Commit Graph

1803 Commits

Author SHA1 Message Date
Ulrich Weigand
342f805812 Use vselect in NaN canonicalization pass. (#5192)
Change add_nan_canon_seq to use vselect instead of bitselect.
This is more straightforward and removes bitcast operations.
Codegen should be unchanged.
2022-11-03 20:36:38 +00:00
Ulrich Weigand
137a8b710f Move bitselect->vselect optimization to x64 back-end (#5191)
The simplifier was performing an optimization to replace bitselect
with vselect if the all bytes of the condition mask could be shown
to be all ones or all zeros.

This optimization only ever made any difference in codegen on the
x64 target.  Therefore, move this optimization to the x64 back-end
and perform it in ISLE instead.  Resulting codegen should be
unchanged, with slightly improved compile time.

This also eliminates a few endian-dependent bitcast operations.
2022-11-03 20:17:36 +00:00
Afonso Bordado
3ef30b5b67 cranelift: Rename i{min,max} to s{min,max} (#5187)
This brings these instructions with our general naming convention
of signed instructions being prefixed with `s`.
2022-11-03 18:20:33 +00:00
Afonso Bordado
2c69b94744 cranelift: Add support for bswap.i128 (#5186)
* fuzzgen: Request only one variable for bswap

This was included by accident. Bswap only has one input, instead of two.

* cranelift: Add `bswap.i128` support

Adds support only for x86, AArch64, S390X.

RISCV does not yet have bswap.
2022-11-03 18:03:37 +00:00
Alex Crichton
22159848c5 Fix instruction size test for Rust 1.65.0 (#5188)
Looks like Rust generously shrank our `enum` in 1.65.0, so update the
test assertion to pass CI.
2022-11-03 16:53:51 +00:00
Trevor Elliott
aeceea28e2 Remove trapif and trapff (#5162)
This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.
2022-11-03 09:25:11 -07:00
Ulrich Weigand
961107ec63 Merge raw_bitcast and bitcast (#5175)
- Allow bitcast for vectors with differing lane widths
- Remove raw_bitcast IR instruction
- Change all users of raw_bitcast to bitcast
- Implement support for no-op bitcast cases across backends

This implements the second step of the plan outlined here:
https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394
2022-11-02 10:16:27 -07:00
Alex Crichton
2afaac5181 Return anyhow::Error from host functions instead of Trap, redesign Trap (#5149)
* Return `anyhow::Error` from host functions instead of `Trap`

This commit refactors how errors are modeled when returned from host
functions and additionally refactors how custom errors work with `Trap`.
At a high level functions in Wasmtime that previously worked with
`Result<T, Trap>` now work with `Result<T>` instead where the error is
`anyhow::Error`. This includes functions such as:

* Host-defined functions in a `Linker<T>`
* `TypedFunc::call`
* Host-related callbacks like call hooks

Errors are now modeled primarily as `anyhow::Error` throughout Wasmtime.
This subsequently removes the need for `Trap` to have the ability to
represent all host-defined errors as it previously did. Consequently the
`From` implementations for any error into a `Trap` have been removed
here and the only embedder-defined way to create a `Trap` is to use
`Trap::new` with a custom string.

After this commit the distinction between a `Trap` and a host error is
the wasm backtrace that it contains. Previously all errors in host
functions would flow through a `Trap` and get a wasm backtrace attached
to them, but now this only happens if a `Trap` itself is created meaning
that arbitrary host-defined errors flowing from a host import to the
other side won't get backtraces attached. Some internals of Wasmtime
itself were updated or preserved to use `Trap::new` to capture a
backtrace where it seemed useful, such as when fuel runs out.

The main motivation for this commit is that it now enables hosts to
thread a concrete error type from a host function all the way through to
where a wasm function was invoked. Previously this could not be done
since the host error was wrapped in a `Trap` that didn't provide the
ability to get at the internals.

A consequence of this commit is that when a host error is returned that
isn't a `Trap` we'll capture a backtrace and then won't have a `Trap` to
attach it to. To avoid losing the contextual information this commit
uses the `Error::context` method to attach the backtrace as contextual
information to ensure that the backtrace is itself not lost.

This is a breaking change for likely all users of Wasmtime, but it's
hoped to be a relatively minor change to workaround. Most use cases can
likely change `-> Result<T, Trap>` to `-> Result<T>` and otherwise
explicit creation of a `Trap` is largely no longer necessary.

* Fix some doc links

* add some tests and make a backtrace type public (#55)

* Trap: avoid a trailing newline in the Display impl

which in turn ends up with three newlines between the end of the
backtrace and the `Caused by` in the anyhow Debug impl

* make BacktraceContext pub, and add tests showing downcasting behavior of anyhow::Error to traps or backtraces

* Remove now-unnecesary `Trap` downcasts in `Linker::module`

* Fix test output expectations

* Remove `Trap::i32_exit`

This commit removes special-handling in the `wasmtime::Trap` type for
the i32 exit code required by WASI. This is now instead modeled as a
specific `I32Exit` error type in the `wasmtime-wasi` crate which is
returned by the `proc_exit` hostcall. Embedders which previously tested
for i32 exits now downcast to the `I32Exit` value.

* Remove the `Trap::new` constructor

This commit removes the ability to create a trap with an arbitrary error
message. The purpose of this commit is to continue the prior trend of
leaning into the `anyhow::Error` type instead of trying to recreate it
with `Trap`. A subsequent simplification to `Trap` after this commit is
that `Trap` will simply be an `enum` of trap codes with no extra
information. This commit is doubly-motivated by the desire to always use
the new `BacktraceContext` type instead of sometimes using that and
sometimes using `Trap`.

Most of the changes here were around updating `Trap::new` calls to
`bail!` calls instead. Tests which assert particular error messages
additionally often needed to use the `:?` formatter instead of the `{}`
formatter because the prior formats the whole `anyhow::Error` and the
latter only formats the top-most error, which now contains the
backtrace.

* Merge `Trap` and `TrapCode`

With prior refactorings there's no more need for `Trap` to be opaque or
otherwise contain a backtrace. This commit parse down `Trap` to simply
an `enum` which was the old `TrapCode`. All various tests and such were
updated to handle this.

The main consequence of this commit is that all errors have a
`BacktraceContext` context attached to them. This unfortunately means
that the backtrace is printed first before the error message or trap
code, but given all the prior simplifications that seems worth it at
this time.

* Rename `BacktraceContext` to `WasmBacktrace`

This feels like a better name given how this has turned out, and
additionally this commit removes having both `WasmBacktrace` and
`BacktraceContext`.

* Soup up documentation for errors and traps

* Fix build of the C API

Co-authored-by: Pat Hickey <pat@moreproductive.org>
2022-11-02 16:29:31 +00:00
Alex Crichton
cd53bed898 Implement AOT compilation for components (#5160)
* Pull `Module` out of `ModuleTextBuilder`

This commit is the first in what will likely be a number towards
preparing for serializing a compiled component to bytes, a precompiled
artifact. To that end my rough plan is to merge all of the compiled
artifacts for a component into one large object file instead of having
lots of separate object files and lots of separate mmaps to manage. To
that end I plan on eventually using `ModuleTextBuilder` to build one
large text section for all core wasm modules and trampolines, meaning
that `ModuleTextBuilder` is no longer specific to one module. I've
extracted out functionality such as function name calculation as well as
relocation resolving (now a closure passed in) in preparation for this.

For now this just keeps tests passing, and the trajectory for this
should become more clear over the following commits.

* Remove component-specific object emission

This commit removes the `ComponentCompiler::emit_obj` function in favor
of `Compiler::emit_obj`, now renamed `append_code`. This involved
significantly refactoring code emission to take a flat list of functions
into `append_code` and the caller is responsible for weaving together
various "families" of functions and un-weaving them afterwards.

* Consolidate ELF parsing in `CodeMemory`

This commit moves the ELF file parsing and section iteration from
`CompiledModule` into `CodeMemory` so one location keeps track of
section ranges and such. This is in preparation for sharing much of this
code with components which needs all the same sections to get tracked
but won't be using `CompiledModule`. A small side benefit from this is
that the section parsing done in `CodeMemory` and `CompiledModule` is no
longer duplicated.

* Remove separately tracked traps in components

Previously components would generate an "always trapping" function
and the metadata around which pc was allowed to trap was handled
manually for components. With recent refactorings the Wasmtime-standard
trap section in object files is now being generated for components as
well which means that can be reused instead of custom-tracking this
metadata. This commit removes the manual tracking for the `always_trap`
functions and plumbs the necessary bits around to make components look
more like modules.

* Remove a now-unnecessary `Arc` in `Module`

Not expected to have any measurable impact on performance, but
complexity-wise this should make it a bit easier to understand the
internals since there's no longer any need to store this somewhere else
than its owner's location.

* Merge compilation artifacts of components

This commit is a large refactoring of the component compilation process
to produce a single artifact instead of multiple binary artifacts. The
core wasm compilation process is refactored as well to share as much
code as necessary with the component compilation process.

This method of representing a compiled component necessitated a few
medium-sized changes internally within Wasmtime:

* A new data structure was created, `CodeObject`, which represents
  metadata about a single compiled artifact. This is then stored as an
  `Arc` within a component and a module. For `Module` this is always
  uniquely owned and represents a shuffling around of data from one
  owner to another. For a `Component`, however, this is shared amongst
  all loaded modules and the top-level component.

* The "module registry" which is used for symbolicating backtraces and
  for trap information has been updated to account for a single region
  of loaded code holding possibly multiple modules. This involved adding
  a second-level `BTreeMap` for now. This will likely slow down
  instantiation slightly but if it poses an issue in the future this
  should be able to be represented with a more clever data structure.

This commit additionally solves a number of longstanding issues with
components such as compiling only one host-to-wasm trampoline per
signature instead of possibly once-per-module. Additionally the
`SignatureCollection` registration now happens once-per-component
instead of once-per-module-within-a-component.

* Fix compile errors from prior commits

* Support AOT-compiling components

This commit adds support for AOT-compiled components in the same manner
as `Module`, specifically adding:

* `Engine::precompile_component`
* `Component::serialize`
* `Component::deserialize`
* `Component::deserialize_file`

Internally the support for components looks quite similar to `Module`.
All the prior commits to this made adding the support here
(unsurprisingly) easy. Components are represented as a single object
file as are modules, and the functions for each module are all piled
into the same object file next to each other (as are areas such as data
sections). Support was also added here to quickly differentiate compiled
components vs compiled modules via the `e_flags` field in the ELF
header.

* Prevent serializing exported modules on components

The current representation of a module within a component means that the
implementation of `Module::serialize` will not work if the module is
exported from a component. The reason for this is that `serialize`
doesn't actually do anything and simply returns the underlying mmap as a
list of bytes. The mmap, however, has `.wasmtime.info` describing
component metadata as opposed to this module's metadata. While rewriting
this section could be implemented it's not so easy to do so and is
otherwise seen as not super important of a feature right now anyway.

* Fix windows build

* Fix an unused function warning

* Update crates/environ/src/compilation.rs

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-11-02 15:26:26 +00:00
Trevor Elliott
09d8df6fab Switch to x64_rbp to avoid the use of a pinned register (#5168)
Avoid a use of preg_rpb in the x64 backend, using x64_rbp instead.
2022-11-01 13:23:33 -07:00
Trevor Elliott
0c53941364 Remove the need for count_operands by restructuring emit in s390x (#5164)
Remove the need for count_operands by restructuring emit in the s390x backend to instead take the AllocationConsumer as an argument.
2022-11-01 10:05:45 -07:00
Nick Fitzgerald
d0673ff7da Cranelift: use iterators instead of indexing; clean up match expressions (#5161) 2022-11-01 09:48:40 -07:00
Nick Fitzgerald
6d289723bd Cranelift: Use a single, shared vector allocation for all ABIArgs (#5127)
* Cranelift: Use a single, shared vector allocation for all `ABIArg`s

Instead of two `SmallVec`s per `SigData`.

* Remove `Deref` and `DerefMut` impls for `ArgsAccumulator`
2022-10-31 14:32:17 -07:00
Afonso Bordado
04fa8951ba aarch64: Deduplicate bmask lowering rule (#5154) 2022-10-31 12:44:35 -07:00
11evan
4ca9e82bd1 cranelift: Add Bswap instruction (#1092) (#5147)
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.

* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.

As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.

Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap

* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.

* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing

* For completeness, added bswap to the interpreter as well.

* added filetests and runtests for each ISA

* added bswap to fuzzgen, thanks to afonso360 for the code there

* 128-bit swaps are not yet implemented, that can be done later
2022-10-31 19:30:00 +00:00
Afonso Bordado
2fb76be2e4 x64: Add bmask implementation (#5148) 2022-10-28 17:17:22 -07:00
Afonso Bordado
879b52825f cranelift: Implement ineg.i128 for everyone (#5129)
* cranelift: Add `ineg` runtests

* aarch64: Implement `ineg.i128`

* x64: Implement `ineg.i128`

* riscv: Implement `ineg.i128`

* fuzzgen: Enable `ineg.i128`
2022-10-28 16:10:00 -07:00
Afonso Bordado
e8f3d03bbe cranelift: Mask high bits on bmask for types smaller than a register (#5118)
* aarch64: Fix incorrect masking for small types on bmask

`bmask` was accidentally relying on the uppermost bits of the register
for small types.

This was found by fuzzgen,  when it generated a shift left followed by
a bmask, the shift left shifted the bits out of the range of the input
type (i8), however these are not automatically cleared since they
remained inside the 32 bits of the register.

That caused issues when the bmask tried to compare the whole register
instead of just the bottom bits. The solution here is to mask the upper
bits for small types.

* aarch64: Emit 32bit cmp on bmask

This fixes an issue where bmask was accidentally comparing the
upper bits of the register by always using a 64bit cmp.

* riscv: Mask high bits in bmask

* riscv: Add compile tests for br{z,nz}

* riscv: Use shifts to mask 32bit values

This produces less code than the AND since that version needs to
load an immediate constant from memory.

* cranelift: Update test input to hexadecimal values

This makes it a bit more clear what is being tested.

* riscv: Use addiw for masking 32 bit values

Co-authored-by: Trevor Elliott <telliott@fastly.com>

* aarch64: Update bmask rule priority

Co-authored-by: Trevor Elliott <telliott@fastly.com>
2022-10-27 09:45:39 -07:00
Trevor Elliott
02620441c3 Add uadd_overflow_trap (#5123)
Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely.

The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.
2022-10-27 09:43:15 -07:00
Afonso Bordado
4867813f77 cranelift: Remove copy instruction (#5125) 2022-10-25 17:27:33 -07:00
Ulrich Weigand
b61e678309 s390x: Fix more regalloc checker errors (#5121)
For VecInsertLane[Undef] and VecExtractLane, if lane_reg is zero_reg(),
the instruction does not actually use any register value.

Fixes https://github.com/bytecodealliance/wasmtime/issues/5090
2022-10-25 18:04:31 +00:00
Ulrich Weigand
39b3b1d772 s390x: Fix handling of sret arguments (#5116)
Skip synthetic StructReturn entries in the return value list.
Fixes https://github.com/bytecodealliance/wasmtime/issues/5089
2022-10-25 10:40:10 -07:00
Chris Fallin
e62e530b7c egraphs: fix fill-in-the-types logic for multiple projections of one value. (#5112)
In particular, this was found to happen in #5099 because a `Result`
projection node was not deduplicating across two separate `isplit`s that
created it. (This is a separate issue we should also fix; `needs_dedup`
is I think overly conservative because `Result` can project out a single
value from a pure or impure node, but the projection itself should be
treated like any other pure operator.)

In any case, if we have a value `v0` and two separate `Result { value:
v0, result: N, ty }` nodes, each of these will fill in the type `ty` for
the `N`th output of `v0`, and the second will idempotently overwrite the
first; we should loosen the assert so that it allows this case.

Fixes #5099. Fixes #5100.
2022-10-25 05:22:28 +00:00
Nick Fitzgerald
097d1087e0 Cranelift: Avoid calling ensure_struct_return_pointer_is_returned and cloning sigs for every call (#5113)
* Cranelift: pass iterators to `ABIMachineSpec::compute_arg_locs`

Instead of slices. This gives us more flexibility to pass custom sequences
without needing to allocate a `Vec` to hold them and pass in as a slice.

* Cranelift: Avoid cloning `ir::Signature`s in `SigData::from_func_sig`

This avoids two heap allocations per signature that are unnecessary 99% of the
time.

* fix typo

* Simplify condition in `missing_struct_return`
2022-10-24 17:21:34 -07:00
Trevor Elliott
ec12415b1f cranelift: Remove redundant branch and select instructions (#5097)
As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input.

For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.
2022-10-24 16:14:35 -07:00
Afonso Bordado
c8791073d6 cranelift: Remove iconst.i128 (#5075)
* cranelift: Remove iconst.i128

* bugpoint: Report Changed when only one instruction is mutated

* cranelift: Fix egraph bxor rule

* cranelift: Remove some simple_preopt opts for i128
2022-10-24 12:43:28 -07:00
Ulrich Weigand
bfcf6616fe s390x: clean up remnants of non-SSA code generation (#5096)
Eliminate a few remaining instances of non-SSA code.
Remove infrastructure previously used for non-SSA code emission.
Related cleanup around flags handling.
2022-10-24 12:40:50 -07:00
Nick Fitzgerald
442f9fa01b Cranelift: pass iterators to ABIMachineSpec::compute_arg_locs (#5095)
Instead of slices. This gives us more flexibility to pass custom sequences
without needing to allocate a `Vec` to hold them and pass in as a slice.
2022-10-21 16:08:09 -07:00
Nick Fitzgerald
5c5fa192f7 Cranelift: use .enumerate() to avoid indexing in s390x backend (#5094)
This can help rustc/llvm avoid bounds checks, but more importantly I will have
future changes here that remove indexing of params, and instead hand them out as
an iterator.
2022-10-21 13:08:56 -07:00
Nick Fitzgerald
4a66c3b855 Cranelift: Remove duplicate IR signature legalizations (#5093)
The `SigData::from_func_sig` constructor will already ensure that the struct
return pointer is returned, so this is a purely unnecessary call.

Note that this is not a performance speed up, since
`ensure_struct_return_ptr_is_returned` doesn't do any significant work if the
signature is already legalized.
2022-10-21 13:08:44 -07:00
Ulrich Weigand
9dadba60a0 s390x: use constraints for call arguments and return values (#5092)
Use the regalloc constraint-based CallArgList / CallRetList
mechanism instead of directly using physregs in instructions.
2022-10-21 11:01:22 -07:00
Chris Fallin
86e77953f8 Fix some egraph-related issues. (#5088)
This fixes #5086 by addressing two separate issues:

- The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node.
- The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).
2022-10-21 10:24:48 -07:00
Trevor Elliott
d9753fac2b Remove uses of reg_mod from s390x (#5073)
Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not.

Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-21 09:22:16 -07:00
Chris Fallin
c392e461a3 egraphs: a few miscellaneous compile-time optimizations. (#5072)
* egraphs: a few miscellaneous compile-time optimizations.

These optimizations together are worth about a 2% compile-time
reduction, as measured on one core with spidermonkey.wasm as an input,
using `hyperfine` on `wasmtime compile`.

The changes included are:
- Some better pre-allocation (blockparams and side-effects concatenated
  list vecs);
- Avoiding the indirection of storing list-of-types for every Pure and
  Inst node, when almost all nodes produce only a single result;
  instead, store arity and single type if it exists, and allow result
  projection nodes to fill in types otherwise;
- Pack the `MemoryState` enum into one `u32` (this together with the
  above removal of the type slice allows `Node` to
  shrink from 48 bytes to 32 bytes);
- always-inline an accessor (`entry` on `CtxHash`) that wasn't
  (`always(inline)` appears to be load-bearing, rather than just
  `inline`);
- Split the update-analysis path into two hotpaths, one for the union
  case and one for the new-node case (and the former can avoid
  recomputing for the contained node when replacing a node with
  node-and-child eclass entry).

* Review feedback.

* Fix test build.

* Fix to lowering when unused output with invalid type is present.
2022-10-19 11:05:00 -07:00
Trevor Elliott
32a7593c94 cranelift: Remove booleans (#5031)
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.

Fixes #3205

Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-17 16:00:27 -07:00
Afonso Bordado
766ecb561e fuzzgen: Always generate reachable blocks (#5034)
* fuzzgen: Always reachable blocks

* fuzzgen: Rename BlockTerminator

* fuzzgen: Rename `finalize_block`

* fuzzgen: Use `cloned` instead of map clone

Thanks @jameysharp!

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* fuzzgen: `rustfmt`

* fuzzgen: Document paramless targets

* fuzzgen:  Add `BlockTerminatorKind`

* fuzzen: Update BrTable/Switch comment

* fuzzgen: Minor cleanup

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-10-17 12:51:20 -07:00
Chris Fallin
1aaea279e5 egraph opts: fix uextend-of-i32. (#5061)
This is a simple error in the const-prop rules: uextend was not
masking iconst's u64 immediate when extending from i32 to
i64. Arguably an iconst.i32 should not have nonzero bits in the upper
32 of its immediate, but that's a separate design question. For now,
if our invariant is that the upper bits are ignored, then it is
required to mask the bits when const-evaling a `uextend`.

Fixes #5047.
2022-10-17 12:45:49 -07:00
Nick Fitzgerald
03d77d4d6b Cranelift: Derive Copy for InstructionData (#5043)
* Cranelift: Derive `Copy` for `InstructionData`

And update `clone` calls to be copies.

* Add a test for `InstructionData`'s size
2022-10-12 07:58:27 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00
Nick Fitzgerald
e2f1ced0b6 Cranelift: Make Opcode represented as a u8 instead of u16 and remove vestigial conversion impls (#5042)
* Cranelift: Make `Opcode` represented as a `u8` instead of `u16`

* Cranelift: Remove unused conversion impls for `Opcode`

These are vestigial, left over from Peepmatic.
2022-10-11 12:57:12 -07:00
Jun Ryung Ju
39fbff92c3 cranelift: Added fp and, or, xor, not ops to interpreter. (#4999)
* cranelift: Added fp and, or, xor, not ops to interpreter.

* Formatting.

* Removed archtecture dependent test on float-bitops.
2022-10-06 18:24:45 -07:00
Chris Fallin
e95ffe4413 Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023)
* Fix StructReturn handling: properly mark the clobber, and offset actual rets.

The legalization of `StructReturn` was causing issues in the new
call-handling code: the `StructReturn` ret was included in the `SigData` as
if it were an actual CLIF-level return value, but it is not.

Prior to using regalloc constraints for return values, we
unconditionally included rax (or the architecture's usual return
register) as a def, so it would be properly handled as "clobbered" by
the regalloc. With the new scheme, we include defs on the call only for
CLIF-level outputs. Callees with `StructReturn` args were thus not known
to clobber the return-value register, and values might be corrupted.

This PR updates the code to include a `StructReturn` ret as a clobber
rather than a returned value in the relevant spots. I observed it
causing saves/restores of rax in some CLIF that @bjorn3 provided me, but
I was having difficulty minimizing this into a test-case that I would be
comfortable including as a precise-output case (including the whole
thing verbatim would lock down a bunch of other irrelevant details and
cause test-update noise later). If we can find a more minimized example
I'm happy to include it as a filetest.

Fixes #5018.
2022-10-07 00:14:38 +00:00
Jamey Sharp
04b30acad9 Misc cleanups (#5014)
* Replace resize+copy_from_slice with extend_from_slice

Vec::resize initializes the new space, which is wasted effort if we're
just going to call `copy_from_slice` on it immediately afterward. Using
`extend_from_slice` is simpler, and very slightly faster.

If the new size were bigger than the buffer we're copying from, then it
would make sense to initialize the excess. But it isn't: it's always
exactly the same size.

* Move helpers from Context to CompiledCode

These methods only use information from Context::compiled_code, so they
should live on CompiledCode instead.

* Remove an unnecessary #[cfg_attr]

There are other uses of `#[allow(clippy::too_many_arguments)]` in this
file, so apparently it doesn't need to be guarded by the "cargo-clippy"
feature.

* Fix a few comments

Two of these were wrong/misleading:

- `FunctionBuilder::new` does not clear the provided func_ctx. It does
  debug-assert that the context is already clear, but I don't think
  that's worth a comment.

- `switch_to_block` does not "create values for the arguments." That's
  done by the combination of `append_block_params_for_function_params`
  and `declare_wasm_parameters`.

* wasmtime-cranelift: Misc cleanups

The main change is to use the `CompiledCode` reference we already had
instead of getting it out of `Context` repeatedly. This removes a bunch
of `unwrap()` calls.

* wasmtime-cranelift: Factor out uncached compile
2022-10-05 10:35:59 -07:00
Trevor Elliott
a209cb63f5 ISLE: Enable the overlap checker (#5011)
This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.
2022-10-04 21:56:49 +00:00
yuyang
07584f6ac8 fix issue 4996. (#5003) 2022-10-04 11:18:42 -07:00
Trevor Elliott
c9ff14e00b Resolve overlap in the s390x backend (#5002)
Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.
2022-10-03 17:06:10 -07:00
Trevor Elliott
ab4be2bdd1 ISLE: Resolve overlaps in the aarch64 backend (#4988) 2022-09-30 12:57:50 -07:00
Trevor Elliott
c1d6ca48a7 ISLE: Resolve overlap in the riscv64 backend (#4982)
Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.
2022-09-29 17:22:25 -07:00
Trevor Elliott
46e42601eb ISLE: Always default the priority to 0 (#4983)
* Always default priorities to 0 in the ISLE IR
* Fix a hidden overlap in the x64 backend
2022-09-29 15:32:29 -07:00
Trevor Elliott
7d5b2b5bb2 ISLE: Add the overlap_errors pragma (#4981)
* Add the overlap_errors pragma to ISLE
* Enable overlap errors in the x64 backend
2022-09-29 12:40:39 -07:00