wasmtime

Author	SHA1	Message	Date
Chris Fallin	f980defe17	egraph support: rewrite to work in terms of CLIF data structures. (#5382 ) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2022-12-06 14:58:57 -08:00
Trevor Elliott	293bb5b334	riscv64: Only emit jumps at the end of basic blocks (#5381 ) This PR fixes two bugs in the riscv64 backend, where branch instructions were emitted in the middle of a basic block: Constant emission, where the constants are inlined into the vcode and are jumped over at runtime, The BrTableCheck pseudo-instruction, which was always emitted before a BrTable instruction, and would handle jumping to the default label. The first bug was resolved by introducing two new psuedo instructions, LoadConst32 and LoadConst64. Both of these instructions serve to delay the original encoding to emission time, after regalloc2 has run. The second bug was fixed by removing the BrTableCheck instruction. As it was always emitted directly before BrTable, it was easier to remove it and merge the two into a single instruction.	2022-12-06 10:54:10 -08:00
Chris Fallin	feaa7ca75f	Alias analysis: refactor for use by other driver loops. (#5380 ) * Alias analysis: refactor for use by other driver loops. This PR pulls the core of the alias analysis infrastructure into a `process_inst()` method that operates on a single instruction, and allows another compiler pass to apply store-to-load forwarding and redundant-load elimination interleaved with other work. The existing behavior remains unchanged; the pass's toplevel loop calls this extracted method. This refactor is a prerequisite for using the alias analysis as part of a refactored egraph-based optimization framework. * Review feedback: remove unneeded mut.	2022-12-06 18:30:02 +00:00
Trevor Elliott	353a681671	Avoid reusing a register during constant loading (#5379 ) Avoid reusing a register when loading a constant, allocating a temporary instead.	2022-12-05 18:37:53 -08:00
Alex Crichton	4933762d81	Add release notes for 3.0.1 and update some versions (#5364 ) * Add release notes for 3.0.1 * Update some version directives for crates in Wasmtime * Mark anything with `publish = false` as version 0.0.0 * Mark the icache coherence crate with the same version as Wasmtime * Fix manifest directives	2022-12-06 01:26:39 +00:00
Trevor Elliott	7d28d586da	riscv64: Don't reuse registers when loading constants (#5376 ) Rework the constant loading functions in the riscv64 backend to generate fresh temporaries instead of reusing the destination register.	2022-12-05 16:51:52 -08:00
Saúl Cabrera	28cfa57533	cranelift: Small documentation fixes (#5377 ) * `translate_operator` doesn't return a boolean. * `from_base_offset` doesn't panic if offset is smaller than base.	2022-12-06 00:46:58 +00:00
Trevor Elliott	817c2b205c	riscv64: Use a temporary when translating shift amount (#5375 ) Use a temporary when translating the shift amount, instead of reusing the destination register.	2022-12-05 20:54:14 +00:00
Trevor Elliott	b475b9bd19	Terminate blocks with a single branch in riscv64 (#5374 ) Ensure that we're terminating blocks with a single branch instruction, when testing I128 values against zero.	2022-12-05 20:13:28 +00:00
Anton Romanov	29d4d1063f	[codegen] Fixed mutability of domtree reference (#5371 )	2022-12-05 08:19:32 -08:00
Trevor Elliott	6aea8e0d7e	Don't reuse destination registers when lowering splat on aarch64 (#5370 )	2022-12-05 08:18:49 -08:00
wasmtime-publish	a28d4d3c89	Bump Wasmtime to 5.0.0 (#5372 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-12-05 08:38:57 -06:00
Trevor Elliott	2e9b0802ab	aarch64: Rework amode compilation to produce SSA code (#5369 ) Rework the compilation of amodes in the aarch64 backend to stop reusing registers and instead generate fresh virtual registers for intermediates. This resolves some SSA checker violations with the aarch64 backend, and as a nice side-effect removes some unnecessary movs in the generated code.	2022-12-02 01:23:15 +00:00
Trevor Elliott	d54a27d0ea	Allocate temporary intermediates when loading constants on aarch64 (#5366 ) As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.	2022-12-01 22:29:36 +00:00
Nick Fitzgerald	4510a4a805	Cranelift: mark post-legalization trapping blocks as cold (#5367 ) Trapping is a rare event.	2022-12-01 12:46:26 -08:00
Nick Fitzgerald	1eeec7b698	cranelift-wasm: Remove `ModuleTranslationState` (#5365 ) * cranelift-wasm: Remove `ModuleTranslationState` We were putting data into it, but never reading data out of it. Can be removed. * cranelift-wasm: move `funct_state.rs` sub module to `state.rs` Since it is the only submodule of `state` it can just be the whole `state` module.	2022-12-01 19:04:36 +00:00
Trevor Elliott	37c3c5b1e0	Remove an unnecessary debug trace (#5359 )	2022-11-30 20:37:20 -08:00
Trevor Elliott	c16f2956db	Allocate a temporary for 64-bit constant loads in the s390x backend (#5357 ) Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.	2022-11-30 17:01:14 -08:00
Jamey Sharp	0e65f87e37	cranelift-isle: Reject unreachable rules (#5322 ) Some of our ISLE rules can never fire because there's a higher-priority rule that will always fire instead. Sometimes the worst that can happen is we generate sub-optimal output. That's not so bad but we'd still like to know about it so we can fix it. In other cases there might be instructions which can't be lowered in isolation. If a general rule for lowering one of the instructions is higher-priority than the rule for lowering the combined sequence, then lowering the combined sequence will always fail. Either way, this is always a bug, so make it a fatal error if we can detect it.	2022-11-30 15:06:00 -08:00
Trevor Elliott	d8dbabfe6b	Don't reuse registers in the x64 div lowering (#5356 ) Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.	2022-11-30 22:44:59 +00:00
Trevor Elliott	87b63174b1	Don't reuse registers in make_i64x2_from_lanes (#5355 ) Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.	2022-11-30 14:37:01 -08:00
Nick Fitzgerald	79f7fa6079	Cranelift: implement `heap_{load,store}` instruction legalization (#5351 ) * Cranelift: implement `heap_{load,store}` instruction legalization This does not remove `heap_addr` yet, but it does factor out the common bounds-check-and-compute-the-native-address functionality that is shared between all of `heap_{addr,load,store}`. Finally, this adds a missing optimization for when we can dedupe explicit bounds checks for static memories and Spectre mitigations. * Cranelift: Enable `heap_load_store_*` run tests on all targets	2022-11-30 19:12:49 +00:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Timothy Chen	67fc5389b0	Remove sig data arg and ret fields to reduce size (#5319 ) * Remove sig data arg and ret fields to reduce size * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix offsets * Add comment Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-30 07:19:41 -08:00
Alex Crichton	86acb9a438	Use workspace inheritance for some more dependencies (#5349 ) Deduplicate some dependency directives through `[workspace.dependencies]`	2022-11-29 22:32:56 +00:00
Nick Fitzgerald	2ad3f78624	Cranelift: fix `heap_{load,store}` test generator script (#5348 ) Was missing some '$' characters and so was comparing string literals against string literals instead of variable values against string literals. Regenerated tests to fix them and add missing tests.	2022-11-29 20:53:14 +00:00
Nick Fitzgerald	913a2ec8c8	Cranelift: consider heap's guard pages when legalizing `heap_addr` (#5335 ) * Cranelift: consider heap's guard pages when legalizing `heap_addr` Fixes #5328 * Update comment to align more directly with implementation * Add legalization tests for `heap_addr` and offset guard pages	2022-11-29 19:54:25 +00:00
Trevor Elliott	f138fc0ed3	Bump regalloc2 to 0.5.0 (#5345 ) * Bump the regalloc2 dependency to 0.5.0 * Replace preg_set_from_machine_env with PRegSet::from * Vet the regalloc2 update	2022-11-29 11:25:35 -08:00
Jamey Sharp	3b76874834	cranelift-isle: Fix representation for overlap checks (#5337 ) Ulrich Weigand identified two bugs in this code due to it falsely claiming there were unreachable rules in the s390x backend. The fixes are: - Add constraints for pure constructors. I didn't notice that a constructor which is declared pure (which currently implies that it is fallible), when used on the left-hand side of a rule, can cause the rule to fail to match. Therefore, any constructors on the left-hand side must be noted as additional constraints on the rule, so that overlap checking can see them. - Ignore subset-overlaps for rules with equality constraints This eliminates false positives when checking for unreachable rules. It introduces false negatives instead but we prefer to fail to detect an error instead of claiming that valid input is wrong. We can implement a more accurate check later.	2022-11-29 11:02:12 -08:00
Afonso Bordado	ec342c20e3	cranelift: Add `iadd_cout` lowerings for aarch64 (#5177 ) * cranelift: Add `iadd_cout`/`isub_bout` i128 tests * aarch64: Add `iadd_cout` lowerings * fuzzgen: Add `iadd_cout`	2022-11-29 10:58:44 -08:00
Jamey Sharp	ff5abfd993	cranelift-isle: Minor error-handling cleanups (#5338 ) - Remove remaining references to Miette - Borrow implementation of `line_starts` from codespan-reporting - Clean up a use of `Result` that no longer conflicts with a local definition - When printing plain errors, add a blank line between errors for readability	2022-11-29 03:07:05 +00:00
Trevor Elliott	a5a0645aff	Don't allow reuse_def constraints in the s390x Loop instruction (#5336 )	2022-11-28 17:52:11 -08:00
Trevor Elliott	368004428a	Fix rule shadowing instances in x64 and aarch64 backends (#5334 ) Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.	2022-11-28 15:48:34 -08:00
Nick Fitzgerald	58a5089e48	Cranelift: log number of CLIF insts/blocks to optimize/lower (#5333 )	2022-11-28 19:35:29 +00:00
Nick Fitzgerald	6fe69d00ca	Cranelift: add debug logs counting how many vcode instructions and blocks we lower to (#5332 )	2022-11-28 18:57:02 +00:00
Trevor Elliott	54a6d2f79a	Generate more fixed_nonallocatable constraints, and add debug assertions (#5132 ) Add assertions to the OperandCollector that show we're not using pinned vregs, and use reg_fixed_nonallocatable constraints when a real register is used with other constraint generation functions like reg_use etc.	2022-11-28 10:31:56 -08:00
Rodrigo Batista de Moraes	28cf995fd3	cranelift-frontend: make `FunctionBuilder::finalize` consume self (#5316 )	2022-11-23 23:41:52 +00:00
Jamey Sharp	044b57f334	cranelift-isle: Rewrite error reporting (#5318 ) There were several issues with ISLE's existing error reporting implementation. - When using Miette for more readable error reports, it would panic if errors were reported from multiple files in the same run. - Miette is pretty heavy-weight for what we're doing, with a lot of dependencies. - The `Error::Errors` enum variant led to normalization steps in many places, to avoid using that variant to represent a single error. This commit: - replaces Miette with codespan-reporting - gets rid of a bunch of cargo-vet exemptions - replaces the `Error::Errors` variant with a new `Errors` type - removes source info from `Error` variants so they're easy to construct - adds source info only when formatting `Errors` - formats `Errors` with a custom `Debug` impl - shares common code between ISLE's callers, islec and cranelift-codegen - includes a source snippet even with fancy-errors disabled I tried to make this a series of smaller commits but I couldn't find any good split points; everything was too entangled with everything else.	2022-11-23 14:20:48 -08:00
Timothy Chen	48ee42efc2	Refactor Sigdata methods with sigset (#5307 ) * Refactor sigdata methods * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Address comments Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-22 09:03:51 -08:00
Nick Fitzgerald	d0d3245a35	Cranelift: Add `heap_load` and `heap_store` instructions (#5300 ) * Cranelift: Define `heap_load` and `heap_store` instructions * Cranelift: Implement interpreter support for `heap_load` and `heap_store` * Cranelift: Add a suite runtests for `heap_{load,store}` There are so many knobs we can twist for heaps and I wanted to exhaustively test all of them, so I wrote a script to generate the tests. I've checked in the script in case we want to make any changes in the future, but I don't think it is worth adding this to CI to check that scripts are up to date or anything like that. * Review feedback	2022-11-21 23:00:39 +00:00
Trevor Elliott	54cfa4df34	cranelift: Fix implicit pointer argument register use (#5301 ) * Fix arg handling to write to VRegs instead of physical regs * Make is_included_in_clobbers required, and handle Args on x64 and riscv64	2022-11-18 16:47:03 -08:00
Jamey Sharp	54207d343e	cranelift-isle: Specialize for Term at rule root (#5295 ) In #5174 we decided it doesn't make sense for a rule to have a bind-pattern at the root of its left-hand side. There's no Rust value corresponding to the root value of such a term, because it actually represents a function declaration with one or more arguments. This commit takes that to its logical conclusion. `sema::Rule` previously had an `lhs` field whose value must always be a `Pattern::Term` variant, and anyone using that structure had to deal with the possibility of finding the wrong variant there. Now the relevant fields from that variant are stored directly in `Rule` instead. Also, the (tiny!) portion of `translate_pattern` which applied when the pattern was the root term is now inlined in `collect_rules`. Because `translate_pattern` no longer has to special-case the root term, we can delete its `rule_term` and `is_root` arguments. That brings it down to a more manageable four arguments, which means many calls fit on one line now.	2022-11-18 11:21:08 -08:00
Jun Ryung Ju	e5f93d9ec0	cranelift: Support `bnot`, `band`, `bor`, `bxor` for x86_64. (#5036 ) * Support `bnot`, `band`, `bor`, `bxor` for x86_64. * Fix-up to handle `B{8,16,32,64}` type on bitops * Fix-up conflict.	2022-11-18 07:45:54 -08:00
Jamey Sharp	9a44ef7443	cranelift-isle: Unify expressions and bindings (#5294 ) As it turns out, that distinction was not necessary for this representation. Removing it eliminates some complexity around wrapping expressions as bindings and vice versa. It also clears up some confusion about which category to put certain constructs in (arguments and extractors) by refusing to have different categories. While I was writing this patch I also realized that `add_match_variant` and `normalize_equivalence_classes` both need to do fundamentally the same things with enum variants, so I refactored them to share code and make their relationship clearer. Finally, I reviewed all the comments in this file and fixed some places where they could be more clear.	2022-11-17 16:00:59 -08:00
Nick Fitzgerald	3b6544dc66	Fix warnings in `cranelift-codegen` docs builds (#5292 )	2022-11-17 21:13:24 +00:00
Alex Crichton	9bf2a8e663	Remove some dead code in the cranelift-wasm crate (#5290 ) * Remove some dead code in the cranelift-wasm crate * Remove some more dead code	2022-11-17 16:28:11 +00:00
Timothy Chen	de6e4a4e20	Shrink the size of SigData in Cranelift (#5261 ) * Shrink the size of SigData in Cranelift * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Change ret arg length to u16 * Add test Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-17 00:15:19 +00:00
Trevor Elliott	4780bd5902	Don't use %rcx directly with CoffTlsGetAddr (#5278 ) Avoid naming %rcx as written by the CoffTlsGetAddr pseudo-instruction in the x64 backend, and instead emit a fixed-def constraint for a fresh VReg and %rcx.	2022-11-16 11:32:09 -08:00
Trevor Elliott	07bd8bf34a	Remove unnecessary moves in x64 gen_memcpy (#5277 ) Remove some unnecessary moves in the x64 gen_memcpy implementation -- the call instruction that's generated will already constrain the args to those registers.	2022-11-16 10:33:00 -08:00
Afonso Bordado	a793648eb2	cranelift: Fix `fdemote` on the interpreter (#5158 ) * cranelift: Cleanup `fdemote`/`fpromote` tests * cranelift: Fix `fdemote`/`fpromote` instruction docs The verifier fails if the input and output types are the same for these instructions * cranelift: Fix `fdemote`/`fpromote` in the interpreter * fuzzgen: Add `fdemote`/`fpromote`	2022-11-15 22:22:00 +00:00

... 2 3 4 5 6 ...

4305 Commits