wasmtime

Author	SHA1	Message	Date
Chris Fallin	22439f7b39	support select_spectre_guard and select on i128 conditions on all platforms. (#5460 ) Fixes #5199. Fixes #5200. Fixes #5452. Fixes #5453. On riscv64, there is apparently an autoconversion from `ValueRegs` to `Reg` that takes just the low register [0], and removing this conversion causes 48 errors. As a result of this, `select` with an `i128` condition was silently miscompiling, testing only the low 64 bits. We should remove this autoconversion to ensure we aren't missing any other silent truncations, but for now this PR just adds the explicit `I128` logic for `select` / `select_spectre_guard`. [0] `d9fdbfd50e/cranelift/codegen/src/isa/riscv64/inst.isle (L1762)`	2022-12-16 14:18:22 -08:00
Trevor Elliott	25bf8e0e67	Make DataFlowGraph::insts public, but restricted (#5450 ) We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG. This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.	2022-12-16 10:46:09 -08:00
Nick Fitzgerald	c0b587ac5f	Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386 ) * cranelift-wasm: translate Wasm loads into lower-level CLIF operations Rather than using `heap_{load,store,addr}`. * cranelift: Remove the `heap_{addr,load,store}` instructions These are now legalized in the `cranelift-wasm` frontend. * cranelift: Remove the `ir::Heap` entity from CLIF * Port basic memory operation tests to .wat filetests * Remove test for verifying CLIF heaps * Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test * Remove `heap_addr` from readonly.clif test * Remove `heap_addr` from `table_addr.clif` test * Remove `heap_addr` from the simd-fvpromote_low.clif test * Remove `heap_addr` from simd-fvdemote.clif test * Remove `heap_addr` from the load-op-store.clif test * Remove the CLIF heap runtest * Remove `heap_addr` from the global_value.clif test * Remove `heap_addr` from fpromote.clif runtests * Remove `heap_addr` from fdemote.clif runtests * Remove `heap_addr` from memory.clif parser test * Remove `heap_addr` from reject_load_readonly.clif test * Remove `heap_addr` from reject_load_notrap.clif test * Remove `heap_addr` from load_readonly_notrap.clif test * Remove `static-heap-without-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` to generating `.wat` tests. * Remove `static-heap-with-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove more heap tests These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove `heap_addr` from `simple-alias.clif` test * Remove `heap_addr` from partial-redundancy.clif test * Remove `heap_addr` from multiple-blocks.clif test * Remove `heap_addr` from fence.clif test * Remove `heap_addr` from extends.clif test * Remove runtests that rely on heaps Heaps are not a thing in CLIF or the interpreter anymore * Add generated load/store `.wat` tests * Enable memory-related wasm features in `.wat` tests * Remove CLIF heap from fcmp-mem-bug.clif test * Add a mode for compiling `.wat` all the way to assembly in filetests * Also generate WAT to assembly tests in `make-load-store-tests.sh` * cargo fmt * Reinstate `f{de,pro}mote.clif` tests without the heap bits * Remove undefined doc link * Remove outdated SVG and dot file from docs * Add docs about `None` returns for base address computation helpers * Factor out `env.heap_access_spectre_mitigation()` to a local * Expand docs for `FuncEnvironment::heaps` trait method * Restore f{de,pro}mote+load clif runtests with stack memory	2022-12-15 00:26:45 +00:00
Trevor Elliott	9dc4f1a83c	s390x: Move the value out of the casloop_val_reg with mov_preg (#5430 ) The casloop_emit function in the s390x backend was using the fixed non-allocatable register %r0 directly with move instructions, which produced a panic in the regalloc2 checker (#5425). This PR changes the casloop_result function to use mov_preg instead of copy_reg to fetch the result, as it's not viewed by regalloc2 as a move. Fixes #5425	2022-12-14 13:06:35 -08:00
Chris Fallin	8383e4b6bd	egraph opt rules: do `(icmp cc x x) == {0,1}` only for integer types. (#5438 ) We could do these for vectors too, in theory, but for now let's fix the bug by applying the equivalence only for integer types. Fixes #5437.	2022-12-14 19:50:42 +00:00
Ulrich Weigand	f0af622208	Simplify LowerBackend interface (#5432 ) * Refactor lower_branch to have Unit result Branches cannot have any output, so it is more straightforward to have the ISLE term return Unit instead of InstOutput. Also provide a new `emit_side_effect` term to simplify implementation of `lower_branch` rules with Unit result. * Simplify LowerBackend interface Move all remaining asserts from the LowerBackend::lower and ::lower_branch_group into the common call site. Change return value of ::lower to Option<InstOutput>, and return value of ::lower_branch_group to Option<()> to match ISLE term signature. Only pass the first branch into ::lower_branch_group and rename it to ::lower_branch. As a result of all those changes, LowerBackend routines now consists solely to calls to the corresponding ISLE routines.	2022-12-14 00:48:25 +00:00
Ulrich Weigand	299be327d5	Simplify "unimplemented" operation error message (#5429 ) Now that all operations are implemented in ISLE, simplify Rust code by providing a generic error message if any operation is not implemented in ISLE. Done across all targets.	2022-12-13 15:22:49 -08:00
Jamey Sharp	eba6b76511	aarch64: Use unsigned constants where appropriate (#5423 ) The Rust type expected in these locations is unsigned, but these constants are negative. ISLE currently emits a Rust expression with extra type conversions in order to make this work as intended. However, across all backends, only these three aarch64 constants use this particular "feature" of ISLE, and I want to make it go away.	2022-12-13 23:08:40 +00:00
Ulrich Weigand	df923f18ca	Remove MachInst::gen_constant (#5427 ) * aarch64: constant generation cleanup Add support for MOVZ and MOVN generation via ISLE. Handle f32const, f64const, and nop instructions via ISLE. No longer call Inst::gen_constant from lower.rs. * riscv64: constant generation cleanup Handle f32const, f64const, and nop instructions via ISLE. * s390x: constant generation cleanup Fix rule priorities for "imm" term. Only handle 32-bit stack offsets; no longer use load_constant64. * x64: constant generation cleanup No longer call Inst::gen_constant from lower.rs or abi.rs. * Refactor LowerBackend::lower to return InstOutput No longer write to the per-insn output registers; instead, return an InstOutput vector of temp registers holding the outputs. This will allow calling LowerBackend::lower multiple times for the same instruction, e.g. to rematerialize constants. When emitting the primary copy of the instruction during lowering, writing to the per-insn registers is now done in lower_clif_block. As a result, the ISLE lower_common routine is no longer needed. In addition, the InsnOutput type and all code related to it can be removed as well. * Refactor IsleContext to hold a LowerBackend reference Remove the "triple", "flags", and "isa_flags" fields that are copied from LowerBackend to each IsleContext, and instead just hold a reference to LowerBackend in IsleContext. This will allow calling LowerBackend::lower from within callbacks in src/machinst/isle.rs, e.g. to rematerialize constants. To avoid having to pass LowerBackend references through multiple functions, eliminate the lower_insn_to_regs subroutines in those targets that still have them, and just inline into the main lower routine. This also eliminates lower_inst.rs on aarch64 and riscv64. Replace all accesses to the removed IsleContext fields by going through the LowerBackend reference. * Remove MachInst::gen_constant This addresses the problem described in issue https://github.com/bytecodealliance/wasmtime/issues/4426 that targets currently have to duplicate code to emit constants between the ISLE logic and the gen_constant callback. After the various cleanups in earlier patches in this series, the only remaining user of get_constant is put_value_in_regs in Lower. This can now be removed, and instead constant rematerialization can be performed in the put_in_regs ISLE callback by simply directly calling LowerBackend::lower on the instruction defining the constant (using a different output register). Since the check for egraph mode is now no longer performed in put_value_in_regs, the Lower::flags member becomes obsolete. Care needs to be taken that other calls directly to the Lower::put_value_in_regs routine now handle the fact that no more rematerialization is performed. All such calls in target code already historically handle constants themselves. The remaining call site in the ISLE gen_call_common helper can be redirected to the ISLE put_in_regs callback. The existing target implementations of gen_constant are then unused and can be removed. (In some target there may still be further opportunities to remove duplication between ISLE and some local Rust code - this can be left to future patches.)	2022-12-13 13:00:04 -08:00
Chris Fallin	92ce79366c	riscv64: remove `valueregs_2_reg` extractor. (#5426 ) This extractor had a side-effect of invoking `put_in_regs`, which is not supposed to be invoked until the pattern-matching commits to evaluating a rule right-hand side (i.e., cannot backtrack). In this case the side-effect was mostly benign (in theory it could have caused additional values to be computed needlessly), but in general we should be careful to keep side-effects out of the left-hand side to enable further optimizations and work on islec. The implicit conversion from `Value` to `Reg` turns out to be enough to make the rules in question work, so we can simply remove the use of the extractor in this case.	2022-12-13 11:47:20 -08:00
Saúl Cabrera	a76e0e8aa5	Decouple `MachBufferFinalized<Stencil>` from `ir::FunctionParameters` (#5419 ) This commit allows retrieving a `MachBufferFinalized<Final>` from a `MachBufferFinalized<Stencil>` without relying on `ir::FunctionParameters`. Instead it uses the function's base source location, which is the only piece used by the previous `apply_params` definition. This change allows other uses cases (e.g. Winch) to use an opaque, common concept, exposed outside of `cranelift-codegen` to get the finalized state of the machine buffer. This change implies that Winch will transitively know about the `Stencil` compilation phase, but the `Stencil` phase is not exposed to Winch. Other alternatives considered: Parametrizing `MachBufferFinzalized` in a way such that it allows specifying which compilation phase the caller is targetting. Such approach would require also parametrizing the `MachSrcLoc` definition. One of the main drawbacks of this approach is that it also requires changing how the `MachBuffer`'s `start_srcloc` works: for caller requesting a `Final` `MachBufferFinalized`, the `MachBuffer` will need to work in terms of `SourceLoc` rather than in `RelSourceLoc` terms. This approach doesn't necessarily present more advantages than the approach presented in this change, in which there's no need to make any fundamental changes and in which all the `cranelift-codegen` primitives are already exposed.	2022-12-13 07:13:45 -05:00
Trevor Elliott	a5ecb5e647	x64: Share a zero in the ushr translation on x64 to free up a register (#5424 ) Share a zero value in the translation of ushr for i128. This increases the lifetime of the value by a few instructions, and reduces the number of registers used in the translation by one, which seems like an acceptable trade-off.	2022-12-12 18:15:43 -08:00
Chris Fallin	9397ea1abe	Cranelift: implement general select_spectre_guard fallbacks. (#5420 ) When adding some optimization rules for `icmp` in the egraph infrastructure, we ended up creating a path to legal CLIF but with patterns unsupported by three of our four backends: specifically, `select_spectre_guard` with a general truthy input, rather than an `icmp`. In #5206 we discussed replacing `select_spectre_guard` with something more specific, and that could still be a long-term solution here, but doing so now would interfere with ongoing refactoring of heap access lowering, so I've opted not to do so. (In that issue I was concerned about complexity and didn't see the need but with this fuzzbug I'm starting to feel a bit differently; maybe we should remove this non-orthogonal op in the long run.) Fixes #5417.	2022-12-12 17:13:34 -08:00
Saúl Cabrera	7adf3cacc5	cranelift-codegen: Prepare cranelift codegen for usage from Winch (#5413 ) This commit prepares the x64 pieces from cranelift codegen to be consumed by Winch for binary emission. This change doesn't introduce or modifies functionality it makes the necessary pieces for binary emission public. This change also improves documentation where applicable.	2022-12-12 09:01:06 -08:00
Timothy Chen	122872fb0c	Remove references for sig (#5414 )	2022-12-12 08:46:23 -08:00
Timothy Chen	8035945502	Reduce sig data size by changing sized spaces (#5402 ) * Reduce sig sizes * Fix test * Change compute_args_loc to return u32	2022-12-11 15:32:30 -08:00
Chris Fallin	244dce93f6	Fix optimization rules for narrow types: wrap i8 results to 8 bits. (#5409 ) * Fix optimization rules for narrow types: wrap i8 results to 8 bits. This fixes #5405. In the egraph mid-end's optimization rules, we were rewriting e.g. imuls of two iconsts to an iconst of the result, but without masking off the high bits (beyond the result type's width). This was producing iconsts with set high bits beyond their types' width, which is not legal. In addition, this PR adds some optimizations to the algebraic rules to recognize e.g. `x == x` (and all other integer comparison operators) and resolve to 1 or 0 as appropriate. * Review feedback. * Review feedback, again.	2022-12-09 22:29:25 +00:00
Ulrich Weigand	e913cf3647	Remove IFLAGS/FFLAGS types (#5406 ) All instructions using the CPU flags types (IFLAGS/FFLAGS) were already removed. This patch completes the cleanup by removing all remaining instructions that define values of CPU flags types, as well as the types themselves. Specifically, the following features are removed: - The IFLAGS and FFLAGS types and the SpecialType category. - Special handling of IFLAGS and FFLAGS in machinst/isle.rs and machinst/lower.rs. - The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry, isub_ifbin, isub_ifbout, and isub_ifborrow instructions. - The writes_cpu_flags instruction property. - The flags verifier pass. - Flags handling in the interpreter. All of these features are currently unused; no functional change intended by this patch. This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.	2022-12-09 13:42:03 -08:00
Jamey Sharp	8726eeefb3	cranelift-isle: Add "partial" flag for constructors (#5392 ) * cranelift-isle: Add "partial" flag for constructors Instead of tying fallibility of constructors to whether they're either internal or pure, this commit assumes all constructors are infallible unless tagged otherwise with a "partial" flag. Internal constructors without the "partial" flag are not allowed to use constructors which have the "partial" flag on the right-hand side of any rules, because they have no way to report last-minute match failures. Multi-constructors should never be "partial"; they report match failures with an empty iterator instead. In turn this means you can't use partial constructors on the right-hand side of internal multi-constructor rules. However, you can use the same constructors on the left-hand side with `if` or `if-let` instead. In many cases, ISLE can already trivially prove that an internal constructor always returns `Some`. With this commit, those cases are largely unchanged, except for removing all the `Option`s and `Some`s from the generated code for those terms. However, for internal non-partial constructors where ISLE could not prove that, it now emits an `unreachable!` panic as the last-resort, instead of returning `None` like it used to do. Among the existing backends, here's how many constructors have these panic cases: - x64: 14% (53/374) - aarch64: 15% (41/277) - riscv64: 23% (26/114) - s390x: 47% (268/567) It's often possible to rewrite rules so that ISLE can tell the panic can never be hit. Just ensure that there's a lowest-priority rule which has no constraints on the left-hand side. But in many of these constructors, it's difficult to statically prove the unhandled cases are unreachable because that's only down to knowledge about how they're called or other preconditions. So this commit does not try to enforce that all terms have a last-resort fallback rule. * Check term flags while translating expressions Instead of doing it in a separate pass afterward. This involved threading all the term flags (pure, multi, partial) through the recursive `translate_expr` calls, so I extracted the flags to a new struct so they can all be passed together. * Validate multi-term usage Now that I've threaded the flags through `translate_expr`, it's easy to check this case too, so let's just do it. * Extract `ReturnKind` to use in `ExternalSig` There are only three legal states for the combination of `multi` and `infallible`, so replace those fields of `ExternalSig` with a three-state enum. * Remove `Option` wrapper from multi-extractors too If we'd had any external multi-constructors this would correct their signatures as well. * Update ISLE tests * Tag prelude constructors as pure where appropriate I believe the only reason these weren't marked `pure` before was because that would have implied that they're also partial. Now that those two states are specified separately we apply this flag more places. * Fix my changes to aarch64 `lower_bmask` and `imm` terms	2022-12-07 17:16:03 -08:00
Chris Fallin	8c55b81300	Optimizations to egraph framework (#5391 ) * Optimizations to egraph framework: - Save elaborated results by canonical value, not latest value (union value). Previously we were artificially skipping and re-elaborating some values we already had because we were not finding them in the map. - Make some changes to handling of icmp results: when icmp became I8-typed (when bools went away), many uses became `(uextend $I32 (icmp $I8 ...))`, and so patterns in lowering backends were no longer matching. This PR includes an x64-specific change to match `(brz (uextend (icmp ...)))` and similarly for `brnz`, but it also takes advantage of the ability to write rules easily in the egraph mid-end to rewrite selects with icmp inputs appropriately. - Extend constprop to understand selects in the egraph mid-end. With these changes, bz2.wasm sees a ~1% speedup, and spidermonkey.wasm with a fib.js input sees a 16.8% speedup: ``` $ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.base.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 2.14s user 0.01s system 99% cpu 2.148 total $ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.egraphs.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 1.78s user 0.01s system 99% cpu 1.788 total ``` * Review feedback.	2022-12-07 13:23:13 -08:00
Trevor Elliott	c5379051c4	Enable the ssa verifier in debug builds (#5354 ) Enable regalloc2's SSA verifier in debug builds to check for any outstanding reuse of virtual registers in def constraints. As fuzzing enables debug_assertions, this will enable the SSA verifier when fuzzing as well.	2022-12-07 12:22:51 -08:00
Nick Fitzgerald	f0c4b6f3a1	Cranelift: Implement `iadd_cout` on x64 for 32- and 64-bit integers (#5285 ) * Split the `iadd_cout` runtests by type * Implement `iadd_cout` for 32- and 64-bit values on x64 * Delete trailing whitespace in `riscv/lower.isle`	2022-12-07 19:54:14 +00:00
Jamey Sharp	29b23d41b6	ISLE rule cleanups (#5389 ) * cranelift-codegen: Use ISLE matching, not same_value The `same_value` function just wrapped an equality test into an external constructor, but we can do that with ISLE's equality constraints instead. * riscv64: Remove custom condition-code tests The `lower_icmp` term exists solely to decide whether to sign-extend or zero-extend the comparison operands, based on whether the condition code requires a signed comparison. It additionally tested whether the condition code was == or !=, but produced the same result as for other unsigned comparisons. We already have `signed_cond_code` in the ISLE prelude, which classifies the total-ordering condition codes according to whether they're signed. It also lumps == and != in the "unsigned" camp, as desired. So this commit uses the existing method from the prelude instead of riscv64-local definitions. Because this version has no constraints on the left-hand side of the rule in the unsigned case, ISLE generates Rust that always returns `Some`. That shows that the current use of `unwrap` is justified, at the only Rust-side call-site of `constructor_lower_icmp`, which is in cranelift/codegen/src/isa/riscv64/lower/isle.rs. * ISLE prelude: make offset32 infallible This extractor always returns `Some`, so it doesn't need to be fallible.	2022-12-07 02:55:59 +00:00
Chris Fallin	f980defe17	egraph support: rewrite to work in terms of CLIF data structures. (#5382 ) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2022-12-06 14:58:57 -08:00
Trevor Elliott	293bb5b334	riscv64: Only emit jumps at the end of basic blocks (#5381 ) This PR fixes two bugs in the riscv64 backend, where branch instructions were emitted in the middle of a basic block: Constant emission, where the constants are inlined into the vcode and are jumped over at runtime, The BrTableCheck pseudo-instruction, which was always emitted before a BrTable instruction, and would handle jumping to the default label. The first bug was resolved by introducing two new psuedo instructions, LoadConst32 and LoadConst64. Both of these instructions serve to delay the original encoding to emission time, after regalloc2 has run. The second bug was fixed by removing the BrTableCheck instruction. As it was always emitted directly before BrTable, it was easier to remove it and merge the two into a single instruction.	2022-12-06 10:54:10 -08:00
Chris Fallin	feaa7ca75f	Alias analysis: refactor for use by other driver loops. (#5380 ) * Alias analysis: refactor for use by other driver loops. This PR pulls the core of the alias analysis infrastructure into a `process_inst()` method that operates on a single instruction, and allows another compiler pass to apply store-to-load forwarding and redundant-load elimination interleaved with other work. The existing behavior remains unchanged; the pass's toplevel loop calls this extracted method. This refactor is a prerequisite for using the alias analysis as part of a refactored egraph-based optimization framework. * Review feedback: remove unneeded mut.	2022-12-06 18:30:02 +00:00
Trevor Elliott	353a681671	Avoid reusing a register during constant loading (#5379 ) Avoid reusing a register when loading a constant, allocating a temporary instead.	2022-12-05 18:37:53 -08:00
Trevor Elliott	7d28d586da	riscv64: Don't reuse registers when loading constants (#5376 ) Rework the constant loading functions in the riscv64 backend to generate fresh temporaries instead of reusing the destination register.	2022-12-05 16:51:52 -08:00
Saúl Cabrera	28cfa57533	cranelift: Small documentation fixes (#5377 ) * `translate_operator` doesn't return a boolean. * `from_base_offset` doesn't panic if offset is smaller than base.	2022-12-06 00:46:58 +00:00
Trevor Elliott	817c2b205c	riscv64: Use a temporary when translating shift amount (#5375 ) Use a temporary when translating the shift amount, instead of reusing the destination register.	2022-12-05 20:54:14 +00:00
Trevor Elliott	b475b9bd19	Terminate blocks with a single branch in riscv64 (#5374 ) Ensure that we're terminating blocks with a single branch instruction, when testing I128 values against zero.	2022-12-05 20:13:28 +00:00
Anton Romanov	29d4d1063f	[codegen] Fixed mutability of domtree reference (#5371 )	2022-12-05 08:19:32 -08:00
Trevor Elliott	6aea8e0d7e	Don't reuse destination registers when lowering splat on aarch64 (#5370 )	2022-12-05 08:18:49 -08:00
Trevor Elliott	2e9b0802ab	aarch64: Rework amode compilation to produce SSA code (#5369 ) Rework the compilation of amodes in the aarch64 backend to stop reusing registers and instead generate fresh virtual registers for intermediates. This resolves some SSA checker violations with the aarch64 backend, and as a nice side-effect removes some unnecessary movs in the generated code.	2022-12-02 01:23:15 +00:00
Trevor Elliott	d54a27d0ea	Allocate temporary intermediates when loading constants on aarch64 (#5366 ) As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.	2022-12-01 22:29:36 +00:00
Nick Fitzgerald	4510a4a805	Cranelift: mark post-legalization trapping blocks as cold (#5367 ) Trapping is a rare event.	2022-12-01 12:46:26 -08:00
Trevor Elliott	37c3c5b1e0	Remove an unnecessary debug trace (#5359 )	2022-11-30 20:37:20 -08:00
Trevor Elliott	c16f2956db	Allocate a temporary for 64-bit constant loads in the s390x backend (#5357 ) Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.	2022-11-30 17:01:14 -08:00
Trevor Elliott	d8dbabfe6b	Don't reuse registers in the x64 div lowering (#5356 ) Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.	2022-11-30 22:44:59 +00:00
Trevor Elliott	87b63174b1	Don't reuse registers in make_i64x2_from_lanes (#5355 ) Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.	2022-11-30 14:37:01 -08:00
Nick Fitzgerald	79f7fa6079	Cranelift: implement `heap_{load,store}` instruction legalization (#5351 ) * Cranelift: implement `heap_{load,store}` instruction legalization This does not remove `heap_addr` yet, but it does factor out the common bounds-check-and-compute-the-native-address functionality that is shared between all of `heap_{addr,load,store}`. Finally, this adds a missing optimization for when we can dedupe explicit bounds checks for static memories and Spectre mitigations. * Cranelift: Enable `heap_load_store_*` run tests on all targets	2022-11-30 19:12:49 +00:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Timothy Chen	67fc5389b0	Remove sig data arg and ret fields to reduce size (#5319 ) * Remove sig data arg and ret fields to reduce size * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix offsets * Add comment Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-30 07:19:41 -08:00
Nick Fitzgerald	913a2ec8c8	Cranelift: consider heap's guard pages when legalizing `heap_addr` (#5335 ) * Cranelift: consider heap's guard pages when legalizing `heap_addr` Fixes #5328 * Update comment to align more directly with implementation * Add legalization tests for `heap_addr` and offset guard pages	2022-11-29 19:54:25 +00:00
Trevor Elliott	f138fc0ed3	Bump regalloc2 to 0.5.0 (#5345 ) * Bump the regalloc2 dependency to 0.5.0 * Replace preg_set_from_machine_env with PRegSet::from * Vet the regalloc2 update	2022-11-29 11:25:35 -08:00
Afonso Bordado	ec342c20e3	cranelift: Add `iadd_cout` lowerings for aarch64 (#5177 ) * cranelift: Add `iadd_cout`/`isub_bout` i128 tests * aarch64: Add `iadd_cout` lowerings * fuzzgen: Add `iadd_cout`	2022-11-29 10:58:44 -08:00
Trevor Elliott	a5a0645aff	Don't allow reuse_def constraints in the s390x Loop instruction (#5336 )	2022-11-28 17:52:11 -08:00
Trevor Elliott	368004428a	Fix rule shadowing instances in x64 and aarch64 backends (#5334 ) Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.	2022-11-28 15:48:34 -08:00
Nick Fitzgerald	58a5089e48	Cranelift: log number of CLIF insts/blocks to optimize/lower (#5333 )	2022-11-28 19:35:29 +00:00
Nick Fitzgerald	6fe69d00ca	Cranelift: add debug logs counting how many vcode instructions and blocks we lower to (#5332 )	2022-11-28 18:57:02 +00:00

... 3 4 5 6 7 ...

1825 Commits