wasmtime

Author	SHA1	Message	Date
Trevor Elliott	19f337e29b	Move the default block to the front of the underlying jump table storage (#5770 ) The new api on JumpTableData makese it easy to keep the default label first, and that shrinks the diff in #5731 a bit.	2023-02-13 20:50:29 +00:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
Alex Crichton	de0e0bea3f	Legalize `b{and,or,xor}_not` into component instructions (#5709 ) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <jamey@minilop.net> --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 13:53:40 -06:00
Trevor Elliott	6d8f2be9e1	Use `andn` for `band_not` when bmi1 is present (#5701 ) We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.	2023-02-03 16:23:18 -08:00
Jun Ryung Ju	9cd4146939	Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type. (#5604 ) * Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type. * Added tests.	2023-02-01 21:57:18 -08:00
Nick Fitzgerald	bdfb746548	Cranelift: Introduce the `return_call` and `return_call_indirect` instructions (#5679 ) * Cranelift: Introduce the `tail` calling convention This is an unstable-ABI calling convention that we will eventually use to support Wasm tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Introduce the `return_call` and `return_call_indirect` instructions These will be used to implement tail calls for Wasm and any other language targeting CLIF. The `return_call_indirect` instruction differs from the Wasm instruction of the same name by taking a native address callee rather than a Wasm function index. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Implement verification rules for `return_call[_indirect]` They must: * have the same return types between the caller and callee, * have the same calling convention between caller and callee, * and that calling convention must support tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * cargo fmt --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-01 21:20:35 +00:00
Trevor Elliott	a5698cedf8	cranelift: Remove brz and brnz (#5630 ) Remove the brz and brnz instructions, as their behavior is now redundant with brif.	2023-01-30 20:34:56 +00:00
Trevor Elliott	a181ad2932	Cleanup the use of `maybe_uextend` in the x64 lowerings (#5637 ) Use maybe_uextend for the brnz lowerings on x64.	2023-01-25 17:28:48 -08:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
Saúl Cabrera	6cb68f3287	cranelift-codegen: Expose x64 settings (#5561 ) Exposes x64 settings so that they can be consumed from Winch for binary emission.	2023-01-11 18:33:03 -05:00
Alexa VanHattum	44913825b5	cranelift: fix register for `srem.i8` on x86_64 (#5540 ) * Change register written to in specific srem case. Add regression test as filetest case. Fixes #5470 * Add another test case, newline * Update comment	2023-01-06 22:18:16 +00:00
Sam Sartor	1efa3d6f8b	Add `clif-util compile` option to output object file (#5493 ) * add clif-util compile option to output object file * switch from a box to a borrow * update objectmodule tests to use borrowed isa * put targetisa into an arc	2023-01-06 12:53:48 -08:00
uint256_t	b00455135e	Cranelift: Implement 'iabs' for scalar types on x86_64 (#5527 ) * Implement 'iabs' for scalar types on x86_64 * Small fix	2023-01-05 21:33:12 -08:00
KarelPeeters	320d67fe8d	Cranelift: include return values in instruction pretty print output. (#5489 )	2023-01-03 09:06:47 -08:00
Chris Fallin	03463458e4	Cranelift: fix branch-of-icmp/fcmp regression: look through `uextend`. (#5487 ) In #5031, we removed `bool` types from CLIF, using integers instead for "truthy" values. This greatly simplified the IR, and was generally an improvement. However, because x86's `SETcc` instruction sets only the low 8 bits of a register, we chose to use `i8` types as the result of `icmp` and `fcmp`, to avoid the need for a masking operation when materializing the result. Unfortunately this means that uses of truthy values often now have `uextend` operations, especially when coming from Wasm (where truthy values are naturally `i32`-typed). For example, where we previously had `(brz (icmp ...))`, we now have `(brz (uextend (icmp ...)))`. It's arguable whether or not we should switch to `i32` truthy values -- in most cases we can avoid materializing a value that's immediately used for a branch or select, so a mask would in most cases be unnecessary, and it would be a win at the IR level -- but irrespective of that, this change did regress our generated code quality: our backends had patterns for e.g. `(brz (icmp ...))` but not with the `uextend`, so we were always materializing truthy values. Many blocks thus ended with "cmp; setcc; cmp; test; branch" rather than "cmp; branch". In #5391 we noticed this and fixed it on x64, but it was a general problem on aarch64 and riscv64 as well. This PR introduces a `maybe_uextend` extractor that "looks through" uextends, and uses it where we consume truthy values, thus fixing the regression. This PR also adds compile filetests to ensure we don't regress again. The riscv64 backend has not been updated here because doing so appears to trigger another issue in its branch handling; fixing that is TBD.	2022-12-22 01:43:44 -08:00
Saúl Cabrera	962a911163	cranelift-codegen: Add support for immediate to memory moves in x64 (#5461 ) This change adds support for immediate to memory moves in x64 which are needed by Winch for zeroing local slots. This change follows the guideline in `isa/x64/inst/emit` and uses other instructions (immediate to register moves) as a base for the test cases. The instruction encoding expectation was derived by assembling each instruction and inspecting the assembly with `objdump`.	2022-12-19 21:54:45 +00:00
Chris Fallin	22439f7b39	support select_spectre_guard and select on i128 conditions on all platforms. (#5460 ) Fixes #5199. Fixes #5200. Fixes #5452. Fixes #5453. On riscv64, there is apparently an autoconversion from `ValueRegs` to `Reg` that takes just the low register [0], and removing this conversion causes 48 errors. As a result of this, `select` with an `i128` condition was silently miscompiling, testing only the low 64 bits. We should remove this autoconversion to ensure we aren't missing any other silent truncations, but for now this PR just adds the explicit `I128` logic for `select` / `select_spectre_guard`. [0] `d9fdbfd50e/cranelift/codegen/src/isa/riscv64/inst.isle (L1762)`	2022-12-16 14:18:22 -08:00
Ulrich Weigand	f0af622208	Simplify LowerBackend interface (#5432 ) * Refactor lower_branch to have Unit result Branches cannot have any output, so it is more straightforward to have the ISLE term return Unit instead of InstOutput. Also provide a new `emit_side_effect` term to simplify implementation of `lower_branch` rules with Unit result. * Simplify LowerBackend interface Move all remaining asserts from the LowerBackend::lower and ::lower_branch_group into the common call site. Change return value of ::lower to Option<InstOutput>, and return value of ::lower_branch_group to Option<()> to match ISLE term signature. Only pass the first branch into ::lower_branch_group and rename it to ::lower_branch. As a result of all those changes, LowerBackend routines now consists solely to calls to the corresponding ISLE routines.	2022-12-14 00:48:25 +00:00
Ulrich Weigand	299be327d5	Simplify "unimplemented" operation error message (#5429 ) Now that all operations are implemented in ISLE, simplify Rust code by providing a generic error message if any operation is not implemented in ISLE. Done across all targets.	2022-12-13 15:22:49 -08:00
Ulrich Weigand	df923f18ca	Remove MachInst::gen_constant (#5427 ) * aarch64: constant generation cleanup Add support for MOVZ and MOVN generation via ISLE. Handle f32const, f64const, and nop instructions via ISLE. No longer call Inst::gen_constant from lower.rs. * riscv64: constant generation cleanup Handle f32const, f64const, and nop instructions via ISLE. * s390x: constant generation cleanup Fix rule priorities for "imm" term. Only handle 32-bit stack offsets; no longer use load_constant64. * x64: constant generation cleanup No longer call Inst::gen_constant from lower.rs or abi.rs. * Refactor LowerBackend::lower to return InstOutput No longer write to the per-insn output registers; instead, return an InstOutput vector of temp registers holding the outputs. This will allow calling LowerBackend::lower multiple times for the same instruction, e.g. to rematerialize constants. When emitting the primary copy of the instruction during lowering, writing to the per-insn registers is now done in lower_clif_block. As a result, the ISLE lower_common routine is no longer needed. In addition, the InsnOutput type and all code related to it can be removed as well. * Refactor IsleContext to hold a LowerBackend reference Remove the "triple", "flags", and "isa_flags" fields that are copied from LowerBackend to each IsleContext, and instead just hold a reference to LowerBackend in IsleContext. This will allow calling LowerBackend::lower from within callbacks in src/machinst/isle.rs, e.g. to rematerialize constants. To avoid having to pass LowerBackend references through multiple functions, eliminate the lower_insn_to_regs subroutines in those targets that still have them, and just inline into the main lower routine. This also eliminates lower_inst.rs on aarch64 and riscv64. Replace all accesses to the removed IsleContext fields by going through the LowerBackend reference. * Remove MachInst::gen_constant This addresses the problem described in issue https://github.com/bytecodealliance/wasmtime/issues/4426 that targets currently have to duplicate code to emit constants between the ISLE logic and the gen_constant callback. After the various cleanups in earlier patches in this series, the only remaining user of get_constant is put_value_in_regs in Lower. This can now be removed, and instead constant rematerialization can be performed in the put_in_regs ISLE callback by simply directly calling LowerBackend::lower on the instruction defining the constant (using a different output register). Since the check for egraph mode is now no longer performed in put_value_in_regs, the Lower::flags member becomes obsolete. Care needs to be taken that other calls directly to the Lower::put_value_in_regs routine now handle the fact that no more rematerialization is performed. All such calls in target code already historically handle constants themselves. The remaining call site in the ISLE gen_call_common helper can be redirected to the ISLE put_in_regs callback. The existing target implementations of gen_constant are then unused and can be removed. (In some target there may still be further opportunities to remove duplication between ISLE and some local Rust code - this can be left to future patches.)	2022-12-13 13:00:04 -08:00
Trevor Elliott	a5ecb5e647	x64: Share a zero in the ushr translation on x64 to free up a register (#5424 ) Share a zero value in the translation of ushr for i128. This increases the lifetime of the value by a few instructions, and reduces the number of registers used in the translation by one, which seems like an acceptable trade-off.	2022-12-12 18:15:43 -08:00
Chris Fallin	9397ea1abe	Cranelift: implement general select_spectre_guard fallbacks. (#5420 ) When adding some optimization rules for `icmp` in the egraph infrastructure, we ended up creating a path to legal CLIF but with patterns unsupported by three of our four backends: specifically, `select_spectre_guard` with a general truthy input, rather than an `icmp`. In #5206 we discussed replacing `select_spectre_guard` with something more specific, and that could still be a long-term solution here, but doing so now would interfere with ongoing refactoring of heap access lowering, so I've opted not to do so. (In that issue I was concerned about complexity and didn't see the need but with this fuzzbug I'm starting to feel a bit differently; maybe we should remove this non-orthogonal op in the long run.) Fixes #5417.	2022-12-12 17:13:34 -08:00
Saúl Cabrera	7adf3cacc5	cranelift-codegen: Prepare cranelift codegen for usage from Winch (#5413 ) This commit prepares the x64 pieces from cranelift codegen to be consumed by Winch for binary emission. This change doesn't introduce or modifies functionality it makes the necessary pieces for binary emission public. This change also improves documentation where applicable.	2022-12-12 09:01:06 -08:00
Timothy Chen	8035945502	Reduce sig data size by changing sized spaces (#5402 ) * Reduce sig sizes * Fix test * Change compute_args_loc to return u32	2022-12-11 15:32:30 -08:00
Ulrich Weigand	e913cf3647	Remove IFLAGS/FFLAGS types (#5406 ) All instructions using the CPU flags types (IFLAGS/FFLAGS) were already removed. This patch completes the cleanup by removing all remaining instructions that define values of CPU flags types, as well as the types themselves. Specifically, the following features are removed: - The IFLAGS and FFLAGS types and the SpecialType category. - Special handling of IFLAGS and FFLAGS in machinst/isle.rs and machinst/lower.rs. - The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry, isub_ifbin, isub_ifbout, and isub_ifborrow instructions. - The writes_cpu_flags instruction property. - The flags verifier pass. - Flags handling in the interpreter. All of these features are currently unused; no functional change intended by this patch. This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.	2022-12-09 13:42:03 -08:00
Jamey Sharp	8726eeefb3	cranelift-isle: Add "partial" flag for constructors (#5392 ) * cranelift-isle: Add "partial" flag for constructors Instead of tying fallibility of constructors to whether they're either internal or pure, this commit assumes all constructors are infallible unless tagged otherwise with a "partial" flag. Internal constructors without the "partial" flag are not allowed to use constructors which have the "partial" flag on the right-hand side of any rules, because they have no way to report last-minute match failures. Multi-constructors should never be "partial"; they report match failures with an empty iterator instead. In turn this means you can't use partial constructors on the right-hand side of internal multi-constructor rules. However, you can use the same constructors on the left-hand side with `if` or `if-let` instead. In many cases, ISLE can already trivially prove that an internal constructor always returns `Some`. With this commit, those cases are largely unchanged, except for removing all the `Option`s and `Some`s from the generated code for those terms. However, for internal non-partial constructors where ISLE could not prove that, it now emits an `unreachable!` panic as the last-resort, instead of returning `None` like it used to do. Among the existing backends, here's how many constructors have these panic cases: - x64: 14% (53/374) - aarch64: 15% (41/277) - riscv64: 23% (26/114) - s390x: 47% (268/567) It's often possible to rewrite rules so that ISLE can tell the panic can never be hit. Just ensure that there's a lowest-priority rule which has no constraints on the left-hand side. But in many of these constructors, it's difficult to statically prove the unhandled cases are unreachable because that's only down to knowledge about how they're called or other preconditions. So this commit does not try to enforce that all terms have a last-resort fallback rule. * Check term flags while translating expressions Instead of doing it in a separate pass afterward. This involved threading all the term flags (pure, multi, partial) through the recursive `translate_expr` calls, so I extracted the flags to a new struct so they can all be passed together. * Validate multi-term usage Now that I've threaded the flags through `translate_expr`, it's easy to check this case too, so let's just do it. * Extract `ReturnKind` to use in `ExternalSig` There are only three legal states for the combination of `multi` and `infallible`, so replace those fields of `ExternalSig` with a three-state enum. * Remove `Option` wrapper from multi-extractors too If we'd had any external multi-constructors this would correct their signatures as well. * Update ISLE tests * Tag prelude constructors as pure where appropriate I believe the only reason these weren't marked `pure` before was because that would have implied that they're also partial. Now that those two states are specified separately we apply this flag more places. * Fix my changes to aarch64 `lower_bmask` and `imm` terms	2022-12-07 17:16:03 -08:00
Chris Fallin	8c55b81300	Optimizations to egraph framework (#5391 ) * Optimizations to egraph framework: - Save elaborated results by canonical value, not latest value (union value). Previously we were artificially skipping and re-elaborating some values we already had because we were not finding them in the map. - Make some changes to handling of icmp results: when icmp became I8-typed (when bools went away), many uses became `(uextend $I32 (icmp $I8 ...))`, and so patterns in lowering backends were no longer matching. This PR includes an x64-specific change to match `(brz (uextend (icmp ...)))` and similarly for `brnz`, but it also takes advantage of the ability to write rules easily in the egraph mid-end to rewrite selects with icmp inputs appropriately. - Extend constprop to understand selects in the egraph mid-end. With these changes, bz2.wasm sees a ~1% speedup, and spidermonkey.wasm with a fib.js input sees a 16.8% speedup: ``` $ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.base.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 2.14s user 0.01s system 99% cpu 2.148 total $ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.egraphs.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 1.78s user 0.01s system 99% cpu 1.788 total ``` * Review feedback.	2022-12-07 13:23:13 -08:00
Trevor Elliott	c5379051c4	Enable the ssa verifier in debug builds (#5354 ) Enable regalloc2's SSA verifier in debug builds to check for any outstanding reuse of virtual registers in def constraints. As fuzzing enables debug_assertions, this will enable the SSA verifier when fuzzing as well.	2022-12-07 12:22:51 -08:00
Nick Fitzgerald	f0c4b6f3a1	Cranelift: Implement `iadd_cout` on x64 for 32- and 64-bit integers (#5285 ) * Split the `iadd_cout` runtests by type * Implement `iadd_cout` for 32- and 64-bit values on x64 * Delete trailing whitespace in `riscv/lower.isle`	2022-12-07 19:54:14 +00:00
Trevor Elliott	d54a27d0ea	Allocate temporary intermediates when loading constants on aarch64 (#5366 ) As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.	2022-12-01 22:29:36 +00:00
Trevor Elliott	d8dbabfe6b	Don't reuse registers in the x64 div lowering (#5356 ) Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.	2022-11-30 22:44:59 +00:00
Trevor Elliott	87b63174b1	Don't reuse registers in make_i64x2_from_lanes (#5355 ) Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.	2022-11-30 14:37:01 -08:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Trevor Elliott	368004428a	Fix rule shadowing instances in x64 and aarch64 backends (#5334 ) Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.	2022-11-28 15:48:34 -08:00
Nick Fitzgerald	d0d3245a35	Cranelift: Add `heap_load` and `heap_store` instructions (#5300 ) * Cranelift: Define `heap_load` and `heap_store` instructions * Cranelift: Implement interpreter support for `heap_load` and `heap_store` * Cranelift: Add a suite runtests for `heap_{load,store}` There are so many knobs we can twist for heaps and I wanted to exhaustively test all of them, so I wrote a script to generate the tests. I've checked in the script in case we want to make any changes in the future, but I don't think it is worth adding this to CI to check that scripts are up to date or anything like that. * Review feedback	2022-11-21 23:00:39 +00:00
Trevor Elliott	54cfa4df34	cranelift: Fix implicit pointer argument register use (#5301 ) * Fix arg handling to write to VRegs instead of physical regs * Make is_included_in_clobbers required, and handle Args on x64 and riscv64	2022-11-18 16:47:03 -08:00
Jun Ryung Ju	e5f93d9ec0	cranelift: Support `bnot`, `band`, `bor`, `bxor` for x86_64. (#5036 ) * Support `bnot`, `band`, `bor`, `bxor` for x86_64. * Fix-up to handle `B{8,16,32,64}` type on bitops * Fix-up conflict.	2022-11-18 07:45:54 -08:00
Trevor Elliott	4780bd5902	Don't use %rcx directly with CoffTlsGetAddr (#5278 ) Avoid naming %rcx as written by the CoffTlsGetAddr pseudo-instruction in the x64 backend, and instead emit a fixed-def constraint for a fresh VReg and %rcx.	2022-11-16 11:32:09 -08:00
Trevor Elliott	07bd8bf34a	Remove unnecessary moves in x64 gen_memcpy (#5277 ) Remove some unnecessary moves in the x64 gen_memcpy implementation -- the call instruction that's generated will already constrain the args to those registers.	2022-11-16 10:33:00 -08:00
Trevor Elliott	a007e02bd2	Add fixed_nonallocatable constraints when appropriate (#5253 ) Plumb the set of allocatable registers through the OperandCollector and use it validate uses of fixed-nonallocatable registers, like %rsp on x86_64.	2022-11-15 12:49:17 -08:00
Trevor Elliott	dece901d16	Use regalloc constraints for sse blend operations (#5251 ) Instead of using xmm0 explicitly for the mask argument to instructions like blendvpd, use regalloc constraints to constrain it to xmm0 instead.	2022-11-14 16:44:34 -08:00
Trevor Elliott	0367fbc2d4	cranelift: Rework pinned register lowering (#5249 ) Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.	2022-11-10 16:19:25 -08:00
Trevor Elliott	b077854b57	Generate SSA code from returns (#5172 ) Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.	2022-11-08 16:00:49 -08:00
Ulrich Weigand	3e5938e65a	Support big- and little-endian lane order with bitcast (#5196 ) Add a MemFlags operand to the bitcast instruction, where only the `big` and `little` flags are accepted. These define the lane order to be used when casting between types of different lane counts. Update all users to pass an appropriate MemFlags argument. Implement lane swaps where necessary in the s390x back-end. This is the final part necessary to fix https://github.com/bytecodealliance/wasmtime/issues/4566.	2022-11-07 14:41:10 -08:00
Ulrich Weigand	137a8b710f	Move bitselect->vselect optimization to x64 back-end (#5191 ) The simplifier was performing an optimization to replace bitselect with vselect if the all bytes of the condition mask could be shown to be all ones or all zeros. This optimization only ever made any difference in codegen on the x64 target. Therefore, move this optimization to the x64 back-end and perform it in ISLE instead. Resulting codegen should be unchanged, with slightly improved compile time. This also eliminates a few endian-dependent bitcast operations.	2022-11-03 20:17:36 +00:00
Afonso Bordado	3ef30b5b67	cranelift: Rename `i{min,max}` to `s{min,max}` (#5187 ) This brings these instructions with our general naming convention of signed instructions being prefixed with `s`.	2022-11-03 18:20:33 +00:00
Afonso Bordado	2c69b94744	cranelift: Add support for `bswap.i128` (#5186 ) * fuzzgen: Request only one variable for bswap This was included by accident. Bswap only has one input, instead of two. * cranelift: Add `bswap.i128` support Adds support only for x86, AArch64, S390X. RISCV does not yet have bswap.	2022-11-03 18:03:37 +00:00
Alex Crichton	22159848c5	Fix instruction size test for Rust 1.65.0 (#5188 ) Looks like Rust generously shrank our `enum` in 1.65.0, so update the test assertion to pass CI.	2022-11-03 16:53:51 +00:00
Trevor Elliott	aeceea28e2	Remove trapif and trapff (#5162 ) This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.	2022-11-03 09:25:11 -07:00

1 2 3 4 5 ...

600 Commits