wasmtime

Author	SHA1	Message	Date
Alex Crichton	5ae8575296	x64: Take SIGFPE signals for divide traps (#6026 ) * x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests	2023-03-16 00:18:45 +00:00
Alex Crichton	03b5dbb3e0	aarch64: Use `VCodeConstant` for f64/v128 constants (#5997 ) * aarch64: Translate float and splat lowering to ISLE I was looking into `constant_f128` and its fallback lowering into memory and to get familiar with the code I figured it'd be good to port some Rust logic to ISLE. This commit ports the `constant_{f128,f64,f32}` helpers into ISLE from Rust as well as the `splat_const` helper which ended up being closely related. Tests reflect a number of regalloc changes that happened but also namely one major difference is that in the lowering of `f32` a 32-bit immediate is created now instead of a 64-bit immediate (in a GP register before it's moved into a FP register). This semantically has no change but the generated code is slightly different in a few minor cases. * aarch64: Load f64/v128 constants from a pool This commit removes the `LoadFpuConst64` and `LoadFpuConst128` pseudo-instructions from the AArch64 backend which internally loaded a nearby constant and then jumped over it. Constants now go through the `VCodeConstant` infrastructure which gets placed at the end of the function similar to how x64 works. Some minor support was added in as well to add a new addressing mode for a `MachLabel`-relative load.	2023-03-13 19:33:52 +00:00
Ulrich Weigand	411781d2fe	s390x: Fix mistake in available_in_isa (#5981 ) The 32-bit float<->int conversion instructions are part of the VXRS_EXT2 facility, not MIE2. Fixes https://github.com/bytecodealliance/wasmtime/issues/5979.	2023-03-10 19:41:41 +00:00
Chris Fallin	7f3500a172	Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. (#5972 ) * Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. @avanhatt has been looking at our address-mode lowering and found an example where when feeding an `I32`-typed address into a load or store, we can violate assumptions and get incorrect codegen. This should never be reachable in practice, because all producers on 64-bit architectures use 64-bit types for addresses. However, our IR is insufficiently constrained, and allows loads/stores to `I32` addresses as well. This is nonsensical on a 64-bit architecture. Initially I had thought we should tighten either the instruction definition's accepted types, or the CLIF verifier, to reject this. However both are target-independent, and we don't want to bake an assumption of 64-bit-ness into the compiler core. Instead this PR tightens specific backends' lowerings to rejecct loads/stores of `I32`-typed addresses. tl;dr: no security implications as all producers use I64-typed addresses (and must, for correct operation); but we currently accept I32-typed addresses too, and this breaks other assumptions. * Allow R64 as well as I64 types. * Add an explicit extractor to match 64-bit address types.	2023-03-09 19:08:16 +00:00
Kevin Rizzo	013b35ff32	winch: Refactoring wasmtime compiler integration pieces to share more between Cranelift and Winch (#5944 ) * Enable the native target by default in winch Match cranelift-codegen's build script where if no architecture is explicitly enabled then the host architecture is implicitly enabled. * Refactor Cranelift's ISA builder to share more with Winch This commit refactors the `Builder` type to have a type parameter representing the finished ISA with Cranelift and Winch having their own typedefs for `Builder` to represent their own builders. The intention is to use this shared functionality to produce more shared code between the two codegen backends. * Moving compiler shared components to a separate crate * Restore native flag inference in compiler building This fixes an oversight from the previous commits to use `cranelift-native` to infer flags for the native host when using default settings with Wasmtime. * Move `Compiler::page_size_align` into wasmtime-environ The `cranelift-codegen` crate doesn't need this and winch wants the same implementation, so shuffle it around so everyone has access to it. * Fill out `Compiler::{flags, isa_flags}` for Winch These are easy enough to plumb through with some shared code for Wasmtime. * Plumb the `is_branch_protection_enabled` flag for Winch Just forwarding an isa-specific setting accessor. * Moving executable creation to shared compiler crate * Adding builder back in and removing from shared crate * Refactoring the shared pieces for the `CompilerBuilder` I decided to move a couple things around from Alex's initial changes. Instead of having the shared builder do everything, I went back to having each compiler have a distinct builder implementation. I refactored most of the flag setting logic into a single shared location, so we can still reduce the amount of code duplication. With them being separate, we don't need to maintain things like `LinkOpts` which Winch doesn't currently use. We also have an avenue to error when certain flags are sent to Winch if we don't support them. I'm hoping this will make things more maintainable as we build out Winch. I'm still unsure about keeping everything shared in a single crate (`cranelift_shared`). It's starting to feel like this crate is doing too much, which makes it difficult to name. There does seem to be a need for two distinct abstraction: creating the final executable and the handling of shared/ISA flags when building the compiler. I could make them into two separate crates, but there doesn't seem to be enough there yet to justify it. * Documentation updates, and renaming the finish method * Adding back in a default temporarily to pass tests, and removing some unused imports * Fixing winch tests with wrong method name * Removing unused imports from codegen shared crate * Apply documentation formatting updates Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com> * Adding back in cranelift_native flag inferring * Adding new shared crate to publish list * Adding write feature to pass cargo check --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com> Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>	2023-03-08 15:07:13 +00:00
Alex Crichton	07518dfd36	Remove the Cranelift `vselect` instruction (#5918 ) * Remove the Cranelift `vselect` instruction This instruction is documented as selecting lanes based on the "truthy" value of the condition lane, but the current status of the implementation of this instruction is: * x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the high bit of each byte doing a byte-wise lane select rather than whatever the controlling type is. * AArch64 - this is the same as `bitselect` which is a bit-wise selection rather than a lane-wise selection. * s390x - this is the same as AArch64, a bit-wise selection rather than lane-wise. * interpreter - the interpreter implements the documented semantics of selecting based on "truthy" values. Coupled with the status of the implementation is the fact that this instruction is not used by WebAssembly SIMD today either. The only use of this instruction in Cranelift is the nan-canonicalization pass. By moving nan-canonicalization to `bitselect`, since that has the desired semantics, there's no longer any need for `vselect`. Given this situation this commit subsqeuently removes `vselect` and all usage of it throughout Cranelift. Closes #5917 * Review comments * Bring back vselect opts as bitselect opts * Clean up vselect usage in the interpreter * Move bitcast in nan canonicalization * Add a comment about float optimization	2023-03-08 00:42:05 +00:00
Trevor Elliott	709257011e	Restrict uextend and sextend to scalar integers (#5953 )	2023-03-07 19:10:50 +00:00
Alex Crichton	8bb183f16e	Implement the relaxed SIMD proposal (#5892 ) * Initial support for the Relaxed SIMD proposal This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn't been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn't have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they're now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. Tests for relaxed-simd are run both with a default `Engine` as well as a "force deterministic" `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated. * x64: Add x86-specific lowerings for relaxed simd This commit builds on the prior commit and adds an array of `x86_` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the "deterministic" lowering. Add libcall support for fma to Wasmtime This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature. * Ignore relaxed-simd tests on s390x and riscv64 * Enable relaxed-simd tests on s390x * Update cranelift/codegen/meta/src/shared/instructions.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> * Add a FIXME from review * Add notes about deterministic semantics * Don't default `has_native_fma` to `true` * Review comments and rebase fixes --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2023-03-07 15:52:41 +00:00
Alex Crichton	9b86a0b9b1	Remove the `widening_pairwise_dot_product_s` clif instruction (#5889 ) This was added for the wasm SIMD proposal but I've been poking around at this recently and the instruction can instead be represented by its component parts with the same semantics I believe. This commit removes the instruction and instead represents it with the existing `iadd_pairwise` instruction (among others) and updates backends to with new pattern matches to have the same codegen as before. This interestingly entirely removed the codegen rule with no replacement on the AArch64 backend as the existing rules all existed to produce the same codegen.	2023-02-27 18:43:43 +00:00
Trevor Elliott	c5d9d5b10f	Remove module-level code generation tests (#5870 ) * Remove module-level code generation tests * Add cold block tests for each backend * Better cold block tests	2023-02-24 01:19:26 +00:00
Trevor Elliott	8abfe928d6	Reuse the DominatorTree postorder travesal in BlockLoweringOrder (#5843 ) * Rework the blockorder module to reuse the dom tree's cfg postorder * Update domtree tests * Treat br_table with an empty jump table as multiple block exits * Bless tests * Change branch_idx to succ_idx and fix the comment	2023-02-23 22:05:20 +00:00
Ulrich Weigand	4314210162	s390x: Fix implementation of {s,u}{min,max} (#5864 ) When expanding a min/max operation to a pair of icmp + select, do not attempt to expand the input value operands twice, as this might fail with memory operands. Fixes https://github.com/bytecodealliance/wasmtime/issues/5859.	2023-02-23 20:01:51 +00:00
Ulrich Weigand	9719147f91	s390x: Fix integer overflow during negation (#5866 ) Use wrapping_neg in i{64,32,16}_from_negated_value to avoid Rust aborts due to integer overflow. The resulting INT_MIN is already handled correctly in subsequent operations. Fixes https://github.com/bytecodealliance/wasmtime/issues/5863.	2023-02-23 16:32:10 +00:00
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2023-02-15 16:35:10 -08:00
Ulrich Weigand	305000d14b	s390x: Fix instruction encoding and disassembly format bugs (#5786 ) - Fix encoding of the AHY instruction. - Fix disassembly format of FIEBR, FIDBR, and LEDBRA instructions.	2023-02-15 08:36:44 -08:00
Ulrich Weigand	e10094dcd6	s390x: Support scalar min/max clif instructions (#5762 ) We don't have ISA instructions for that, so simply expand them to icmp + select. Also enable fuzzing for those clif instructions now.	2023-02-15 11:45:09 +00:00
Alex Crichton	a0a97f5e8f	Add (bnot (bxor x y)) lowerings for s390x/aarch64 (#5763 ) * Add (bnot (bxor x y)) lowerings for s390x/aarch64 I originally thought that s390x's original lowering in #5709, but as was rightfully pointed out `(bnot (bxor x y))` is equivalent to `(bxor x (bnot y))` so the special lowering for one should apply as a special lowering for the other. For the s390x and aarch64 backend that have already have a fused lowering of the bxor/bnot add a lowering additionally for the bnot/bxor combination. * Add bnot(bxor(..)) tests for s390x 128-bit sizes	2023-02-13 15:41:18 +00:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
Alex Crichton	de0e0bea3f	Legalize `b{and,or,xor}_not` into component instructions (#5709 ) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <jamey@minilop.net> --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 13:53:40 -06:00
Trevor Elliott	a5698cedf8	cranelift: Remove brz and brnz (#5630 ) Remove the brz and brnz instructions, as their behavior is now redundant with brif.	2023-01-30 20:34:56 +00:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Ulrich Weigand	a2e9a608c1	fuzzgen: Enable s390x and disable unimplemented ops (#5596 ) Also fix assertion failure when using "i128 uext" or "i128 sext" arguments or return values, as discovered by the fuzzer.	2023-01-19 10:08:32 -08:00
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
Sam Sartor	1efa3d6f8b	Add `clif-util compile` option to output object file (#5493 ) * add clif-util compile option to output object file * switch from a box to a borrow * update objectmodule tests to use borrowed isa * put targetisa into an arc	2023-01-06 12:53:48 -08:00
KarelPeeters	320d67fe8d	Cranelift: include return values in instruction pretty print output. (#5489 )	2023-01-03 09:06:47 -08:00
Trevor Elliott	9dc4f1a83c	s390x: Move the value out of the casloop_val_reg with mov_preg (#5430 ) The casloop_emit function in the s390x backend was using the fixed non-allocatable register %r0 directly with move instructions, which produced a panic in the regalloc2 checker (#5425). This PR changes the casloop_result function to use mov_preg instead of copy_reg to fetch the result, as it's not viewed by regalloc2 as a move. Fixes #5425	2022-12-14 13:06:35 -08:00
Ulrich Weigand	f0af622208	Simplify LowerBackend interface (#5432 ) * Refactor lower_branch to have Unit result Branches cannot have any output, so it is more straightforward to have the ISLE term return Unit instead of InstOutput. Also provide a new `emit_side_effect` term to simplify implementation of `lower_branch` rules with Unit result. * Simplify LowerBackend interface Move all remaining asserts from the LowerBackend::lower and ::lower_branch_group into the common call site. Change return value of ::lower to Option<InstOutput>, and return value of ::lower_branch_group to Option<()> to match ISLE term signature. Only pass the first branch into ::lower_branch_group and rename it to ::lower_branch. As a result of all those changes, LowerBackend routines now consists solely to calls to the corresponding ISLE routines.	2022-12-14 00:48:25 +00:00
Ulrich Weigand	299be327d5	Simplify "unimplemented" operation error message (#5429 ) Now that all operations are implemented in ISLE, simplify Rust code by providing a generic error message if any operation is not implemented in ISLE. Done across all targets.	2022-12-13 15:22:49 -08:00
Ulrich Weigand	df923f18ca	Remove MachInst::gen_constant (#5427 ) * aarch64: constant generation cleanup Add support for MOVZ and MOVN generation via ISLE. Handle f32const, f64const, and nop instructions via ISLE. No longer call Inst::gen_constant from lower.rs. * riscv64: constant generation cleanup Handle f32const, f64const, and nop instructions via ISLE. * s390x: constant generation cleanup Fix rule priorities for "imm" term. Only handle 32-bit stack offsets; no longer use load_constant64. * x64: constant generation cleanup No longer call Inst::gen_constant from lower.rs or abi.rs. * Refactor LowerBackend::lower to return InstOutput No longer write to the per-insn output registers; instead, return an InstOutput vector of temp registers holding the outputs. This will allow calling LowerBackend::lower multiple times for the same instruction, e.g. to rematerialize constants. When emitting the primary copy of the instruction during lowering, writing to the per-insn registers is now done in lower_clif_block. As a result, the ISLE lower_common routine is no longer needed. In addition, the InsnOutput type and all code related to it can be removed as well. * Refactor IsleContext to hold a LowerBackend reference Remove the "triple", "flags", and "isa_flags" fields that are copied from LowerBackend to each IsleContext, and instead just hold a reference to LowerBackend in IsleContext. This will allow calling LowerBackend::lower from within callbacks in src/machinst/isle.rs, e.g. to rematerialize constants. To avoid having to pass LowerBackend references through multiple functions, eliminate the lower_insn_to_regs subroutines in those targets that still have them, and just inline into the main lower routine. This also eliminates lower_inst.rs on aarch64 and riscv64. Replace all accesses to the removed IsleContext fields by going through the LowerBackend reference. * Remove MachInst::gen_constant This addresses the problem described in issue https://github.com/bytecodealliance/wasmtime/issues/4426 that targets currently have to duplicate code to emit constants between the ISLE logic and the gen_constant callback. After the various cleanups in earlier patches in this series, the only remaining user of get_constant is put_value_in_regs in Lower. This can now be removed, and instead constant rematerialization can be performed in the put_in_regs ISLE callback by simply directly calling LowerBackend::lower on the instruction defining the constant (using a different output register). Since the check for egraph mode is now no longer performed in put_value_in_regs, the Lower::flags member becomes obsolete. Care needs to be taken that other calls directly to the Lower::put_value_in_regs routine now handle the fact that no more rematerialization is performed. All such calls in target code already historically handle constants themselves. The remaining call site in the ISLE gen_call_common helper can be redirected to the ISLE put_in_regs callback. The existing target implementations of gen_constant are then unused and can be removed. (In some target there may still be further opportunities to remove duplication between ISLE and some local Rust code - this can be left to future patches.)	2022-12-13 13:00:04 -08:00
Timothy Chen	122872fb0c	Remove references for sig (#5414 )	2022-12-12 08:46:23 -08:00
Timothy Chen	8035945502	Reduce sig data size by changing sized spaces (#5402 ) * Reduce sig sizes * Fix test * Change compute_args_loc to return u32	2022-12-11 15:32:30 -08:00
Ulrich Weigand	e913cf3647	Remove IFLAGS/FFLAGS types (#5406 ) All instructions using the CPU flags types (IFLAGS/FFLAGS) were already removed. This patch completes the cleanup by removing all remaining instructions that define values of CPU flags types, as well as the types themselves. Specifically, the following features are removed: - The IFLAGS and FFLAGS types and the SpecialType category. - Special handling of IFLAGS and FFLAGS in machinst/isle.rs and machinst/lower.rs. - The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry, isub_ifbin, isub_ifbout, and isub_ifborrow instructions. - The writes_cpu_flags instruction property. - The flags verifier pass. - Flags handling in the interpreter. All of these features are currently unused; no functional change intended by this patch. This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.	2022-12-09 13:42:03 -08:00
Jamey Sharp	8726eeefb3	cranelift-isle: Add "partial" flag for constructors (#5392 ) * cranelift-isle: Add "partial" flag for constructors Instead of tying fallibility of constructors to whether they're either internal or pure, this commit assumes all constructors are infallible unless tagged otherwise with a "partial" flag. Internal constructors without the "partial" flag are not allowed to use constructors which have the "partial" flag on the right-hand side of any rules, because they have no way to report last-minute match failures. Multi-constructors should never be "partial"; they report match failures with an empty iterator instead. In turn this means you can't use partial constructors on the right-hand side of internal multi-constructor rules. However, you can use the same constructors on the left-hand side with `if` or `if-let` instead. In many cases, ISLE can already trivially prove that an internal constructor always returns `Some`. With this commit, those cases are largely unchanged, except for removing all the `Option`s and `Some`s from the generated code for those terms. However, for internal non-partial constructors where ISLE could not prove that, it now emits an `unreachable!` panic as the last-resort, instead of returning `None` like it used to do. Among the existing backends, here's how many constructors have these panic cases: - x64: 14% (53/374) - aarch64: 15% (41/277) - riscv64: 23% (26/114) - s390x: 47% (268/567) It's often possible to rewrite rules so that ISLE can tell the panic can never be hit. Just ensure that there's a lowest-priority rule which has no constraints on the left-hand side. But in many of these constructors, it's difficult to statically prove the unhandled cases are unreachable because that's only down to knowledge about how they're called or other preconditions. So this commit does not try to enforce that all terms have a last-resort fallback rule. * Check term flags while translating expressions Instead of doing it in a separate pass afterward. This involved threading all the term flags (pure, multi, partial) through the recursive `translate_expr` calls, so I extracted the flags to a new struct so they can all be passed together. * Validate multi-term usage Now that I've threaded the flags through `translate_expr`, it's easy to check this case too, so let's just do it. * Extract `ReturnKind` to use in `ExternalSig` There are only three legal states for the combination of `multi` and `infallible`, so replace those fields of `ExternalSig` with a three-state enum. * Remove `Option` wrapper from multi-extractors too If we'd had any external multi-constructors this would correct their signatures as well. * Update ISLE tests * Tag prelude constructors as pure where appropriate I believe the only reason these weren't marked `pure` before was because that would have implied that they're also partial. Now that those two states are specified separately we apply this flag more places. * Fix my changes to aarch64 `lower_bmask` and `imm` terms	2022-12-07 17:16:03 -08:00
Trevor Elliott	c5379051c4	Enable the ssa verifier in debug builds (#5354 ) Enable regalloc2's SSA verifier in debug builds to check for any outstanding reuse of virtual registers in def constraints. As fuzzing enables debug_assertions, this will enable the SSA verifier when fuzzing as well.	2022-12-07 12:22:51 -08:00
Trevor Elliott	d54a27d0ea	Allocate temporary intermediates when loading constants on aarch64 (#5366 ) As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.	2022-12-01 22:29:36 +00:00
Trevor Elliott	37c3c5b1e0	Remove an unnecessary debug trace (#5359 )	2022-11-30 20:37:20 -08:00
Trevor Elliott	c16f2956db	Allocate a temporary for 64-bit constant loads in the s390x backend (#5357 ) Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.	2022-11-30 17:01:14 -08:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Timothy Chen	67fc5389b0	Remove sig data arg and ret fields to reduce size (#5319 ) * Remove sig data arg and ret fields to reduce size * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix offsets * Add comment Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-30 07:19:41 -08:00
Trevor Elliott	a5a0645aff	Don't allow reuse_def constraints in the s390x Loop instruction (#5336 )	2022-11-28 17:52:11 -08:00
Timothy Chen	48ee42efc2	Refactor Sigdata methods with sigset (#5307 ) * Refactor sigdata methods * Update cranelift/codegen/src/machinst/abi.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Address comments Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-11-22 09:03:51 -08:00
Nick Fitzgerald	d0d3245a35	Cranelift: Add `heap_load` and `heap_store` instructions (#5300 ) * Cranelift: Define `heap_load` and `heap_store` instructions * Cranelift: Implement interpreter support for `heap_load` and `heap_store` * Cranelift: Add a suite runtests for `heap_{load,store}` There are so many knobs we can twist for heaps and I wanted to exhaustively test all of them, so I wrote a script to generate the tests. I've checked in the script in case we want to make any changes in the future, but I don't think it is worth adding this to CI to check that scripts are up to date or anything like that. * Review feedback	2022-11-21 23:00:39 +00:00
Trevor Elliott	54cfa4df34	cranelift: Fix implicit pointer argument register use (#5301 ) * Fix arg handling to write to VRegs instead of physical regs * Make is_included_in_clobbers required, and handle Args on x64 and riscv64	2022-11-18 16:47:03 -08:00
Trevor Elliott	a007e02bd2	Add fixed_nonallocatable constraints when appropriate (#5253 ) Plumb the set of allocatable registers through the OperandCollector and use it validate uses of fixed-nonallocatable registers, like %rsp on x86_64.	2022-11-15 12:49:17 -08:00
Trevor Elliott	b077854b57	Generate SSA code from returns (#5172 ) Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.	2022-11-08 16:00:49 -08:00
Ulrich Weigand	3e5938e65a	Support big- and little-endian lane order with bitcast (#5196 ) Add a MemFlags operand to the bitcast instruction, where only the `big` and `little` flags are accepted. These define the lane order to be used when casting between types of different lane counts. Update all users to pass an appropriate MemFlags argument. Implement lane swaps where necessary in the s390x back-end. This is the final part necessary to fix https://github.com/bytecodealliance/wasmtime/issues/4566.	2022-11-07 14:41:10 -08:00
Afonso Bordado	3ef30b5b67	cranelift: Rename `i{min,max}` to `s{min,max}` (#5187 ) This brings these instructions with our general naming convention of signed instructions being prefixed with `s`.	2022-11-03 18:20:33 +00:00
Afonso Bordado	2c69b94744	cranelift: Add support for `bswap.i128` (#5186 ) * fuzzgen: Request only one variable for bswap This was included by accident. Bswap only has one input, instead of two. * cranelift: Add `bswap.i128` support Adds support only for x86, AArch64, S390X. RISCV does not yet have bswap.	2022-11-03 18:03:37 +00:00
Trevor Elliott	aeceea28e2	Remove trapif and trapff (#5162 ) This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.	2022-11-03 09:25:11 -07:00
Ulrich Weigand	961107ec63	Merge raw_bitcast and bitcast (#5175 ) - Allow bitcast for vectors with differing lane widths - Remove raw_bitcast IR instruction - Change all users of raw_bitcast to bitcast - Implement support for no-op bitcast cases across backends This implements the second step of the plan outlined here: https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394	2022-11-02 10:16:27 -07:00

1 2 3 4

187 Commits