wasmtime

Author	SHA1	Message	Date
Trevor Elliott	861220c433	Restrict the types for isplit and iconcat to match backends (#6070 ) * Restrict the types for isplit and iconcat to match backends * Admit unimplemented bitwidths to isplit/iconcat * Modify the NarrowInt type instead of shadowing it * Fix filetest failures	2023-03-21 01:21:00 +00:00
Alex Crichton	f7dda1ab2c	x64: Fix vbroadcastss with AVX2 and without AVX (#6060 ) * x64: Fix vbroadcastss with AVX2 and without AVX This commit fixes a corner case in the emission of the `vbroadcasts{s,d}` instructions. The memory-to-xmm form of these instructions was available with the AVX instruction set, but the xmm-to-xmm form of these instructions wasn't available until AVX2. The instruction requirement for these are listed as AVX but the lowering rules are appropriately annotated to use either AVX2 or AVX when appropriate. While this should work in practice this didn't work for the assertion about enabled features for each instruction. The `vbroadcastss` instruction was listed as requiring AVX but could get emitted when AVX2 was enabled (due to the reg-to-reg form being available). This caused an issue for the fuzzer where AVX2 was enabled but AVX was disabled. One possible fix would be to add more opcodes, one for reg-to-reg and one for mem-to-reg. That seemed like somewhat overkill for a pretty niche situation that shouldn't actually come up in practice anywhere. Instead this commit changes all the `has_avx` accessors to the `use_avx_simd` predicate already available in the target flags. The `use_avx2_simd` predicate was then updated to additionally require `has_avx`, so if AVX2 is enabled and AVX is disabled then the `vbroadcastss` instruction won't get emitted any more. Closes #6059 * Pass `enable_simd` on a few more files	2023-03-18 18:38:03 +00:00
Trevor Elliott	78dbe93f21	Rename `as_bool` to `as_truthy`, and fix TypeSet::as_bool (#6027 )	2023-03-17 21:11:24 +00:00
Trevor Elliott	411a3eff3e	cranelift: Emit a table of opcodes in gen_inst (#6046 ) * Emit a table of opcodes in gen_inst * Remove accidental export of OPCODE_SIGNATURES * Generate `Opcode::all` instead of a table	2023-03-17 17:38:28 +00:00
Alex Crichton	5ae8575296	x64: Take SIGFPE signals for divide traps (#6026 ) * x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests	2023-03-16 00:18:45 +00:00
Trevor Elliott	f5ad74e546	cranelift: Add narrower and wider constraints to the instruction DSL (#6013 ) * Add narrower and wider constraints to the instruction DSL * Add docs to narrower/wider operands * Update cranelift/codegen/meta/src/cdsl/instructions.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix assertion message * Simplify upper bounds for the wider constraint * Remove additional unnecessary cases in the verifier * Remove unused variables * Remove changes to is_ctrl_typevar_candidate These changes were only necessary when the type returned by an instruction was a variable constrained by narrow or widen. As we have switched to requiring that constraints must appear on argument types and not return types, these changes were not longer necessary. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-03-14 16:34:17 +00:00
Trevor Elliott	188f712025	Mark fpromote and fdemote as operating on float scalars (#6014 )	2023-03-14 00:47:40 +00:00
Afonso Bordado	5c95e6fbaf	riscv64: Codemotion cleanups to ISLE files (#5984 ) * riscv64: Fix typo in extensions * riscv64: Move converters to top of file * riscv64: Group up all imm12 rules * riscv64: Move zero_reg helpers to Physical Regs section * riscv64: Move helpers away from `clz` lowerings These were in the middle of the `clz` rules and are kind of distracting * riscv64: Move `cls` rules next to `ctz`/`clz` * cranelift: Move `u8_and` / `u32_add` to Primitive Arithmetic section * riscv64: Mark some imm12 constructors as pure * cranelift: Move `s32_add_fallible` next to `u32_add` * riscv64: Fix Typo	2023-03-13 19:20:15 +00:00
Alex Crichton	7956dc6ba2	Change CLIF `shuffle` to validate lane indices (#5995 ) * Change CLIF `shuffle` to validate lane indices Previously the CLIF `shuffle` instruction did not perform any validation on the lane shuffle mask and specified that out-of-bounds lanes always returned 0 as the value. This behavior though is not required by WebAssembly which validates that lane indices are always in-bounds. Additionally since these are static immediates even other code generators should be able to verify that the immediates are in-bounds. As a result this commit updates the definition of the `shuffle` instruction to specify that all byte immediates must be in-bounds in the range of [0, 32). The verifier has been updated and some test cases have been removed that were testing this functionality. Closes #5989 * Only generate valid shuffle immediates in fuzzer	2023-03-13 14:24:11 +00:00
Jamey Sharp	f877141668	cranelift-meta: Don't let-bind operand definitions (#5969 ) We've adopted this pattern in Cranelift's instruction definitions where we let-bind some calls to `Operand::new` and then later use them in one or more calls to `Inst::new`. That pattern has two problems: - It puts the type of each operand somewhere potentially far removed from the instruction in which it's used. - We let-bind the same name for many different operands, compounding the first problem by making it harder to find _which_ definition is used. So instead this commit removes all let-bindings for operand definitions and constructs a new `Operand` every time. Constructing an `Operand` at every use means we duplicate some documentation strings, but not all that many of them as it turns out. I've left the let-bound type-sets alone, so those are currently still shared across many instructions. They have some of the same problems and should be reviewed as well.	2023-03-09 17:24:19 +00:00
Kevin Rizzo	013b35ff32	winch: Refactoring wasmtime compiler integration pieces to share more between Cranelift and Winch (#5944 ) * Enable the native target by default in winch Match cranelift-codegen's build script where if no architecture is explicitly enabled then the host architecture is implicitly enabled. * Refactor Cranelift's ISA builder to share more with Winch This commit refactors the `Builder` type to have a type parameter representing the finished ISA with Cranelift and Winch having their own typedefs for `Builder` to represent their own builders. The intention is to use this shared functionality to produce more shared code between the two codegen backends. * Moving compiler shared components to a separate crate * Restore native flag inference in compiler building This fixes an oversight from the previous commits to use `cranelift-native` to infer flags for the native host when using default settings with Wasmtime. * Move `Compiler::page_size_align` into wasmtime-environ The `cranelift-codegen` crate doesn't need this and winch wants the same implementation, so shuffle it around so everyone has access to it. * Fill out `Compiler::{flags, isa_flags}` for Winch These are easy enough to plumb through with some shared code for Wasmtime. * Plumb the `is_branch_protection_enabled` flag for Winch Just forwarding an isa-specific setting accessor. * Moving executable creation to shared compiler crate * Adding builder back in and removing from shared crate * Refactoring the shared pieces for the `CompilerBuilder` I decided to move a couple things around from Alex's initial changes. Instead of having the shared builder do everything, I went back to having each compiler have a distinct builder implementation. I refactored most of the flag setting logic into a single shared location, so we can still reduce the amount of code duplication. With them being separate, we don't need to maintain things like `LinkOpts` which Winch doesn't currently use. We also have an avenue to error when certain flags are sent to Winch if we don't support them. I'm hoping this will make things more maintainable as we build out Winch. I'm still unsure about keeping everything shared in a single crate (`cranelift_shared`). It's starting to feel like this crate is doing too much, which makes it difficult to name. There does seem to be a need for two distinct abstraction: creating the final executable and the handling of shared/ISA flags when building the compiler. I could make them into two separate crates, but there doesn't seem to be enough there yet to justify it. * Documentation updates, and renaming the finish method * Adding back in a default temporarily to pass tests, and removing some unused imports * Fixing winch tests with wrong method name * Removing unused imports from codegen shared crate * Apply documentation formatting updates Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com> * Adding back in cranelift_native flag inferring * Adding new shared crate to publish list * Adding write feature to pass cargo check --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com> Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>	2023-03-08 15:07:13 +00:00
Trevor Elliott	7d482345fb	Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers (#5956 ) * Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers * Review feedback	2023-03-08 05:00:15 +00:00
Alex Crichton	07518dfd36	Remove the Cranelift `vselect` instruction (#5918 ) * Remove the Cranelift `vselect` instruction This instruction is documented as selecting lanes based on the "truthy" value of the condition lane, but the current status of the implementation of this instruction is: * x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the high bit of each byte doing a byte-wise lane select rather than whatever the controlling type is. * AArch64 - this is the same as `bitselect` which is a bit-wise selection rather than a lane-wise selection. * s390x - this is the same as AArch64, a bit-wise selection rather than lane-wise. * interpreter - the interpreter implements the documented semantics of selecting based on "truthy" values. Coupled with the status of the implementation is the fact that this instruction is not used by WebAssembly SIMD today either. The only use of this instruction in Cranelift is the nan-canonicalization pass. By moving nan-canonicalization to `bitselect`, since that has the desired semantics, there's no longer any need for `vselect`. Given this situation this commit subsqeuently removes `vselect` and all usage of it throughout Cranelift. Closes #5917 * Review comments * Bring back vselect opts as bitselect opts * Clean up vselect usage in the interpreter * Move bitcast in nan canonicalization * Add a comment about float optimization	2023-03-08 00:42:05 +00:00
Trevor Elliott	709257011e	Restrict uextend and sextend to scalar integers (#5953 )	2023-03-07 19:10:50 +00:00
Trevor Elliott	b44f67b6d7	Stop supporting vectors with `bmask` (#5948 ) Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-07 17:34:29 +00:00
Alex Crichton	8bb183f16e	Implement the relaxed SIMD proposal (#5892 ) * Initial support for the Relaxed SIMD proposal This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn't been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn't have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they're now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. Tests for relaxed-simd are run both with a default `Engine` as well as a "force deterministic" `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated. * x64: Add x86-specific lowerings for relaxed simd This commit builds on the prior commit and adds an array of `x86_` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the "deterministic" lowering. Add libcall support for fma to Wasmtime This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature. * Ignore relaxed-simd tests on s390x and riscv64 * Enable relaxed-simd tests on s390x * Update cranelift/codegen/meta/src/shared/instructions.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> * Add a FIXME from review * Add notes about deterministic semantics * Don't default `has_native_fma` to `true` * Review comments and rebase fixes --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2023-03-07 15:52:41 +00:00
Afonso Bordado	6d9ab2067b	cranelift: Make `sqmul_round_sat` vector only (#5941 )	2023-03-06 18:22:50 +00:00
wasmtime-publish	354bc48015	Bump Wasmtime to 8.0.0 (#5932 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-03-06 15:08:16 +00:00
Chris Fallin	7b8854f803	egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800 ) * Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)" This reverts commit `c7e2571866`. * egraphs: fix handling of effectful-but-idempotent ops and GVN. This PR addresses #5796: currently, ops that are effectful, i.e., remain in the side-effecting skeleton (which we keep in the `Layout` while the egraph exists), but are idempotent and thus mergeable by a GVN pass, are not handled properly. GVN is still possible on effectful but idempotent ops precisely because our GVN does not create partial redundancies: it removes an instruction only when it is dominated by an identical instruction. An isntruction will not be "hoisted" to a point where it could execute in the optimized code but not in the original. However, there are really two parts to the egraph implementation that produce this effect: the deduplication on insertion into the egraph, and the elaboration with a scoped hashmap. The deduplication lets us give a single name (value ID) to all copies of an identical instruction, and then elaboration will re-create duplicates if GVN should not hoist or merge some of them. Because deduplication need not worry about dominance or scopes, we use a simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes". When we added support for GVN'ing effectful but idempotent ops (#5594), we kept the use of this simple dedup'ing hashmap, but these ops do not get elaborated; instead they stay in the side-effecting skeleton. Thus, we inadvertently created potential for weird code-motion effects. The proposal in #5796 would solve this in a clean way by treating these ops as pure again, and keeping them out of the skeleton, instead putting "force" pseudo-ops in the skeleton. However, this is a little more complex than I would like, and I've realized that @jameysharp's earlier suggestion is much simpler: we can keep an actual scoped hashmap separately just for the effectful-but-idempotent ops, and use it to GVN while we build the egraph. In effect, we're fusing a separate GVN pass with the egraph pass (but letting it interact corecursively with egraph rewrites. This is in principle similar to how we keep a separate map for loads and fuse this pass with the egraph rewrite pass as well. Note that we can use a `ScopedHashMap` here without the "context" (as needed by `CtxHashMap`) because, as noted by @jameysharp, in practice the ops we want to GVN have all their args inline. Equality on the `InstructinoData` itself is conservative: two insts whose struct contents compare shallowly equal are definitely identical, but identical insts in a deep-equality sense may not compare shallowly equal, due to list indirection. This is fine for GVN, because it is still sound to skip any given GVN opportunity (and keep the original instructions). Fixes #5796. * Add comments from review.	2023-03-02 02:10:42 +00:00
Alex Crichton	c4a2c1e818	clif: Remove the type variable from `swizzle` (#5897 ) This instruction is only defined with i8x16 inputs and outputs so there's no need for a type variable, so shadow the otherwise-generic `a` result with a concrete i8x16 type.	2023-03-01 00:38:53 +00:00
Alex Crichton	9b86a0b9b1	Remove the `widening_pairwise_dot_product_s` clif instruction (#5889 ) This was added for the wasm SIMD proposal but I've been poking around at this recently and the instruction can instead be represented by its component parts with the same semantics I believe. This commit removes the instruction and instead represents it with the existing `iadd_pairwise` instruction (among others) and updates backends to with new pattern matches to have the same codegen as before. This interestingly entirely removed the codegen rule with no replacement on the AArch64 backend as the existing rules all existed to produce the same codegen.	2023-02-27 18:43:43 +00:00
Jamey Sharp	5cfb461945	Only emit ISLE/egraph terms for single-value insts (#5848 ) For instructions with no results (such as branches and stores) or instructions with multiple results (such as add with carry), we have assertions checking that an optimization rule doesn't try to match on or construct such instructions. When we generate terms for matching or constructing instructions, the terms for these instructions are guaranteed to panic if they're ever used. So let's just not generate them. In the future we may wish to generate terms with different types for these instructions, to make them usable in ISLE rules for optimization that fall outside our current egraph constraints.	2023-02-24 15:38:48 +00:00
Afonso Bordado	6e6a1034d7	riscv64: Add bitmanip extension flags (#5847 )	2023-02-21 22:12:44 +00:00
Trevor Elliott	80c147d9c0	Rework br_table to use BlockCall (#5731 ) Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.	2023-02-16 09:23:27 -08:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
Ayomide Bamidele	9637840b4b	Update AMD and generic x86 CPU presets to match LLVM (#5575 ) * Add x86 AMD and generic presets * Fix typos * Add zen2 definition	2023-02-08 15:56:45 -08:00
Chris Fallin	673b448cfe	Cranelift DFG: make inst clone deep-clone varargs lists. (#5727 ) When investigating #5716, I found that rematerialization of a `call`, in addition to blowing up for other reasons, caused aliasing of the varargs list (the `EntityList` in the `ListPool`), such that editing the args of the second copy of the call instruction inadvertently updated the first as well. This PR modifies `DataFlowGraph::clone_inst` so that it always clones the varargs list if present. This shouldn't have any functional impact on Cranelift today, because we don't rematerialize any instructions with varargs; but it's important to get it right to avoid a bug later!	2023-02-07 01:21:09 +00:00
wasmtime-publish	482f541101	Bump Wasmtime to 7.0.0 (#5712 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-02-06 09:10:19 -06:00
Nick Fitzgerald	bdfb746548	Cranelift: Introduce the `return_call` and `return_call_indirect` instructions (#5679 ) * Cranelift: Introduce the `tail` calling convention This is an unstable-ABI calling convention that we will eventually use to support Wasm tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Introduce the `return_call` and `return_call_indirect` instructions These will be used to implement tail calls for Wasm and any other language targeting CLIF. The `return_call_indirect` instruction differs from the Wasm instruction of the same name by taking a native address callee rather than a Wasm function index. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Implement verification rules for `return_call[_indirect]` They must: * have the same return types between the caller and callee, * have the same calling convention between caller and callee, * and that calling convention must support tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * cargo fmt --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-01 21:20:35 +00:00
Trevor Elliott	b5692db7ce	Remove boolean parameters from instruction builder functions (#5658 ) Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.	2023-01-30 16:12:05 -08:00
Trevor Elliott	a5698cedf8	cranelift: Remove brz and brnz (#5630 ) Remove the brz and brnz instructions, as their behavior is now redundant with brif.	2023-01-30 20:34:56 +00:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Chris Fallin	1faff8c2ce	Enable egraph-based optimization by default. (#5587 ) This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default. Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on. This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!). Fixes #5181.	2023-01-19 15:46:53 -08:00
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
Ayomide Bamidele	e4dc9c7944	Update Intel x86 CPU presets to match LLVM (#5490 ) * Update Intel x86 CPU presets * Add LLVM reference * Remove 32bit CPU architectures * Rename silvermont to slm * Fix haswell presets * Add icelake alias * Group streaming simd presets * Add slm silvermont preset * Remove duplicate alderlake def	2023-01-13 21:02:36 +00:00
Afonso Bordado	8ac04612ae	cranelift: Remove predicate not macro branch (#5552 )	2023-01-09 12:19:30 -08:00
Afonso Bordado	f845ebb450	cranelift: Remove `is_pic` predicate from x86 backend (#5548 ) This is still present as shared flags and we don't use the predicate anywhere.	2023-01-09 10:04:45 -08:00
Nick Fitzgerald	c50bdf600e	Cranelift: GVN all idempotently trapping but otherwise pure instructions (#5534 )	2023-01-05 15:08:06 -08:00
wasmtime-publish	7bfbec1b57	Bump Wasmtime to 6.0.0 (#5521 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-01-05 09:46:01 -06:00
Nick Fitzgerald	f4a2d5337a	Cranelift: GVN `uadd_overflow_trap` (#5520 ) * Switch duplicate loads w/ dynamic memories test to `min_size = 0` This test was accidentally hitting a special case for bounds checks for when we know that `offset + access_size < min_size` and can skip some steps. This commit changes the `min_size` of the memory to zero so that we are forced to do fully general bounds checks. * Cranelift: Mark `uadd_overflow_trap` as okay for GVN Although this improves our test sequence for duplicate loads with dynamic memories, it unfortunately doesn't have any effect on sightglass benchmarks: ``` instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [34448 35607.23 37158] gvn_uadd_overflow_trap.so [34566 35734.05 36585] main.so instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [44101 60449.62 92712] gvn_uadd_overflow_trap.so [44011 60436.37 92690] main.so instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [35595 36675.72 38153] gvn_uadd_overflow_trap.so [35440 36670.42 37993] main.so compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [17370195 17405125.62 17471222] gvn_uadd_overflow_trap.so [17369324 17404859.43 17470725] main.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [7055720520 7055886880.32 7056265930] gvn_uadd_overflow_trap.so [7055719554 7055843809.33 7056193289] main.so compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [683589861 683767276.00 684098366] gvn_uadd_overflow_trap.so [683590024 683767998.02 684097885] main.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [46436883 46437135.10 46437823] gvn_uadd_overflow_trap.so [46436883 46437087.67 46437785] main.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [126522461 126565812.58 126647044] gvn_uadd_overflow_trap.so [126522176 126565757.75 126647522] main.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [653010531 653010533.03 653010544] gvn_uadd_overflow_trap.so [653010531 653010533.18 653010537] main.so ``` * cranelift-codegen-meta: Rename `side_effects_okay_for_gvn` to `side_effects_idempotent` * cranelift-filetests: Ensure there is a trailing newline for blessed Wasm tests	2023-01-04 22:03:16 -08:00
Nick Fitzgerald	937601c7c3	Cranelift: GVN spectre guards and run redundant load elimination twice (#5517 ) * Cranelift: Make spectre guards GVN-able While these instructions have a side effect that is otherwise invisible to the optimizer, the side effect in question is idempotent, so it can be de-duplicated by GVN. * Cranelift: Run redundant load replacement and GVN twice This allows us to actually replace redundant Wasm loads with dynamic memories. While this improves our hand-crafted test sequences, it doesn't seem to have any improvement on sightglass benchmarks run with dynamic memories, however it also isn't a hit to compilation times, so seems generally good to land anyways: ``` $ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [683595240 683768610.53 684097577] once.so [683597068 700115966.83 1664907164] twice.so instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [44107 60411.07 92785] once.so [44138 59552.32 92097] twice.so compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [17369916 17404839.78 17471458] once.so [17369935 17625713.87 30700150] twice.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [126523640 126566170.80 126648265] once.so [126523076 127174580.30 163145149] twice.so instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [34569 35686.25 36513] once.so [34651 35749.97 36953] twice.so instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [35146 36639.10 37707] once.so [34472 36580.82 38431] twice.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [7055720115 7055841324.82 7056180024] once.so [7055717681 7055877095.85 7056225217] twice.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [46436881 46437081.28 46437691] once.so [46436883 46437127.68 46437766] twice.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [653010530 653010533.27 653010539] once.so [653010531 653010532.95 653010538] twice.so ```	2023-01-04 20:05:43 +00:00
Afonso Bordado	52ba72f341	riscv64: Fix masking on `iabs` (#5505 ) * cranelift: Add `iabs.i128` runtest * riscv64: Fix incorrect extension in iabs When lowering iabs, we were accidentally comparing the unextended value this caused the instruction to misbehave with certain top bits. This commit also adds a zbb lowering that does not use jumps.	2023-01-03 17:37:25 -08:00
Ayomide Bamidele	b47e644c3d	Remove vconcat and vsplit clif instructions (#5465 ) Fixes #5463. * remove vsplit instruction * remove vconcat instruction * remove unsused half/double vector helper functions * remove unused operand constraints * delete + inline Type::half_vector method	2022-12-20 00:41:55 +00:00
Trevor Elliott	25bf8e0e67	Make DataFlowGraph::insts public, but restricted (#5450 ) We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG. This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.	2022-12-16 10:46:09 -08:00
Nick Fitzgerald	c0b587ac5f	Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386 ) * cranelift-wasm: translate Wasm loads into lower-level CLIF operations Rather than using `heap_{load,store,addr}`. * cranelift: Remove the `heap_{addr,load,store}` instructions These are now legalized in the `cranelift-wasm` frontend. * cranelift: Remove the `ir::Heap` entity from CLIF * Port basic memory operation tests to .wat filetests * Remove test for verifying CLIF heaps * Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test * Remove `heap_addr` from readonly.clif test * Remove `heap_addr` from `table_addr.clif` test * Remove `heap_addr` from the simd-fvpromote_low.clif test * Remove `heap_addr` from simd-fvdemote.clif test * Remove `heap_addr` from the load-op-store.clif test * Remove the CLIF heap runtest * Remove `heap_addr` from the global_value.clif test * Remove `heap_addr` from fpromote.clif runtests * Remove `heap_addr` from fdemote.clif runtests * Remove `heap_addr` from memory.clif parser test * Remove `heap_addr` from reject_load_readonly.clif test * Remove `heap_addr` from reject_load_notrap.clif test * Remove `heap_addr` from load_readonly_notrap.clif test * Remove `static-heap-without-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` to generating `.wat` tests. * Remove `static-heap-with-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove more heap tests These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove `heap_addr` from `simple-alias.clif` test * Remove `heap_addr` from partial-redundancy.clif test * Remove `heap_addr` from multiple-blocks.clif test * Remove `heap_addr` from fence.clif test * Remove `heap_addr` from extends.clif test * Remove runtests that rely on heaps Heaps are not a thing in CLIF or the interpreter anymore * Add generated load/store `.wat` tests * Enable memory-related wasm features in `.wat` tests * Remove CLIF heap from fcmp-mem-bug.clif test * Add a mode for compiling `.wat` all the way to assembly in filetests * Also generate WAT to assembly tests in `make-load-store-tests.sh` * cargo fmt * Reinstate `f{de,pro}mote.clif` tests without the heap bits * Remove undefined doc link * Remove outdated SVG and dot file from docs * Add docs about `None` returns for base address computation helpers * Factor out `env.heap_access_spectre_mitigation()` to a local * Expand docs for `FuncEnvironment::heaps` trait method * Restore f{de,pro}mote+load clif runtests with stack memory	2022-12-15 00:26:45 +00:00
Ulrich Weigand	e913cf3647	Remove IFLAGS/FFLAGS types (#5406 ) All instructions using the CPU flags types (IFLAGS/FFLAGS) were already removed. This patch completes the cleanup by removing all remaining instructions that define values of CPU flags types, as well as the types themselves. Specifically, the following features are removed: - The IFLAGS and FFLAGS types and the SpecialType category. - Special handling of IFLAGS and FFLAGS in machinst/isle.rs and machinst/lower.rs. - The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry, isub_ifbin, isub_ifbout, and isub_ifborrow instructions. - The writes_cpu_flags instruction property. - The flags verifier pass. - Flags handling in the interpreter. All of these features are currently unused; no functional change intended by this patch. This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.	2022-12-09 13:42:03 -08:00
Chris Fallin	f980defe17	egraph support: rewrite to work in terms of CLIF data structures. (#5382 ) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2022-12-06 14:58:57 -08:00
wasmtime-publish	a28d4d3c89	Bump Wasmtime to 5.0.0 (#5372 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-12-05 08:38:57 -06:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Nick Fitzgerald	d0d3245a35	Cranelift: Add `heap_load` and `heap_store` instructions (#5300 ) * Cranelift: Define `heap_load` and `heap_store` instructions * Cranelift: Implement interpreter support for `heap_load` and `heap_store` * Cranelift: Add a suite runtests for `heap_{load,store}` There are so many knobs we can twist for heaps and I wanted to exhaustively test all of them, so I wrote a script to generate the tests. I've checked in the script in case we want to make any changes in the future, but I don't think it is worth adding this to CI to check that scripts are up to date or anything like that. * Review feedback	2022-11-21 23:00:39 +00:00

1 2 3 4 5 ...

624 Commits