wasmtime

Author	SHA1	Message	Date
Trevor Elliott	78dbe93f21	Rename `as_bool` to `as_truthy`, and fix TypeSet::as_bool (#6027 )	2023-03-17 21:11:24 +00:00
bjorn3	2c40c267d4	Make sequence numbers local to instructions (#6043 ) * Only allow pp_cmp within a single block Block order shouldn't matter for codegen and restricting pp_cmp to a single block will allow making instruction sequence numbers local to a block. * Make sequence numbers local to instructions This allows renumbering to be localized to a single block where previously it could affect the entire function. Also saves 32bit of overhead per block.	2023-03-17 20:53:21 +00:00
Karl Meakin	208d09e9f0	cranelift: rewrite `x-1` to `ineg(x)` (#6052 ) cranelift: rewrite `x-1` to `ineg(x)` Add commuted test	2023-03-17 19:52:13 +00:00
Karl Meakin	c3f5b71b6a	craneleft: cancel `ineg` when args to `imul` (#6053 ) * craneleft: cancel `ineg`/`iabs` when args to `imul` * Remove unsound `iabs(x) * iabs(y) == x*y` rewrite	2023-03-17 19:41:20 +00:00
Karl Meakin	73cc433bdd	cranelift: simplify `icmp` against UMAX/SMIN/SMAX (#6037 ) * cranelift: simplify `icmp` against UMAX/SMIN/SMAX * Add tests for icmp against numeric limits	2023-03-17 18:54:29 +00:00
bjorn3	a81c206870	Various cleanups to Layout (#6042 ) * Use inst_block instead of pp_block where possible * Remove unused is_block_gap method * Remove ProgramOrder trait It only has a single implementation * Rename Layout::cmp to pp_cmp to distinguish it from Ord::cmp * Make pp_block non-generic * Use rpo_cmp_block instead of rpo_cmp in the verifier * Remove ProgramPoint * Rename ExpandedProgramPoint to ProgramPoint * Remove From<ValueDef> for ProgramPoint impl	2023-03-17 18:46:34 +00:00
Trevor Elliott	411a3eff3e	cranelift: Emit a table of opcodes in gen_inst (#6046 ) * Emit a table of opcodes in gen_inst * Remove accidental export of OPCODE_SIGNATURES * Generate `Opcode::all` instead of a table	2023-03-17 17:38:28 +00:00
Alex Crichton	5ebe53a351	x64: Elide more uextend with extractlane (#6045 ) * x64: Elide more uextend with extractlane I've confirmed locally now that `pextr{b,w,d}` all zero the upper bits of the full 64-bit register size which means that the `extractlane` operation with a zero-extend can be elided for more cases, including 8-to-64-bit casts as well as 32-to-64. This helps elide a few extra `mov`s in a loop I was looking at and had a modest corresponding increase in performance (my guess was due to the slightly decreased code size mostly as opposed to the removed `mov`s). * Remove stray file	2023-03-17 16:18:41 +00:00
Karl Meakin	b53d66e634	cranelift: simplify `x-x` to `0` (#6032 ) * cranelift: simplify `x-x` to `0` * Guard `x-x => 0` rewrite with `fits_in_64`	2023-03-17 15:14:28 +00:00
Alex Crichton	8e500099b3	x64: Refactor and add extractlane special case for uextend/sextend (#6022 ) * x64: Refactor sextend/uextend rules Move much of the meaty logic from these lowering rules into the `extend_to_gpr` helper to benefit other callers of `extend_to_gpr` to elide instructions. This additionally simplifies `sextend` and `uextend` lowerings to rely on optimizations happening within the `extend_to_gpr` helper. * x64: Skip `uextend` for `pextr{b,w}` instructions These instructions are documented as automatically zeroing the upper bits so `uextend` operations can be skipped. This slightly improves codegen for the wasm `i{8x16,16x8}.extract_lane_u` instructions, for example. * Modernize an extractor pattern * Trim some superfluous match clauses Additionally rejigger priorities to be "mostly default" now. * Refactor 32-to-64 predicate to a helper Also adjust the pattern matched in the `extend_to_gpr` helper. * Slightly refactor pextr{b,w} case * Review comments	2023-03-16 22:14:59 +00:00
Karl Meakin	d479951469	cranelift: simplify `fneg(fneg(x))` to `x` (#6034 )	2023-03-16 22:14:12 +00:00
Karl Meakin	dccc2d6269	cranelift: simplify `ineg(ineg(x))` to `x` (#6033 )	2023-03-16 22:14:05 +00:00
Alex Crichton	5ae8575296	x64: Take SIGFPE signals for divide traps (#6026 ) * x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests	2023-03-16 00:18:45 +00:00
Alex Crichton	d76f7ee52e	x64: Improve codegen for splats (#6025 ) This commit goes through the lowerings for the CLIF `splat` instruction and improves the support for each operator. Many of these lowerings are mirrored from v8/SpiderMonkey and there are a number of improvements: * AVX2 `v{p,}broadcast` instructions are added and used when available. Float-based splats are much simpler and always a single-instruction * Integer-based splats don't insert into an uninit xmm value and instead start out with a `movd` to move into an `xmm` register. This thoeretically breaks dependencies with prior instructions since `movd` creates a fresh new value in the destination register. * Loads are now sunk into all of the instructions. A new extractor, `sinkable_load_exact`, was added to sink the i8/i16 loads.	2023-03-15 21:33:56 +00:00
Afonso Bordado	a10c50afe9	cranelift: Translate `stack_*` accesses as unaligned (#6016 ) We can't currently ensure that these will be aligned, so we shouldn't mark them as such.	2023-03-15 18:05:55 +00:00
Alex Crichton	6ed90f86c8	x64: Add support for the `pblendw` instruction (#6023 ) This commit adds another case for `shuffle` lowering to the x64 backend for the `{,v}pblendw` instruction. This instruction selects 16-bit values from either of the inputs corresponding to an immediate 8-bit-mask where each bit selects the corresponding lane from the inputs.	2023-03-15 17:20:43 +00:00
Alex Crichton	fcddb9ca81	x64: Add lea-based lowering for iadd (#5986 ) * x64: Refactor `Amode` computation in ISLE This commit replaces the previous computation of `Amode` with a different set of rules that are intended to achieve the same purpose but are structured differently. The motivation for this commit is going to become more relevant in the next commit where `lea` will be used for the `iadd` instruction, possibly, on x64. When doing so it caused a stack overflow in the test suite during the compilation phase of a wasm module, namely as part of the `amode_add` function. This function is recursively defined in terms of itself and recurses as deep as the deepest `iadd`-chain in a program. A particular test in our test suite has a 10k-long chain of `iadd` which ended up causing a stack overflow in debug mode. This stack overflow is caused because the `amode_add` helper in ISLE unconditionally peels all the `iadd` nodes away and looks at all of them, even if most end up in intermediate registers along the way. Given that structure I couldn't find a way to easily abort the recursion. The new `to_amode` helper is structured in a similar fashion but attempts to instead only recurse far enough to fold items into the final `Amode` instead of recursing through items which themselves don't end up in the `Amode`. Put another way previously the `amode_add` helper might emit `x64_add` instructions, but it no longer does that. This goal of this commit is to preserve all the original `Amode` optimizations, however. For some parts, though, it relies more on egraph optimizations to run since if an `iadd` is 10k deep it doesn't try to find a constant buried 9k levels inside there to fold into the `Amode`. The hope, though, is that with egraphs having run already it's shuffled constants to the right most of the time and already folded any possible together. * x64: Add `lea`-based lowering for `iadd` This commit adds a rule for the lowering of `iadd` to use `lea` for 32 and 64-bit addition. The theoretical benefit of `lea` over the `add` instruction is that the `lea` variant can emulate a 3-operand instruction which doesn't destructively modify on of its operands. Additionally the `lea` operation can fold in other components such as constant additions and shifts. In practice, however, if `lea` is unconditionally used instead of `iadd` it ends up losing 10% performance on a local `meshoptimizer` benchmark. My best guess as to what's going on here is that my CPU's dedicated units for address computation are all overloaded while the ALUs are basically idle in a memory-intensive loop. Previously when the ALU was used for `add` and the address units for stores/loads it in theory pipelined things better (most of this is me shooting in the dark). To prevent the performance loss here I've updated the lowering of `iadd` to conditionally sometimes use `lea` and sometimes use `add` depending on how "complicated" the `Amode` is. Simple ones like `a + b` or `a + $imm` continue to use `add` (and its subsequent hypothetical extra `mov` necessary into the result). More complicated ones like `a + b + $imm` or `a + b << c + $imm` use `lea` as it can remove the need for extra instructions. Locally at least this fixes the performance loss relative to unconditionally using `lea`. One note is that this adds an `OperandSize` argument to the `MInst::LoadEffectiveAddress` variant to add an encoding for 32-bit `lea` in addition to the preexisting 64-bit encoding. * Conditionally use `lea` based on regalloc	2023-03-15 17:14:25 +00:00
Trevor Elliott	68b937d965	cranelift: Fix shift overflow when constructing BitSet (#6020 ) * Fix shift overflow when constructing the Wider constraint for integers * Clarify comment	2023-03-14 22:25:51 +00:00
Trevor Elliott	e4d9bb7c5a	cranelift: Exclude the control type in narrower and wider (#6018 ) * Don't include the control type in `narrower` or `wider` constraints * Add verifier tests for instructions that use narrower and wider	2023-03-14 18:09:15 +00:00
Trevor Elliott	f5ad74e546	cranelift: Add narrower and wider constraints to the instruction DSL (#6013 ) * Add narrower and wider constraints to the instruction DSL * Add docs to narrower/wider operands * Update cranelift/codegen/meta/src/cdsl/instructions.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix assertion message * Simplify upper bounds for the wider constraint * Remove additional unnecessary cases in the verifier * Remove unused variables * Remove changes to is_ctrl_typevar_candidate These changes were only necessary when the type returned by an instruction was a variable constrained by narrow or widen. As we have switched to requiring that constraints must appear on argument types and not return types, these changes were not longer necessary. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-03-14 16:34:17 +00:00
Alex Crichton	5c1b468648	x64: Migrate {s,u}{div,rem} to ISLE (#6008 ) * x64: Add precise-output tests for div traps This adds a suite of `.clif` files which are intended to test the `avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions. x64: Remove conditional regalloc in `Div` instruction Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid having conditionally-used registers with respect to regalloc. * x64: Migrate non-trapping, `udiv`/`urem` to ISLE * x64: Port checked `udiv` to ISLE * x64: Migrate urem entirely to ISLE * x64: Use `test` instead of `cmp` to compare-to-zero * x64: Port `sdiv` lowering to ISLE * x64: Port `srem` lowering to ISLE * Tidy up regalloc behavior and fix tests * Update docs and winch * Review comments * Reword again * More refactoring test fixes * More test fixes	2023-03-14 01:44:06 +00:00
Trevor Elliott	188f712025	Mark fpromote and fdemote as operating on float scalars (#6014 )	2023-03-14 00:47:40 +00:00
Alex Crichton	d6ce632b5b	aarch64: Specialize constant vector shifts (#5976 ) * aarch64: Specialize constant vector shifts This commit adds special lowering rules for vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values. * Fix codegen for 0-shift-rights * Special-case zero left-shifts as well * Remove left-shift special case	2023-03-13 22:37:59 +00:00
Alex Crichton	e2a6fe99c2	x64: Add `shuffle` specialization for `palignr` (#5999 ) * x64: Add `shuffle` specialization for `palignr` This commit adds specializations for the `palignr` instruction to the x64 backend to specialize some more patterns of byte shuffles. * Fix tests	2023-03-13 21:01:24 +00:00
Alex Crichton	03b5dbb3e0	aarch64: Use `VCodeConstant` for f64/v128 constants (#5997 ) * aarch64: Translate float and splat lowering to ISLE I was looking into `constant_f128` and its fallback lowering into memory and to get familiar with the code I figured it'd be good to port some Rust logic to ISLE. This commit ports the `constant_{f128,f64,f32}` helpers into ISLE from Rust as well as the `splat_const` helper which ended up being closely related. Tests reflect a number of regalloc changes that happened but also namely one major difference is that in the lowering of `f32` a 32-bit immediate is created now instead of a 64-bit immediate (in a GP register before it's moved into a FP register). This semantically has no change but the generated code is slightly different in a few minor cases. * aarch64: Load f64/v128 constants from a pool This commit removes the `LoadFpuConst64` and `LoadFpuConst128` pseudo-instructions from the AArch64 backend which internally loaded a nearby constant and then jumped over it. Constants now go through the `VCodeConstant` infrastructure which gets placed at the end of the function similar to how x64 works. Some minor support was added in as well to add a new addressing mode for a `MachLabel`-relative load.	2023-03-13 19:33:52 +00:00
Alex Crichton	6ecdc2482e	x64: Improve memory support in `{insert,extract}lane` (#5982 ) * x64: Improve memory support in `{insert,extract}lane` This commit improves adds support to Cranelift to emit `pextr{b,w,d,q}` with a memory destination, merging a store-of-extract operation into one instruction. Additionally AVX support is added for the `pextr` instructions. I've additionally tried to ensure that codegen tests and runtests exist for all forms of these instructions too. Add missing commas * Fix tests	2023-03-13 19:30:44 +00:00
Afonso Bordado	5c95e6fbaf	riscv64: Codemotion cleanups to ISLE files (#5984 ) * riscv64: Fix typo in extensions * riscv64: Move converters to top of file * riscv64: Group up all imm12 rules * riscv64: Move zero_reg helpers to Physical Regs section * riscv64: Move helpers away from `clz` lowerings These were in the middle of the `clz` rules and are kind of distracting * riscv64: Move `cls` rules next to `ctz`/`clz` * cranelift: Move `u8_and` / `u32_add` to Primitive Arithmetic section * riscv64: Mark some imm12 constructors as pure * cranelift: Move `s32_add_fallible` next to `u32_add` * riscv64: Fix Typo	2023-03-13 19:20:15 +00:00
Afonso Bordado	ad0bce3a36	riscv64: Fix regaloc panic with `bor`+`bnot` on floats (#5857 )	2023-03-13 18:29:36 +00:00
Saúl Cabrera	d03612c2d9	cranelift-codegen(x64): Expose `CallInfo` (#6005 ) This commit exposes the `CallInfo` struct, needed by Winch to emit function calls.	2023-03-13 17:50:53 +00:00
Alex Crichton	7956dc6ba2	Change CLIF `shuffle` to validate lane indices (#5995 ) * Change CLIF `shuffle` to validate lane indices Previously the CLIF `shuffle` instruction did not perform any validation on the lane shuffle mask and specified that out-of-bounds lanes always returned 0 as the value. This behavior though is not required by WebAssembly which validates that lane indices are always in-bounds. Additionally since these are static immediates even other code generators should be able to verify that the immediates are in-bounds. As a result this commit updates the definition of the `shuffle` instruction to specify that all byte immediates must be in-bounds in the range of [0, 32). The verifier has been updated and some test cases have been removed that were testing this functionality. Closes #5989 * Only generate valid shuffle immediates in fuzzer	2023-03-13 14:24:11 +00:00
Chris Fallin	264089e29d	Cranelift: aarch64: fix undefined dest reg in f32x4.splat case. (#5987 ) One of the cases for a splat operation, as updated in #5370, wrote to a temp reg but then only conditionally transformed the temp into the final destination register. In another codepath, `rd` was left undefined. This causes a panic later when regalloc2 verifies SSA properties of its input (here, value not def'd before use). Fixes #5985.	2023-03-11 00:22:29 +00:00
Alex Crichton	52896e020d	aarch64: Add specialized `shuffle` lowerings (#5977 ) * aarch64: Add `shuffle` lowerings for the `uzp{1,2}` instructions This commit uses the same style of patterns in the x64 backend to start adding specific lowerings of the Cranelift `shuffle` instruction to particular AArch64 instructions. * aarch64: Add `shuffle` lowerings to the `zip{1,2}` instructions These instructions match the `punpck` family of instructions on x64 and should help provide more efficient lowerings than the current `shuffle` fallback. aarch64: Add `shuffle` lowerings for `trn{1,2}` Along the lines of prior commits adds specific patterns to lowering for individual AArch64 instructions available. * aarch64: Add a `shuffle` lowering for the `ext` instruction This instruction will more-or-less concatenate two 128-bit vector registers to create a 256-bit value, shift it right, and then take the lower 128-bits into the destination. This can be modeled with a `shuffle` of consecutive bytes so this adds a lowering rule to generate this instruction. * aarch64: Add `shuffle` special case for `dup` This commit adds special cases for Cranelift's `shuffle` on AArch64 when the lowering can be represented with a `dup` instruction which broadcasts one vector's lane into all lanes of the destination. * aarch64: Add `shuffle` specializations for `rev` instructions This commit adds shuffle mask specializations for the `rev{16,32,64}` family of instructions on AArch64 which can be used to reverse bytes, 16-bit values, or 32-bit values within larger values. * Fix tests * Add doc-comments in ISLE	2023-03-10 21:37:13 +00:00
Ulrich Weigand	411781d2fe	s390x: Fix mistake in available_in_isa (#5981 ) The 32-bit float<->int conversion instructions are part of the VXRS_EXT2 facility, not MIE2. Fixes https://github.com/bytecodealliance/wasmtime/issues/5979.	2023-03-10 19:41:41 +00:00
bjorn3	108f7917c8	Support plugging external profilers into the Cranelift timing infrastructure (#5749 ) * Remove no-std code for cranelift_codegen::timings no-std mode isn't supported by Cranelift anymore * Simplify define_passes macro * Add egraph opt timings * Replace the add_to_current api with PassTimes::add * Omit a couple of unused time measurements * Reduce divergence between run and run_passes a bit * Introduce a Profiler trait This allows plugging in external profilers into the Cranelift profiling framework. * Add Pass::description method * Remove duplicate usage of the compile pass timing * Rustfmt	2023-03-10 19:33:56 +00:00
yuyang	4e875f33a7	Codegen fix `fcvt_from_sint.f32` with small types on riscv64. (#5964 ) * fix issue5952 * We should only extend i8 and i16 * remove extra space * move some code	2023-03-10 10:29:55 +00:00
Alex Crichton	0ec7b872fa	x64: Optimize store-of-extract-lane-0 (#5924 ) * x64: Optimize store-of-extract-lane-0 The `movss` and `movsd` instructions can be used to store the 0th lane of a `t32x4` or a `t64x2` vector into memory, enabling fusing a `store` and an `extractlane` instruction. * Fix merge conflict with `main`	2023-03-10 01:06:38 +00:00
Alex Crichton	83f21e784a	x64: Add more support for more AVX instructions (#5931 ) * x64: Add a smattering of lowerings for `shuffle` specializations (#5930) * x64: Add lowerings for `punpck{h,l}wd` Add some special cases for `shuffle` for more specialized x86 instructions. * x64: Add `shuffle` lowerings for `pshufd` This commit adds special-cased lowerings for the x64 `shuffle` instruction when the `pshufd` instruction alone is necessary. This is possible when the shuffle immediate permutes 32-bit values within one of the vector inputs of the `shuffle` instruction, but not both. * x64: Add shuffle lowerings for `punpck{h,l}{q,}dq` This adds specific permutations for some x86 instructions which specifically interleave high/low bytes for 32 and 64-bit values. This corresponds to the preexisting specific lowerings for interleaving 8 and 16-bit values. * x64: Add `shuffle` lowerings for `shufps` This commit adds targeted lowerings for the `shuffle` instruction that match the pattern that `shufps` supports. The `shufps` instruction selects two elements from the first vector and two elements from the second vector which means while it's not generally applicable it should still be more useful than the catch-all lowering of `shuffle`. * x64: Add shuffle support for `pshuf{l,h}w` This commit adds special lowering cases for these instructions which permute 16-bit values within a 128-bit value either within the upper or lower half of the 128-bit value. * x64: Specialize `shuffle` with an all-zeros immediate Instead of loading the all-zeros immediate from a rip-relative address at the end of the function instead generate a zero with a `pxor` instruction and then use `pshufb` to do the broadcast. * Review comments * x64: Add an AVX encoding for the `pshufd` instruction This will benefit from lack of need for alignment vs the `pshufd` instruction if working with a memory operand and additionally, as I've just learned, this reduces dependencies between instructions because the `v` instructions zero the upper bits as opposed to preserving them which could accidentally create false dependencies in the CPU between instructions. x64: Add more support for AVX loads/stores This commit adds VEX-encoded versions of instructions such as `mov{ss,sd,upd,ups,dqu}` for load and store operations. This also changes some signatures so the `load` helpers specifically take a `SyntheticAmode` argument which ended up doing a small refactoring of the `_regmove` variant used for `insertlane 0` into f64x2 vectors. x64: Enable using AVX instructions for zero regs This commit refactors the internal ISLE helpers for creating zero'd xmm registers to leverage the AVX support for all other instructions. This moves away from picking opcodes to instead picking instructions with a bit of reorganization. * x64: Remove `XmmConstOp` as an instruction All existing users can be replaced with usage of the `xmm_uninit_value` helper instruction so there's no longer any need for these otherwise constant operations. This additionally reduces manual usage of opcodes in favor of instruction helpers. * Review comments * Update test expectations	2023-03-09 23:57:42 +00:00
Alex Crichton	1c3a1bda6c	x64: Add a smattering of lowerings for `shuffle` specializations (#5930 ) * x64: Add lowerings for `punpck{h,l}wd` Add some special cases for `shuffle` for more specialized x86 instructions. * x64: Add `shuffle` lowerings for `pshufd` This commit adds special-cased lowerings for the x64 `shuffle` instruction when the `pshufd` instruction alone is necessary. This is possible when the shuffle immediate permutes 32-bit values within one of the vector inputs of the `shuffle` instruction, but not both. * x64: Add shuffle lowerings for `punpck{h,l}{q,}dq` This adds specific permutations for some x86 instructions which specifically interleave high/low bytes for 32 and 64-bit values. This corresponds to the preexisting specific lowerings for interleaving 8 and 16-bit values. * x64: Add `shuffle` lowerings for `shufps` This commit adds targeted lowerings for the `shuffle` instruction that match the pattern that `shufps` supports. The `shufps` instruction selects two elements from the first vector and two elements from the second vector which means while it's not generally applicable it should still be more useful than the catch-all lowering of `shuffle`. * x64: Add shuffle support for `pshuf{l,h}w` This commit adds special lowering cases for these instructions which permute 16-bit values within a 128-bit value either within the upper or lower half of the 128-bit value. * x64: Specialize `shuffle` with an all-zeros immediate Instead of loading the all-zeros immediate from a rip-relative address at the end of the function instead generate a zero with a `pxor` instruction and then use `pshufb` to do the broadcast. * Review comments	2023-03-09 22:58:19 +00:00
Chris Fallin	7f3500a172	Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. (#5972 ) * Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. @avanhatt has been looking at our address-mode lowering and found an example where when feeding an `I32`-typed address into a load or store, we can violate assumptions and get incorrect codegen. This should never be reachable in practice, because all producers on 64-bit architectures use 64-bit types for addresses. However, our IR is insufficiently constrained, and allows loads/stores to `I32` addresses as well. This is nonsensical on a 64-bit architecture. Initially I had thought we should tighten either the instruction definition's accepted types, or the CLIF verifier, to reject this. However both are target-independent, and we don't want to bake an assumption of 64-bit-ness into the compiler core. Instead this PR tightens specific backends' lowerings to rejecct loads/stores of `I32`-typed addresses. tl;dr: no security implications as all producers use I64-typed addresses (and must, for correct operation); but we currently accept I32-typed addresses too, and this breaks other assumptions. * Allow R64 as well as I64 types. * Add an explicit extractor to match 64-bit address types.	2023-03-09 19:08:16 +00:00
Jamey Sharp	f877141668	cranelift-meta: Don't let-bind operand definitions (#5969 ) We've adopted this pattern in Cranelift's instruction definitions where we let-bind some calls to `Operand::new` and then later use them in one or more calls to `Inst::new`. That pattern has two problems: - It puts the type of each operand somewhere potentially far removed from the instruction in which it's used. - We let-bind the same name for many different operands, compounding the first problem by making it harder to find _which_ definition is used. So instead this commit removes all let-bindings for operand definitions and constructs a new `Operand` every time. Constructing an `Operand` at every use means we duplicate some documentation strings, but not all that many of them as it turns out. I've left the let-bound type-sets alone, so those are currently still shared across many instructions. They have some of the same problems and should be reviewed as well.	2023-03-09 17:24:19 +00:00
Alex Crichton	63fb30e4b4	Merge pull request from GHSA-ff4p-7xrq-q5r8 * x64: Remove incorrect `amode_add` lowering rules This commit removes two incorrect rules as part of the x64 backend's computation of addressing modes. These two rules folded a zero-extended 32-bit computation into the address mode operand, but this isn't correct as the 32-bit computation should be truncated to 32-bits but when folded into the address mode computation it happens with 64-bit operands, meaning truncation doesn't happen. * Add release notes	2023-03-08 13:00:40 -06:00
Alex Crichton	5dc2bbccbb	Merge pull request from GHSA-xm67-587q-r2vw This commit fixes an off-by-one error in the subtraction of indices when shuffling a vector with itself. Lanes 16-and-above are mapped to select from the first vector since the first and second element are the same, but the subtraction was with 15 rather than 16 by accident.	2023-03-08 13:00:00 -06:00
Kevin Rizzo	013b35ff32	winch: Refactoring wasmtime compiler integration pieces to share more between Cranelift and Winch (#5944 ) * Enable the native target by default in winch Match cranelift-codegen's build script where if no architecture is explicitly enabled then the host architecture is implicitly enabled. * Refactor Cranelift's ISA builder to share more with Winch This commit refactors the `Builder` type to have a type parameter representing the finished ISA with Cranelift and Winch having their own typedefs for `Builder` to represent their own builders. The intention is to use this shared functionality to produce more shared code between the two codegen backends. * Moving compiler shared components to a separate crate * Restore native flag inference in compiler building This fixes an oversight from the previous commits to use `cranelift-native` to infer flags for the native host when using default settings with Wasmtime. * Move `Compiler::page_size_align` into wasmtime-environ The `cranelift-codegen` crate doesn't need this and winch wants the same implementation, so shuffle it around so everyone has access to it. * Fill out `Compiler::{flags, isa_flags}` for Winch These are easy enough to plumb through with some shared code for Wasmtime. * Plumb the `is_branch_protection_enabled` flag for Winch Just forwarding an isa-specific setting accessor. * Moving executable creation to shared compiler crate * Adding builder back in and removing from shared crate * Refactoring the shared pieces for the `CompilerBuilder` I decided to move a couple things around from Alex's initial changes. Instead of having the shared builder do everything, I went back to having each compiler have a distinct builder implementation. I refactored most of the flag setting logic into a single shared location, so we can still reduce the amount of code duplication. With them being separate, we don't need to maintain things like `LinkOpts` which Winch doesn't currently use. We also have an avenue to error when certain flags are sent to Winch if we don't support them. I'm hoping this will make things more maintainable as we build out Winch. I'm still unsure about keeping everything shared in a single crate (`cranelift_shared`). It's starting to feel like this crate is doing too much, which makes it difficult to name. There does seem to be a need for two distinct abstraction: creating the final executable and the handling of shared/ISA flags when building the compiler. I could make them into two separate crates, but there doesn't seem to be enough there yet to justify it. * Documentation updates, and renaming the finish method * Adding back in a default temporarily to pass tests, and removing some unused imports * Fixing winch tests with wrong method name * Removing unused imports from codegen shared crate * Apply documentation formatting updates Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com> * Adding back in cranelift_native flag inferring * Adding new shared crate to publish list * Adding write feature to pass cargo check --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com> Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>	2023-03-08 15:07:13 +00:00
Trevor Elliott	7d482345fb	Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers (#5956 ) * Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers * Review feedback	2023-03-08 05:00:15 +00:00
Alex Crichton	07518dfd36	Remove the Cranelift `vselect` instruction (#5918 ) * Remove the Cranelift `vselect` instruction This instruction is documented as selecting lanes based on the "truthy" value of the condition lane, but the current status of the implementation of this instruction is: * x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the high bit of each byte doing a byte-wise lane select rather than whatever the controlling type is. * AArch64 - this is the same as `bitselect` which is a bit-wise selection rather than a lane-wise selection. * s390x - this is the same as AArch64, a bit-wise selection rather than lane-wise. * interpreter - the interpreter implements the documented semantics of selecting based on "truthy" values. Coupled with the status of the implementation is the fact that this instruction is not used by WebAssembly SIMD today either. The only use of this instruction in Cranelift is the nan-canonicalization pass. By moving nan-canonicalization to `bitselect`, since that has the desired semantics, there's no longer any need for `vselect`. Given this situation this commit subsqeuently removes `vselect` and all usage of it throughout Cranelift. Closes #5917 * Review comments * Bring back vselect opts as bitselect opts * Clean up vselect usage in the interpreter * Move bitcast in nan canonicalization * Add a comment about float optimization	2023-03-08 00:42:05 +00:00
Alex Crichton	afde4ea4e3	Fix the original block for block params in egraphs (#5960 ) This fixes an issue where block params were always listed as being members of the current block in egraphs, even when the block param was actually defined in a separate block. This then enables instructions which depend on these parameters to get hoisted up out of inner loops at least to the block that defined the argument. Closes #5957	2023-03-07 23:58:03 +00:00
Trevor Elliott	709257011e	Restrict uextend and sextend to scalar integers (#5953 )	2023-03-07 19:10:50 +00:00
Trevor Elliott	b44f67b6d7	Stop supporting vectors with `bmask` (#5948 ) Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-07 17:34:29 +00:00
Alex Crichton	8bb183f16e	Implement the relaxed SIMD proposal (#5892 ) * Initial support for the Relaxed SIMD proposal This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn't been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn't have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they're now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. Tests for relaxed-simd are run both with a default `Engine` as well as a "force deterministic" `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated. * x64: Add x86-specific lowerings for relaxed simd This commit builds on the prior commit and adds an array of `x86_` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the "deterministic" lowering. Add libcall support for fma to Wasmtime This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature. * Ignore relaxed-simd tests on s390x and riscv64 * Enable relaxed-simd tests on s390x * Update cranelift/codegen/meta/src/shared/instructions.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> * Add a FIXME from review * Add notes about deterministic semantics * Don't default `has_native_fma` to `true` * Review comments and rebase fixes --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2023-03-07 15:52:41 +00:00
Qinxuan Chen	e2dcb19099	deps: update criterion dependency to v0.4.0 (#5935 )	2023-03-07 15:21:48 +00:00

1 2 3 4 5 ...

2270 Commits