wasmtime

Author	SHA1	Message	Date
Chris Dickinson	a97e82c6e2	doc: fix StackSlot reference to FunctionBuilder (#6182 ) `FunctionBuilder::create_stackslot` was split into `create_sized_stack_slot` and `create_dynamic_stack_slot`. This updates the doc in the `StackBuilder` docstring to refer to the new methods. Fixes #5838.	2023-04-09 21:14:19 +00:00
Alexa VanHattum	71d3b638f3	Clarify instructions.rs documentation for ushr/ashr (narrow values) (#6186 )	2023-04-09 20:01:49 +00:00
kevaundray	e3dbad9cc2	add result type assertion (#6184 )	2023-04-09 19:55:15 +00:00
Chris Fallin	230e2135d6	Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. (#6167 ) * Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. This PR removes the LICM, GVN, and preopt passes, and associated support pieces, from `cranelift-codegen`. Not to worry, we still have optimizations: the egraph framework subsumes all of these, and has been on by default since #5181. A few decision points: - Filetests for the legacy LICM, GVN and simple_preopt were removed too. As we built optimizations in the egraph framework we wrote new tests for the equivalent functionality, and many of the old tests were testing specific behaviors in the old implementations that may not be relevant anymore. However if folks prefer I could take a different approach here and try to port over all of the tests. - The corresponding filetest modes (commands) were deleted too. The `test alias_analysis` mode remains, but no longer invokes a separate GVN first (since there is no separate GVN that will not also do alias analysis) so the tests were tweaked slightly to work with that. The egrpah testsuite also covers alias analysis. - The `divconst_magic_numbers` module is removed since it's unused without `simple_preopt`, though this is the one remaining optimization we still need to build in the egraphs framework, pending #5908. The magic numbers will live forever in git history so removing this in the meantime is not a major issue IMHO. - The `use_egraphs` setting itself was removed at both the Cranelift and Wasmtime levels. It has been marked deprecated for a few releases now (Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift versions) so I think this is probably OK. As an alternative if anyone feels strongly, we could leave the setting and make it a no-op. * Update test outputs for remaining test differences.	2023-04-06 18:11:03 +00:00
Afonso Bordado	a9cda5af19	cranelift: Implement PartialEq in `Function` (#6157 )	2023-04-05 22:33:10 +00:00
Remo Senekowitsch	7eb8914090	Chaos mode MVP: Skip branch optimization in MachBuffer (#6039 ) * fuzz: Add chaos mode control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: Skip branch optimization with chaos mode Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: Rename chaos engine -> control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * chaos mode: refactoring ControlPlane to be passed through the call stack by reference Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Remo Senekowitsch <contact@remsle.dev> * fuzz: annotate chaos todos Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: cleanup control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: remove control plane from compiler context Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: move control plane into emit state Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: fix remaining compiler errors Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fix tests * refactor emission state ctrl plane accessors Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * centralize conditional compilation of chaos mode Also cleanup a few straggling dependencies on cranelift-control that aren't needed anymore. Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * add cranelift-control to published crates prtest:full Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * add cranelift-control to public crates Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> --------- Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> Co-authored-by: Remo Senekowitsch <contact@remsle.dev>	2023-04-05 19:28:46 +00:00
Alex Crichton	967543eb43	aarch64: Add more lowerings for the CLIF `fma` (#6150 ) This commit adds new lowerings to the AArch64 backend of the element-based `fmla` and `fmls` instructions. These instructions have one of the multiplicands as an implicit broadcast of a single lane of another register and can help remove `shuffle` or `dup` instructions that would otherwise be used to implement them.	2023-04-05 17:22:55 +00:00
wasmtime-publish	bf741955f0	Bump Wasmtime to 9.0.0 (#6143 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-04-05 17:06:36 +00:00
Alex Crichton	d45cbba83f	Add egraph cprop optimizations for `splat` (#6148 ) This commit adds constant-propagation optimizations for `splat`-of-constant to produce a `vconst` node. This should help later hoisting these constants out of loops if it shows up in wasm.	2023-04-05 16:10:45 +00:00
Jamey Sharp	81545c3a86	Revert "simple_gvn: recognize commutative operators (#6135 )" (#6142 ) This reverts commit `c85bf27ff8`.	2023-04-04 20:22:44 +00:00
Karl Meakin	57e42d0c46	ISLE: rewrite loose inequalities to strict inequalities and strict inequalities to equalities (#6130 ) * ISLE: rewrite loose inequalities to strict inequalities * Rewrite strict inequalities to equalities where possible	2023-04-04 17:42:19 +00:00
Karl Meakin	c85bf27ff8	simple_gvn: recognize commutative operators (#6135 ) * simple_gvn: recognize commutative operators Normalize instructions with commutative opcodes by sorting the arguments. This means instructions like `iadd v0, v1` and `iadd v1, v0` will be considered identical by GVN and deduplicated. * Remove `UsubSat` and `SsubSat` from `is_commutative` They are not actually commutative * Remove `TODO`s * Move InstructionData normalization into helper fn * Add normalization of commutative instructions in the epgrah implementation * Handle reflexive icmp/fcmps in GVN * Change formatting of `normalize_in_place` * suggestions from code review	2023-04-04 00:25:05 +00:00
Karl Meakin	c8c224ead6	ISLE: move `icmp` rewrites to separate file. (#6120 ) * ISLE: move `icmp` rewrites to separate file. Move `icmp`-related rewrite rules from `algebraic.isle` to `icmp.isle`. Also move `icmp`-related tests from `algebraic.clif` to `icmp.clif`. * Put parameterized and unparameterized `icmp` tests in separate files * Undo refactoring of (ir)reflexivity rewrites * Fix `icmp-parameterised.clif` * Undo formatting/comment changes	2023-03-31 17:40:31 +00:00
Yoni L	94f2ff0921	cranelift::codegen::Context::optimize(): reduce verbosity of "egraph stats" traces (#6122 )	2023-03-30 00:46:14 +00:00
Alex Crichton	0b0ac3ff73	x64: Add AVX support for some more float-related instructions (#6092 ) * x64: Add AVX encodings of `vcvt{ss2sd,sd2ss}` Additionally update the instruction helpers to take an `XmmMem` argument to allow load sinking into the instruction. * x64: Add AVX encoding of `sqrts{s,d}` * x64: Add AVX support for `rounds{s,d}`	2023-03-29 18:09:49 +00:00
Alex Crichton	afb417920d	x64: Deduplicate fcmp emission logic (#6113 ) * x64: Deduplicate fcmp emission logic The `select`-of-`fcmp` lowering duplicated a good deal of `FloatCC` lowering logic that was already done by `emit_fcmp`, so this commit refactors these lowering rules to instead delegate to `emit_fcmp` and then handle that result. * Swap order of condition codes Shouldn't affect the correctness of this operation and it's a bit more natural to write the lowering rule this way. * Swap the order of comparison operands No need to swap `a b`, only the `x y` needs swapping. * Fix x64 printing of `XmmCmove`	2023-03-29 16:24:25 +00:00
Karl Meakin	dcf0ea9ff3	ISLE: rewrite `and`/`or` of `icmp` (#6095 ) * ISLE: rewrite `and`/`or` of `icmp` * Add `make-icmp-tests.sh` script * Remove unused changes	2023-03-29 00:13:27 +00:00
Maja Kądziołka	db07988ccb	x64: emit_cmp: use x64_test for comparisons with 0 (#6086 ) * x64: emit_cmp: use x64_test for comparisons with 0 See #5869 * fixup! x64: emit_cmp: use x64_test for comparisons with 0	2023-03-27 15:38:48 +00:00
Afonso Bordado	a002a2cc5e	riscv64: Add instruction helpers (#6099 ) * riscv64: Add helpers for `add` * riscv64: Add helpers for `sub` * riscv64: Add helpers for `sll` * riscv64: Add helpers for `srl` * riscv64: Add helpers for `sra` * riscv64: Add helpers for `or` * riscv64: Add helpers for `and` * riscv64: Add helpers for `xor` * riscv64: Add helpers for `addi` * riscv64: Add helpers for `slli` * riscv64: Add helpers for `srli` * riscv64: Add helpers for `srai` * riscv64: Add helpers for `ori` * riscv64: Add helpers for `xori` * riscv64: Add helpers for `andi` * riscv64: Add helpers for `not` * riscv64: Add helpers for `sltiu` * riscv64: Add helpers for `seqz` * riscv64: Add helpers for `addw` * riscv64: Add helpers for `subw` * riscv64: Add helpers for `sllw` * riscv64: Add helpers for `slliw` * riscv64: Add helpers for `srlw` * riscv64: Add helpers for `srliw` * riscv64: Add helpers for `sraw` * riscv64: Add helpers for `sraiw` * riscv64: Add helpers for `sltu` * riscv64: Add helpers for `mul` * riscv64: Add helpers for `mulh` * riscv64: Add helpers for `mulhu` * riscv64: Add helpers for `div` * riscv64: Add helpers for `divu` * riscv64: Add helpers for `rem` * riscv64: Add helpers for `remu` * riscv64: Add helpers for `mulw` * riscv64: Add helpers for `divw` * riscv64: Add helpers for `divuw` * riscv64: Add helpers for `remw` * riscv64: Add helpers for `remuw` * riscv64: Add helpers for `neg` * riscv64: Add helpers for `addiw` * riscv64: Add helpers for `sext.w` * riscv64: Add helpers for `fadd` * riscv64: Add helpers for `fsub` * riscv64: Add helpers for `fmul` * riscv64: Add helpers for `fdiv` * riscv64: Add helpers for `fsqrt` * riscv64: Add helpers for `fmadd` * riscv64: Add helpers for `fsgnj` * riscv64: Add helpers for `fsgnjn` * riscv64: Add helpers for `fsgnjx` * riscv64: Add helpers for `fcvtds` * riscv64: Add helpers for `fcvtsd` * riscv64: Add helpers for `adduw` * riscv64: Add helpers for `zext.w` * riscv64: Add helpers for `andn` * riscv64: Add helpers for `orn` * riscv64: Add helpers for `clz` * riscv64: Add helpers for `clzw` * riscv64: Add helpers for `ctz` * riscv64: Add helpers for `ctzw` * riscv64: Add helpers for `cpop` * riscv64: Add helpers for `max` * riscv64: Add helpers for `feq` * riscv64: Add helpers for `flt` * riscv64: Add helpers for `fle` * riscv64: Add helpers for `fgt` * riscv64: Add helpers for `fge` * riscv64: Add helpers for `sext.b` * riscv64: Add helpers for `sext.h` * riscv64: Add helpers for `zext.h` * riscv64: Add helpers for `rol` * riscv64: Add helpers for `rolw` * riscv64: Add helpers for `ror` * riscv64: Add helpers for `rorw` * riscv64: Add helpers for `rev8` * riscv64: Add helpers for `brev8` * riscv64: Add helpers for `bseti` * riscv64: Add helpers for `pack` * riscv64: Add helpers for `packw` * riscv64: Add helpers for `slli.uw` * riscv64: Add helpers for `fabs` * riscv64: Add helpers for `fneg`	2023-03-24 18:01:04 +00:00
Nathan Whitaker	c3decdf910	cranelift: Implement TLS on aarch64 Mach-O (Apple Silicon) (#5434 ) * Implement TLS on Aarch64 Mach-O * Add aarch64 macho TLS filetest * Address review comments - `Aarch64` instead of `AArch64` in comments - Remove unnecessary guard in tls_value lowering - Remove unnecessary regalloc metadata in emission * Use x1 as temporary register in emission - Instead of passing in a temporary register to use when emitting the TLS code, just use `x1`, as it's already in the clobber set. This also keeps the size of `aarch64::inst::Inst` at 32 bytes. - Update filetest accordingly * Update aarch64 mach-o TLS filetest	2023-03-24 17:54:01 +00:00
Afonso Bordado	3546ccf7d1	riscv64: Cleanup unused `lower_float_unordered` (#6096 )	2023-03-23 21:08:38 +00:00
Afonso Bordado	602ff71fe4	riscv64: Add `Zba` extension instructions (#6087 ) * riscv64: Use `add.uw` to zero extend * riscv64: Implement `add.uw` optimizations * riscv64: Add `Zba` `iadd+ishl` optimizations * riscv64: Add `shl+uextend` optimizations based on `Zba` * riscv64: Fix some issues with `Zba` instructions * riscv64: Restrict shnadd selection * riscv64: Fix `extend` priorities * riscv64: Remove redundant `addw` rule * riscv64: Specify type for `add` extend rules * riscv64: Use `u64_from_imm64` extractor instead of `uimm8` * riscv64: Restrict `uextend` in `shnadd.uw` rules * riscv64: Use concrete type in `slli.uw` rule * riscv64: Add extra arithmetic extends tests Co-authored-by: Jamey Sharp <jsharp@fastly.com> * riscv64: Make `Adduw` types concrete * riscv64: Add extra arithmetic extend tests * riscv64: Add `sextend`+Arithmetic rules * riscv64: Fix whitespace * cranelift: Move arithmetic extends tests with i128 to separate file --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-23 20:06:03 +00:00
Ulrich Weigand	6f66abd5c7	s390x: Improved TrapIf implementation (#6079 ) Following up on the discussion in https://github.com/bytecodealliance/wasmtime/pull/6011 this adds an improved implementation of TrapIf for s390x using a single conditional branch instruction. If the trap conditions is true, we branch into the middle of the branch instruction - those middle two bytes are zero, which matches the encoding of the trap instruction. In addition, show the trap code for Trap and TrapIf instructions in assembler output.	2023-03-23 14:50:43 +00:00
Alex Crichton	2fde25311e	x64: Refactor and fill out some gpr-vs-xmm bits (#6058 ) * x64: Add instruction helpers for `mov{d,q}` These will soon grow AVX-equivalents so move them to instruction helpers to have clauses for AVX in the future. * x64: Don't auto-convert between RegMemImm and XmmMemImm The previous conversion, `mov_rmi_to_xmm`, would move from GPR registers to XMM registers which isn't what many of the other `convert` statements between these newtypes do. This seemed like a possible footgun so I've removed the auto-conversion and added an explicit helper to go from a `u32` to an `XmmMemImm`. * x64: Add AVX encodings of some more GPR-related insns This commit adds some more support for AVX instructions where GPRs are in use mixed in with XMM registers. This required a few more variants of `Inst` to handle the new instructions. * Fix vpmovmskb encoding * Fix xmm-to-gpr encoding of vmovd/vmovq * Fix typo * Fix rebase conflict * Fix rebase conflict with tests	2023-03-22 14:58:09 +00:00
Afonso Bordado	7a3df7dcc0	riscv64: Improve `ctz`/`clz`/`cls` codegen (#5854 ) * cranelift: Add extra runtests for `clz`/`ctz` * riscv64: Restrict lowering rules for `ctz`/`clz` * cranelift: Add `u64` isle helpers * riscv64: Improve `ctz` codegen * riscv64: Improve `clz` codegen * riscv64: Improve `cls` codegen * riscv64: Improve `clz.i128` codegen Instead of checking if we have 64 zeros in the top half. Check if it is 0, that way we avoid loading the `64` constant. * riscv64: Improve `ctz.i128` codegen Instead of checking if we have 64 zeros in the bottom half. Check if it is 0, that way we avoid loading the `64` constant. * riscv64: Use extended value in `lower_cls` * riscv64: Use pattern matches on `bseti`	2023-03-21 23:15:14 +00:00
Karl Meakin	ff6f17ca52	ISLE: add synonyms for all variations of `icmp` (#6081 )	2023-03-21 22:13:00 +00:00
Alexa VanHattum	13be5618a7	Cranelift: ISLE: aarch64: fix `imm12_from_negated_value` for `i32`, `i16` (#6078 ) * Fix the semantics of imm12_from_negated_value, swapping to a partial term + rule * wrapping_neg	2023-03-21 19:16:25 +00:00
Trevor Elliott	861220c433	Restrict the types for isplit and iconcat to match backends (#6070 ) * Restrict the types for isplit and iconcat to match backends * Admit unimplemented bitwidths to isplit/iconcat * Modify the NarrowInt type instead of shadowing it * Fix filetest failures	2023-03-21 01:21:00 +00:00
Karl Meakin	7d9318fe77	cranelift: rewrite `iabs(ineg(x))` and `iabs(iabs(x))` (#6072 ) * cranelift: rerwite `iabs(ineg(x))`` and `iabs(iabs(x))` * Fix comment on `iabs(iabs(x))` rewrite * Remove subsume on rewrite for `iabs(ineg(x))`	2023-03-21 00:12:21 +00:00
Alex Crichton	a3b21031d4	Add a `MachBuffer::defer_trap` method (#6011 ) * Add a `MachBuffer::defer_trap` method This commit adds a new method to `MachBuffer` to defer trap opcodes to the end of a function in a similar manner to how constants are deferred to the end of the function. This is useful for backends which frequently use `TrapIf`-style opcodes. Currently a jump is emitted which skips the next instruction, a trap, and then execution continues normally. While there isn't any pressing problem with this construction the trap opcode is in the middle of the instruction stream as opposed to "off on the side" despite rarely being taken. With this method in place all the backends (except riscv64 since I couldn't figure it out easily enough) have a new lowering of their `TrapIf` opcode. Now a trap is deferred, which returns a label, and then that label is jumped to when executing the trap. A fixup is then recorded in `MachBuffer` to get patched later on during emission, or at the end of the function. Subsequently all `TrapIf` instructions translate to a single branch plus a single trap at the end of the function. I've additionally further updated some more lowerings in the x64 backend which were explicitly using traps to instead use `TrapIf` where applicable to avoid jumping over traps mid-function. Other backends didn't appear to have many jump-over-the-next-trap patterns. Lots of tests have had their expectations updated here which should reflect all the traps being sunk to the end of functions. * Print trap code on all platforms * Emit traps before constants * Preserve source location information for traps * Fix test expectations * Attempt to fix s390x The MachBuffer was registering trap codes with the first byte of the trap, but the SIGILL handler was expecting it to be registered with the last byte of the trap. Exploit that SIGILL is always represented with a 2-byte instruction and always march 2-backwards for SIGILL, continuing to march backwards 1 byte for SIGFPE-generating instructions. * Back out s390x changes * Back out more s390x bits * Review comments	2023-03-20 21:24:47 +00:00
bjorn3	49bab6db7f	Ensure the sequence number doesn't leak out of Layout (#6061 ) Previously it could affect the PartialEq and Hash impls. Ignoring the sequence number in PartialEq and Hash allows us to not renumber all blocks in the incremental cache.	2023-03-20 19:20:00 +00:00
bjorn3	fc3c5d2414	Properly use the VersionMarker in CachedFunc (#6062 )	2023-03-20 19:18:51 +00:00
Alex Crichton	f7dda1ab2c	x64: Fix vbroadcastss with AVX2 and without AVX (#6060 ) * x64: Fix vbroadcastss with AVX2 and without AVX This commit fixes a corner case in the emission of the `vbroadcasts{s,d}` instructions. The memory-to-xmm form of these instructions was available with the AVX instruction set, but the xmm-to-xmm form of these instructions wasn't available until AVX2. The instruction requirement for these are listed as AVX but the lowering rules are appropriately annotated to use either AVX2 or AVX when appropriate. While this should work in practice this didn't work for the assertion about enabled features for each instruction. The `vbroadcastss` instruction was listed as requiring AVX but could get emitted when AVX2 was enabled (due to the reg-to-reg form being available). This caused an issue for the fuzzer where AVX2 was enabled but AVX was disabled. One possible fix would be to add more opcodes, one for reg-to-reg and one for mem-to-reg. That seemed like somewhat overkill for a pretty niche situation that shouldn't actually come up in practice anywhere. Instead this commit changes all the `has_avx` accessors to the `use_avx_simd` predicate already available in the target flags. The `use_avx2_simd` predicate was then updated to additionally require `has_avx`, so if AVX2 is enabled and AVX is disabled then the `vbroadcastss` instruction won't get emitted any more. Closes #6059 * Pass `enable_simd` on a few more files	2023-03-18 18:38:03 +00:00
Trevor Elliott	78dbe93f21	Rename `as_bool` to `as_truthy`, and fix TypeSet::as_bool (#6027 )	2023-03-17 21:11:24 +00:00
bjorn3	2c40c267d4	Make sequence numbers local to instructions (#6043 ) * Only allow pp_cmp within a single block Block order shouldn't matter for codegen and restricting pp_cmp to a single block will allow making instruction sequence numbers local to a block. * Make sequence numbers local to instructions This allows renumbering to be localized to a single block where previously it could affect the entire function. Also saves 32bit of overhead per block.	2023-03-17 20:53:21 +00:00
Karl Meakin	208d09e9f0	cranelift: rewrite `x-1` to `ineg(x)` (#6052 ) cranelift: rewrite `x-1` to `ineg(x)` Add commuted test	2023-03-17 19:52:13 +00:00
Karl Meakin	c3f5b71b6a	craneleft: cancel `ineg` when args to `imul` (#6053 ) * craneleft: cancel `ineg`/`iabs` when args to `imul` * Remove unsound `iabs(x) * iabs(y) == x*y` rewrite	2023-03-17 19:41:20 +00:00
Karl Meakin	73cc433bdd	cranelift: simplify `icmp` against UMAX/SMIN/SMAX (#6037 ) * cranelift: simplify `icmp` against UMAX/SMIN/SMAX * Add tests for icmp against numeric limits	2023-03-17 18:54:29 +00:00
bjorn3	a81c206870	Various cleanups to Layout (#6042 ) * Use inst_block instead of pp_block where possible * Remove unused is_block_gap method * Remove ProgramOrder trait It only has a single implementation * Rename Layout::cmp to pp_cmp to distinguish it from Ord::cmp * Make pp_block non-generic * Use rpo_cmp_block instead of rpo_cmp in the verifier * Remove ProgramPoint * Rename ExpandedProgramPoint to ProgramPoint * Remove From<ValueDef> for ProgramPoint impl	2023-03-17 18:46:34 +00:00
Trevor Elliott	411a3eff3e	cranelift: Emit a table of opcodes in gen_inst (#6046 ) * Emit a table of opcodes in gen_inst * Remove accidental export of OPCODE_SIGNATURES * Generate `Opcode::all` instead of a table	2023-03-17 17:38:28 +00:00
Alex Crichton	5ebe53a351	x64: Elide more uextend with extractlane (#6045 ) * x64: Elide more uextend with extractlane I've confirmed locally now that `pextr{b,w,d}` all zero the upper bits of the full 64-bit register size which means that the `extractlane` operation with a zero-extend can be elided for more cases, including 8-to-64-bit casts as well as 32-to-64. This helps elide a few extra `mov`s in a loop I was looking at and had a modest corresponding increase in performance (my guess was due to the slightly decreased code size mostly as opposed to the removed `mov`s). * Remove stray file	2023-03-17 16:18:41 +00:00
Karl Meakin	b53d66e634	cranelift: simplify `x-x` to `0` (#6032 ) * cranelift: simplify `x-x` to `0` * Guard `x-x => 0` rewrite with `fits_in_64`	2023-03-17 15:14:28 +00:00
Alex Crichton	8e500099b3	x64: Refactor and add extractlane special case for uextend/sextend (#6022 ) * x64: Refactor sextend/uextend rules Move much of the meaty logic from these lowering rules into the `extend_to_gpr` helper to benefit other callers of `extend_to_gpr` to elide instructions. This additionally simplifies `sextend` and `uextend` lowerings to rely on optimizations happening within the `extend_to_gpr` helper. * x64: Skip `uextend` for `pextr{b,w}` instructions These instructions are documented as automatically zeroing the upper bits so `uextend` operations can be skipped. This slightly improves codegen for the wasm `i{8x16,16x8}.extract_lane_u` instructions, for example. * Modernize an extractor pattern * Trim some superfluous match clauses Additionally rejigger priorities to be "mostly default" now. * Refactor 32-to-64 predicate to a helper Also adjust the pattern matched in the `extend_to_gpr` helper. * Slightly refactor pextr{b,w} case * Review comments	2023-03-16 22:14:59 +00:00
Karl Meakin	d479951469	cranelift: simplify `fneg(fneg(x))` to `x` (#6034 )	2023-03-16 22:14:12 +00:00
Karl Meakin	dccc2d6269	cranelift: simplify `ineg(ineg(x))` to `x` (#6033 )	2023-03-16 22:14:05 +00:00
Alex Crichton	5ae8575296	x64: Take SIGFPE signals for divide traps (#6026 ) * x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests	2023-03-16 00:18:45 +00:00
Alex Crichton	d76f7ee52e	x64: Improve codegen for splats (#6025 ) This commit goes through the lowerings for the CLIF `splat` instruction and improves the support for each operator. Many of these lowerings are mirrored from v8/SpiderMonkey and there are a number of improvements: * AVX2 `v{p,}broadcast` instructions are added and used when available. Float-based splats are much simpler and always a single-instruction * Integer-based splats don't insert into an uninit xmm value and instead start out with a `movd` to move into an `xmm` register. This thoeretically breaks dependencies with prior instructions since `movd` creates a fresh new value in the destination register. * Loads are now sunk into all of the instructions. A new extractor, `sinkable_load_exact`, was added to sink the i8/i16 loads.	2023-03-15 21:33:56 +00:00
Afonso Bordado	a10c50afe9	cranelift: Translate `stack_*` accesses as unaligned (#6016 ) We can't currently ensure that these will be aligned, so we shouldn't mark them as such.	2023-03-15 18:05:55 +00:00
Alex Crichton	6ed90f86c8	x64: Add support for the `pblendw` instruction (#6023 ) This commit adds another case for `shuffle` lowering to the x64 backend for the `{,v}pblendw` instruction. This instruction selects 16-bit values from either of the inputs corresponding to an immediate 8-bit-mask where each bit selects the corresponding lane from the inputs.	2023-03-15 17:20:43 +00:00
Alex Crichton	fcddb9ca81	x64: Add lea-based lowering for iadd (#5986 ) * x64: Refactor `Amode` computation in ISLE This commit replaces the previous computation of `Amode` with a different set of rules that are intended to achieve the same purpose but are structured differently. The motivation for this commit is going to become more relevant in the next commit where `lea` will be used for the `iadd` instruction, possibly, on x64. When doing so it caused a stack overflow in the test suite during the compilation phase of a wasm module, namely as part of the `amode_add` function. This function is recursively defined in terms of itself and recurses as deep as the deepest `iadd`-chain in a program. A particular test in our test suite has a 10k-long chain of `iadd` which ended up causing a stack overflow in debug mode. This stack overflow is caused because the `amode_add` helper in ISLE unconditionally peels all the `iadd` nodes away and looks at all of them, even if most end up in intermediate registers along the way. Given that structure I couldn't find a way to easily abort the recursion. The new `to_amode` helper is structured in a similar fashion but attempts to instead only recurse far enough to fold items into the final `Amode` instead of recursing through items which themselves don't end up in the `Amode`. Put another way previously the `amode_add` helper might emit `x64_add` instructions, but it no longer does that. This goal of this commit is to preserve all the original `Amode` optimizations, however. For some parts, though, it relies more on egraph optimizations to run since if an `iadd` is 10k deep it doesn't try to find a constant buried 9k levels inside there to fold into the `Amode`. The hope, though, is that with egraphs having run already it's shuffled constants to the right most of the time and already folded any possible together. * x64: Add `lea`-based lowering for `iadd` This commit adds a rule for the lowering of `iadd` to use `lea` for 32 and 64-bit addition. The theoretical benefit of `lea` over the `add` instruction is that the `lea` variant can emulate a 3-operand instruction which doesn't destructively modify on of its operands. Additionally the `lea` operation can fold in other components such as constant additions and shifts. In practice, however, if `lea` is unconditionally used instead of `iadd` it ends up losing 10% performance on a local `meshoptimizer` benchmark. My best guess as to what's going on here is that my CPU's dedicated units for address computation are all overloaded while the ALUs are basically idle in a memory-intensive loop. Previously when the ALU was used for `add` and the address units for stores/loads it in theory pipelined things better (most of this is me shooting in the dark). To prevent the performance loss here I've updated the lowering of `iadd` to conditionally sometimes use `lea` and sometimes use `add` depending on how "complicated" the `Amode` is. Simple ones like `a + b` or `a + $imm` continue to use `add` (and its subsequent hypothetical extra `mov` necessary into the result). More complicated ones like `a + b + $imm` or `a + b << c + $imm` use `lea` as it can remove the need for extra instructions. Locally at least this fixes the performance loss relative to unconditionally using `lea`. One note is that this adds an `OperandSize` argument to the `MInst::LoadEffectiveAddress` variant to add an encoding for 32-bit `lea` in addition to the preexisting 64-bit encoding. * Conditionally use `lea` based on regalloc	2023-03-15 17:14:25 +00:00

1 2 3 4 5 ...

2303 Commits