wasmtime

Author	SHA1	Message	Date
Alex Crichton	edae6c0217	x64: Add non-SSE4.1 lowerings of `pmov{s,z}x` (#6279 ) x64: Add non-SSE4.1 lowerings of `pmov{s,z}x` This commit adds lowerings for a suite of sign/zero extension instructions which don't require SSE4.1. Like before these lowerings are based on LLVM's output. This commit also deletes special casees for `i16x8.extmul_{low,high}_` since the output of the special case is the same as the default lowering of all the component instructions used within as well. * Remove SSE4.1 specialization of `uwiden_high` LLVM prefers the `punpckh*`-based lowerings and at least according to `llvm-mca` these are slightly better cycle-wise too.	2023-04-27 02:15:34 +00:00
Afonso Bordado	62cbb5045e	riscv64: Implement a few SIMD arithmetic ops (#6268 ) * riscv64: Swap order of `VecAluRRR` source registers These were accidentally reversed from what we declare in the isle emit helper * riscv64: Add SIMD `isub` * riscv64: Add SIMD `imul` * riscv64: Add `{u,s}mulhi` * riscv64: Add `b{and,or,xor}` * cranelift: Move `imul.i8x16` runtest to separate file Looks like x86 does not implement it * riscv64: Better formatting for `VecAluOpRRR` * cranelift: Enable x86 SIMD tests with `has_sse41=false`	2023-04-25 16:39:33 +00:00
Afonso Bordado	4337ccd4b7	riscv64: Support non 128bit vector sizes (#6266 ) * riscv64: Add `Zvl` extensions * riscv64: Allow lowering SIMD operations that fit in a vector register * riscv64: Support non 128bit vector sizes * riscv64: Add Zvl Presets * riscv64: Precompute `min_vec_reg_size`	2023-04-25 14:50:00 +00:00
Alex Crichton	d2bb4aa13b	x64: Add non-SSE4.1 lowering for pmulld (#6259 ) Adds a lowering for SSE2 for i32x4-based multiplication which only first became available in SSE4.1	2023-04-24 17:24:19 +00:00
Trevor Elliott	d9e27c5441	Bump regalloc2 to 0.7.0 (#6237 ) * Bump RA2 to 0.7.0 * Certify the RA2 update * Import the rustc-hash audit * Updates for regalloc2 prtest:full * Update tests	2023-04-21 00:47:58 +00:00
Trevor Elliott	e6339b2725	s390x: Remove uses of copy_reg (#6253 ) * Remove uses of `copy_reg` in s390x lowerings * Update tests * Add one copy back in for smulhi to avoid an inserted move	2023-04-20 22:05:55 +00:00
Alex Crichton	a2a38edd8a	x64: Add non-SSE 4.1 lowerings for `v{all,any}_true` (#6232 ) This commit adds lowerings to the x64 backend for two more CLIF instructions that currently require SSE 4.1. These lowerings are inspired by LLVM's lowerings and avoid the use of SSE 4.1 instructions.	2023-04-20 21:56:59 +00:00
Afonso Bordado	60e4a00413	riscv64: Initial SIMD Vector Implementation (#6240 ) * riscv64: Remove unused code * riscv64: Add vector types * riscv64: Initial Vector ABI Load/Stores * riscv64: Vector Loads/Stores * riscv64: Fix `vsetvli` encoding error * riscv64: Add SIMD `iadd` runtests * riscv64: Rename `VecSew` The SEW name is correct, but only for VType. We also use this type in loads/stores as the Efective Element Width, so the name isn't quite correct in that case. * ci: Add V extension to RISC-V QEMU * riscv64: Misc Cleanups * riscv64: Check V extension in `load`/`store` for SIMD * riscv64: Fix `sumop` doc comment * cranelift: Fix comment typo * riscv64: Add convert for VType and VecElementWidth * riscv64: Remove VecElementWidth converter	2023-04-20 21:54:43 +00:00
Trevor Elliott	7ad2fe32c9	aarch64: Avoid a copy in the extract_vector lowering (#6252 ) * Avoid a move in the aarch64 lowering of extract_vector * Update tests	2023-04-20 18:24:12 +00:00
Trevor Elliott	b667f5fa5b	Avoid introducing a move for struct return (#6245 ) * Move sret_reg handling to Lower:new, from gen_arg_setup * Update exp files * Review feedback * Review feedback	2023-04-20 18:13:40 +00:00
Trevor Elliott	f89ac63766	riscv64: Remove the gen_move2 helper (#6246 ) * Remove gen_move2 from riscv64 * Update exp files	2023-04-19 21:04:30 +00:00
Alex Crichton	62f8928bee	x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest (#6224 ) * x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest This commit adds lowerings that work with SSE2 for CLIF `ceil`, `trunc`, `floor`, and `nearest` instructions over vectors. To get these working `insertlane` for float vectors was also implemented for non-SSE4.1 instructions as well. Note that the goal of these lowerings is not speed but rather "it works", so the decompose-to-call-libcalls logic for vector is probably horrendously slow but should at least be correct. * Skip new tests on riscv64 * Update cranelift/codegen/src/isa/x64/inst.isle Co-authored-by: Andrew Brown <andrew.brown@intel.com> --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2023-04-18 17:23:18 +00:00
Alex Crichton	7ebff82861	Optimize sign extension via shifts (#6220 ) * Optimize sign extension via shifts This commit adds egraph optimization patterns for left-shifting a value and then right-shifting it as a form of sign extending its lower bits. This matches the behavior of the WebAssembly `i32.extend8_s` instruction, for example. Note that the lowering of that WebAssembly instruction does not use shifts, but historical versions of LLVM that didn't support the instruction, or versions with the instruction disabled, will use shifts instead. A second rule for reduction-of-extend being the same as the original value was added to keep an existing shift-related test passing as well. * Add reference assemblies for new opts	2023-04-17 18:48:08 +00:00
Afonso Bordado	9e1ff9726c	egraphs: Add `bmask` bit pattern optimization rule (#6196 ) * egraphs: Add a bmask bit pattern optimization * egraphs: Add more `ineg` rules * egraphs: Add sshr rule * egraphs: Simplify bmask rule * egraphs: Add comutative version of bmask rule * egraphs: Add more testcases * egraphs: Cleanup rule comments * egraphs: Add more `ineg` optimizations	2023-04-14 18:50:48 +00:00
Alex Crichton	2d25db047f	x64: Lower SIMD requirement to SSE4.1 from SSE4.2 (#6206 ) Cranelift only has one instruction SIMD which depends on SSE4.2 so this commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and enables lowering the baseline requirement for SIMD support from SSE4.2 to SSE4.1. The `has_sse42` setting is no longer enabled by default for Cranelift. Additionally `enable_simd` no longer requires `has_sse42` on x64. Finally the fuzz-generator for Wasmtime codegen settings now enables flipping the `has_sse42` setting instead of unconditionally setting it to `true`. The specific lowering for `pcmpgtq` is copied from LLVM's lowering of this instruction.	2023-04-14 17:24:43 +00:00
T0b1-iOS	3956a6aa0f	remove `unsigned_add_overflow_condition` (#6199 )	2023-04-13 14:30:44 +00:00
T0b1-iOS	f684a5fbee	remove `iadd_cout` and `isub_bout` (#6198 )	2023-04-11 23:39:32 +00:00
Karl Meakin	c0166f78f9	ISLE: simplify select/bitselect when both choices are the same (#6141 )	2023-04-11 22:41:19 +00:00
Karl Meakin	b9a58148cf	ISLE: split algebraic.isle into several files (#6140 ) * ISLE: split algebraic.isle into several files * delete `algebraic.clif` * Add `README.md` * Remove old `algebraic.clif` tests --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-04-11 21:39:18 +00:00
T0b1-iOS	569089e473	Add `{u,s}{add,sub,mul}_overflow` instructions (#5784 ) * add `{u,s}{add,sub,mul}_overflow` with interpreter * add `{u,s}{add,sub,mul}_overflow` for x64 * add `{u,s}{add,sub,mul}_overflow` for aarch64 * 128bit filetests for `{u,s}{add,sub,mul}_overflow` * `{u,s}{add,sub,mul}_overflow` emit tests for x64 * `{u,s}{add,sub,mul}_overflow` emit tests for aarch64 * Initial review changes * add `with_flags_extended` helper * add `with_flags_chained` helper	2023-04-11 20:16:04 +00:00
Afonso Bordado	4c32dd7786	riscv64: Delete `SelectIf` instruction (#5888 ) * riscv64: Delete `SelectIf` instruction * riscv64: Fix typo in comment Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com> * riscv64: Improve `bmask` codegen * riscv64: Use `lower_bmask` in `select_spectre_guard` * riscv64: Use `lower_bmask` to extend values in `select_spectre_guard` Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com> --------- Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>	2023-04-11 17:33:32 +00:00
Alex Crichton	435b6894d7	x64: Clarify and shrink up ModRM/SIB encoding (#6181 ) I noticed recently that for the `ImmRegRegShift` addressing mode Cranelift will unconditionally emit at least a 1-byte immediate for the offset to be added to the register addition computation, even when the offset is zero. In this case though the instruction encoding can be slightly more compact and remove a byte. This commit started off by applying this optimization, which resulted in the `*.clif` test changes in this commit. Further reading this code, however, I personally found it quite hard to follow what was happening with all the various branches and ModRM/SIB bits. I reviewed these encodings in the x64 architecture manual and attempted to improve the logic for encoding here. The new version in this commit is intended to be functionally equivalent to the prior version where dropping a zero-offset from the `ImmRegRegShift` variant is the only change.	2023-04-10 19:37:19 +00:00
Chris Fallin	8f1a7773a3	Revert "ISLE: rewrite loose inequalities to strict inequalities and strict inequalities to equalities (#6130 )" (#6193 ) This reverts commit `57e42d0c46`. Fixes #6185.	2023-04-10 18:43:15 +00:00
Jamey Sharp	ac2bd1f305	cranelift: Rename a filetest with the wrong extension (#6190 ) This test was committed with a `.isle` extension instead of `.clif`, so it wasn't actually running in the test suite. Fortunately, it still passes.	2023-04-10 16:27:42 +00:00
Chris Fallin	230e2135d6	Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. (#6167 ) * Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. This PR removes the LICM, GVN, and preopt passes, and associated support pieces, from `cranelift-codegen`. Not to worry, we still have optimizations: the egraph framework subsumes all of these, and has been on by default since #5181. A few decision points: - Filetests for the legacy LICM, GVN and simple_preopt were removed too. As we built optimizations in the egraph framework we wrote new tests for the equivalent functionality, and many of the old tests were testing specific behaviors in the old implementations that may not be relevant anymore. However if folks prefer I could take a different approach here and try to port over all of the tests. - The corresponding filetest modes (commands) were deleted too. The `test alias_analysis` mode remains, but no longer invokes a separate GVN first (since there is no separate GVN that will not also do alias analysis) so the tests were tweaked slightly to work with that. The egrpah testsuite also covers alias analysis. - The `divconst_magic_numbers` module is removed since it's unused without `simple_preopt`, though this is the one remaining optimization we still need to build in the egraphs framework, pending #5908. The magic numbers will live forever in git history so removing this in the meantime is not a major issue IMHO. - The `use_egraphs` setting itself was removed at both the Cranelift and Wasmtime levels. It has been marked deprecated for a few releases now (Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift versions) so I think this is probably OK. As an alternative if anyone feels strongly, we could leave the setting and make it a no-op. * Update test outputs for remaining test differences.	2023-04-06 18:11:03 +00:00
bjorn3	e1812b611b	Rename define_function to define_function_with_control_plane (#6165 ) And add a define_function convenience function which uses a default control plane.	2023-04-06 16:14:13 +00:00
Remo Senekowitsch	7eb8914090	Chaos mode MVP: Skip branch optimization in MachBuffer (#6039 ) * fuzz: Add chaos mode control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: Skip branch optimization with chaos mode Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: Rename chaos engine -> control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * chaos mode: refactoring ControlPlane to be passed through the call stack by reference Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Remo Senekowitsch <contact@remsle.dev> * fuzz: annotate chaos todos Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: cleanup control plane Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: remove control plane from compiler context Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: move control plane into emit state Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fuzz: fix remaining compiler errors Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * fix tests * refactor emission state ctrl plane accessors Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * centralize conditional compilation of chaos mode Also cleanup a few straggling dependencies on cranelift-control that aren't needed anymore. Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * add cranelift-control to published crates prtest:full Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> * add cranelift-control to public crates Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> --------- Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com> Co-authored-by: Moritz Waser <mzrw.dev@pm.me> Co-authored-by: Remo Senekowitsch <contact@remsle.dev>	2023-04-05 19:28:46 +00:00
Alex Crichton	967543eb43	aarch64: Add more lowerings for the CLIF `fma` (#6150 ) This commit adds new lowerings to the AArch64 backend of the element-based `fmla` and `fmls` instructions. These instructions have one of the multiplicands as an implicit broadcast of a single lane of another register and can help remove `shuffle` or `dup` instructions that would otherwise be used to implement them.	2023-04-05 17:22:55 +00:00
Alex Crichton	d45cbba83f	Add egraph cprop optimizations for `splat` (#6148 ) This commit adds constant-propagation optimizations for `splat`-of-constant to produce a `vconst` node. This should help later hoisting these constants out of loops if it shows up in wasm.	2023-04-05 16:10:45 +00:00
Jamey Sharp	81545c3a86	Revert "simple_gvn: recognize commutative operators (#6135 )" (#6142 ) This reverts commit `c85bf27ff8`.	2023-04-04 20:22:44 +00:00
Karl Meakin	57e42d0c46	ISLE: rewrite loose inequalities to strict inequalities and strict inequalities to equalities (#6130 ) * ISLE: rewrite loose inequalities to strict inequalities * Rewrite strict inequalities to equalities where possible	2023-04-04 17:42:19 +00:00
Jan-Justin van Tonder	c475735f5e	cranelift-interpreter: Fix incorrect scalar_to_vector result (#6133 ) * The `vectorizelanes` function performs a check to see whether there is a single value provided in an array, and if so returns it as a scalar. While elsewhere in the interpreter this behaviour is relied upon, it yields an incorrect result when attempting to convert a scalar to a vector. The original `vectorizelanes` remains untouched, however, an unconditional variant `vectorizelanes_all` was added. * A test was added under `filetests/runtests/issue5911.clif`. Fixes #5911	2023-04-04 12:14:16 +00:00
Karl Meakin	c85bf27ff8	simple_gvn: recognize commutative operators (#6135 ) * simple_gvn: recognize commutative operators Normalize instructions with commutative opcodes by sorting the arguments. This means instructions like `iadd v0, v1` and `iadd v1, v0` will be considered identical by GVN and deduplicated. * Remove `UsubSat` and `SsubSat` from `is_commutative` They are not actually commutative * Remove `TODO`s * Move InstructionData normalization into helper fn * Add normalization of commutative instructions in the epgrah implementation * Handle reflexive icmp/fcmps in GVN * Change formatting of `normalize_in_place` * suggestions from code review	2023-04-04 00:25:05 +00:00
Karl Meakin	c8c224ead6	ISLE: move `icmp` rewrites to separate file. (#6120 ) * ISLE: move `icmp` rewrites to separate file. Move `icmp`-related rewrite rules from `algebraic.isle` to `icmp.isle`. Also move `icmp`-related tests from `algebraic.clif` to `icmp.clif`. * Put parameterized and unparameterized `icmp` tests in separate files * Undo refactoring of (ir)reflexivity rewrites * Fix `icmp-parameterised.clif` * Undo formatting/comment changes	2023-03-31 17:40:31 +00:00
Alex Crichton	0b0ac3ff73	x64: Add AVX support for some more float-related instructions (#6092 ) * x64: Add AVX encodings of `vcvt{ss2sd,sd2ss}` Additionally update the instruction helpers to take an `XmmMem` argument to allow load sinking into the instruction. * x64: Add AVX encoding of `sqrts{s,d}` * x64: Add AVX support for `rounds{s,d}`	2023-03-29 18:09:49 +00:00
Alex Crichton	afb417920d	x64: Deduplicate fcmp emission logic (#6113 ) * x64: Deduplicate fcmp emission logic The `select`-of-`fcmp` lowering duplicated a good deal of `FloatCC` lowering logic that was already done by `emit_fcmp`, so this commit refactors these lowering rules to instead delegate to `emit_fcmp` and then handle that result. * Swap order of condition codes Shouldn't affect the correctness of this operation and it's a bit more natural to write the lowering rule this way. * Swap the order of comparison operands No need to swap `a b`, only the `x y` needs swapping. * Fix x64 printing of `XmmCmove`	2023-03-29 16:24:25 +00:00
Karl Meakin	dcf0ea9ff3	ISLE: rewrite `and`/`or` of `icmp` (#6095 ) * ISLE: rewrite `and`/`or` of `icmp` * Add `make-icmp-tests.sh` script * Remove unused changes	2023-03-29 00:13:27 +00:00
Karl Meakin	97d9f77d94	Add `precise_output` argument to `test optimize`. (#6111 ) * Add `precise_output` argument to `test optimise`. Also allow optimise tests to be updated by `CRANELIFT_TEST_BLESS=1` * Move `check_precise_output` and `update_test` to `subtest`	2023-03-28 22:32:04 +00:00
Maja Kądziołka	db07988ccb	x64: emit_cmp: use x64_test for comparisons with 0 (#6086 ) * x64: emit_cmp: use x64_test for comparisons with 0 See #5869 * fixup! x64: emit_cmp: use x64_test for comparisons with 0	2023-03-27 15:38:48 +00:00
Nathan Whitaker	c3decdf910	cranelift: Implement TLS on aarch64 Mach-O (Apple Silicon) (#5434 ) * Implement TLS on Aarch64 Mach-O * Add aarch64 macho TLS filetest * Address review comments - `Aarch64` instead of `AArch64` in comments - Remove unnecessary guard in tls_value lowering - Remove unnecessary regalloc metadata in emission * Use x1 as temporary register in emission - Instead of passing in a temporary register to use when emitting the TLS code, just use `x1`, as it's already in the clobber set. This also keeps the size of `aarch64::inst::Inst` at 32 bytes. - Update filetest accordingly * Update aarch64 mach-o TLS filetest	2023-03-24 17:54:01 +00:00
Afonso Bordado	602ff71fe4	riscv64: Add `Zba` extension instructions (#6087 ) * riscv64: Use `add.uw` to zero extend * riscv64: Implement `add.uw` optimizations * riscv64: Add `Zba` `iadd+ishl` optimizations * riscv64: Add `shl+uextend` optimizations based on `Zba` * riscv64: Fix some issues with `Zba` instructions * riscv64: Restrict shnadd selection * riscv64: Fix `extend` priorities * riscv64: Remove redundant `addw` rule * riscv64: Specify type for `add` extend rules * riscv64: Use `u64_from_imm64` extractor instead of `uimm8` * riscv64: Restrict `uextend` in `shnadd.uw` rules * riscv64: Use concrete type in `slli.uw` rule * riscv64: Add extra arithmetic extends tests Co-authored-by: Jamey Sharp <jsharp@fastly.com> * riscv64: Make `Adduw` types concrete * riscv64: Add extra arithmetic extend tests * riscv64: Add `sextend`+Arithmetic rules * riscv64: Fix whitespace * cranelift: Move arithmetic extends tests with i128 to separate file --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-23 20:06:03 +00:00
Ulrich Weigand	6f66abd5c7	s390x: Improved TrapIf implementation (#6079 ) Following up on the discussion in https://github.com/bytecodealliance/wasmtime/pull/6011 this adds an improved implementation of TrapIf for s390x using a single conditional branch instruction. If the trap conditions is true, we branch into the middle of the branch instruction - those middle two bytes are zero, which matches the encoding of the trap instruction. In addition, show the trap code for Trap and TrapIf instructions in assembler output.	2023-03-23 14:50:43 +00:00
Alex Crichton	2fde25311e	x64: Refactor and fill out some gpr-vs-xmm bits (#6058 ) * x64: Add instruction helpers for `mov{d,q}` These will soon grow AVX-equivalents so move them to instruction helpers to have clauses for AVX in the future. * x64: Don't auto-convert between RegMemImm and XmmMemImm The previous conversion, `mov_rmi_to_xmm`, would move from GPR registers to XMM registers which isn't what many of the other `convert` statements between these newtypes do. This seemed like a possible footgun so I've removed the auto-conversion and added an explicit helper to go from a `u32` to an `XmmMemImm`. * x64: Add AVX encodings of some more GPR-related insns This commit adds some more support for AVX instructions where GPRs are in use mixed in with XMM registers. This required a few more variants of `Inst` to handle the new instructions. * Fix vpmovmskb encoding * Fix xmm-to-gpr encoding of vmovd/vmovq * Fix typo * Fix rebase conflict * Fix rebase conflict with tests	2023-03-22 14:58:09 +00:00
Afonso Bordado	7a3df7dcc0	riscv64: Improve `ctz`/`clz`/`cls` codegen (#5854 ) * cranelift: Add extra runtests for `clz`/`ctz` * riscv64: Restrict lowering rules for `ctz`/`clz` * cranelift: Add `u64` isle helpers * riscv64: Improve `ctz` codegen * riscv64: Improve `clz` codegen * riscv64: Improve `cls` codegen * riscv64: Improve `clz.i128` codegen Instead of checking if we have 64 zeros in the top half. Check if it is 0, that way we avoid loading the `64` constant. * riscv64: Improve `ctz.i128` codegen Instead of checking if we have 64 zeros in the bottom half. Check if it is 0, that way we avoid loading the `64` constant. * riscv64: Use extended value in `lower_cls` * riscv64: Use pattern matches on `bseti`	2023-03-21 23:15:14 +00:00
Alexa VanHattum	13be5618a7	Cranelift: ISLE: aarch64: fix `imm12_from_negated_value` for `i32`, `i16` (#6078 ) * Fix the semantics of imm12_from_negated_value, swapping to a partial term + rule * wrapping_neg	2023-03-21 19:16:25 +00:00
Trevor Elliott	861220c433	Restrict the types for isplit and iconcat to match backends (#6070 ) * Restrict the types for isplit and iconcat to match backends * Admit unimplemented bitwidths to isplit/iconcat * Modify the NarrowInt type instead of shadowing it * Fix filetest failures	2023-03-21 01:21:00 +00:00
Karl Meakin	7d9318fe77	cranelift: rewrite `iabs(ineg(x))` and `iabs(iabs(x))` (#6072 ) * cranelift: rerwite `iabs(ineg(x))`` and `iabs(iabs(x))` * Fix comment on `iabs(iabs(x))` rewrite * Remove subsume on rewrite for `iabs(ineg(x))`	2023-03-21 00:12:21 +00:00
Alex Crichton	a3b21031d4	Add a `MachBuffer::defer_trap` method (#6011 ) * Add a `MachBuffer::defer_trap` method This commit adds a new method to `MachBuffer` to defer trap opcodes to the end of a function in a similar manner to how constants are deferred to the end of the function. This is useful for backends which frequently use `TrapIf`-style opcodes. Currently a jump is emitted which skips the next instruction, a trap, and then execution continues normally. While there isn't any pressing problem with this construction the trap opcode is in the middle of the instruction stream as opposed to "off on the side" despite rarely being taken. With this method in place all the backends (except riscv64 since I couldn't figure it out easily enough) have a new lowering of their `TrapIf` opcode. Now a trap is deferred, which returns a label, and then that label is jumped to when executing the trap. A fixup is then recorded in `MachBuffer` to get patched later on during emission, or at the end of the function. Subsequently all `TrapIf` instructions translate to a single branch plus a single trap at the end of the function. I've additionally further updated some more lowerings in the x64 backend which were explicitly using traps to instead use `TrapIf` where applicable to avoid jumping over traps mid-function. Other backends didn't appear to have many jump-over-the-next-trap patterns. Lots of tests have had their expectations updated here which should reflect all the traps being sunk to the end of functions. * Print trap code on all platforms * Emit traps before constants * Preserve source location information for traps * Fix test expectations * Attempt to fix s390x The MachBuffer was registering trap codes with the first byte of the trap, but the SIGILL handler was expecting it to be registered with the last byte of the trap. Exploit that SIGILL is always represented with a 2-byte instruction and always march 2-backwards for SIGILL, continuing to march backwards 1 byte for SIGFPE-generating instructions. * Back out s390x changes * Back out more s390x bits * Review comments	2023-03-20 21:24:47 +00:00
Alex Crichton	dd7fa81b20	x64: Run more filetests with AVX support (#6063 ) This commit goes through the `runtests` folder of the `filetests` test suite and ensure that everything which uses simd or float-related instructions on x64 is executed with the baseline support for x86_64 in addition to adding in AVX support. Most of the instructions used have AVX equivalents so this should help test all of the equivalents in addition to the codegen filetests in the x64 folder.	2023-03-20 19:13:14 +00:00
Alex Crichton	f7dda1ab2c	x64: Fix vbroadcastss with AVX2 and without AVX (#6060 ) * x64: Fix vbroadcastss with AVX2 and without AVX This commit fixes a corner case in the emission of the `vbroadcasts{s,d}` instructions. The memory-to-xmm form of these instructions was available with the AVX instruction set, but the xmm-to-xmm form of these instructions wasn't available until AVX2. The instruction requirement for these are listed as AVX but the lowering rules are appropriately annotated to use either AVX2 or AVX when appropriate. While this should work in practice this didn't work for the assertion about enabled features for each instruction. The `vbroadcastss` instruction was listed as requiring AVX but could get emitted when AVX2 was enabled (due to the reg-to-reg form being available). This caused an issue for the fuzzer where AVX2 was enabled but AVX was disabled. One possible fix would be to add more opcodes, one for reg-to-reg and one for mem-to-reg. That seemed like somewhat overkill for a pretty niche situation that shouldn't actually come up in practice anywhere. Instead this commit changes all the `has_avx` accessors to the `use_avx_simd` predicate already available in the target flags. The `use_avx2_simd` predicate was then updated to additionally require `has_avx`, so if AVX2 is enabled and AVX is disabled then the `vbroadcastss` instruction won't get emitted any more. Closes #6059 * Pass `enable_simd` on a few more files	2023-03-18 18:38:03 +00:00

1 2 3 4 5 ...

1526 Commits