wasmtime

Author	SHA1	Message	Date
Alex Crichton	83f21e784a	x64: Add more support for more AVX instructions (#5931 ) * x64: Add a smattering of lowerings for `shuffle` specializations (#5930) * x64: Add lowerings for `punpck{h,l}wd` Add some special cases for `shuffle` for more specialized x86 instructions. * x64: Add `shuffle` lowerings for `pshufd` This commit adds special-cased lowerings for the x64 `shuffle` instruction when the `pshufd` instruction alone is necessary. This is possible when the shuffle immediate permutes 32-bit values within one of the vector inputs of the `shuffle` instruction, but not both. * x64: Add shuffle lowerings for `punpck{h,l}{q,}dq` This adds specific permutations for some x86 instructions which specifically interleave high/low bytes for 32 and 64-bit values. This corresponds to the preexisting specific lowerings for interleaving 8 and 16-bit values. * x64: Add `shuffle` lowerings for `shufps` This commit adds targeted lowerings for the `shuffle` instruction that match the pattern that `shufps` supports. The `shufps` instruction selects two elements from the first vector and two elements from the second vector which means while it's not generally applicable it should still be more useful than the catch-all lowering of `shuffle`. * x64: Add shuffle support for `pshuf{l,h}w` This commit adds special lowering cases for these instructions which permute 16-bit values within a 128-bit value either within the upper or lower half of the 128-bit value. * x64: Specialize `shuffle` with an all-zeros immediate Instead of loading the all-zeros immediate from a rip-relative address at the end of the function instead generate a zero with a `pxor` instruction and then use `pshufb` to do the broadcast. * Review comments * x64: Add an AVX encoding for the `pshufd` instruction This will benefit from lack of need for alignment vs the `pshufd` instruction if working with a memory operand and additionally, as I've just learned, this reduces dependencies between instructions because the `v` instructions zero the upper bits as opposed to preserving them which could accidentally create false dependencies in the CPU between instructions. x64: Add more support for AVX loads/stores This commit adds VEX-encoded versions of instructions such as `mov{ss,sd,upd,ups,dqu}` for load and store operations. This also changes some signatures so the `load` helpers specifically take a `SyntheticAmode` argument which ended up doing a small refactoring of the `_regmove` variant used for `insertlane 0` into f64x2 vectors. x64: Enable using AVX instructions for zero regs This commit refactors the internal ISLE helpers for creating zero'd xmm registers to leverage the AVX support for all other instructions. This moves away from picking opcodes to instead picking instructions with a bit of reorganization. * x64: Remove `XmmConstOp` as an instruction All existing users can be replaced with usage of the `xmm_uninit_value` helper instruction so there's no longer any need for these otherwise constant operations. This additionally reduces manual usage of opcodes in favor of instruction helpers. * Review comments * Update test expectations	2023-03-09 23:57:42 +00:00
Alex Crichton	5dc2bbccbb	Merge pull request from GHSA-xm67-587q-r2vw This commit fixes an off-by-one error in the subtraction of indices when shuffling a vector with itself. Lanes 16-and-above are mapped to select from the first vector since the first and second element are the same, but the subtraction was with 15 rather than 16 by accident.	2023-03-08 13:00:00 -06:00
Alex Crichton	c4a2c1e818	clif: Remove the type variable from `swizzle` (#5897 ) This instruction is only defined with i8x16 inputs and outputs so there's no need for a type variable, so shadow the otherwise-generic `a` result with a concrete i8x16 type.	2023-03-01 00:38:53 +00:00
Alex Crichton	f2dce812c3	x64: Sink constant loads into xmm instructions (#5880 ) A number of places in the x64 backend make use of 128-bit constants for various wasm SIMD-related instructions although most of them currently use the `x64_xmm_load_const` helper to load the constant into a register. Almost all xmm instructions, however, enable using a memory operand which means that these loads can be folded into instructions to help reduce register pressure. Automatic conversions were added for a `VCodeConstant` into an `XmmMem` value and then explicit loads were all removed in favor of forwarding the `XmmMem` value directly to the underlying instruction. Note that some instances of `x64_xmm_load_const` remain since they're used in contexts where load sinking won't work (e.g. they're the first operand, not the second for non-commutative instructions).	2023-02-27 22:02:42 +00:00
Trevor Elliott	cc073593a4	Fix block label printing in precise-output tests (#5798 ) As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-16 02:35:26 +00:00
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2023-02-15 16:35:10 -08:00
Trevor Elliott	b077854b57	Generate SSA code from returns (#5172 ) Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.	2022-11-08 16:00:49 -08:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Andrew Brown	f063082474	x64: remove `Inst::XmmLoadConst` (#4876 ) This is a cherry-pick of a long-ago commit, 2d46637. The original message reads: > Now that `SyntheticAmode` can refer to constants, there is no longer a > need for a separate instruction format--standard load instructions will > work. Since then, the transition to ISLE and the use of `XmmLoadConst` in many more places makes this change a larger diff than the original. The basic idea is the same, though: the extra indirection of `Inst::XMmLoadConst` is removed and replaced by a direct use of `VCodeConstant` as a `SyntheticAmode`. This has no effect on codegen, but the CLIF output is now clearer in that the actual instruction is displayed (e.g., `movdqu`) instead of a made-up instruction (`load_const`).	2022-09-07 12:52:13 -07:00
Trevor Elliott	9386409607	x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780 ) Lower extractlane, scalar_to_vector and splat in ISLE. This PR also makes some changes to the SinkableLoad api * change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem * add reg_mem_to_reg_mem_imm and register it as an automatic conversion	2022-08-25 09:38:03 -07:00
Trevor Elliott	b8b6f2781e	x64: Lower shuffle and swizzle in ISLE (#4772 ) Lower `shuffle` and `swizzle` in ISLE. This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result. I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: `94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)`	2022-08-24 21:49:51 +00:00
Trevor Elliott	4bdfa76370	x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763 ) https://github.com/bytecodealliance/wasmtime/pull/4763	2022-08-23 16:32:00 -07:00
Chris Fallin	019ebf47b1	x64: fix pretty-printing argument order for XmmRmR instructions. (#4094 ) The pretty-printing had swapped dst and src2; this was introduced when we moved to RA2 (sorry about that! IMHO we should do something to automate the mapping between regalloc arg collection and pretty printing/emission). `src2` comes at the end because it has a variable number of register mentions; this is in line with how many of the other inst formats work. Actual emitted code was never incorrect, just the pretty-printing. Updated test golden outputs look correct to me now, including the one that we saw was incorrect in #3945.	2022-05-03 10:12:58 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Alex Crichton	1ef0abb12c	Update lots of `isa//.clif` tests to `precise-output` (#3677 ) * Update lots of `isa//.clif` tests to `precise-output` This commit goes through the `aarch64` and `x64` subdirectories and subjectively changes tests from `test compile` to add `precise-output`. This then auto-updates all the test expectations so they can be automatically instead of manually updated in the future. Not all tests were migrated, largely subject to the whims of myself, mainly looking to see if the test was looking for specific instructions or just checking the whole assembly output. * Filter out `;;` comments from test expctations Looks like the cranelift parser picks up all comments, not just those trailing the function, so use a convention where `;;` is used for human-readable-comments in test cases and `;`-prefixed comments are the test expectation.	2022-01-10 13:38:23 -06:00
bjorn3	9e34df33b9	Remove the old x86 backend	2021-09-29 16:13:46 +02:00
Chris Fallin	cb48ea406e	Switch default to new x86_64 backend. This PR switches the default backend on x86, for both the `cranelift-codegen` crate and for Wasmtime, to the new (`MachInst`-style, `VCode`-based) backend that has been under development and testing for some time now. The old backend is still available by default in builds with the `old-x86-backend` feature, or by requesting `BackendVariant::Legacy` from the appropriate APIs. As part of that switch, it adds some more runtime-configurable plumbing to the testing infrastructure so that tests can be run using the appropriate backend. `clif-util test` is now capable of parsing a backend selector option from filetests and instantiating the correct backend. CI has been updated so that the old x86 backend continues to run its tests, just as we used to run the new x64 backend separately. At some point, we will remove the old x86 backend entirely, once we are satisfied that the new backend has not caused any unforeseen issues and we do not need to revert.	2021-04-02 11:35:53 -07:00
Andrew Brown	bb2dd5b68b	[machinst x64]: implement load*_zero for x64	2021-01-08 16:21:57 -08:00
Chris Fallin	1dddba649a	x64 regalloc register order: put caller-saves (volatiles) first. The x64 backend currently builds the `RealRegUniverse` in a way that is generating somewhat suboptimal code. In many blocks, we see uses of callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in very short leaf functions where there are plenty of volatiles to use. This is leading to unnecessary spills/reloads. On one (local) test program, a medium-sized C benchmark compiled to Wasm and run on Wasmtime, I am seeing a ~10% performance improvement with this change; it will be less pronounced in programs with high register pressure (there we are likely to use all registers regardless, so the prologue/epilogue will save/restore all callee-saves), or in programs with fewer calls, but this is a clear win for small functions and in many cases removes prologue/epilogue clobber-saves altogether. Separately, I think the RA's coalescing is tripping up a bit in some cases; see e.g. the filetest touched by this commit that loads a value into %rsi then moves to %rax and returns immediately. This is an orthogonal issue, though, and should be addressed (if worthwhile) in regalloc.rs.	2020-12-06 22:37:43 -08:00
Andrew Brown	83f182b390	Implement initial emission of constants This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.	2020-11-05 14:25:02 -08:00
Benjamin Bouvier	e32e6fb612	machinst x64: check SSE requirements for instructions against enabled features;	2020-10-08 09:21:51 +02:00
Andrew Brown	16a2538ecd	[machinst x64]: rename Inst::XmmUninitializedValue and document This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.	2020-10-02 08:29:31 -07:00
Andrew Brown	3d9f3bf728	[machinst x64]: port CLIF tests related to comparison and lane operations	2020-10-02 08:29:31 -07:00
Andrew Brown	b7217d454f	[machinst x64]: add lane-related CLIF filetests	2020-09-29 08:45:12 -07:00

25 Commits