wasmtime

Author	SHA1	Message	Date
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Alex Crichton	1141169ff8	aarch64: Initial work to transition backend to ISLE (#3541 ) * aarch64: Initial work to transition backend to ISLE This commit is what is hoped to be the initial commit towards migrating the aarch64 backend to ISLE. There's seemingly a lot of changes here but it's intended to largely be code motion. The current thinking is to closely follow the x64 backend for how all this is handled and organized. Major changes in this PR are: * The `Inst` enum is now defined in ISLE. This avoids having to define it in two places (once in Rust and once in ISLE). I've preserved all the comments in the ISLE and otherwise this isn't actually a functional change from the Rust perspective, it's still the same enum according to Rust. * Lots of little enums and things were moved to ISLE as well. As with `Inst` their definitions didn't change, only where they're defined. This will give future ISLE PRs access to all these operations. * Initial code for lowering `iconst`, `null`, and `bconst` are implemented. Ironically none of this is actually used right now because constant lowering is handled in `put_input_in_regs` which specially handles constants. Nonetheless I wanted to get at least something simple working which shows off how to special case various things that are specific to AArch64. In a future PR I plan to hook up const-lowering in ISLE to this path so even though `iconst`-the-clif-instruction is never lowered this should use the const lowering defined in ISLE rather than elsewhere in the backend (eventually leading to the deletion of the non-ISLE lowering). * The `IsleContext` skeleton is created and set up for future additions. * Some code for ISLE that's shared across all backends now lives in `isle_prelude_methods!()` and is deduplicated between the AArch64 backend and the x64 backend. * Register mapping is tweaked to do the same thing for AArch64 that it does for x64. Namely mapping virtual registers is supported instead of just virtual to machine registers. My main goal with this PR was to get AArch64 into a place where new instructions can be added with relative ease. Additionally I'm hoping to figure out as part of this change how much to share for ISLE between AArch64 and x64 (and other backends). * Don't use priorities with rules * Update .gitattributes with concise syntax * Deduplicate some type definitions * Rebuild ISLE * Move isa::isle to machinst::isle	2021-11-18 10:38:16 -06:00
Alex Crichton	92394566fc	x64: Migrate `fabs` and `bnot` vector operations to ISLE This was my first attempt at transitioning code to ISLE to originally fix #3327 but that fix has since landed on `main`, so this is instead now just porting a few operations to ISLE. Closes #3336	2021-11-16 07:36:49 -08:00
Nick Fitzgerald	d377b665c6	Initial ISLE integration with the x64 backend On the build side, this commit introduces two things: 1. The automatic generation of various ISLE definitions for working with CLIF. Specifically, it generates extern type definitions for clif opcodes and the clif instruction data `enum`, as well as extractors for matching each clif instructions. This happens inside the `cranelift-codegen-meta` crate. 2. The compilation of ISLE DSL sources to Rust code, that can be included in the main `cranelift-codegen` compilation. Next, this commit introduces the integration glue code required to get ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif instruction, we first try to use the ISLE code path. If it succeeds, then we are done lowering this instruction. If it fails, then we proceed along the existing hand-written code path for lowering. Finally, this commit ports many lowering rules over from hand-written, open-coded Rust to ISLE. In the process of supporting ISLE, this commit also makes the x64 `Inst` capable of expressing SSA by supporting 3-operand forms for all of the existing instructions that only have a 2-operand form encoding: dst = src1 op src2 Rather than only the typical x86-64 2-operand form: dst = dst op src This allows `MachInst` to be in SSA form, since `dst` and `src1` are disentangled. ("3-operand" and "2-operand" are a little bit of a misnomer since not all operations are binary operations, but we do the same thing for, e.g., unary operations by disentangling the sole operand from the result.) There are two motivations for this change: 1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE lowering to translate a CLIF expression that evaluates to some value into a `MachInst` expression that evaluates to the same value. We want both the lowering itself and the resulting `MachInst` to be pure and referentially transparent. This is both a nice paradigm for compiler writers that are authoring and maintaining lowering rules and is a prerequisite to any sort of formal verification of our lowering rules in the future. 2. Better align `MachInst` with `regalloc2`'s API, which requires that the input be in SSA form.	2021-10-12 17:11:58 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Johnnie Birch	500f530322	Add support for i32x4_trunc_sat_f64x2_s for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	23290f0450	Add support for i32x4_trunc_sat_f64x2_u for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	5deda27977	Add support for Saturating Rounding Q-format Multiplication for x64	2021-07-26 20:32:46 -07:00
Johnnie Birch	6fbe0b72bd	Add simd_extmul_* support for x64	2021-07-15 01:07:52 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Chris Fallin	95559c01aa	Merge pull request from GHSA-hpqh-2wqx-7qp5 Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.	2021-05-21 12:01:55 -07:00
Andrew Brown	54b45d28a3	x64: lower fcvt_from_uint to VCVTUDQ2PS when possible When AVX512VL and AVX512F are available, use a single instruction (`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This optimization is a port from the legacy x86 backend.	2021-05-19 12:20:11 -07:00
Chris Fallin	a1c9b06cea	Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Previously, the x64 backend's ABI code would generate a sign-extending load when loading a less-than-64-bit integer from a spillslot. This is incorrect: e.g., for i32s > 0x80000000, this would result in all high bits set. This interacts poorly with another optimization. Normally, the invariant is that the high bits of a register holding a value of a certain type, beyond that type's bits, are undefined. However, as an optimization, we recognize and use the fact that on x86-64, 32-bit instructions zero the upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op (turning it into just a move, which can then sometimes disappear entirely due to register coalescing). If a spill and reload happen between the production of a 32-bit value from an instruction known to zero the upper bits and its use, then we will rely on zero upper bits that might actually be set by a sign-extend. This will result in incorrect execution. As a fix, we stick to a simple invariant: we always spill and reload a full 64 bits when handling integer registers on x64. This ensures that no bits are mangled.	2021-05-19 12:19:19 -07:00
Andrew Brown	7ef3ae2903	x64: implement vselect with variable blend instructions This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`, and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used by [simple_preopt.rs](`fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999)`) to lower `bitselect` to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than `bitselect` in general, which lowers to 4-5 instructions. The old backend had the `vselect` lowering; this simply introduces it to the new backend.	2021-05-17 11:23:33 -07:00
Andrew Brown	e676589b0c	x64: lower i64x2.imul to VPMULLQ when possible This adds the machinery to encode the VPMULLQ instruction which is available in AVX512VL and AVX512DQ. When these feature sets are available, we use this instruction instead of a lengthy 12-instruction sequence.	2021-05-13 20:14:05 -07:00
Andrew Brown	0acc1451ea	x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819 ) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink	2021-04-15 11:53:58 -07:00
Andrew Brown	8e495ac79d	x64: match multiple ISA requirements before emitting Because there are instructions that are present in more than one ISA feature set, we need to see if any of the ISA requirements match before emitting. This change includes the `VPABSQ` instruction as an example, which is present in both `AVX512F` and `AVX512VL`.	2021-04-08 10:30:39 -07:00
Johnnie Birch	31d3db1ec2	Implements convert low signed integer to float for x64 simd	2021-03-26 12:13:29 -07:00
Alex Crichton	3f694ae319	Use stable Rust on CI to test the x64 backend (#2766 ) * Use stable Rust on CI to test the x64 backend This commit leverages the newly-released 1.51.0 compiler to test the new backend on Windows and Linux with a stable compiler instead of a nightly compiler. This isolates the nightly build to just the nightly documentation generation and fuzzing, both of which rely on nightly for the best results right now. * Use updated stable in book build job * Run rustfmt for new stable * Silence new warnings for wasi-nn * Allow some dead code in the x64 backend Looks like new rustc is better about emitting some dead-code warnings * Update rust in peepmatic job * Fix a test in the pooling allocator * Remove `package.metdata.docs.rs` temporarily Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first * Fix a warning in a wasi-nn example	2021-03-25 13:18:59 -05:00
Chris Fallin	e41d882144	Merge pull request #2678 from cfallin/x64-fastcall x86-64 Windows fastcall ABI support.	2021-03-05 10:46:47 -08:00
Chris Fallin	6c94eb82aa	x86-64 Windows fastcall ABI support. This adds support for the "fastcall" ABI, which is the native C/C++ ABI on Windows platforms on x86-64. It is similar to but not exactly like System V; primarily, its argument register assignments are different, and it requires stack shadow space. Note that this also adjusts the handling of multi-register values in the shared ABI implementation, and with this change, adjusts handling of `i128`s on both Fastcall/x64 and SysV/x64 platforms. This was done to align with actual behavior by the "rustc ABI" on both platforms, as mapped out experimentally (Compiler Explorer link in comments). This behavior is gated under the `enable_llvm_abi_extensions` flag. Note also that this does not add x64 unwind info on Windows. That will come in a future PR (but is planned!).	2021-03-03 19:53:18 -08:00
Andrew Brown	508f8fa5a9	[x64] Add i64x2.abs This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.	2021-03-02 12:30:02 -08:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	b12d41bfe9	Expand x64 OperandSize to support 8 and 16-bit operands. This is in preparation for refactoring all x64::Inst arms to use OperandSize. Current uses of OperandSize fall into two categories: 1. XMM operations which require 32/64 bit operands 2. Immediates which only care about 64-bit or not. Adds assertions to existing Inst constructors to check that they are passed valid sizes. This change also removes the implicit widening of 1 and 2 byte values to 4 bytes. from_bytes() is only used by category 2, so removing this behavior will not change any visible behavior. Overall this change should be a no-op.	2021-02-03 10:40:11 -08:00
Benjamin Bouvier	13027ad670	cranelift x64: add instruction set checks for popcnt/tzcnt/lzcnt;	2021-01-30 13:38:55 +01:00
Benjamin Bouvier	2275519cb1	cranelift x64: use the POPCNT instruction for Popcount when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	6bf6612d96	cranelift x64: use the TZCNT instruction for Ctz when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	d3acd9a283	cranelift x64: use the LZCNT instruction for Clz when it's available;	2021-01-29 19:41:01 +01:00
Chris Fallin	71ead6e31d	x64 backend: implement 128-bit ops and misc fixes. This implements all of the ops on I128 that are implemented by the legacy x86 backend, and includes all that are required by at least one major use-case (cg_clif rustc backend). The sequences are open-coded where necessary; for e.g. the bit operations, this can be somewhat complex, but these sequences have been tested carefully. This PR also includes a drive-by fix of clz/ctz for 8- and 16-bit cases where they were incorrect previously. Also includes ridealong fixes developed while bringing up cg_clif support, because they are difficult to completely separate due to other refactors that occurred in this PR: - fix REX prefix logic for some 8-bit instructions. When using an 8-bit register in 64-bit mode on x86-64, the REX prefix semantics are somewhat subtle: without the REX prefix, register numbers 4--7 correspond to the second-to-lowest byte of the first four registers (AH, CH, BH, DH), whereas with the REX prefix, these register numbers correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always emit a REX byte for instructions with 8-bit cases (this is harmless even if unneeded), but this would unnecessarily inflate code size; instead, the usual approach is to emit it only for these registers. This logic was present in some cases but missing for some other instructions: divide, not, negate, shifts. Fixes #2508. - avoid unaligned SSE loads on some f64 ops. The implementations of several FP ops, such as fabs/fneg, used SSE instructions. This is not a problem per-se, except that load-op merging did not take alignment into account. Specifically, if an op on an f64 loaded from memory happened to merge that load, and the instruction into which it was merged was an SSE instruction, then the SSE instruction imposes stricter (128-bit) alignment requirements than the load.f64 did. This PR simply forces any instruction lowerings that could use SSE instructions to implement non-SIMD operations to take inputs in registers only, and avoid load-op merging. Fixes #2507. - two bugfixes exposed by cg_clif: urem/srem.i8, select.b1. - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places the remainder in AH, not RDX, different from all the other width-forms of this instruction. - select.b1: we were not recognizing selects of boolean values as integer-typed operations, so we were generating XMM moves instead (!).	2021-01-14 13:45:50 -08:00
Andrew Brown	09a5b91b9d	x64: make several structures debuggable	2021-01-08 16:21:57 -08:00
Chris Fallin	dbd2241b60	x64: handle tests of b1 values correctly (only LSB is defined). Previously, `select` and `brz`/`brnz` instructions, when given a `b1` boolean argument, would test whether that boolean argument was nonzero, rather than whether its LSB was nonzero. Since our invariant for mapping CLIF state to machine state is that bits beyond the width of a value are undefined, the proper lowering is to test only the LSB. (aarch64 does not have the same issue because its `Extend` pseudoinst already properly handles masking of b1 values when a zero-extend is requested, as it is for select/brz/brnz.) Found by Nathan Ringo on Zulip [1] (thanks!). [1] https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s	2021-01-05 14:45:46 -08:00
Yury Delendik	2964023a77	[SIMD][x86_64] Add encoding for PMADDWD (#2530 ) * [SIMD][x86_64] Add encoding for PMADDWD * also for "experimental_x64"	2020-12-24 07:52:50 -06:00
Johnnie Birch	a548516f97	Enable SIMD spec tests for f32x4_rounding and f64x4_rounding. Also address some review comments pointing out minor issues.	2020-12-02 13:44:51 -08:00
Johnnie Birch	a33e755cb2	Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest	2020-12-02 13:44:51 -08:00
Johnnie Birch	2cc501427e	Add remaining X86_64 support for pack w/ signed/unsigned saturation Adds lowering for packssdw, packusdw, packuswb	2020-11-22 23:14:29 -08:00
Johnnie Birch	124096735b	Add support for palignr for X86_64 vcode backend	2020-11-22 22:14:02 -08:00
Johnnie Birch	615a575da1	Add support for x86_64 packed move lowering for the vcode backend	2020-11-22 20:23:00 -08:00
Chris Fallin	073c727a74	x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap. This end result was previously enacted by carrying a `SourceLoc` on every load/store, which was somewhat cumbersome, and only indirectly encoded metadata about a memory reference (can it trap) by its presence or absence. We have a type for this -- `MemFlags` -- that tells us everything we might want to know about a load or store, and we should plumb it through to code emission instead. This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load and store `Inst` variants on aarch64. These two choices seem to factor things out in the nicest way: there are relatively few load/store insts on aarch64 but many addressing modes, while the opposite is true on x64.	2020-11-17 11:43:06 -08:00
Andrew Brown	8ba92853be	[machinst x64]: add punpack[hl]bw instructions	2020-11-12 14:21:45 -08:00
Andrew Brown	8131b15921	[machinst x64]: allow addressing of constants	2020-11-12 14:21:45 -08:00
Andrew Brown	6725b6b129	[machinst x64]: implement bitmask	2020-10-28 15:16:36 -07:00

1 2 3

109 Commits