wasmtime

Author	SHA1	Message	Date
Trevor Elliott	d8dbabfe6b	Don't reuse registers in the x64 div lowering (#5356 ) Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.	2022-11-30 22:44:59 +00:00
Alex Crichton	830885383f	Implement inline stack probes for AArch64 (#5353 ) * Turn off probestack by default in Cranelift The probestack feature is not implemented for the aarch64 and s390x backends and currently the on-by-default status requires the aarch64 and s390x implementations to be a stub. Turning off probestack by default allows the s390x and aarch64 backends to panic with an error message to avoid providing a false sense of security. When the probestack option is implemented for all backends, however, it may be reasonable to re-enable. * aarch64: Improve codegen for AMode fallback Currently the final fallback for finalizing an `AMode` will generate both a constant-loading instruction as well as an `add` instruction to the base register into the same temporary. This commit improves the codegen by removing the `add` instruction and folding the final add into the finalized `AMode`. This changes the `extendop` used but both registers are 64-bit so shouldn't be affected by the extending operation. * aarch64: Implement inline stack probes This commit implements inline stack probes for the aarch64 backend in Cranelift. The support here is modeled after the x64 support where unrolled probes are used up to a particular threshold after which a loop is generated. The instructions here are similar in spirit to x64 except that unlike x64 the stack pointer isn't modified during the unrolled loop to avoid needing to re-adjust it back up at the end of the loop. * Enable inline probestack for AArch64 and Riscv64 This commit enables inline probestacks for the AArch64 and Riscv64 architectures in the same manner that x86_64 has it enabled now. Some more testing was additionally added since on Unix platforms we should be guaranteed that Rust's stack overflow message is now printed too. * Enable probestack for aarch64 in cranelift-fuzzgen * Address review comments * Remove implicit stack overflow traps from x64 backend This commit removes implicit `StackOverflow` traps inserted by the x64 backend for stack-based operations. This was historically required when stack overflow was detected with page faults but Wasmtime no longer requires that since it's not suitable for wasm modules which call host functions. Additionally no other backend implements this form of implicit trap-code additions so this is intended to synchronize the behavior of all the backends. This fixes a test added prior for aarch64 to properly abort the process instead of accidentally being caught by Wasmtime. * Fix a style issue	2022-11-30 12:30:00 -06:00
Trevor Elliott	4780bd5902	Don't use %rcx directly with CoffTlsGetAddr (#5278 ) Avoid naming %rcx as written by the CoffTlsGetAddr pseudo-instruction in the x64 backend, and instead emit a fixed-def constraint for a fresh VReg and %rcx.	2022-11-16 11:32:09 -08:00
Trevor Elliott	a007e02bd2	Add fixed_nonallocatable constraints when appropriate (#5253 ) Plumb the set of allocatable registers through the OperandCollector and use it validate uses of fixed-nonallocatable registers, like %rsp on x86_64.	2022-11-15 12:49:17 -08:00
Trevor Elliott	dece901d16	Use regalloc constraints for sse blend operations (#5251 ) Instead of using xmm0 explicitly for the mask argument to instructions like blendvpd, use regalloc constraints to constrain it to xmm0 instead.	2022-11-14 16:44:34 -08:00
Trevor Elliott	0367fbc2d4	cranelift: Rework pinned register lowering (#5249 ) Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.	2022-11-10 16:19:25 -08:00
11evan	4ca9e82bd1	cranelift: Add Bswap instruction (#1092 ) (#5147 ) Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now: ``` builder.ins().bswap(value) ``` to get a native byteswap instruction. * x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register. As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8. Updated x64 RexFlags type to support emitting for single-operand instructions like bswap * aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate. * s390x: Bswap was already supported in backend, just had to add a bit of plumbing * For completeness, added bswap to the interpreter as well. * added filetests and runtests for each ISA * added bswap to fuzzgen, thanks to afonso360 for the code there * 128-bit swaps are not yet implemented, that can be done later	2022-10-31 19:30:00 +00:00
Chris Fallin	2986f6b0ff	ABI: implement register arguments with constraints. (#4858 ) * ABI: implement register arguments with constraints. Currently, Cranelift's ABI code emits a sequence of moves from physical registers into vregs at the top of the function body, one for every register-carried argument. For a number of reasons, we want to move to operand constraints instead, and remove the use of explicitly-named "pinned vregs"; this allows for better regalloc in theory, as it removes the need to "reverse-engineer" the sequence of moves. This PR alters the ABI code so that it generates a single "args" pseudo-instruction as the first instruction in the function body. This pseudo-inst defs all register arguments, and constrains them to the appropriate registers at the def-point. Subsequently the regalloc can move them wherever it needs to. Some care was taken not to have this pseudo-inst show up in post-regalloc disassemblies, but the change did cause a general regalloc "shift" in many tests, so the precise-output updates are a bit noisy. Sorry about that! A subsequent PR will handle the other half of the ABI code, namely, the callsite case, with a similar preg-to-constraint conversion. * Update based on review feedback. * Review feedback.	2022-09-08 18:03:14 -07:00
Andrew Brown	f063082474	x64: remove `Inst::XmmLoadConst` (#4876 ) This is a cherry-pick of a long-ago commit, 2d46637. The original message reads: > Now that `SyntheticAmode` can refer to constants, there is no longer a > need for a separate instruction format--standard load instructions will > work. Since then, the transition to ISLE and the use of `XmmLoadConst` in many more places makes this change a larger diff than the original. The basic idea is the same, though: the extra indirection of `Inst::XMmLoadConst` is removed and replaced by a direct use of `VCodeConstant` as a `SyntheticAmode`. This has no effect on codegen, but the CLIF output is now clearer in that the actual instruction is displayed (e.g., `movdqu`) instead of a made-up instruction (`load_const`).	2022-09-07 12:52:13 -07:00
Chris Fallin	385bd0cbf8	x64: fix CvtFloatToUintSeq: do not clobber src. (#4842 ) This slipped through the regalloc2 operand code update in #4811: the CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It was marked as a "mod" operand in the original and I mistakenly converted it to a "use" as I had not seen the actual clobber. The instruction now takes an extra temp and makes a copy of `src` in the appropriate place. Fixes #4840.	2022-09-01 22:46:57 +00:00
Afonso Bordado	08e7a7f1a0	cranelift: Add inline stack probing for x64 (#4747 ) * cranelift: Add inline stack probe for x64 * cranelift: Cleanups comments Thanks @jameysharp!	2022-09-01 22:32:54 +00:00
Trevor Elliott	fb8b9838fe	Add MInst.XmmUnaryRmRImm to handle rounding instructions (#4823 ) Add a new pseudo-instruction, XmmUnaryRmRImm, to handle instructions like roundss that only use their first register argument for the instruction's result. This has the added benefit of allowing the isle wrappers for those instructions to take an XmmMem argument, allowing for more cases where loads may be merged.	2022-08-31 08:29:32 -07:00
Chris Fallin	186c7c3b89	x64: clean up regalloc-related semantics on several instructions. (#4811 ) * x64: clean up regalloc-related semantics on several instructions. This PR removes all uses of "modify" operands on instructions in the x64 backend, and also removes all uses of "pinned vregs", or vregs that are explicitly tied to particular physical registers. In place of both of these mechanisms, which are legacies of the old regalloc design and supported via compatibility code, the backend now uses operand constraints. This is more flexible as it allows the regalloc to see the liveranges and constraints without "reverse-engineering" move instructions. Eventually, after removing all such uses (including in other backends and by the ABI code), we can remove the compatibility code in regalloc2, significantly simplifying its liverange-construction frontend and thus allowing for higher confidence in correctness as well as possibly a bit more compilation speed. Curiously, there are a few extra move instructions now; they are likely poor splitting decisions and I can try to chase these down later. * Fix cranelift-codegen tests. * Review feedback.	2022-08-30 17:21:14 -07:00
Trevor Elliott	25d960f9c4	x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793 ) Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.	2022-08-26 16:33:48 -07:00
Afonso Bordado	3ea1813173	x64: Add native lowering for scalar `fma` (#4539 ) Use `vfmadd213{ss,sd}` for these lowerings.	2022-08-11 22:48:16 +00:00
Afonso Bordado	c5bc368cfe	cranelift: Add COFF TLS Support (#4546 ) * cranelift: Implement COFF TLS Relocations * cranelift: Emit SecRel relocations * cranelift: Handle _tls_index symbol in backend	2022-08-11 09:33:40 -07:00
Trevor Elliott	a25d52046b	x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650 ) https://github.com/bytecodealliance/wasmtime/pull/4650	2022-08-10 10:49:02 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Chris Fallin	5c2c285dd7	Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332 ) `idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit divisor from `ax`, as our CISC ancestors intended. This PR fixes the metadata to avoid a regalloc panic (due to undefined `rdx`) in this case. Does not affect Wasmtime or other Wasm-frontend embedders.	2022-06-27 12:31:06 -07:00
Alex Crichton	8bb07523e2	x64: Fix codegen for the `select` instruction with v128 (#4317 ) This commit fixes a bug in the previous codegen for the `select` instruction when the operations of the `select` were of the `v128` type. Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32 or 64 for a 64 or 32-bit move, but this was also used for these 128-bit types which meant that when used the wrong move instruction was generated. The fix applied here is to store the whole `Type` being moved so the 128-bit variant can be selected as well.	2022-06-27 11:02:40 -07:00
Benjamin Bouvier	6e828df632	Remove unused `SourceLoc` in many `Mach` data structures (#4180 ) * Remove unused srcloc in MachReloc * Remove unused srcloc in MachTrap * Use `into_iter` on array in bench code to suppress a warning * Remove unused srcloc in MachCallSite	2022-05-23 09:27:28 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	61dc38c065	Implement Spectre mitigations for table accesses and br_tables. (#4092 ) Currently, we have partial Spectre mitigation: we protect heap accesses with dynamic bounds checks. Specifically, we guard against errant accesses on the misspeculated path beyond the bounds-check conditional branch by adding a conditional move that is also dependent on the bounds-check condition. This data dependency on the condition is not speculated and thus will always pick the "safe" value (in the heap case, a NULL address) on the misspeculated path, until the pipeline flushes and recovers onto the correct path. This PR uses the same technique both for table accesses -- used to implement Wasm tables -- and for jumptables, used to implement Wasm `br_table` instructions. In the case of Wasm tables, the cmove picks the table base address on the misspeculated path. This is equivalent to reading the first table entry. This prevents loads of arbitrary data addresses on the misspeculated path. In the case of `br_table`, the cmove picks index 0 on the misspeculated path. This is safer than allowing a branch to an address loaded from an index under misspeculation (i.e., it preserves control-flow integrity even under misspeculation). The table mitigation is controlled by a Cranelift setting, on by default. The br_table mitigation is always on, because it is part of the single lowering pseudoinstruction. In both cases, the impact should be minimal: a single extra cmove in a (relatively) rarely-used operation. The table mitigation is architecture-independent (happens during legalization); the br_table mitigation has been implemented for both x64 and aarch64. (I don't know enough about s390x to implement this confidently there, but would happily review a PR to do the same on that platform.)	2022-05-02 11:19:16 -07:00
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	2022-04-26 18:58:26 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Chris Fallin	cd173cfe8e	ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856 )	2022-02-28 14:40:26 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
bjorn3	17c3c1813f	Remove MachInstEmitInfo	2022-01-04 18:06:01 +01:00
Nick Fitzgerald	d377b665c6	Initial ISLE integration with the x64 backend On the build side, this commit introduces two things: 1. The automatic generation of various ISLE definitions for working with CLIF. Specifically, it generates extern type definitions for clif opcodes and the clif instruction data `enum`, as well as extractors for matching each clif instructions. This happens inside the `cranelift-codegen-meta` crate. 2. The compilation of ISLE DSL sources to Rust code, that can be included in the main `cranelift-codegen` compilation. Next, this commit introduces the integration glue code required to get ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif instruction, we first try to use the ISLE code path. If it succeeds, then we are done lowering this instruction. If it fails, then we proceed along the existing hand-written code path for lowering. Finally, this commit ports many lowering rules over from hand-written, open-coded Rust to ISLE. In the process of supporting ISLE, this commit also makes the x64 `Inst` capable of expressing SSA by supporting 3-operand forms for all of the existing instructions that only have a 2-operand form encoding: dst = src1 op src2 Rather than only the typical x86-64 2-operand form: dst = dst op src This allows `MachInst` to be in SSA form, since `dst` and `src1` are disentangled. ("3-operand" and "2-operand" are a little bit of a misnomer since not all operations are binary operations, but we do the same thing for, e.g., unary operations by disentangling the sole operand from the result.) There are two motivations for this change: 1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE lowering to translate a CLIF expression that evaluates to some value into a `MachInst` expression that evaluates to the same value. We want both the lowering itself and the resulting `MachInst` to be pure and referentially transparent. This is both a nice paradigm for compiler writers that are authoring and maintaining lowering rules and is a prerequisite to any sort of formal verification of our lowering rules in the future. 2. Better align `MachInst` with `regalloc2`'s API, which requires that the input be in SSA form.	2021-10-12 17:11:58 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Johnnie Birch	500f530322	Add support for i32x4_trunc_sat_f64x2_s for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	23290f0450	Add support for i32x4_trunc_sat_f64x2_u for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	5deda27977	Add support for Saturating Rounding Q-format Multiplication for x64	2021-07-26 20:32:46 -07:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
Johnnie Birch	6fbe0b72bd	Add simd_extmul_* support for x64	2021-07-15 01:07:52 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Andrew Brown	8dc4cc9fe3	x64: fix AVX512 flag checks Previously, the multiple flags for certain AVX512 instructions were checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ, emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel Software Developer Manual, vol. 1, sec. 15.4, has more information on this (notable there is the suggestion to check with `XGETBV` that the OS is allowing the use of the XMM registers--but that is a separate issue). This change switches to `AND` logic in the new backend.	2021-06-01 11:41:16 -07:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00

1 2 3 4

185 Commits