wasmtime

Author	SHA1	Message	Date
Afonso Bordado	d394edcefe	x64: Mask shift amounts for small types (#4752 ) * x64: Mask shift amounts for small types * cranelift: Disable i128 shifts in fuzzer again They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed * cranelift: Avoid masking shifts for 32 and 64 bit cases * cranelift: Add const shift tests and fix them * cranelift: Remove const `rotl` cases Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr	2022-08-24 10:31:38 -07:00
Trevor Elliott	4bdfa76370	x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763 ) https://github.com/bytecodealliance/wasmtime/pull/4763	2022-08-23 16:32:00 -07:00
Trevor Elliott	b5f1ab7780	x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741 ) Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE. For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.	2022-08-23 11:22:49 -07:00
Trevor Elliott	cee4b209f3	x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730 ) https://github.com/bytecodealliance/wasmtime/pull/4730	2022-08-22 13:57:36 -07:00
Trevor Elliott	8b6019909b	x64: Lower widening and narrowing operations in ISLE (#4722 ) Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.	2022-08-18 11:53:24 -07:00
Trevor Elliott	fbfceaec98	x64: Migrate iadd_pairwise to ISLE (#4718 ) * Add a test for iadd_pairwise with swiden input * Implement iadd_pairwise for swiden_{low,high} input * Add a test case for iadd_pairwise with uwiden input * Implement iadd_pairwise with uwiden	2022-08-16 12:21:06 -07:00
Nick Fitzgerald	e0d4934ef4	Cranelift: Remove the `ABICaller` trait (#4711 ) * Cranelift: Remove the `ABICaller` trait It has only one implementation: the `ABICallerImpl` struct. We can just use that directly rather than having extra, unnecessary layers of generics and abstractions. * Cranelift: Rename `ABICallerImpl` to `Caller`	2022-08-15 20:41:08 +00:00
Nick Fitzgerald	532fb22af6	Cranelift: Remove the `LowerCtx` trait (#4697 ) The trait had only one implementation: the `Lower` struct. It is easier to just use that directly, and not introduce unnecessary layers of generics and abstractions. Once upon a time, there was hope that we would have other implementations of the `LowerCtx` trait, that did things like lower CLIF to SMTLIB for verification. However, this is not practical these days given the way that the trait has evolved over time, and our verification efforts are focused on ISLE now anyways, and we're actually making some progress on that front (much more than anyone ever did on a second `LowerCtx` trait implementation!)	2022-08-11 16:54:17 -07:00
Afonso Bordado	3ea1813173	x64: Add native lowering for scalar `fma` (#4539 ) Use `vfmadd213{ss,sd}` for these lowerings.	2022-08-11 22:48:16 +00:00
Trevor Elliott	0c2e0494bd	x64: Lower fcvt_from_uint in ISLE (#4684 ) * Add a test for the existing behavior of fcvt_from_unit * Migrate the I8, I16, I32 cases of fcvt_from_uint * Implement the I64 case of fcvt_from_uint * Add a test for the existing behavior of fcvt_from_uint.f64x2 * Migrate fcvt_from_uint.f64x2 to ISLE * Lower the last case of `fcvt_from_uint` * Add a test for `fcvt_from_uint` * Finish lowering fcmp_from_uint * Format	2022-08-11 12:28:41 -07:00
Trevor Elliott	63c2d1e0c3	x64: Remove unnecessary register use when comparing against constants (#4645 ) https://github.com/bytecodealliance/wasmtime/pull/4645	2022-08-09 23:53:51 +00:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Trevor Elliott	301be7438e	x64: Begin migrating branch instructions to ISLE (#4587 ) https://github.com/bytecodealliance/wasmtime/pull/4587	2022-08-03 20:28:52 +00:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Trevor Elliott	9e9e043174	x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518 ) https://github.com/bytecodealliance/wasmtime/pull/4518	2022-07-25 21:28:52 +00:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Andrew Brown	6df56e6aa6	x64: port `atomic_cas` to ISLE (#4223 )	2022-06-06 13:20:33 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	03793b71a7	ISLE: remove all uses of argument polarity, and remove it from the language. (#4091 ) This PR removes "argument polarity": the feature of ISLE extractors that lets them take inputs aside from the value to be matched. Cases that need this expressivity have been subsumed by #4072 with if-let clauses; we can now finally remove this misfeature of the language, which has caused significant confusion and has always felt like a bit of a hack. This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference documentation; and (iii) refactors away all uses of the feature in our three existing backends written in ISLE.	2022-05-02 09:52:12 -07:00
Chris Fallin	164bfeaf7e	x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069 )	2022-04-26 09:50:46 -07:00
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	2022-04-22 18:00:48 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	f62199da8c	x64: port `load` to ISLE (#3993 ) This change moves the majority of the lowerings for CLIF's `load` instruction over to ISLE. To do so, it also migrates the previous mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE rules (see `to_amode`).	2022-04-07 18:31:22 -07:00
Andrew Brown	e92cbfb283	x64: port `icmp` to ISLE (#3886 ) * x64: port GPR-held `icmp` to ISLE * x64: port equality `icmp` for i128 type * x64: port `icmp` for vector types * x64: rename from_intcc to intcc_to_cc	2022-03-18 11:22:09 -07:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Nick Fitzgerald	bb7ae46ecd	ISLE: emit traps as safepoints on x64	2022-02-07 10:01:23 -08:00
Nick Fitzgerald	2c77cf866a	ISLE: Rename `{gpr,xmm}_mem_new` constructors to `reg_mem_to_{gpr,xmm}_mem`	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Ulrich Weigand	a3e2f5c28b	Move emit and emit_safepoint to prelude.isle Even though the implementation of emit and emit_safepoint may be platform-specific, the interface ought to be common so that other code in prelude.isle may safely call these constructors. This patch moves the definition of emit (from all platforms) and emit_safepoint (s390x only) to prelude.isle. This required adding an emit_safepoint implementation to aarch64 and x64 as well - the latter is still a stub as special move mitosis handling will be required.	2022-01-31 22:54:04 +01:00
Ulrich Weigand	906f6a35cf	ISLE: Allow emitting safepoint insns Change the implementation of emitted_insts in IsleContext from a plain vector of instructions into a vector of tuples, where the second element is a boolean that indicates whether this instruction should be emitted as a safepoint. This allows targets to emit safepoint insns via ISLE.	2022-01-25 14:21:41 +01:00
Ulrich Weigand	be60a19623	ISLE standard prelude: Additional types and helpers In preparing to move the s390x back-end to ISLE, I noticed a few missing pieces in the common prelude code. This patch: - Defines the reference types $R32 / $R64. - Provides a trap_code_bad_conversion_to_integer helper. - Provides an avoid_div_traps helper. This requires passing the generic flags in addition to the ISA-specifc flags into the ISLE lowering context.	2022-01-20 17:23:31 +01:00
Nick Fitzgerald	4e34dd8239	cranelift: Port `ushr` SIMD lowerings to ISLE on x64	2022-01-13 14:39:06 -08:00
Nick Fitzgerald	a7dba81c1d	cranelift: Port `ishl` SIMD lowerings to ISLE (#3686 )	2022-01-13 09:34:37 -06:00
Nick Fitzgerald	7454f1f3af	cranelift: port `sshr` to ISLE on x64 (#3681 )	2022-01-12 09:13:58 -06:00
Chris Fallin	fd171ca063	Fix OperandSize: need clamp-to-32-bit behavior in most cases, but true-width for shifts.	2021-12-16 12:32:28 -08:00

1 2

61 Commits