wasmtime

Author	SHA1	Message	Date
Trevor Elliott	25d960f9c4	x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793 ) Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.	2022-08-26 16:33:48 -07:00
Chris Fallin	8e8dfdf5f9	AArch64: Migrate calls and returns to ISLE. (#4788 )	2022-08-26 16:26:39 -07:00
Trevor Elliott	ca6d648e37	x64: Ensure that constants are always 16 bytes for XmmMem (#4790 ) Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load. Fixes #4761	2022-08-26 20:04:38 +00:00
Trevor Elliott	9386409607	x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780 ) Lower extractlane, scalar_to_vector and splat in ISLE. This PR also makes some changes to the SinkableLoad api * change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem * add reg_mem_to_reg_mem_imm and register it as an automatic conversion	2022-08-25 09:38:03 -07:00
Trevor Elliott	b8b6f2781e	x64: Lower shuffle and swizzle in ISLE (#4772 ) Lower `shuffle` and `swizzle` in ISLE. This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result. I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: `94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)`	2022-08-24 21:49:51 +00:00
Afonso Bordado	d394edcefe	x64: Mask shift amounts for small types (#4752 ) * x64: Mask shift amounts for small types * cranelift: Disable i128 shifts in fuzzer again They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed * cranelift: Avoid masking shifts for 32 and 64 bit cases * cranelift: Add const shift tests and fix them * cranelift: Remove const `rotl` cases Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr	2022-08-24 10:31:38 -07:00
Trevor Elliott	4bdfa76370	x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763 ) https://github.com/bytecodealliance/wasmtime/pull/4763	2022-08-23 16:32:00 -07:00
Trevor Elliott	b5f1ab7780	x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741 ) Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE. For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.	2022-08-23 11:22:49 -07:00
Trevor Elliott	cee4b209f3	x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730 ) https://github.com/bytecodealliance/wasmtime/pull/4730	2022-08-22 13:57:36 -07:00
Trevor Elliott	8b6019909b	x64: Lower widening and narrowing operations in ISLE (#4722 ) Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.	2022-08-18 11:53:24 -07:00
Trevor Elliott	fbfceaec98	x64: Migrate iadd_pairwise to ISLE (#4718 ) * Add a test for iadd_pairwise with swiden input * Implement iadd_pairwise for swiden_{low,high} input * Add a test case for iadd_pairwise with uwiden input * Implement iadd_pairwise with uwiden	2022-08-16 12:21:06 -07:00
Nick Fitzgerald	e0d4934ef4	Cranelift: Remove the `ABICaller` trait (#4711 ) * Cranelift: Remove the `ABICaller` trait It has only one implementation: the `ABICallerImpl` struct. We can just use that directly rather than having extra, unnecessary layers of generics and abstractions. * Cranelift: Rename `ABICallerImpl` to `Caller`	2022-08-15 20:41:08 +00:00
Nick Fitzgerald	532fb22af6	Cranelift: Remove the `LowerCtx` trait (#4697 ) The trait had only one implementation: the `Lower` struct. It is easier to just use that directly, and not introduce unnecessary layers of generics and abstractions. Once upon a time, there was hope that we would have other implementations of the `LowerCtx` trait, that did things like lower CLIF to SMTLIB for verification. However, this is not practical these days given the way that the trait has evolved over time, and our verification efforts are focused on ISLE now anyways, and we're actually making some progress on that front (much more than anyone ever did on a second `LowerCtx` trait implementation!)	2022-08-11 16:54:17 -07:00
Afonso Bordado	3ea1813173	x64: Add native lowering for scalar `fma` (#4539 ) Use `vfmadd213{ss,sd}` for these lowerings.	2022-08-11 22:48:16 +00:00
Trevor Elliott	0c2e0494bd	x64: Lower fcvt_from_uint in ISLE (#4684 ) * Add a test for the existing behavior of fcvt_from_unit * Migrate the I8, I16, I32 cases of fcvt_from_uint * Implement the I64 case of fcvt_from_uint * Add a test for the existing behavior of fcvt_from_uint.f64x2 * Migrate fcvt_from_uint.f64x2 to ISLE * Lower the last case of `fcvt_from_uint` * Add a test for `fcvt_from_uint` * Finish lowering fcmp_from_uint * Format	2022-08-11 12:28:41 -07:00
Trevor Elliott	63c2d1e0c3	x64: Remove unnecessary register use when comparing against constants (#4645 ) https://github.com/bytecodealliance/wasmtime/pull/4645	2022-08-09 23:53:51 +00:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Trevor Elliott	301be7438e	x64: Begin migrating branch instructions to ISLE (#4587 ) https://github.com/bytecodealliance/wasmtime/pull/4587	2022-08-03 20:28:52 +00:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Trevor Elliott	9e9e043174	x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518 ) https://github.com/bytecodealliance/wasmtime/pull/4518	2022-07-25 21:28:52 +00:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Andrew Brown	6df56e6aa6	x64: port `atomic_cas` to ISLE (#4223 )	2022-06-06 13:20:33 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00
Chris Fallin	5d671952ee	Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143 ) This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to invoke the ISLE DSL compiler on every compilation, rather than only when the user specifies a special "rebuild ISLE" feature. The main benefit of this change is that it vastly simplifies the mental model required of developers, and removes a bunch of failure modes we have tried to work around in other ways. There is now just one "source of truth", the ISLE source itself, in the repository, and so there is no need to understand a special "rebuild" step and how to handle merge errors. There is no special process needed to develop the compiler when modifying the DSL. And there is no "noise" in the git history produced by constantly-regenerated files. The two main downsides we discussed in #4066 are: - Compile time could increase, by adding more to the "meta" step before the main build; - It becomes less obvious where the source definitions are (everything becomes more "magic"), which makes exploration and debugging harder. This PR addresses each of these concerns: 1. To maintain reasonable compile time, it includes work to cut down the dependencies of the `cranelift-isle` crate to nothing (only the Rust stdlib), in the default build. It does this by putting the error-reporting bits (`miette` crate) under an optional feature, and the logging (`log` crate) under a feature-controlled macro, and manually writing an `Error` impl rather than using `thiserror`. This completely avoids proc macros and the `syn` build slowness. The user can still get nice errors out of `miette`: this is enabled by specifying a Cargo feature `--features isle-errors`. 2. To allow the user to optionally inspect the generated source, which nominally lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree` that, as implied by the name, moves the target for ISLE generated source into the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration currently works as well. To prevent surprises, if the feature is not specified, the build fails if this directory exists.	2022-05-11 22:25:24 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	2af8d1e93c	Cranelift/ISLE: re-apply prio-trie fix, this time with fixed fix. (#4117 ) * ISLE compiler: fix priority-trie interval bug. (#4093) This PR fixes a bug in the ISLE compiler related to rule priorities. An important note first: the bug did not affect the correctness of the Cranelift backends, either in theory (because the rules should be correct applied in any order, even contrary to the stated priorities) or in practice (because the generated code actually does not change at all with the DSL compiler fix, only with a separate minimized bug example). The issue was a simple swap of `min` for `max` (see first commit). This is the minimal fix, I think, to get a correct priority-trie with the minimized bug example in this commit. However, while debugging this, I started to convince myself that the complexity of merging multiple priority ranges using the sort of hybrid interval tree / string-matching trie data structure was unneeded. The original design was built with the assumption we might have a bunch of different priority levels, and would need the efficiency of merging where possible. But in practice we haven't used priorities this way: the vast majority of lowering rules exist at the default (priority 0), and just a few overrides are explicitly at prio 1, 2 or (rarely) 3. So, it turns out to be a lot simpler to label trie edges with (prio, symbol) rather than (prio-range, symbol), and delete the whole mess of interval-splitting logic on insertion. It's easier (IMHO) to convince oneself that the resulting insertion algorithm is correct. I was worried that this might impact the size of the generated Rust code or its runtime, but In fact, to my initial surprise (but it makes sense given the above "rarely used" factor), the generated code with this compiler fix is exactly the same. I rebuilt with `--features rebuild-isle,all-arch` but... there were no diffs to commit! This is to me the simplest evidence that we didn't really need that complexity. * Fix earlier commit from #4093: properly sort trie. This commit fixes an in-hindsight-obvious bug in #4093: the trie's edges must be sorted recursively, not just at the top level. With this fix, the generated code differs only in one cosmetic way (a let-binding moves) but otherwise is the same. This includes @fitzgen's fix to the CI (from the revert in #4102) that deletes manifests to actually check that the checked-in source is consistent with the checked-in compiler. The force-rebuild step is now in a shell script for convenience: anyone hacking on the ISLE compiler itself can use this script to more easily rebuild everything. * Add note to build.rs to remind to update force-rebuild-isle.sh	2022-05-09 16:36:48 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	03793b71a7	ISLE: remove all uses of argument polarity, and remove it from the language. (#4091 ) This PR removes "argument polarity": the feature of ISLE extractors that lets them take inputs aside from the value to be matched. Cases that need this expressivity have been subsumed by #4072 with if-let clauses; we can now finally remove this misfeature of the language, which has caused significant confusion and has always felt like a bit of a hack. This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference documentation; and (iii) refactors away all uses of the feature in our three existing backends written in ISLE.	2022-05-02 09:52:12 -07:00
Chris Fallin	eceb433b28	Remove `=x` uses from ISLE, and remove support from the DSL compiler. (#4078 ) This is a follow-up on #4074: now that we have the simplified syntax, we can remove the old, redundant syntax.	2022-04-28 11:17:08 -07:00
Sam Parker	12b4374cd5	[AArch64] Port atomic rmw to ISLE (#4021 ) Also fix and extend the current implementation: - AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be inverted first. - Inputs to the cmp for the RMWLoop case are sign-extended when needed. - Lower Xchg to Swp. - Lower Sub to Add with a negated input. - Added more runtests. Copyright (c) 2022, Arm Limited.	2022-04-27 13:13:59 -07:00
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	2022-04-26 18:58:26 -07:00
Chris Fallin	164bfeaf7e	x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069 )	2022-04-26 09:50:46 -07:00
Chris Fallin	f384938a10	x64 backend: fix a load-op merging bug with integer min/max. (#4068 ) The recent work in #4061 introduced a notion of "unique uses" for CLIF values that both simplified the load-op merging rules and allowed loads to merge in some more places. Unfortunately there's one factor that PR didn't account for: a unique use at the CLIF level could become a multiple-use at the VCode level, when a lowering uses a value multiple times! Making this less error-prone in general is hard, because we don't know the lowering in VCode until it's emitted, so we can't ahead-of-time know that a value will be used multiple times and prevent its merging. But we can know in the lowerings themselves when we're doing this. At least we get a panic from regalloc when we get this wrong; no bad code (uninitialized register being read) should ever come from a backend bug like this. This is still a bit less than ideal, but for now the fix is: in `cmp_and_choose` in the x64 backend (which compares values, then picks one or the other with a cmove), explicitly put values in registers. Fixes #4067 (thanks @Mrmaxmeier for the report!).	2022-04-25 10:32:09 -07:00
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	2022-04-22 18:00:48 -07:00
Johnnie Birch	6a36a1d15d	X64: Port Sqrt to ISLE (#4065 )	2022-04-22 00:42:22 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	7a55779c6b	x64: fix miscompilation of `select.i128` (#4017 ) Issue #3963 identified a miscompilation with select in which the second in the pair of `CMOV`s (one pair per `i128` register) used the wrong flag. This change fixes the error in the x64 ISLE helper function emitting these `CMOV` instructions.	2022-04-12 09:56:57 -07:00
Andrew Brown	f62199da8c	x64: port `load` to ISLE (#3993 ) This change moves the majority of the lowerings for CLIF's `load` instruction over to ISLE. To do so, it also migrates the previous mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE rules (see `to_amode`).	2022-04-07 18:31:22 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00

1 2 3

113 Commits