wasmtime

Author	SHA1	Message	Date
Trevor Elliott	ad09c273c6	Don't merge loads for xmm registers (#4891 ) Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm. Fixes #4890	2022-09-12 10:14:35 -07:00
Chris Fallin	2986f6b0ff	ABI: implement register arguments with constraints. (#4858 ) * ABI: implement register arguments with constraints. Currently, Cranelift's ABI code emits a sequence of moves from physical registers into vregs at the top of the function body, one for every register-carried argument. For a number of reasons, we want to move to operand constraints instead, and remove the use of explicitly-named "pinned vregs"; this allows for better regalloc in theory, as it removes the need to "reverse-engineer" the sequence of moves. This PR alters the ABI code so that it generates a single "args" pseudo-instruction as the first instruction in the function body. This pseudo-inst defs all register arguments, and constrains them to the appropriate registers at the def-point. Subsequently the regalloc can move them wherever it needs to. Some care was taken not to have this pseudo-inst show up in post-regalloc disassemblies, but the change did cause a general regalloc "shift" in many tests, so the precise-output updates are a bit noisy. Sorry about that! A subsequent PR will handle the other half of the ABI code, namely, the callsite case, with a similar preg-to-constraint conversion. * Update based on review feedback. * Review feedback.	2022-09-08 18:03:14 -07:00
Trevor Elliott	caad14826c	Rework the ISA flag checking extractors for x64 (#4878 ) Using fallible extractors that produce no values for flag checks means that it's not possible to pattern match cases where those flags are false. This change reworks the existing flag-checking extractors to be infallible, returning the flag's boolean value from the context instead.	2022-09-07 13:49:35 -07:00
Andrew Brown	f063082474	x64: remove `Inst::XmmLoadConst` (#4876 ) This is a cherry-pick of a long-ago commit, 2d46637. The original message reads: > Now that `SyntheticAmode` can refer to constants, there is no longer a > need for a separate instruction format--standard load instructions will > work. Since then, the transition to ISLE and the use of `XmmLoadConst` in many more places makes this change a larger diff than the original. The basic idea is the same, though: the extra indirection of `Inst::XMmLoadConst` is removed and replaced by a direct use of `VCodeConstant` as a `SyntheticAmode`. This has no effect on codegen, but the CLIF output is now clearer in that the actual instruction is displayed (e.g., `movdqu`) instead of a made-up instruction (`load_const`).	2022-09-07 12:52:13 -07:00
Nick Fitzgerald	f18a1f1488	Cranelift: Deduplicate ABI signatures during lowering (#4829 ) * Cranelift: Deduplicate ABI signatures during lowering This commit creates the `SigSet` type which interns and deduplicates the ABI signatures that we create from `ir::Signature`s. The ABI signatures are now referred to indirectly via a `Sig` (which is a `cranelift_entity` ID), and we pass around a `SigSet` to anything that needs to access the actual underlying `SigData` (which is what `ABISig` used to be). I had to change a couple methods to return a `SmallInstVec` instead of emitting directly to work around what would otherwise be shared and exclusive borrows of the lowering context overlapping. I don't expect any of these to heap allocate in practice. This does not remove the often-unnecessary allocations caused by `ensure_struct_return_ptr_is_returned`. That is left for follow up work. This also opens the door for further shuffling of signature data into more efficient representations in the future, now that we have `SigSet` to store it all in one place and it is threaded through all the code. We could potentially move each signature's parameter and return vectors into one big vector shared between all signatures, for example, which could cut down on allocations and shrink the size of `SigData` since those `SmallVec`s have pretty large inline capacity. Overall, this refactoring gives a 1-7% speedup for compilation on `pulldown-cmark`: ``` compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 8754213.66 ± 7526266.23 (confidence = 99%) dedupe.so is 1.01x to 1.07x faster than main.so! [191003295 234620642.20 280597986] dedupe.so [197626699 243374855.86 321816763] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [170406200 194299792.68 253001201] dedupe.so [172071888 193230743.11 223608329] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3870997347 4437735062.59 5216007266] dedupe.so [4019924063 4424595349.24 4965088931] main.so ``` * Use full path instead of import to avoid warnings in some build configurations Warnings will then cause CI to fail. * Move `SigSet` into `VCode`	2022-08-31 20:39:32 +00:00
Chris Fallin	186c7c3b89	x64: clean up regalloc-related semantics on several instructions. (#4811 ) * x64: clean up regalloc-related semantics on several instructions. This PR removes all uses of "modify" operands on instructions in the x64 backend, and also removes all uses of "pinned vregs", or vregs that are explicitly tied to particular physical registers. In place of both of these mechanisms, which are legacies of the old regalloc design and supported via compatibility code, the backend now uses operand constraints. This is more flexible as it allows the regalloc to see the liveranges and constraints without "reverse-engineering" move instructions. Eventually, after removing all such uses (including in other backends and by the ABI code), we can remove the compatibility code in regalloc2, significantly simplifying its liverange-construction frontend and thus allowing for higher confidence in correctness as well as possibly a bit more compilation speed. Curiously, there are a few extra move instructions now; they are likely poor splitting decisions and I can try to chase these down later. * Fix cranelift-codegen tests. * Review feedback.	2022-08-30 17:21:14 -07:00
Trevor Elliott	25d960f9c4	x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793 ) Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.	2022-08-26 16:33:48 -07:00
Chris Fallin	8e8dfdf5f9	AArch64: Migrate calls and returns to ISLE. (#4788 )	2022-08-26 16:26:39 -07:00
Trevor Elliott	ca6d648e37	x64: Ensure that constants are always 16 bytes for XmmMem (#4790 ) Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load. Fixes #4761	2022-08-26 20:04:38 +00:00
Trevor Elliott	9386409607	x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780 ) Lower extractlane, scalar_to_vector and splat in ISLE. This PR also makes some changes to the SinkableLoad api * change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem * add reg_mem_to_reg_mem_imm and register it as an automatic conversion	2022-08-25 09:38:03 -07:00
Trevor Elliott	b8b6f2781e	x64: Lower shuffle and swizzle in ISLE (#4772 ) Lower `shuffle` and `swizzle` in ISLE. This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result. I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: `94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)`	2022-08-24 21:49:51 +00:00
Afonso Bordado	d394edcefe	x64: Mask shift amounts for small types (#4752 ) * x64: Mask shift amounts for small types * cranelift: Disable i128 shifts in fuzzer again They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed * cranelift: Avoid masking shifts for 32 and 64 bit cases * cranelift: Add const shift tests and fix them * cranelift: Remove const `rotl` cases Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr	2022-08-24 10:31:38 -07:00
Trevor Elliott	4bdfa76370	x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763 ) https://github.com/bytecodealliance/wasmtime/pull/4763	2022-08-23 16:32:00 -07:00
Trevor Elliott	b5f1ab7780	x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741 ) Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE. For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.	2022-08-23 11:22:49 -07:00
Trevor Elliott	cee4b209f3	x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730 ) https://github.com/bytecodealliance/wasmtime/pull/4730	2022-08-22 13:57:36 -07:00
Trevor Elliott	8b6019909b	x64: Lower widening and narrowing operations in ISLE (#4722 ) Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.	2022-08-18 11:53:24 -07:00
Trevor Elliott	fbfceaec98	x64: Migrate iadd_pairwise to ISLE (#4718 ) * Add a test for iadd_pairwise with swiden input * Implement iadd_pairwise for swiden_{low,high} input * Add a test case for iadd_pairwise with uwiden input * Implement iadd_pairwise with uwiden	2022-08-16 12:21:06 -07:00
Nick Fitzgerald	e0d4934ef4	Cranelift: Remove the `ABICaller` trait (#4711 ) * Cranelift: Remove the `ABICaller` trait It has only one implementation: the `ABICallerImpl` struct. We can just use that directly rather than having extra, unnecessary layers of generics and abstractions. * Cranelift: Rename `ABICallerImpl` to `Caller`	2022-08-15 20:41:08 +00:00
Nick Fitzgerald	532fb22af6	Cranelift: Remove the `LowerCtx` trait (#4697 ) The trait had only one implementation: the `Lower` struct. It is easier to just use that directly, and not introduce unnecessary layers of generics and abstractions. Once upon a time, there was hope that we would have other implementations of the `LowerCtx` trait, that did things like lower CLIF to SMTLIB for verification. However, this is not practical these days given the way that the trait has evolved over time, and our verification efforts are focused on ISLE now anyways, and we're actually making some progress on that front (much more than anyone ever did on a second `LowerCtx` trait implementation!)	2022-08-11 16:54:17 -07:00
Afonso Bordado	3ea1813173	x64: Add native lowering for scalar `fma` (#4539 ) Use `vfmadd213{ss,sd}` for these lowerings.	2022-08-11 22:48:16 +00:00
Trevor Elliott	0c2e0494bd	x64: Lower fcvt_from_uint in ISLE (#4684 ) * Add a test for the existing behavior of fcvt_from_unit * Migrate the I8, I16, I32 cases of fcvt_from_uint * Implement the I64 case of fcvt_from_uint * Add a test for the existing behavior of fcvt_from_uint.f64x2 * Migrate fcvt_from_uint.f64x2 to ISLE * Lower the last case of `fcvt_from_uint` * Add a test for `fcvt_from_uint` * Finish lowering fcmp_from_uint * Format	2022-08-11 12:28:41 -07:00
Trevor Elliott	63c2d1e0c3	x64: Remove unnecessary register use when comparing against constants (#4645 ) https://github.com/bytecodealliance/wasmtime/pull/4645	2022-08-09 23:53:51 +00:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Trevor Elliott	301be7438e	x64: Begin migrating branch instructions to ISLE (#4587 ) https://github.com/bytecodealliance/wasmtime/pull/4587	2022-08-03 20:28:52 +00:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Trevor Elliott	9e9e043174	x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518 ) https://github.com/bytecodealliance/wasmtime/pull/4518	2022-07-25 21:28:52 +00:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Andrew Brown	6df56e6aa6	x64: port `atomic_cas` to ISLE (#4223 )	2022-06-06 13:20:33 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	03793b71a7	ISLE: remove all uses of argument polarity, and remove it from the language. (#4091 ) This PR removes "argument polarity": the feature of ISLE extractors that lets them take inputs aside from the value to be matched. Cases that need this expressivity have been subsumed by #4072 with if-let clauses; we can now finally remove this misfeature of the language, which has caused significant confusion and has always felt like a bit of a hack. This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference documentation; and (iii) refactors away all uses of the feature in our three existing backends written in ISLE.	2022-05-02 09:52:12 -07:00
Chris Fallin	164bfeaf7e	x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069 )	2022-04-26 09:50:46 -07:00
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	2022-04-22 18:00:48 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	f62199da8c	x64: port `load` to ISLE (#3993 ) This change moves the majority of the lowerings for CLIF's `load` instruction over to ISLE. To do so, it also migrates the previous mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE rules (see `to_amode`).	2022-04-07 18:31:22 -07:00
Andrew Brown	e92cbfb283	x64: port `icmp` to ISLE (#3886 ) * x64: port GPR-held `icmp` to ISLE * x64: port equality `icmp` for i128 type * x64: port `icmp` for vector types * x64: rename from_intcc to intcc_to_cc	2022-03-18 11:22:09 -07:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00

1 2

72 Commits