This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
* Encode vselect using BLEND instructions on x86
* Legalize vselect to bitselect
* Optimize bitselect to vselect for some operands
* Add run tests for bitselect-vselect optimization
* Address review feedback
* Ensure GlobalSet on vectors are cast to Cranelift's I8X16 type
This is a fix related to the decision to use Cranelift's I8X16 type to represent Wasm's V128--it requires casting to maintain type correctness. See https://github.com/bytecodealliance/wasmtime/issues/1147.
* Enable SIMD spec test: simd_lane.wast
I'm not actually sure that it's possible to write `#[test]` in a
`proc-macro` crate. Regardless I don't think it's too too conventional,
so let's disable this for now.
Closes#1775
A full Eq implementation is no needed for ReservedValue, as we only need
to check whether a value is the reserved one. For entities (defined with
`entity_impl!`) this doesn't make much difference, but for more
complicated types this avoids generating redundant `Eq`s.
I hadn't realized before that the filetest backend for `test vcode` is
doing essentially what `compile` is doing, but for new (`MachInst`)
backends: it is just getting a disassembly and running it through
filecheck. There's no reason not to reuse `test compile` for the AArch64
tests as well.
This was motivated by the desire to have "this IR compiles successfully"
tests work on both x86 and AArch64. It seems this should work fine by
adding multiple `target` directives when a test case should be
compile-tested on multiple architectures.
This commit prevents updating the XMM save unwind operation offsets when a
frame pointer is not used, even though currently Cranelift always uses a
frame pointer.
This will prevent incorrect unwind information in the future when we start
omitting frame pointers.
This commit fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.
Importantly, this commit removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.
The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so
that it is included in the frame size and to offset the layout of spills and
explicit slots.
The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.
Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents `layout_stack` from putting a spill or explicit
slot in this reserved space. In the future, `layout_stack` should take
advantage of the *caller-provided* shadow space for spills, but this commit does
not attempt to address that.
The shadow space is now omitted from the local frame for leaf functions.
Fixes#1728.
Fixes#1587.
Fixes#1475.