In the `ModuleEnvironment::declare_signature` callback, also pass the original
Wasm function signature, so that consumers may associate this information with
each compiled function. This is often necessary because while each Wasm
signature gets compiled down into a single native signature, multiple Wasm
signatures might compile down into the same native signature, and in these cases
the original Wasm signature is required for dynamic type checking of calls.
About half of the `FuncEnvironment::translate_table_*` methods were using the
`TableIndex` newtype, while the other half were using raw `u32`s. This commit
makes everything use `TableIndex`.
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
* Encode vselect using BLEND instructions on x86
* Legalize vselect to bitselect
* Optimize bitselect to vselect for some operands
* Add run tests for bitselect-vselect optimization
* Address review feedback
* Ensure GlobalSet on vectors are cast to Cranelift's I8X16 type
This is a fix related to the decision to use Cranelift's I8X16 type to represent Wasm's V128--it requires casting to maintain type correctness. See https://github.com/bytecodealliance/wasmtime/issues/1147.
* Enable SIMD spec test: simd_lane.wast
I'm not actually sure that it's possible to write `#[test]` in a
`proc-macro` crate. Regardless I don't think it's too too conventional,
so let's disable this for now.
Closes#1775
A full Eq implementation is no needed for ReservedValue, as we only need
to check whether a value is the reserved one. For entities (defined with
`entity_impl!`) this doesn't make much difference, but for more
complicated types this avoids generating redundant `Eq`s.
I hadn't realized before that the filetest backend for `test vcode` is
doing essentially what `compile` is doing, but for new (`MachInst`)
backends: it is just getting a disassembly and running it through
filecheck. There's no reason not to reuse `test compile` for the AArch64
tests as well.
This was motivated by the desire to have "this IR compiles successfully"
tests work on both x86 and AArch64. It seems this should work fine by
adding multiple `target` directives when a test case should be
compile-tested on multiple architectures.
This commit prevents updating the XMM save unwind operation offsets when a
frame pointer is not used, even though currently Cranelift always uses a
frame pointer.
This will prevent incorrect unwind information in the future when we start
omitting frame pointers.
This commit fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.
Importantly, this commit removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.
The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so
that it is included in the frame size and to offset the layout of spills and
explicit slots.
The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.
Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents `layout_stack` from putting a spill or explicit
slot in this reserved space. In the future, `layout_stack` should take
advantage of the *caller-provided* shadow space for spills, but this commit does
not attempt to address that.
The shadow space is now omitted from the local frame for leaf functions.
Fixes#1728.
Fixes#1587.
Fixes#1475.
This patch fixes a subtle bug that occurred in the MachBuffer branch
optimization: in tracking labels at the current buffer tail using a
sorted-by-offset array, the code did not update this array properly when
redirecting labels. As a result, the dead-branch removal was unsafe,
because not every label pointing to a branch is guaranteed to be
redirected properly first.
Discovered while doing performance testing: bz2 silently took a wrong
branch and exited compression early. (Eek!)
To address this problem, this patch adopts a slightly simpler data
structure: we only track the labels *at the current buffer tail*, and
*at the start of each branch*, and we're careful to update these
appropriately to maintain the invariants. I'm pretty confident that this
is correct now, but we should (still) fuzz it a bunch, because wrong
control flow scares me a nonzero amount. I should probably also actually
write out a formal proof that these data-structure updates are correct.
The optimizations are important for performance (removing useless empty
blocks, and taking advantage of any fallthrough opportunities at all),
so I don't think we would want to drop them entirely.