The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single
AArch64 SIMD instruction, and instead need a sequence of around ten
instructions. Because of this, this patch is somewhat longer and more complex
than it would be for (eg) x64.
Main changes are:
* the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64.
* at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm
instructions are to be translated.
* in the wasm->CLIF translation (code_translator.rs), translate into
`vhigh_bits`. This is straightforward.
* in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits`
into equivalent sequences of AArch64 instructions. There is a different
sequence for each of the `{8x16, 16x8, 32x4}` variants.
All other changes are AArch64-specific, and add instruction definitions needed
by the previous step:
* Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by
immediate) and `VecExtract` (effectively a double-length vector shift)
* To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the
`VecLanesOp` family add an `addv` variant.
* Add supporting code for the above changes to AArch64 instructions:
- getting the register uses (`aarch64_get_regs`)
- mapping the registers (`aarch64_map_regs`)
- printing instructions
- emitting instructions (`impl MachInstEmit for Inst`). The handling of
`VecShiftImm` is a bit complex.
- emission tests for new instructions and variants.
The test that triggers the rebuild of the peephole optimizers is in the
`cranelift-codegen` crate, not the umbrella cranelift crate. This was previously
successfully running zero tests, and then successfully reporting no `git diff`
because no peephole optimizers were ever rebuilt.
This change fixes it so that we run the correct test that triggers the
rebuilding of the peephole optimizers.
Similar to other data structures owned by the `Store` there's no need
for `Instance` to have a strong `Arc` reference, instead it's sufficient
for `Store` to have the owning reference.
In the current translation of wasm (128-bit) SIMD into CLIF, we work around differences in the
type system models of wasm vs CLIF by inserting `bitcast` (a no-op cast) CLIF instructions before
more or less every use of a SIMD value. Unfortunately this was not being done consistently and
even small examples with a single if-then-else diamond that produces a SIMD value, could cause a
verification failure downstream. In this case, the jump out of the "else" block needed a
bitcast, but didn't have one.
This patch wraps creation of CLIF jumps and conditional branches up into three functions,
`canonicalise_then_jump` and `canonicalise_then_br{z,nz}`, and uses them consistently. They
first cast the relevant block formal parameters, then generate the relevant kind of branch/jump.
Hence, provided they are also used consistently in future to generate branches/jumps in this
file, we are protected against such failures.
The patch also adds a large(ish) comment at the top explaining this in more detail.
There's no need to name each export since each synthetic instance we're
creating only has one export, so let's use the empty string which
doesn't require any allocations.
Fixes RUSTSEC-2020-0054 warning from cargo-audit/cargo-deny, follows the recommendation to switch to the new maintained `directories-next` crate fork
Only affects the cache directory determination for the environment and was a simple search'n'replace to this fork so don't think behavior has changed.
https://rustsec.org/advisories/RUSTSEC-2020-0054
This commit adds initial (gated) support for the multi-memory wasm
proposal. This was actually quite easy since almost all of wasmtime
already expected multi-memory to be implemented one day. The only real
substantive change is the `memory.copy` intrinsic changes, which now
accounts for the source/destination memories possibly being different.
Rather than using paths from the root instruction to the instruction we are
matching against or checking if it is constant or whatever, use temporary
variables. When we successfully match an instruction's opcode, we simultaneously
define these temporaries for the instruction's operands. This is similar to how
open-coding these matches in Rust would use `match` expressions with pattern
matching to bind the operands to variables at the same time.
This saves about 1.8% of instructions retired when Peepmatic is enabled.
A new associated type Info is added to MachInstEmit, which is the
immutable counterpart to State. It can't easily be constructed from an
ABICallee, since it would require adding an associated type to the
latter, and making so leaks the associated type in a lot of places in
the code base and makes the code harder to read. Instead, the EmitInfo
state can simply be passed to the `Vcode::emit` function directly.
As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.
This change abstracts away (from the perspective of the new backend) how immediate values are stored in InstructionData. It gathers large immediates from necessary places (e.g. constant pool) and delegates to `InstructionData::imm_value` for the rest. This refactor only touches original users of `LowerCtx::get_immediate` but a future change could do the same for any place the new backend is accessing InstructionData directly to retrieve immediates.
* normalise value prior to right shifts
by first left-aligning (shift left by 32 bits)
then shifting back (respecting signedness)
* Update crates/debug/src/transform/expression.rs
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
* Update crates/debug/src/transform/expression.rs
* Update crates/debug/src/transform/expression.rs
* update translation of DW_OP_shr in test
* add translation test for DW_OP_shra
* explain normalisation
* optimise the expression by performing only one right shift
We assume that the expression evaluator permits collapsing
two shifts as long as they go in the same direction.
Review feedback.
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
As found by @julian-seward1, movss/movsd aren't included in the
zero-latency move instructions section of the Intel optimization manual.
Use MOVAPS instead for those moves.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.
In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).
Fixes#2254.