The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single
AArch64 SIMD instruction, and instead need a sequence of around ten
instructions. Because of this, this patch is somewhat longer and more complex
than it would be for (eg) x64.
Main changes are:
* the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64.
* at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm
instructions are to be translated.
* in the wasm->CLIF translation (code_translator.rs), translate into
`vhigh_bits`. This is straightforward.
* in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits`
into equivalent sequences of AArch64 instructions. There is a different
sequence for each of the `{8x16, 16x8, 32x4}` variants.
All other changes are AArch64-specific, and add instruction definitions needed
by the previous step:
* Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by
immediate) and `VecExtract` (effectively a double-length vector shift)
* To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the
`VecLanesOp` family add an `addv` variant.
* Add supporting code for the above changes to AArch64 instructions:
- getting the register uses (`aarch64_get_regs`)
- mapping the registers (`aarch64_map_regs`)
- printing instructions
- emitting instructions (`impl MachInstEmit for Inst`). The handling of
`VecShiftImm` is a bit complex.
- emission tests for new instructions and variants.
The test that triggers the rebuild of the peephole optimizers is in the
`cranelift-codegen` crate, not the umbrella cranelift crate. This was previously
successfully running zero tests, and then successfully reporting no `git diff`
because no peephole optimizers were ever rebuilt.
This change fixes it so that we run the correct test that triggers the
rebuilding of the peephole optimizers.
Similar to other data structures owned by the `Store` there's no need
for `Instance` to have a strong `Arc` reference, instead it's sufficient
for `Store` to have the owning reference.
In the current translation of wasm (128-bit) SIMD into CLIF, we work around differences in the
type system models of wasm vs CLIF by inserting `bitcast` (a no-op cast) CLIF instructions before
more or less every use of a SIMD value. Unfortunately this was not being done consistently and
even small examples with a single if-then-else diamond that produces a SIMD value, could cause a
verification failure downstream. In this case, the jump out of the "else" block needed a
bitcast, but didn't have one.
This patch wraps creation of CLIF jumps and conditional branches up into three functions,
`canonicalise_then_jump` and `canonicalise_then_br{z,nz}`, and uses them consistently. They
first cast the relevant block formal parameters, then generate the relevant kind of branch/jump.
Hence, provided they are also used consistently in future to generate branches/jumps in this
file, we are protected against such failures.
The patch also adds a large(ish) comment at the top explaining this in more detail.
There's no need to name each export since each synthetic instance we're
creating only has one export, so let's use the empty string which
doesn't require any allocations.
Fixes RUSTSEC-2020-0054 warning from cargo-audit/cargo-deny, follows the recommendation to switch to the new maintained `directories-next` crate fork
Only affects the cache directory determination for the environment and was a simple search'n'replace to this fork so don't think behavior has changed.
https://rustsec.org/advisories/RUSTSEC-2020-0054
In order to register traps for `load_splat`, several instruction formats need knowledge of `SourceLoc`s; however, since the x64 backend does not correctly and completely register traps for `RegMem::Mem` variants I opened https://github.com/bytecodealliance/wasmtime/issues/2290 to discuss and resolve this issue. In the meantime, the current behavior (i.e. remaining largely unaware of `SourceLoc`s) is retained.
It corresponds to WebAssembly's `load*_splat` operations, which
were previously represented as a combination of `Load` and `Splat`
instructions. However, there are architectures such as Armv8-A
that have a single machine instruction equivalent to the Wasm
operations. In order to generate it, it is necessary to merge the
`Load` and the `Splat` in the backend, which is not possible
because the load may have side effects. The new IR instruction
works around this limitation.
The AArch64 backend leverages the new instruction to improve code
generation.
Copyright (c) 2020, Arm Limited.
This commit adds initial (gated) support for the multi-memory wasm
proposal. This was actually quite easy since almost all of wasmtime
already expected multi-memory to be implemented one day. The only real
substantive change is the `memory.copy` intrinsic changes, which now
accounts for the source/destination memories possibly being different.
Rather than using paths from the root instruction to the instruction we are
matching against or checking if it is constant or whatever, use temporary
variables. When we successfully match an instruction's opcode, we simultaneously
define these temporaries for the instruction's operands. This is similar to how
open-coding these matches in Rust would use `match` expressions with pattern
matching to bind the operands to variables at the same time.
This saves about 1.8% of instructions retired when Peepmatic is enabled.
A new associated type Info is added to MachInstEmit, which is the
immutable counterpart to State. It can't easily be constructed from an
ABICallee, since it would require adding an associated type to the
latter, and making so leaks the associated type in a lot of places in
the code base and makes the code harder to read. Instead, the EmitInfo
state can simply be passed to the `Vcode::emit` function directly.
As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.