`lucetc` currently *almost*, but not quite, works with the new x64
backend; the only missing piece is support for the particular
instructions emitted as part of its prologue stack-check.
We do not normally see `brff`, `brif`, or `ifcmp_sp` in CLIF generated by
`cranelift-wasm` without the old-backend legalization rules, so these
were not supported in the new x64 backend as they were not necessary for
Wasm MVP support. Using them resulted in an `unimplemented!()` panic.
This PR adds support for `brff` and `brif` analogously to how AArch64
implements them, by pattern-matching the `ifcmp` / `ffcmp` directly.
Then `ifcmp_sp` is a straightforward variant of `ifcmp`.
Along the way, this also removes the notion of "fallthrough block" from
the branch-group lowering method; instead, `fallthrough` instructions
are handled as normal branches to their explicitly-provided targets,
which (in the original CLIF) match the fallthrough block. The reason for
this is that the block reordering done as part of lowering can change
the fallthrough block. We were not using `fallthrough` instructions in
the output produced by `cranelift-wasm`, so this, too, was not
previously caught.
With these changes, the `lucetc` crate in Lucet passes all tests with
the `x64` feature-flag added to its `cranelift-codegen` dependency.
This changes the following:
mov x0, #4
ldr x0, [x1, #4]
Into:
ldr x0, [x1]
I noticed this pattern (but with #0), in a benchmark.
Copyright (c) 2020, Arm Limited.
* this requires upgrading to wasmparser 0.67.0.
* There are no CLIF side changes because the CLIF `select` instruction is
polymorphic enough.
* on aarch64, there is unfortunately no conditional-move (csel) instruction on
vectors. This patch adds a synthetic instruction `VecCSel` which *does*
behave like that. At emit time, this is emitted as an if-then-else diamond
(4 insns).
* aarch64 implementation is otherwise straightforwards.
* Make cranelift_codegen::isa::unwind::input public
* Move UnwindCode's common offset field out of the structure
* Make MachCompileResult::unwind_info more generic
* Record initial stack pointer offset
This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.
There has been some confusion over the meaning of the "sign-extend"
(`sext`) and "zero-extend" (`uext`) attributes on parameters and return
values in signatures. According to the three implemented backends, these
attributes indicate that a value narrower than a full register should
always be extended in the way specified. However, they are much more
useful if they mean "extend in this way if the ABI requires extending":
only the ABI backend knows whether or not a particular ABI (e.g., x64
SysV vs. x64 Baldrdash) requires extensions, while only the frontend
(CLIF generator) knows whether or not a value is signed, so the two have
to work in concert.
This is the result of some very helpful discussion in #2354 (thanks to
@uweigand for raising the issue and @bjorn3 for helping to reason about
it).
This change respects the extension attributes in the above way, rather
than unconditionally extending, to avoid potential performance
degradation as we introduce more extension attributes on signatures.
I don't think this has happened in awhile but I've run a `cargo update`
as well as trimming some of the duplicate/older dependencies in
`Cargo.lock` by updating some of our immediate dependencies as well.
This patch implements, for aarch64, the following wasm SIMD extensions.
v128.load32_zero and v128.load64_zero instructions
https://github.com/WebAssembly/simd/pull/237
The changes are straightforward:
* no new CLIF instructions. They are translated into an existing CLIF scalar
load followed by a CLIF `scalar_to_vector`.
* the comment/specification for CLIF `scalar_to_vector` has been changed to
match the actual intended semantics, per consulation with Andrew Brown.
* translation from `scalar_to_vector` to aarch64 `fmov` instruction. This
has been generalised slightly so as to allow both 32- and 64-bit transfers.
* special-case zero in `lower_constant_f128` in order to avoid a
potentially slow call to `Inst::load_fp_constant128`.
* Once "Allow loads to merge into other operations during instruction
selection in MachInst backends"
(https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
we can use that functionality to pattern match the two-CLIF pair and
emit a single AArch64 instruction.
* A simple filetest has been added.
There is no comprehensive testcase in this commit, because that is a separate
repo. The implementation has been tested, nevertheless.
* Rewrite interpreter generically
This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions).
* Replace macros with closures
The immediate splitting code contained a bug causing both low and high
to be equal for i128. This is the root cause for
bjorn3/rustc_codegen_cranelift#1097 and likely the only bug preventing
cg_clif from bootstrapping rustc.
In particular, introduce initial support for the MOVI and MVNI
instructions, with 8-bit elements. Also, treat vector constants
as 32- or 64-bit floating-point numbers, if their value allows
it, by relying on the architectural zero extension. Finally,
stop generating literal loads for 32-bit constants.
Copyright (c) 2020, Arm Limited.
As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.
In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).
Fixes#2254.
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`)
into `gen_clobber_save` so that it can be combined with other stack
adjustments.
Copyright (c) 2020, Arm Limited.
This adds a new feature experimental_x64 for CLIF tests.
A test is run in the new x64 backend iff:
- either the test doesn't have an x86_64 target requirement, signaling
it must be target agnostic or not run on this target.
- or the test does require the x86_64 target, and the test is marked
with the `experimental_x64` feature.
This required one workaround in the parser. The reason is that the
parser will try to use information not provided by the TargetIsa adapter
for the Mach backends, like register names. In particular, parsing test
may fail before the test runner realizes that the test must not be run.
In this case, we early return an almost-empty TestFile from the parser,
under the same conditions as above, so that the caller may filter out
the test properly.
This also copies two tests from the test suite using the new backend,
for demonstration purposes.
This commit adds arm32 code generation for some IR insts.
Floating-point instructions are not supported, because regalloc
does not allow to represent overlapping register classes,
which are needed by VFP/Neon.
There is also no support for big-endianness, I64 and I128 types.
* clif-util: do not convert `anyhow::Error`s into strings into `anyhow::Error`s
* filetests: Use the debug formatting of `anyhow::Error`s
This provides the full error context, not just the source error's message.
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.
This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.