Commit Graph

860 Commits

Author SHA1 Message Date
Chris Fallin
3e516e784b Fix lowering instruction-sinking (load-merging) bug.
This fixes a subtle corner case exposed during fuzzing. If we have a bit
of CLIF like:

```
    v0 = load.i64 ...
    v1 = iadd.i64 v0, ...
    v2 = do_other_thing v1
    v3 = load.i64 v1
```

and if this is lowered using a machine backend that can merge loads into
ALU ops, *and* that has an addressing mode that can look through add
ops, then the following can happen:

1. We lower the load at `v3`. This looks backward at the address
   operand tree and finds that `v1` is `v0` plus other things; it has an
   addressing mode that can add `v0`'s register and the other things
   directly; so it calls `put_value_in_reg(v0)` and uses its register in
   the amode. At this point, the add producing `v1` has no references,
   so it will not (yet) be codegen'd.
2. We lower `do_other_thing`, which puts `v1` in a register and uses it.
   the `iadd` now has a reference.
3. We reach the `iadd` and, because it has a reference, lower it. Our
   machine has the ability to merge a load into an ALU operation.
   Crucially, *we think the load at `v0` is mergeable* because it has
   only one user, the add at `v1` (!). So we merge it.
4. We reach the `load` at `v0` and because it has been merged into the
   `iadd`, we do not separately codegen it. The register that holds `v0`
   is thus never written, and the use of this register by the final load
   (Step 1) will see an undefined value.

The logic error here is that in the presence of pattern matching that
looks through pure ops, we can end up with multiple uses of a value that
originally had a single use (because we allow lookthrough of pure ops in
all cases). In other words, the multiple-use-ness of `v1` "passes
through" in some sense to `v0`. However, the load sinking logic is not
aware of this.

The fix, I think, is pretty simple: we disallow an effectful instruction
from sinking/merging if it already has some other use when we look back
at it.

If we disallowed lookthrough of *any* op that had multiple uses, even
pure ones, then we would avoid this scenario; but earlier experiments
showed that to have a non-negligible performance impact, so (given that
we've worked out the logic above) I think this complexity is worth it.
2020-12-03 14:59:12 -08:00
Anton Kirilov
f59b274d22 Cranelift AArch64: Further vector constant improvements
Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements,
and the vector variant of FMOV.

Copyright (c) 2020, Arm Limited.
2020-12-03 15:30:24 +00:00
Chris Fallin
c9a81f008d x64 backend: fix condition-code used for part of explicit heap check.
A dynamic heap address computation may create up to two conditional
branches: the usual bounds-check, but also (in some cases) an
offset-addition overflow check.

The x64 backend had reversed the condition code for this check,
resulting in an always-trapping execution for a valid offset. I'm
somewhat surprised this has existed so long, but I suppose the
particular conditions (large offset, small offset guard, dynamic heap)
have been somewhat rare in our testing so far.

Found via fuzzing in #2453.
2020-12-02 10:40:53 -08:00
Chris Fallin
d413b907b4 Merge pull request #2414 from jgouly/extend-refactor
arm64: Refactor Inst::Extend handling
2020-11-25 17:22:07 -08:00
Chris Fallin
b97f07b405 x64 backend: merge loads into ALU ops when appropriate.
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

```
    movq 0(%rdi), %rax
    addq %rax, %rbx
```

we want to generate this:

```
    addq 0(%rdi), %rax
```

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the *only* use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized `LowerCtx` types rather than a `&mut dyn
LowerCtx`.

On `bz2.wasm`, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.
2020-11-17 11:06:46 -08:00
Chris Fallin
712ff22492 AArch64 SIMD: pattern-match load+splat into LD1R instruction. 2020-11-16 15:59:28 -08:00
Joey Gouly
70cbc4ca7c arm64: Refactor Inst::Extend handling
This refactors the handling of Inst::Extend and simplifies the lowering
of Bextend and Bmask, which allows the use of SBFX instructions for
extensions from 1-bit booleans. Other extensions use aliases of BFM,
and the code was changed to reflect that, rather than hard coding bit
patterns. Also ImmLogic is now implemented, so another hard coded
instruction can be removed.

As part of looking at boolean handling, `normalize_boolean_result` was
changed to `materialize_boolean_result`, such that it can use either
CSET or CSETM. Using CSETM saves an instruction (previously CSET + SUB)
for booleans bigger than 1-bit.

Copyright (c) 2020, Arm Limited.
2020-11-13 16:17:25 +00:00
Chris Fallin
113d061129 Merge pull request #2369 from akirilov-arm/move_fix
Cranelift AArch64: Various small fixes
2020-11-12 14:59:10 -08:00
Andrew Brown
bd93e69eb4 [machinst x64]: implement packed shifts 2020-11-12 14:21:45 -08:00
Chris Fallin
89dbc4590d Merge pull request #2363 from cfallin/extend-only-if-abi
Do value-extensions at ABI boundaries only when ABI requires it.
2020-11-12 12:26:20 -08:00
Chris Fallin
fd6433aaf5 Merge pull request #2395 from cfallin/lucet-x64-support
Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.
2020-11-12 12:10:52 -08:00
Anton Kirilov
edaada3f57 Cranelift AArch64: Various small fixes
* Use FMOV to move 64-bit FP registers and SIMD vectors.
* Add support for additional vector load types.
* Fix the printing of Inst::LoadAddr.

Copyright (c) 2020, Arm Limited.
2020-11-12 13:54:05 +00:00
Chris Fallin
5df8840483 Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.
`lucetc` currently *almost*, but not quite, works with the new x64
backend; the only missing piece is support for the particular
instructions emitted as part of its prologue stack-check.

We do not normally see `brff`, `brif`, or `ifcmp_sp` in CLIF generated by
`cranelift-wasm` without the old-backend legalization rules, so these
were not supported in the new x64 backend as they were not necessary for
Wasm MVP support. Using them resulted in an `unimplemented!()` panic.

This PR adds support for `brff` and `brif` analogously to how AArch64
implements them, by pattern-matching the `ifcmp` / `ffcmp` directly.
Then `ifcmp_sp` is a straightforward variant of `ifcmp`.

Along the way, this also removes the notion of "fallthrough block" from
the branch-group lowering method; instead, `fallthrough` instructions
are handled as normal branches to their explicitly-provided targets,
which (in the original CLIF) match the fallthrough block. The reason for
this is that the block reordering done as part of lowering can change
the fallthrough block. We were not using `fallthrough` instructions in
the output produced by `cranelift-wasm`, so this, too, was not
previously caught.

With these changes, the `lucetc` crate in Lucet passes all tests with
the `x64` feature-flag added to its `cranelift-codegen` dependency.
2020-11-11 13:43:39 -08:00
Chris Fallin
997b654235 Merge pull request #2393 from jgouly/constant-addend
arm64: Fold some constants into load instructions
2020-11-11 11:23:21 -08:00
Pat Hickey
aa259ff92a Merge pull request #2390 from bjorn3/more_simplejit_refactors
More SimpleJIT refactorings
2020-11-11 11:16:04 -08:00
Joey Gouly
a5011e8212 arm64: Fold some constants into load instructions
This changes the following:
  mov x0, #4
  ldr x0, [x1, #4]

Into:
  ldr x0, [x1]

I noticed this pattern (but with #0), in a benchmark.

Copyright (c) 2020, Arm Limited.
2020-11-11 18:47:43 +00:00
Julian Seward
41e87a2f99 Support wasm select instruction with V128-typed operands on AArch64.
* this requires upgrading to wasmparser 0.67.0.

* There are no CLIF side changes because the CLIF `select` instruction is
  polymorphic enough.

* on aarch64, there is unfortunately no conditional-move (csel) instruction on
  vectors.  This patch adds a synthetic instruction `VecCSel` which *does*
  behave like that.  At emit time, this is emitted as an if-then-else diamond
  (4 insns).

* aarch64 implementation is otherwise straightforwards.
2020-11-11 18:45:24 +01:00
bjorn3
b7a93c2321 Remove reloc_block
It isn't called and all reloc sinks either ignore it or panic when it is
called.
2020-11-11 12:36:17 +01:00
Andrew Brown
c9e8889d47 Update clippy annotation to use latest version (#2375) 2020-11-09 09:24:59 -06:00
Yury Delendik
f60c0f3ec3 cranelift: refactor unwind logic to accommodate multiple backends (#2357)
*    Make cranelift_codegen::isa::unwind::input public
*    Move UnwindCode's common offset field out of the structure
*    Make MachCompileResult::unwind_info more generic
*    Record initial stack pointer offset
2020-11-05 16:57:40 -06:00
Andrew Brown
83f182b390 Implement initial emission of constants
This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.
2020-11-05 14:25:02 -08:00
Chris Fallin
a2bbb198de Do value-extensions at ABI boundaries only when ABI requires it.
There has been some confusion over the meaning of the "sign-extend"
(`sext`) and "zero-extend" (`uext`) attributes on parameters and return
values in signatures. According to the three implemented backends, these
attributes indicate that a value narrower than a full register should
always be extended in the way specified. However, they are much more
useful if they mean "extend in this way if the ABI requires extending":
only the ABI backend knows whether or not a particular ABI (e.g., x64
SysV vs. x64 Baldrdash) requires extensions, while only the frontend
(CLIF generator) knows whether or not a value is signed, so the two have
to work in concert.

This is the result of some very helpful discussion in #2354 (thanks to
@uweigand for raising the issue and @bjorn3 for helping to reason about
it).

This change respects the extension attributes in the above way, rather
than unconditionally extending, to avoid potential performance
degradation as we introduce more extension attributes on signatures.
2020-11-05 11:54:35 -08:00
Alex Crichton
e4c3fc5cf2 Update immediate and transitive dependencies
I don't think this has happened in awhile but I've run a `cargo update`
as well as trimming some of the duplicate/older dependencies in
`Cargo.lock` by updating some of our immediate dependencies as well.
2020-11-05 08:34:09 -08:00
Alex Crichton
ab1958434a Bump to 0.21.0 (#2359) 2020-11-05 09:39:53 -06:00
Julian Seward
dd9bfcefaa CL/aarch64: implement the wasm SIMD v128.load{32,64}_zero instructions.
This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  https://github.com/WebAssembly/simd/pull/237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
2020-11-04 20:00:04 +01:00
Andrew Brown
6d50099816 Rewrite interpreter generically (#2323)
* Rewrite interpreter generically

This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions).

* Replace macros with closures
2020-11-02 12:28:07 -08:00
Chris Fallin
d1be8dcfc0 Merge pull request #2310 from akirilov-arm/vector_constants
Cranelift AArch64: Improve code generation for vector constants
2020-11-01 21:56:40 -08:00
bjorn3
23aafa1054 Fix icmp_imm.i128
The immediate splitting code contained a bug causing both low and high
to be equal for i128. This is the root cause for
bjorn3/rustc_codegen_cranelift#1097 and likely the only bug preventing
cg_clif from bootstrapping rustc.
2020-10-31 21:11:50 +01:00
Anton Kirilov
207779fe1d Cranelift AArch64: Improve code generation for vector constants
In particular, introduce initial support for the MOVI and MVNI
instructions, with 8-bit elements. Also, treat vector constants
as 32- or 64-bit floating-point numbers, if their value allows
it, by relying on the architectural zero extension. Finally,
stop generating literal loads for 32-bit constants.

Copyright (c) 2020, Arm Limited.
2020-10-30 13:16:12 +00:00
Johnnie Birch
fa66daea25 Add filetests for fcvt_from_sint.f32x4
Add portions of filetests simd-conversion-legalize.clif and simd-conversion-run.clif
that test fcvt_from_sint.f32x4
2020-10-28 13:02:50 -07:00
Yury Delendik
de4af90af6 machinst x64: New backend unwind (#2266)
Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.
2020-10-23 15:19:41 -05:00
Yury Delendik
b10e027fef Refactor UnwindInfo codes and frame_register (#2307)
* Refactor UnwindInfo codes and frame_register

* use isa word_size

* fix filetests

* Add comment about UnwindCode::PushRegister
2020-10-22 14:52:42 -05:00
Andrew Brown
0ba35171fb [machinst x64]: port more CLIF filetests 2020-10-09 10:04:50 -07:00
Benjamin Bouvier
e8c2a1763a machinst x64: avoid emitting movzx when the input is an ALU 32-bits operation; 2020-10-09 18:49:27 +02:00
Benjamin Bouvier
3980a43cda machinst x64: use the (base,offset) addressing mode even in the presence of a uextend; 2020-10-09 18:49:27 +02:00
Andrew Brown
c8cce5d2d7 [machinst x64]: enable packed saturated arithmetic 2020-10-08 08:46:20 -07:00
Benjamin Bouvier
e32e6fb612 machinst x64: check SSE requirements for instructions against enabled features; 2020-10-08 09:21:51 +02:00
Andrew Brown
3778fa025c Switch DataValue to use Ieee32/Ieee64
As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.
2020-10-07 12:17:17 -07:00
Andrew Brown
6f6f79ef2b refactor: move DataValue from cranelift-reader to cranelift-codegen
This is no change to functionality; the move is necessary in order to return InstructionData immediates in a structure way (see next commit).
2020-10-07 12:17:17 -07:00
Chris Fallin
71768bb6cf Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Fixes #2254.
2020-10-06 14:44:02 -07:00
Johnnie Birch
5799fd3cc0 Add file test simd-arithmetic-run to x64 backend
Copies over simd-arithmetic-run from the old backend, adding
several run tests including for min/max. Tests not supported
are commented out.
2020-10-02 16:20:10 -07:00
Chris Fallin
b2f52910fb Merge pull request #2224 from jgouly/sp_adjust
arm64: Use SignedOffset rather than PreIndexed addressing mode for ca…
2020-10-02 09:18:00 -07:00
Andrew Brown
16a2538ecd [machinst x64]: rename Inst::XmmUninitializedValue and document
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
2020-10-02 08:29:31 -07:00
Andrew Brown
3d9f3bf728 [machinst x64]: port CLIF tests related to comparison and lane operations 2020-10-02 08:29:31 -07:00
Joey Gouly
eec60c9b06 arm64: Use SignedOffset rather than PreIndexed addressing mode for callee-saved registers
This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`)
into `gen_clobber_save` so that it can be combined with other stack
adjustments.

Copyright (c) 2020, Arm Limited.
2020-10-02 16:22:55 +01:00
Anton Kirilov
d18de69e5a AArch64: Add test cases for callee-saved SIMD & FP registers
Copyright (c) 2020, Arm Limited.
2020-09-30 14:19:02 +01:00
Andrew Brown
b43f4a464a refactor: move all 'filetests/vcode' tests to 'filetests/isa' 2020-09-29 09:27:39 -07:00
Andrew Brown
452d854855 [machinst x64]: demonstrate that packed register moves are elided 2020-09-29 08:48:37 -07:00
Andrew Brown
b7217d454f [machinst x64]: add lane-related CLIF filetests 2020-09-29 08:45:12 -07:00
Pat Hickey
b10beeee01 dep gardening (#2233)
* wasmtime-profiling: latest object dep is 0.21.1

* latest gimli is 0.22

* bump cargo.lock
2020-09-26 00:49:28 -05:00