Commit Graph

41 Commits

Author SHA1 Message Date
Trevor Elliott
cc073593a4 Fix block label printing in precise-output tests (#5798)
As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function.

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-16 02:35:26 +00:00
Trevor Elliott
f04decc4a1 Use capstone to validate precise-output tests (#5780)
Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.
2023-02-15 16:35:10 -08:00
Trevor Elliott
a5698cedf8 cranelift: Remove brz and brnz (#5630)
Remove the brz and brnz instructions, as their behavior is now redundant with brif.
2023-01-30 20:34:56 +00:00
Trevor Elliott
a5ecb5e647 x64: Share a zero in the ushr translation on x64 to free up a register (#5424)
Share a zero value in the translation of ushr for i128. This increases the lifetime of the value by a few instructions, and reduces the number of registers used in the translation by one, which seems like an acceptable trade-off.
2022-12-12 18:15:43 -08:00
Trevor Elliott
c5379051c4 Enable the ssa verifier in debug builds (#5354)
Enable regalloc2's SSA verifier in debug builds to check for any outstanding reuse of virtual registers in def constraints. As fuzzing enables debug_assertions, this will enable the SSA verifier when fuzzing as well.
2022-12-07 12:22:51 -08:00
Trevor Elliott
b077854b57 Generate SSA code from returns (#5172)
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
2022-11-08 16:00:49 -08:00
Afonso Bordado
c8791073d6 cranelift: Remove iconst.i128 (#5075)
* cranelift: Remove iconst.i128

* bugpoint: Report Changed when only one instruction is mutated

* cranelift: Fix egraph bxor rule

* cranelift: Remove some simple_preopt opts for i128
2022-10-24 12:43:28 -07:00
Trevor Elliott
32a7593c94 cranelift: Remove booleans (#5031)
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.

Fixes #3205

Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-17 16:00:27 -07:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Afonso Bordado
d394edcefe x64: Mask shift amounts for small types (#4752)
* x64: Mask shift amounts for small types

* cranelift: Disable i128 shifts in fuzzer again

They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed

* cranelift: Avoid masking shifts for 32 and 64 bit cases

* cranelift: Add const shift tests and fix them

* cranelift: Remove const `rotl` cases

Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
2022-08-24 10:31:38 -07:00
Trevor Elliott
1fc11bbe51 x64: Migrate brff and I128 branching instructions to ISLE (#4599)
https://github.com/bytecodealliance/wasmtime/pull/4599
2022-08-04 08:58:50 -07:00
Trevor Elliott
25782b527e x64: Migrate trapif and trapff to ISLE (#4545)
https://github.com/bytecodealliance/wasmtime/pull/4545
2022-08-01 11:24:11 -07:00
Trevor Elliott
29d4edc76b x64: Migrate call and call_indirect to ISLE (#4542)
https://github.com/bytecodealliance/wasmtime/pull/4542
2022-07-28 13:10:03 -07:00
Trevor Elliott
7e0bb465d0 X64: port the rest of icmp to ISLE (#4254)
Finish migrating icmp to ISLE for x64
2022-06-13 16:34:11 -07:00
Chris Fallin
8f61eb9341 Upgrade to regalloc2 version 0.2.1. (#4199)
This resolves an edge-case where mul.i128 with an input that continues
to be live after the instruction could cause an invalid regalloc
constraint (basically, the regalloc did not previously support an
instruction use and def both being constrained to the same physical reg;
and the "mul" variant used for mul.i128 on x64 was the only instance of
such operands in Cranelift).

Causes two extra move instructions in the mul.i128 filetest, but that's
the price to pay for the slightly more general (works in all cases)
handling of the constraints.
2022-06-01 13:26:20 -07:00
Chris Fallin
b830c3cf93 Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182)
RA2 recently removed the need for a dedicated scratch register for
cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive
performance impact on function bodies that were register-constrained, as
it means that one more register is available. In Sightglass, I measured
+5-8% on `blake3-scalar`, at least among current benchmarks.
2022-05-23 12:51:04 -07:00
Chris Fallin
a0318f36f0 Switch Cranelift over to regalloc2. (#3989)
This PR switches Cranelift over to the new register allocator, regalloc2.

See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.

Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:

```
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A
```
2022-04-14 10:28:21 -07:00
Andrew Brown
5fa104205d x64: improve generation of i128 icmp (#3946)
Previously, we used the flags of `AND` for `SETcc`. This change uses
`TEST` instead, which discards the AND result but sets the flags needed
for `SETcc`. This reduces register pressure slightly for this sequence.
2022-03-18 16:36:31 -07:00
Andrew Brown
e92cbfb283 x64: port icmp to ISLE (#3886)
* x64: port GPR-held `icmp` to ISLE
* x64: port equality `icmp` for i128 type
* x64: port `icmp` for vector types
* x64: rename from_intcc to intcc_to_cc
2022-03-18 11:22:09 -07:00
Chris Fallin
24f145cd1e Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848) 2022-02-28 09:45:13 -08:00
Nick Fitzgerald
a41fdb0303 cranelift: Port rotr lowering to ISLE on x64 2022-01-13 14:59:09 -08:00
Nick Fitzgerald
7454f1f3af cranelift: port sshr to ISLE on x64 (#3681) 2022-01-12 09:13:58 -06:00
Alex Crichton
1ef0abb12c Update lots of isa/*/*.clif tests to precise-output (#3677)
* Update lots of `isa/*/*.clif` tests to `precise-output`

This commit goes through the `aarch64` and `x64` subdirectories and
subjectively changes tests from `test compile` to add `precise-output`.
This then auto-updates all the test expectations so they can be
automatically instead of manually updated in the future. Not all tests
were migrated, largely subject to the whims of myself, mainly looking to
see if the test was looking for specific instructions or just checking
the whole assembly output.

* Filter out `;;` comments from test expctations

Looks like the cranelift parser picks up all comments, not just those
trailing the function, so use a convention where `;;` is used for
human-readable-comments in test cases and `;`-prefixed comments are the
test expectation.
2022-01-10 13:38:23 -06:00
Alex Crichton
d8974ce6bc aarch64: Migrate ishl/ushr/sshr to ISLE (#3608)
* aarch64: Migrate ishl/ushr/sshr to ISLE

This commit migrates the `ishl`, `ushr`, and `sshr` instructions to
ISLE. These involve special cases for almost all types of integers
(including vectors) and helper functions for the i128 lowerings since
the i128 lowerings look to be used for other instructions as well. This
doesn't delete the i128 lowerings in the Rust code just yet because
they're still used by Rust lowerings, but they should be deletable in
due time once those lowerings are translated to ISLE.

* Use more descriptive names for i128 lowerings

* Use a with_flags-lookalike for csel

* Use existing `with_flags_*`

* Coment backwards order

* Update generated code
2021-12-16 17:37:53 -06:00
Chris Fallin
fd171ca063 Fix OperandSize: need clamp-to-32-bit behavior in most cases, but true-width for shifts. 2021-12-16 12:32:28 -08:00
Nick Fitzgerald
33fcd6b4a5 x64: special case 0 to use xor in Inst::gen_constant for i128s 2021-11-10 15:57:58 -08:00
Nick Fitzgerald
b5105c025c MachInst: always rematerialize constants, rather than assign them registers
There were a few previous code paths that attempted to handle this, but this new
check handles it for all callers.

Rematerializing constants, rather than assigning and reusing a register, allows
for lower register pressure.
2021-11-10 15:45:43 -08:00
Nick Fitzgerald
b8494822dc ISLE: finish porting imul lowering to ISLE 2021-11-05 15:41:24 -07:00
Nick Fitzgerald
d377b665c6 Initial ISLE integration with the x64 backend
On the build side, this commit introduces two things:

1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.

2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.

Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.

Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.

In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:

    dst = src1 op src2

Rather than only the typical x86-64 2-operand form:

    dst = dst op src

This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.

("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)

There are two motivations for this change:

1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
   lowering to translate a CLIF expression that evaluates to some value into a
   `MachInst` expression that evaluates to the same value. We want both the
   lowering itself and the resulting `MachInst` to be pure and referentially
   transparent. This is both a nice paradigm for compiler writers that are
   authoring and maintaining lowering rules and is a prerequisite to any sort of
   formal verification of our lowering rules in the future.

2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
   be in SSA form.
2021-10-12 17:11:58 -07:00
Chris Fallin
e9921574d7 Update to regalloc.rs 0.0.32.
It appears that some allocation heuristics have changed slightly since
0.0.31, so some of the golden-output filetests are updated as well.
Ideally we would rely more on runtests rather than golden-compilation
tests; but for now this is sufficient. (I'm not sure exactly what in
regalloc.rs changed to alter these heuristics; it's actually been almost
a year since the 0.0.31 release with several refactorings and tweaks
merged since then.)

Fixes #3441.
2021-10-20 15:28:42 -07:00
bjorn3
9e34df33b9 Remove the old x86 backend 2021-09-29 16:13:46 +02:00
Chris Fallin
efe3930215 Fix bint on x64, and make bextend consistent with bool representation.
There has been occasional confusion with the representation that we use
for bool-typed values in registers, at least when these are wider than
one bit. Does a `b8` store `true` as 1, or as all-ones (`0xff`)?

We've settled on the latter because of some use-cases where the wide
bool becomes a mask -- see #2058 for more on this.

This is fine, and transparent, to most operations within CLIF, because
the bool-typed value still has only two semantically-visible states,
namely `true` and `false`.

However, we have to be careful with bool-to-int conversions. `bint` on
aarch64 correctly masked the all-ones value down to 0 or 1, as required
by the instruction specification, but on x64 it did not. This PR fixes
that bug and makes x64 consistent with aarch64.

While staring at this code I realized that `bextend` was also not
consistent with the all-ones invariant: it should do a sign-extend, not
a zero-extend as it previously did. This is also rectified and tested.
(Aarch64 also already had this case implemented correctly.)

Fixes #3003.
2021-06-22 10:56:56 -07:00
Afonso Bordado
e021995323 Allow i128 amount operands on shift instructions in the x64 backend
Fixes #2727.
2021-05-10 18:32:20 +01:00
Chris Fallin
cb48ea406e Switch default to new x86_64 backend.
This PR switches the default backend on x86, for both the
`cranelift-codegen` crate and for Wasmtime, to the new
(`MachInst`-style, `VCode`-based) backend that has been under
development and testing for some time now.

The old backend is still available by default in builds with the
`old-x86-backend` feature, or by requesting `BackendVariant::Legacy`
from the appropriate APIs.

As part of that switch, it adds some more runtime-configurable plumbing
to the testing infrastructure so that tests can be run using the
appropriate backend. `clif-util test` is now capable of parsing a
backend selector option from filetests and instantiating the correct
backend.

CI has been updated so that the old x86 backend continues to run its
tests, just as we used to run the new x64 backend separately.

At some point, we will remove the old x86 backend entirely, once we are
satisfied that the new backend has not caused any unforeseen issues and
we do not need to revert.
2021-04-02 11:35:53 -07:00
Chris Fallin
2d5db92a9e Rework/simplify unwind infrastructure and implement Windows unwind.
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for `alloca` (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
`abi_impl.rs`.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from `sp` once to allocate space, then used stores to `[sp,
offset]` to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.
2021-03-11 20:03:52 -08:00
Chris Fallin
e41d882144 Merge pull request #2678 from cfallin/x64-fastcall
x86-64 Windows fastcall ABI support.
2021-03-05 10:46:47 -08:00
Chris Fallin
6c94eb82aa x86-64 Windows fastcall ABI support.
This adds support for the "fastcall" ABI, which is the native C/C++ ABI
on Windows platforms on x86-64. It is similar to but not exactly like
System V; primarily, its argument register assignments are different,
and it requires stack shadow space.

Note that this also adjusts the handling of multi-register values in the
shared ABI implementation, and with this change, adjusts handling of
`i128`s on *both* Fastcall/x64 *and* SysV/x64 platforms. This was done
to align with actual behavior by the "rustc ABI" on both platforms, as
mapped out experimentally (Compiler Explorer link in comments). This
behavior is gated under the `enable_llvm_abi_extensions` flag.

Note also that this does *not* add x64 unwind info on Windows. That will
come in a future PR (but is planned!).
2021-03-03 19:53:18 -08:00
Chris Fallin
0f3e00b25e Fix some i128 shift-related bugs in x64 backend.
This fixes #2672 and #2679, and also fixes an incorrect instruction
emission (`test` with small immediate) that we had missed earlier.

The shift-related fixes have to do with (i) shifts by 0 bits, as a
special case that must be handled; and (ii) shifts by a 128-bit amount,
which we can handle by just dropping the upper half (we only use 3--7
bits of shift amount).

This adjusts the lowerings appropriately, and also adds run-tests to
ensure that the lowerings actually execute correctly (previously we only
had compile-tests with golden lowerings; I'd like to correct this for
more ops eventually, adding run-tests beyond what the Wasm spec and
frontend covers).
2021-02-23 14:22:04 -08:00
Chris Fallin
456561f431 x64 and aarch64: allow StructArgument and StructReturn args.
The StructReturn ABI is fairly simple at the codegen/isel level: we only
need to take care to return the sret pointer as one of the return values
if that wasn't specified in the initial function signature.

Struct arguments are a little more complex. A struct argument is stored
as a chunk of memory in the stack-args space. However, the CLIF
semantics are slightly special: on the caller side, the parameter passed
in is a pointer to an arbitrary memory block, and we must memcpy this
data to the on-stack struct-argument; and on the callee side, we provide
a pointer to the passed-in struct-argument as the CLIF block param
value.

This is necessary to support various ABIs other than Wasm, such as that
of Rust (with the cg_clif codegen backend).
2021-01-17 23:11:45 -08:00
Chris Fallin
71ead6e31d x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).

The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.

Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:

- fix REX prefix logic for some 8-bit instructions.

  When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
  semantics are somewhat subtle: without the REX prefix, register numbers
  4--7 correspond to the second-to-lowest byte of the first four registers
  (AH, CH, BH, DH), whereas with the REX prefix, these register numbers
  correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
  emit a REX byte for instructions with 8-bit cases (this is harmless even
  if unneeded), but this would unnecessarily inflate code size; instead,
  the usual approach is to emit it only for these registers.

  This logic was present in some cases but missing for some other
  instructions: divide, not, negate, shifts.

  Fixes #2508.

- avoid unaligned SSE loads on some f64 ops.

  The implementations of several FP ops, such as fabs/fneg, used SSE
  instructions. This is not a problem per-se, except that load-op merging
  did not take *alignment* into account. Specifically, if an op on an f64
  loaded from memory happened to merge that load, and the instruction into
  which it was merged was an SSE instruction, then the SSE instruction
  imposes stricter (128-bit) alignment requirements than the load.f64 did.

  This PR simply forces any instruction lowerings that could use SSE
  instructions to implement non-SIMD operations to take inputs in
  registers only, and avoid load-op merging.

  Fixes #2507.

- two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.

  - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
    the remainder in AH, not RDX, different from all the other width-forms
    of this instruction.

  - select.b1: we were not recognizing selects of boolean values as
    integer-typed operations, so we were generating XMM moves instead (!).
2021-01-14 13:45:50 -08:00