This was my first attempt at transitioning code to ISLE to originally
fix#3327 but that fix has since landed on `main`, so this is instead
now just porting a few operations to ISLE.
Closes#3336
On the build side, this commit introduces two things:
1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.
2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.
Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.
Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.
In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:
dst = src1 op src2
Rather than only the typical x86-64 2-operand form:
dst = dst op src
This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.
("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)
There are two motivations for this change:
1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
lowering to translate a CLIF expression that evaluates to some value into a
`MachInst` expression that evaluates to the same value. We want both the
lowering itself and the resulting `MachInst` to be pure and referentially
transparent. This is both a nice paradigm for compiler writers that are
authoring and maintaining lowering rules and is a prerequisite to any sort of
formal verification of our lowering rules in the future.
2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
be in SSA form.
This opcode was removed as part of the old-backend cleanup in #3446.
While this opcode will definitely go away eventually, it is
unfortunately still used today in Lucet (as we just discovered while
working to upgrade Lucet's pinned Cranelift version). Lucet is
deprecated and slated to eventually be completely sunset in favor of
Wasmtime; but until that happens, we need to keep this opcode.
This eagerly evaluates the `format!` and produces a `String` with a heap
allocation, regardless whether `foo` is `Some`/`Ok` or `None`/`Err`. Using
`foo.unwrap_or_else(|| panic!(...))` makes it so that the error message
formatting is only evaluated if `foo` is `None`/`Err`.
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
Fixes#2943, though not as optimally as may be desired. With x64 SIMD
instructions, the memory operand must be aligned--this change adds that
check. There are cases, however, where we can do better--see #3106.
Also, reorganize the AArch64-specific VCode instructions for unary
narrowing and widening vector operations, so that they are more
straightforward to use.
Copyright (c) 2021, Arm Limited.
As discussed in #3035, most backends have explicit
`unimplemented!(...)` match-arms for opcode lowering cases that are not
yet implemented; this allows the backend maintainer to easily see what
is not yet implemented, and avoiding a catch-all wildcard arm is less
error-prone as opcodes are added in the future.
However, the x64 backend was the exception: as @akirilov-arm pointed
out, it had a wildcard match arm. This fixes the issue by explicitly
listing all opcodes the x64 backend does not yet implement.
As per our tests, these opcodes are not used or need by Wasm lowering;
but, it is good to know that they exist, so that we can eventually
either support or remove them.
This was a good exercise for me as I wasn't aware of a few of these in
particular: e.g., aarch64 supports `bmask` while x64 does not, and there
isn't a good reason why x64 shouldn't, especially if others hope to use
Cranelift as a SIMD-capable general codegen in the future.
The `unimplemented!()` cases are separate from `panic!()` ones: my
convention here was to split out those that are logically just *missing*
from those that should be *impossible*, mostly due to expected removal
by legalization before we reach the lowering step.
There has been occasional confusion with the representation that we use
for bool-typed values in registers, at least when these are wider than
one bit. Does a `b8` store `true` as 1, or as all-ones (`0xff`)?
We've settled on the latter because of some use-cases where the wide
bool becomes a mask -- see #2058 for more on this.
This is fine, and transparent, to most operations within CLIF, because
the bool-typed value still has only two semantically-visible states,
namely `true` and `false`.
However, we have to be careful with bool-to-int conversions. `bint` on
aarch64 correctly masked the all-ones value down to 0 or 1, as required
by the instruction specification, but on x64 it did not. This PR fixes
that bug and makes x64 consistent with aarch64.
While staring at this code I realized that `bextend` was also not
consistent with the all-ones invariant: it should do a sign-extend, not
a zero-extend as it previously did. This is also rectified and tested.
(Aarch64 also already had this case implemented correctly.)
Fixes#3003.
* Add support for x64 packed promote low
* Add support for x64 packed floating point demote
* Update vector promote low and demote by adding constraints
Also does some renaming and minor refactoring
Previously, the multiple flags for certain AVX512 instructions were
checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ,
emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel
Software Developer Manual, vol. 1, sec. 15.4, has more information on
this (notable there is the suggestion to check with `XGETBV` that the OS
is allowing the use of the XMM registers--but that is a separate issue).
This change switches to `AND` logic in the new backend.
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
When AVX512VL and AVX512F are available, use a single instruction
(`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This
optimization is a port from the legacy x86 backend.
This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`,
and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used
by
[simple_preopt.rs](fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999))
to lower `bitselect` to a single x86 instruction when the condition mask
is known to be boolean (all 1s or 0s, e.g., from a conversion). This is
better than `bitselect` in general, which lowers to 4-5 instructions.
The old backend had the `vselect` lowering; this simply introduces it to
the new backend.
This adds the machinery to encode the VPMULLQ instruction which is
available in AVX512VL and AVX512DQ. When these feature sets are
available, we use this instruction instead of a lengthy 12-instruction
sequence.
Since the lowering of `imul` complicated the other ALU operations it was
matched with and since future commits will alter the multiplication
lowering further, this change moves the `imul` lowering to its own match
block.
* x64: add EVEX encoding mechanism
Also, includes an empty stub module for the VEX encoding.
* x64: lower abs.i64x2 to VPABSQ when available
* x64: refactor EVEX encodings to use `EvexInstruction`
This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked.
* x64: rename encoding CodeSink to ByteSink
This re-factoring replaces uses of `Inst::mov_r_m` with `Inst::store` to ensure there is only one code location to troubleshoot when generating store instructions for a specific type.
* Use stable Rust on CI to test the x64 backend
This commit leverages the newly-released 1.51.0 compiler to test the
new backend on Windows and Linux with a stable compiler instead of a
nightly compiler. This isolates the nightly build to just the nightly
documentation generation and fuzzing, both of which rely on nightly for
the best results right now.
* Use updated stable in book build job
* Run rustfmt for new stable
* Silence new warnings for wasi-nn
* Allow some dead code in the x64 backend
Looks like new rustc is better about emitting some dead-code warnings
* Update rust in peepmatic job
* Fix a test in the pooling allocator
* Remove `package.metdata.docs.rs` temporarily
Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first
* Fix a warning in a wasi-nn example
The codegen for div/rem ops has two modes, depending on the
`avoid_div_traps` flag: it can either do all checks for trapping
conditions explicitly, and use explicit trap instructions, then use a
hardware divide instruction that will not trap (`avoid_div_traps ==
true`); or it can run in a mode where a hardware FP fault on the divide
instruction implies a Wasm trap (`avoid_div_traps == false`). Wasmtime
uses the former while Lucet (for example) uses the latter.
It turns out that because we run all our spec tests run under Wasmtime,
we missed a spec corner case that fails in the latter: INT_MIN % -1 == 0
per the spec, but causes a trap with the x86 signed divide/remainder
instruction. Hence, in Lucet, this specific remainder computation would
incorrectly result in a Wasm trap.
This PR fixes the issue by just forcing use of the explicit-checks
implementation for `srem` even when `avoid_div_traps` is false.