* cranelift: Reorganize test suite
Group some SIMD operations by instruction.
* cranelift: Deduplicate some shift tests
Also, new tests with the mod behaviour
* aarch64: Lower shifts with mod behaviour
* x64: Lower shifts with mod behaviour
* wasmtime: Don't mask SIMD shifts
* x64: Add VEX Instruction Encoder
This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.
* x64: Add FMA Flag
* x64: Implement SIMD `fma`
* x64: Use 4 register Vex Inst
* x64: Reorder VEX pretty print args
* x64: port `atomic_rmw` to ISLE
This change ports `atomic_rmw` to ISLE for the x64 backend. It does not
change the lowering in any way, though it seems possible that the fixed
regs need not be as fixed and that there are opportunities for single
instruction lowerings. It does rename `inst_common::AtomicRmwOp` to
`MachAtomicRmwOp` to disambiguate with the IR enum with the same name.
* x64: remove remaining hardcoded register constraints for `atomic_rmw`
* x64: use `SyntheticAmode` in `AtomicRmwSeq`
* review: add missing reg collector for amode
* review: collect memory registers in the 'late' phase
x64 backend: add lowerings with load-op-store fusion.
These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP
%reg, (mem)`) -- i.e., x86 instructions that load from memory, perform
an ALU operation, and store the result, all in one instruction. Using
these instruction forms, we can merge three CLIF ops together: a load,
an arithmetic operation, and a store.
The recent work in #4061 introduced a notion of "unique uses" for CLIF
values that both simplified the load-op merging rules and allowed
loads to merge in some more places.
Unfortunately there's one factor that PR didn't account for: a unique
use at the CLIF level could become a multiple-use at the VCode level,
when a lowering uses a value multiple times!
Making this less error-prone in general is hard, because we don't know
the lowering in VCode until it's emitted, so we can't ahead-of-time
know that a value will be used multiple times and prevent its
merging. But we *can* know in the lowerings themselves when we're
doing this. At least we get a panic from regalloc when we get this
wrong; no bad code (uninitialized register being read) should ever
come from a backend bug like this.
This is still a bit less than ideal, but for now the fix is: in
`cmp_and_choose` in the x64 backend (which compares values, then
picks one or the other with a cmove), explicitly put values in
registers.
Fixes#4067 (thanks @Mrmaxmeier for the report!).
* Cranelift: fix#3953: rework single/multiple-use logic in lowering.
This PR addresses the longstanding issue with loads trying to merge
into compares on x86-64, and more generally, with the lowering
framework falsely recognizing "single uses" of one op by
another (which would normally allow merging of side-effecting ops like
loads) when there is *indirect* duplication.
To fix this, we replace the direct `value_uses` count with a
transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how
a `&mut` downgrades to `&` when accessed through another `&`!). A
value is used multiple times transitively if it has multiple direct
uses, or is used by another op that is used multiple times
transitively.
The canonical example of badness is:
```
v1 := load
v2 := ifcmp v1, ...
v3 := selectif v2, ...
v4 := selectif v2, ...
```
both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even
though the use of `v1` is "unique", it is codegenned twice. This is
why we ~~can't have nice things~~ can't merge loads into
compares (#3953).
There is quite a subtle and interesting design space around this
problem and how we might solve it. See the long doc-comment on
`ValueUseState` in this PR for more justification for the particular
design here. In particular, this design deliberately simplifies a bit
relative to an "optimal" solution: some uses can *become* unique
depending on merging, but we don't design our data structures for such
updates because that would require significant extra costly
tracking (some sort of transitive refcounting). For example, in the
above, if `selectif` somehow did not merge `ifcmp`, then we would only
codegen the `ifcmp` once into its result register (and use that
register twice); then the load *is* uniquely used, and could be
merged. But that requires transitioning from "multiple use" back to
"unique use" with careful tracking as we do pattern-matching, which
I've chosen to make out-of-scope here for now. In practice, I don't
think it will matter too much (and we can always improve later).
With this PR, we can now re-enable load-op merging for compares. A
subsequent commit does this.
* Update x64 backend to allow load-op merging for `cmp`.
* Update filetests.
* Add test for cmp-mem merging on x64.
* Comment fixes.
* Rework ValueUseState analysis for better performance.
* Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation)
* Address review comments.
This PR switches Cranelift over to the new register allocator, regalloc2.
See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.
Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:
```
Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A
```
This change moves the majority of the lowerings for CLIF's `load`
instruction over to ISLE. To do so, it also migrates the previous
mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE
rules (see `to_amode`).
Previous changes had ported the difficult "`select` based on an `fcmp`"
patterns to ISLE; this completes porting of `select` by moving over the
final two kinds of patterns:
- `select` based on an `icmp`
- `select` based on a value
* x64: port scalar `fcmp` to ISLE
Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new
type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`.
* isle: rename `vec128` to `ty_vec12`
This refactoring changes the name of the `vec128` matcher function to
follow the `ty_*` convention of the other type matchers. It also makes
the helper an inline function call.
* x64: port vector `fcmp` to ISLE
Fuzz testing identified a lowering case for CLIF's `icmp` in which the
double use of a loaded operand resulted in a register allocation error.
This change manually adds `put_in_xmm` to avoid load-coalescing these
values and includes a CLIF filetest to trigger this issue. Closes#3951.
I opened #3953 to discuss a way in which this kind of mistake (i.e.,
forgetting to add `put_in_*` in certain situations) could be avoided.
This change is refactoring only--it should have no logic changes. As
discussed previously, prefixing all machine code instructions with
`x64_` will make it easier to identify what parts of the ISLE code
correspond to single instructions and what parts rely on helpers that
may emit more than one instruction.
Previously, we used the flags of `AND` for `SETcc`. This change uses
`TEST` instead, which discards the AND result but sets the flags needed
for `SETcc`. This reduces register pressure slightly for this sequence.
* x64: port GPR-held `icmp` to ISLE
* x64: port equality `icmp` for i128 type
* x64: port `icmp` for vector types
* x64: rename from_intcc to intcc_to_cc
In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly.
This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases.
No security impact to Wasm as Wasm does not use narrow integer types.
Thanks @bjorn3 for reporting!
This changes the output of the `lower` constructor from a
`ValueRegs` to a new `InstOutput` type, which is a vector
of `ValueRegs`.
Code in `lower_common` is updated to use this new type to
handle instructions with multiple outputs. All back-ends
are updated to use the new type.
This PR makes use of the new implicit-conversion feature of the ISLE DSL
that was introduced in #3807 in order to make the lowering rules
significantly simpler and more concise.
The basic idea is to eliminate the repetitive and mechanical use of
terms that convert from one type to another when there is only one real
way to do the conversion -- for example, to go from a `WritableReg` to a
`Reg`, the only sensible way is to use `writable_reg_to_reg`.
This PR generally takes any term of the form "A_to_B" and makes it an
automatic conversion, as well as some others that are similar in spirit.
The notable exception to the pure-value-convsion category is the
`put_in_reg` family of operations, which actually do have side-effects.
However, as noted in the doc additions in #3807, this is fine as long as
the side-effects are idempotent. And on balance, making `put_in_reg`
automatic is a significant clarity win -- together with other operand
converters, it enables rules like:
```
;; Add two registers.
(rule (lower (has_type (fits_in_64 ty)
(iadd x y)))
(add ty x y))
```
There may be other converters that we could define to make the rules
even simpler; we can make such improvements as we think of them, but
this should be a good start!
* x64: port `select` using an FP comparison to ISLE
This change includes quite a few interlocking parts, required mainly by
the current x64 conventions in ISLE:
- it adds a way to emit a `cmove` with multiple OR-ing conditions;
because x64 ISLE cannot currently safely emit a comparison followed
by several jumps, this adds `MachInst::CmoveOr` and
`MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro
instructions hide the multi-instruction sequence in `lower.isle`
- to properly keep track of what instructions consume and produce
flags, @cfallin added a way to pass around variants of
`ConsumesFlags` and `ProducesFlags`--these changes affect all
backends
- then, to lower the `fcmp + select` CLIF, this change adds several
`cmove*_from_values` helpers that perform all of the awkward
conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one
upside is that now these lowerings have much-improved documentation
explaining why the various `FloatCC` and `CC` choices are made the
the way they are.
Co-authored-by: Chris Fallin <chris@cfallin.org>
We already defined the `Gpr` newtype and used it in a few places, and we already
defined the `Xmm` newtype and used it extensively. This finishes the transition
to using the newtypes extensively in lowering by making use of `Gpr` in more
places.
Fixes#3685
This primary motivation of this large commit (apologies for its size!) is to
introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch
difficult-to-diagnose register class mixup bugs in x64 lowerings.
But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of
our operand-with-wrong-register-class bugs, because about 50% of operands on x64
aren't just a register, but a register or memory address or even an
immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well.
Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit
noisier to work with from ISLE. They need to maintain the invariant that their
registers really are of the claimed register class, so they need to encapsulate
the inner data. If they exposed the underlying `enum` variants, then anyone
could just change register classes or construct a `GprMem` that holds an XMM
register, defeating the whole point of these newtypes. So when working with
these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem
my_gpr)` instead of `(GprMem.Gpr my_gpr)`.
A bit of extra lines of code are included to add support for register mapping
for all of these newtypes as well. Ultimately this is all a bit wordier than I'd
hoped it would be when I first started authoring this commit, but I think it is
all worth it nonetheless!
In the process of adding these newtypes, I didn't want to have to update both
the ISLE `extern` type definition of `MInst` and the Rust definition, so I move
the definition fully into ISLE, similar as aarch64.
Finally, this process isn't complete. I've introduced the newtypes here, and
I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as
register class-converting instructions, but I haven't moved all of the GPR-using
instructions over to the newtypes yet. I figured this commit was big enough as
it was, and I can continue the adoption of these newtypes in follow up commits.
Part of #3685.