Also fix and extend the current implementation:
- AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be
inverted first.
- Inputs to the cmp for the RMWLoop case are sign-extended when
needed.
- Lower Xchg to Swp.
- Lower Sub to Add with a negated input.
- Added more runtests.
Copyright (c) 2022, Arm Limited.
x64 backend: add lowerings with load-op-store fusion.
These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP
%reg, (mem)`) -- i.e., x86 instructions that load from memory, perform
an ALU operation, and store the result, all in one instruction. Using
these instruction forms, we can merge three CLIF ops together: a load,
an arithmetic operation, and a store.
Issue #3963 identified a miscompilation with select in which the second
in the pair of `CMOV`s (one pair per `i128` register) used the wrong
flag. This change fixes the error in the x64 ISLE helper function
emitting these `CMOV` instructions.
In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly.
This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases.
No security impact to Wasm as Wasm does not use narrow integer types.
Thanks @bjorn3 for reporting!
* x64: port `select` using an FP comparison to ISLE
This change includes quite a few interlocking parts, required mainly by
the current x64 conventions in ISLE:
- it adds a way to emit a `cmove` with multiple OR-ing conditions;
because x64 ISLE cannot currently safely emit a comparison followed
by several jumps, this adds `MachInst::CmoveOr` and
`MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro
instructions hide the multi-instruction sequence in `lower.isle`
- to properly keep track of what instructions consume and produce
flags, @cfallin added a way to pass around variants of
`ConsumesFlags` and `ProducesFlags`--these changes affect all
backends
- then, to lower the `fcmp + select` CLIF, this change adds several
`cmove*_from_values` helpers that perform all of the awkward
conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one
upside is that now these lowerings have much-improved documentation
explaining why the various `FloatCC` and `CC` choices are made the
the way they are.
Co-authored-by: Chris Fallin <chris@cfallin.org>
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
This adds support for all atomic operations that were unimplemented
so far in the s390x back end:
- atomic_rmw operations xchg, nand, smin, smax, umin, umax
- $I8 and $I16 versions of atomic_rmw and atomic_cas
- little endian versions of atomic_rmw and atomic_cas
All of these have to be implemented by a compare-and-swap loop;
and for the $I8 and $I16 versions the actual atomic instruction
needs to operate on the surrounding aligned 32-bit word.
Since we cannot emit new control flow during ISLE instruction
selection, these compare-and-swap loops are emitted as a single
meta-instruction to be expanded at emit time.
However, since there is a large number of different versions of
the loop required to implement all the above operations, I've
implemented a facility to allow specifying the loop bodies
from within ISLE after all, by creating a vector of MInst
structures that will be emitted as part of the meta-instruction.
There are still restrictions, in particular instructions that
are part of the loop body may not modify any virtual register.
But even so, this approach looks preferable to doing everything
in emit.rs.
A few instructions needed in those compare-and-swap loop bodies
were added as well, in particular the RxSBG family of instructions
as well as the LOAD REVERSED in-register byte-swap instructions.
This patch also adds filetest runtests to verify the semantics
of all operations, in particular the subword and little-endian
variants (those are currently only executed on s390x).
This commit migrates these existing instructions to ISLE from the manual
lowerings implemented today. This was mostly straightforward but while I
was at it I fixed what appeared to be broken translations for I{8,16}
for `clz`, `cls`, and `ctz`. Previously the lowerings would produce
results as-if the input was 32-bits, but now I believe they all
correctly account for the bit-width.
Uncovered by @bjorn3 (thanks!): 8- and 16-bit rotates were not working
properly in recent versions of Cranelift with part of the lowering
migrated to ISLE.
This PR fixes a few issues:
- 8- and 16-bit rotate-left needs to mask a constant amount, if any,
because we use a 32-bit rotate instruction and so don't get the
appropriate shift-amount masking for free from x86 semantics.
- `operand_size_from_type` was incorrect: it only handled 32- and 64-bit
types and silently returned `OperandSize::Size32` for everything else.
Now uses the `OperandSize::from_ty(ty)` helper as the pre-ISLE code
did.
Our test coverage for narrow value types is not great; this PR adds some
runtests for rotl/rotr but more would always be better!
Cranelift shuffles require indices to be in-bounds, which the
avx512-using backend also requires via a debug assert, so this commit
fixes a test with simd shuffles to only use in-bounds indices.
This is motivated by another failure on CI where the machine we were
running on presumably had avx512 things enabled. This should fix those
failures.
Closes#3581
Implemented `SqmulRoundSat` for the Cranelift interpreter, performing
QN-format fixed point multiplication for 16 and 32-bit integers in
SIMD vectors.
Copyright (c) 2021, Arm Limited
- Add relocation handling needed after PR #3275
- Fix incorrect handling of signed constants detected by PR #3056 test
- Fix LabelUse max pos/neg ranges; fix overflow in buffers.rs
- Disable fuzzing tests that require pre-built v8 binaries
- Disable cranelift test that depends on i128
- Temporarily disable memory64 tests
Implemented `Smulhi` for the Cranelift interpreter, performing signed
integer multiplication and producing the high half of a double-length
result.
Copyright (c) 2021, Arm Limited
Implemented `WideningPairwiseDotProductS` to perform sign-extending
length-doubling multiplication on corresponding elements from two
`i16x8` SIMD vectors, performing a pairwise add on the results (thus
returning `i32x4`).
Copyright (c) 2021, Arm Limited