`fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0.
This is not how the IEEE754 views these values and the interpreter was
returning the wrong value in these operations since it was just using the
standard IEEE754 comparisons.
This also tries to preserve NaN information by avoiding passing NaN's
through any operation that could canonicalize it.
* cranelift: Implement `fma` on interpreter
* cranelift: Implement `fabs` on interpreter
* cranelift: Fix `fneg` implementation on interpreter
`fneg` was implemented as `0 - x` which is not correct according to the
standard since that operation makes no guarantees on what the output
is when the input is `NaN`. However for `fneg` the output for `NaN`
inputs is fully defined.
* cranelift: Implement `fcopysign` on interpreter
* cranelift: Fix `bint` implementation on interpreter
The interpreter was returning -1 instead of 1 for positive values.
This also extends the bint test suite to cover all types.
* cranelift: Restrict `bint` to scalar values only
This change removes all variants of `load*_complex` and `store*_complex`
from Cranelift; this is a breaking change to the instructions exposed by
CLIF. The complete list of instructions removed is: `load_complex`,
`store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`,
`sload8_complex`, `uload16_complex`, `sload16_complex`,
`istore16_complex`, `uload32_complex`, `sload32_complex`,
`istore32_complex`, `uload8x8_complex`, `sload8x8_complex`,
`sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`,
`sload32x2_complex`.
The rationale for this removal is that the Cranelift backend now has the
ability to pattern-match multiple upstream additions in order to
calculate the address to access. Previously, this was not possible so
the `*_complex` instructions were needed. Over time, these instructions
have fallen out of use in this repository, making the additional
overhead of maintaining them a chore.
- `extractlanes` will now function on a scalar value, returning the
value as a single-element array.
- `vectorizelanes` will accept a single-element array, returning the
contained value.
Existing `if !x.is_vector()` code-patterns have been simplified as a
result.
Copyright (c) 2022 Arm Limited
This opcode was removed as part of the old-backend cleanup in #3446.
While this opcode will definitely go away eventually, it is
unfortunately still used today in Lucet (as we just discovered while
working to upgrade Lucet's pinned Cranelift version). Lucet is
deprecated and slated to eventually be completely sunset in favor of
Wasmtime; but until that happens, we need to keep this opcode.
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
Implemented `SqmulRoundSat` for the Cranelift interpreter, performing
QN-format fixed point multiplication for 16 and 32-bit integers in
SIMD vectors.
Copyright (c) 2021, Arm Limited
Implemented `Smulhi` for the Cranelift interpreter, performing signed
integer multiplication and producing the high half of a double-length
result.
Copyright (c) 2021, Arm Limited
Implemented `WideningPairwiseDotProductS` to perform sign-extending
length-doubling multiplication on corresponding elements from two
`i16x8` SIMD vectors, performing a pairwise add on the results (thus
returning `i32x4`).
Copyright (c) 2021, Arm Limited
Implemented the following Opcodes for the Cranelift interpreter:
- `Unarrow` to combine two SIMD vectors into a new vector with twice
the lanes but half the width, with signed inputs which are clamped to
`0x00`.
- `Uunarrow` to perform the same operation as `Unarrow` but treating
inputs as unsigned.
- `Snarrow` to perform the same operation as `Unarrow` but treating
both inputs and outputs as signed, and saturating accordingly.
Note that all 3 instructions saturate at the type boundaries.
Copyright (c) 2021, Arm Limited
Implemented the following Opcodes for the Cranelift interpreter:
- `VhighBits` to reduce a vector to a scalar integer formed by
concatenating the MSB of each lane.
- `Vselect` to select lanes from two vectors controlled by a boolean
vector.
Copyright (c) 2021, Arm Limited
Implemented `SwidenLow` and `SwidenHigh` for the Cranelift interpreter,
doubling the width and halving the number of lanes preserving the low
and high halves respectively.
Conversions are performed using signed extension.
Copyright (c) 2021, Arm Limited
Implemented `UwidenLow` and `UwidenHigh` for the Cranelift interpreter,
doubling the width and halving the number of lanes preserving the low
and high halves respectively. Conversions are performed using unsigned
zero extension.
Copyright (c) 2021, Arm Limited
Implemented `Shuffle` for the Cranelift interpreter, to shuffle two SIMD
vectors together based on an immediate mask of 16 bytes.
Copyright (c) 2021, Arm Limited
Implemented for the Cranelift interpreter:
- `Bitrev` to reverse the order of the bits in an integer.
- `Cls` to count the leading bits which are the same as the sign bit in
an integer, yielding one less than the size of the integer for 0 and -1.
- `Clz` to count the number of leading zeros in the bitwise representation of the
integer.
- `Ctz` to count the number of trailing zeros in the bitwise representation of the
integer.
- `Popcnt` to count the number of ones in the bitwise representation of the
integer.
Copyright (c) 2021, Arm Limited
* Implement `Swizzle` and `Splat` for interpreter
Implemented for the Cranelift interpreter:
- `Swizzle` to shuffle an `i8x16` SIMD vector based
on the indices specified in another vector of the same size.
- `Splat` to create a SIMD vector with all lanes having the same value.
Copyright (c) 2021, Arm Limited
* Fix old x86 backend failing test
Copyright (c) 2021, Arm Limited
* Represent i16x8 and above as hex
Copyright (c) 2021, Arm Limited
* cranelift: Implement ZeroExtend for a bunch of types in interpreter
* cranelift: Implement VConst on interpreter
* cranelift: Implement VallTrue on interpreter
* cranelift: Implement VanyTrue on interpreter
* cranelift: Mark `v{all,any}_true` tests as machinst only
* cranelift: Disable `vany_true` tests on aarch64
The `b64x2` case produces an illegal instruction. See #3305