As @MaxGraey pointed out (thanks!) in #4397, `round` has different
behavior from `nearest`. And it looks like the native rust
implementation is still pending stabilization.
Right now we duplicate the wasmtime implementation, merged in #2171.
However, we definitely should switch to the rust native version
when it is available.
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.
Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.
The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.
Copyright (c) 2022, Arm Limited.
`fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0.
This is not how the IEEE754 views these values and the interpreter was
returning the wrong value in these operations since it was just using the
standard IEEE754 comparisons.
This also tries to preserve NaN information by avoiding passing NaN's
through any operation that could canonicalize it.
* cranelift: Implement `fma` on interpreter
* cranelift: Implement `fabs` on interpreter
* cranelift: Fix `fneg` implementation on interpreter
`fneg` was implemented as `0 - x` which is not correct according to the
standard since that operation makes no guarantees on what the output
is when the input is `NaN`. However for `fneg` the output for `NaN`
inputs is fully defined.
* cranelift: Implement `fcopysign` on interpreter
* cranelift: Fix `bint` implementation on interpreter
The interpreter was returning -1 instead of 1 for positive values.
This also extends the bint test suite to cover all types.
* cranelift: Restrict `bint` to scalar values only
This change removes all variants of `load*_complex` and `store*_complex`
from Cranelift; this is a breaking change to the instructions exposed by
CLIF. The complete list of instructions removed is: `load_complex`,
`store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`,
`sload8_complex`, `uload16_complex`, `sload16_complex`,
`istore16_complex`, `uload32_complex`, `sload32_complex`,
`istore32_complex`, `uload8x8_complex`, `sload8x8_complex`,
`sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`,
`sload32x2_complex`.
The rationale for this removal is that the Cranelift backend now has the
ability to pattern-match multiple upstream additions in order to
calculate the address to access. Previously, this was not possible so
the `*_complex` instructions were needed. Over time, these instructions
have fallen out of use in this repository, making the additional
overhead of maintaining them a chore.
- `extractlanes` will now function on a scalar value, returning the
value as a single-element array.
- `vectorizelanes` will accept a single-element array, returning the
contained value.
Existing `if !x.is_vector()` code-patterns have been simplified as a
result.
Copyright (c) 2022 Arm Limited
This opcode was removed as part of the old-backend cleanup in #3446.
While this opcode will definitely go away eventually, it is
unfortunately still used today in Lucet (as we just discovered while
working to upgrade Lucet's pinned Cranelift version). Lucet is
deprecated and slated to eventually be completely sunset in favor of
Wasmtime; but until that happens, we need to keep this opcode.
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
Implemented `SqmulRoundSat` for the Cranelift interpreter, performing
QN-format fixed point multiplication for 16 and 32-bit integers in
SIMD vectors.
Copyright (c) 2021, Arm Limited
Implemented `Smulhi` for the Cranelift interpreter, performing signed
integer multiplication and producing the high half of a double-length
result.
Copyright (c) 2021, Arm Limited
Implemented `WideningPairwiseDotProductS` to perform sign-extending
length-doubling multiplication on corresponding elements from two
`i16x8` SIMD vectors, performing a pairwise add on the results (thus
returning `i32x4`).
Copyright (c) 2021, Arm Limited
Implemented the following Opcodes for the Cranelift interpreter:
- `Unarrow` to combine two SIMD vectors into a new vector with twice
the lanes but half the width, with signed inputs which are clamped to
`0x00`.
- `Uunarrow` to perform the same operation as `Unarrow` but treating
inputs as unsigned.
- `Snarrow` to perform the same operation as `Unarrow` but treating
both inputs and outputs as signed, and saturating accordingly.
Note that all 3 instructions saturate at the type boundaries.
Copyright (c) 2021, Arm Limited
Implemented the following Opcodes for the Cranelift interpreter:
- `VhighBits` to reduce a vector to a scalar integer formed by
concatenating the MSB of each lane.
- `Vselect` to select lanes from two vectors controlled by a boolean
vector.
Copyright (c) 2021, Arm Limited
Implemented `SwidenLow` and `SwidenHigh` for the Cranelift interpreter,
doubling the width and halving the number of lanes preserving the low
and high halves respectively.
Conversions are performed using signed extension.
Copyright (c) 2021, Arm Limited