The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
* Encode vselect using BLEND instructions on x86
* Legalize vselect to bitselect
* Optimize bitselect to vselect for some operands
* Add run tests for bitselect-vselect optimization
* Address review feedback
Certain operations (e.g. x86_packss) will have operands with types like `NxM` but will return results with types like `(N/2)x(M*2)` (halve the lane width, double the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::HalfWidth` then `DerivedFunction::DoubleVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, 1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)), I added `DerivedFunction::SplitLanes` to do the necessary type conversion.
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
- Add a `simple_legalize()` function that invokes a predetermined set of
legalizations, without depending on the details of the current
backend design. This will be used by the new backend pipeline.
- Separate out `has_side_effect()` from the DCE pass. This will be used
by the new backends' lowering code.
- Add documentation for the `Arm64Call` relocation type.
In cranelift x86 encodings, it seemed unintuitive to specialize Templates with both `infer_rex()`` and `w()`: if `w()` is specified, the REX.W bit must be set so a REX prefix is alway required--no need to infer it. This change forces us to write `rex().w()``--it's more explicit and shows more clearly what cranelift will emit. This change also modifies the tests that expected DynRex recipes.
- Convert recipes to have necessary size calculator
- Add a missing binemit function, `put_dynrexmp3`
- Modify the meta-encodings of x86 SIMD instructions to use `infer_rex()`, mostly through the `enc_both_inferred()` helper
- Fix up tests that previously always emitted a REX prefix
Until #1306 is resolved (some spilling/regalloc issue with larger FPR register banks), this removes FPR32 support. Only Wasm's `i64x2.mul` was using this register class and that instruction is predicated on AVX512 support; for the time being, that instruction will have to make do with the 16 FPR registers.
This patch updates or removes all references to the Cranelift repository. It affects links in README documents, issues that were transferred to the Wasmtime repository, CI badges, and a small bunch of sundry items.
The EVEX encoding format (e.g. in AVX-512) allows addressing 32 registers instead of 16. The FPR register class currently defines 16 registers, `%xmm0`-`%xmm15`; that class is kept as-is with this change. A larger class, FPR32, is added as a super-class of FPR using a larger bank of registers, `%xmm0`-`%xmm31`.
With this change, register banks can now be re-ordered and other components (e.g. unwinding, regalloc) will no longer break. The previous behavior assumed that GPR registers always started at `RegUnit` 0.