* ensure that all const assignments are placed at the end of the sequence.
This minimises live ranges.
* for the non-const assignments, ignore self-assignments. This can
dramatically reduce the total number of moves generated, because any
self-assignments trigger the overlap-case handling, hence invoking the
double-copy behaviour in cases where it's not necessary.
It's worth pointing out that self-assignments are common, and are not due to
deficiencies in CLIR optimisation. Rather, they occur whenever a loop back
edge doesn't modify *all* loop-carried values. This can easily happen if
the loop has multiple "early" back-edges -- "continues" in C parlance. Eg:
loop_header(a, b, c, d, e, f):
...
a_new = ...
b_new = ...
if (..) goto loop_header(a_new, b_new, c, d, e, f)
...
c_new = ...
d_new = ...
if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f)
etc
For functions with many live values, this can dramatically reduce the number
of spill moves we throw into the register allocator.
In terms of compilation costs, this ranges from neutral for functions which
spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in
insn count.
In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness),
instruction counts are reduced by 4.3%, data reads by 12.3% and data writes
by 18.5%. Note those last two figures include all reads and writes made by the
generated code, not just spills/reloads, so the proportional reduction in
spill/reload traffic must be greater.
`EncCursor` is a variant of `Cursor` that allows updating CLIF while
keeping its encodings up to date, given a particular ISA. However, new
(MachInst) backends don't use the encodings, and the `TargetIsaAdapter`
shim will panic if any encoding-related method is called. This PR avoids
those panics.
Fixes#1809.
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.
In the `ModuleEnvironment::declare_signature` callback, also pass the original
Wasm function signature, so that consumers may associate this information with
each compiled function. This is often necessary because while each Wasm
signature gets compiled down into a single native signature, multiple Wasm
signatures might compile down into the same native signature, and in these cases
the original Wasm signature is required for dynamic type checking of calls.
About half of the `FuncEnvironment::translate_table_*` methods were using the
`TableIndex` newtype, while the other half were using raw `u32`s. This commit
makes everything use `TableIndex`.
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
* Encode vselect using BLEND instructions on x86
* Legalize vselect to bitselect
* Optimize bitselect to vselect for some operands
* Add run tests for bitselect-vselect optimization
* Address review feedback
* Ensure GlobalSet on vectors are cast to Cranelift's I8X16 type
This is a fix related to the decision to use Cranelift's I8X16 type to represent Wasm's V128--it requires casting to maintain type correctness. See https://github.com/bytecodealliance/wasmtime/issues/1147.
* Enable SIMD spec test: simd_lane.wast
I'm not actually sure that it's possible to write `#[test]` in a
`proc-macro` crate. Regardless I don't think it's too too conventional,
so let's disable this for now.
Closes#1775