This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
When a load/store instruction needs an address of the form `v0 +
uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain `[rA, rB]` addressing mode. This patch extends `lower_address()`
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.
When we vendor Cranelift into Firefox, we need to be able to build with
the Firefox CI setup (unless we carry patches on top of upstream).
Unfortunately, the Firefox CI currently appears to build with a slightly
older version of Rust: I can't work out which version exactly, but one
without stable support for `matches!()`.
A recent attempt to version-bump Cranelift failed with build errors at
the two locations in this patch:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305994046&repo=autoland&lineNumber=24829
I also see a bunch of uses of `matches!()` in Peepmatic, but those
crates are not built by Firefox, so we can leave them be for now, I
think.
Adds support for addss and subss. This is the first lowering for
sse floating point alu and some move operations. The changes here do
some renaming of data structures and adds a couple of new ones
to support sse specific operations. The work done here will likely
evolve as needed to support an efficient, inituative, and consistent
framework.
* ensure that all const assignments are placed at the end of the sequence.
This minimises live ranges.
* for the non-const assignments, ignore self-assignments. This can
dramatically reduce the total number of moves generated, because any
self-assignments trigger the overlap-case handling, hence invoking the
double-copy behaviour in cases where it's not necessary.
It's worth pointing out that self-assignments are common, and are not due to
deficiencies in CLIR optimisation. Rather, they occur whenever a loop back
edge doesn't modify *all* loop-carried values. This can easily happen if
the loop has multiple "early" back-edges -- "continues" in C parlance. Eg:
loop_header(a, b, c, d, e, f):
...
a_new = ...
b_new = ...
if (..) goto loop_header(a_new, b_new, c, d, e, f)
...
c_new = ...
d_new = ...
if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f)
etc
For functions with many live values, this can dramatically reduce the number
of spill moves we throw into the register allocator.
In terms of compilation costs, this ranges from neutral for functions which
spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in
insn count.
In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness),
instruction counts are reduced by 4.3%, data reads by 12.3% and data writes
by 18.5%. Note those last two figures include all reads and writes made by the
generated code, not just spills/reloads, so the proportional reduction in
spill/reload traffic must be greater.
- Properly mask constant values down to appropriate width when
generating a constant value directly in aarch64 backend. This was a
miscompilation introduced in the new-isel refactor. In combination
with failure to respect NarrowValueMode, this resulted in a very
subtle bug when an `i32` constant was used in bit-twiddling logic.
- Add support for `iadd_ifcout` in aarch64 backend as used in explicit
heap-check mode. With this change, we no longer fail heap-related
tests with the huge-heap-region mode disabled.
- Remove a panic that was occurring in some tests that are currently
ignored on aarch64, by simply returning empty/default information in
`value_label` functionality rather than touching unimplemented APIs.
This is not a bugfix per-se, but removes confusing panic messages from
`cargo test` output that might otherwise mislead.
These libcalls are useful for 32-bit platforms.
On x86_32 in particular, commit 4ec16fa0 added support for legalizing
64-bit shifts through SIMD operations. However, that legalization
requires SIMD to be enabled and SSE 4.1 to be supported, which is not
acceptable as a hard requirement.
`EncCursor` is a variant of `Cursor` that allows updating CLIF while
keeping its encodings up to date, given a particular ISA. However, new
(MachInst) backends don't use the encodings, and the `TargetIsaAdapter`
shim will panic if any encoding-related method is called. This PR avoids
those panics.
Fixes#1809.
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.