Change the implementation of emitted_insts in IsleContext from
a plain vector of instructions into a vector of tuples, where
the second element is a boolean that indicates whether this
instruction should be emitted as a safepoint.
This allows targets to emit safepoint insns via ISLE.
Attempt to match a Jump instruction in ISLE will currently lead to the
generated files not compiling. This is because the definition of the
InstructionData enum in clif.isle does not match the actual type used
in Rust code.
Specifically, clif.isle erroneously omits the ValueList variable-length
argument entry if the format does not use a typevar operand. This is
the case for Jump and a few other formats. The problem is caused by
a bug in the gen_isle routine in meta/src/gen_inst.rs.
If a block is marked cold but has side-effect-free code that is only
used by side-effectful code in non-cold blocks, we will erroneously fail
to emit it, causing a regalloc failure.
This is due to the interaction of block ordering and lowering: we rely
on block ordering to visit uses before defs (except for backedges) so
that we can effectively do an inline liveness analysis and skip lowering
operations that are not used anywhere. This "inline DCE" is needed
because instruction lowering can pattern-match and merge one instruction
into another, removing the need to generate the source instruction.
Unfortunately, the way that I added cold-block support in #3698 was
oblivious to this -- it just changed the block sort order. For
efficiency reasons, we generate code in its final order directly, so it
would not be tenable to generate it in e.g. RPO first and then reorder
cold blocks to the bottom; we really do want to visit in the same order
as the final code.
This PR fixes the bug by moving the point at which cold blocks are sunk
to emission-time instead. This is cheaper than either trying to visit
blocks during lowering in RPO but add to VCode out-of-order, or trying
to do some expensive analysis to recover proper liveness. It's not clear
that the latter would be possible anyway -- the need to lower some
instructions depends on other instructions' isel results/merging
success, so we really do need to visit in RPO, and we can't simply lower
all instructions as side-effecting roots (some can't be toplevel nodes).
The one downside of this approach is that the VCode itself still has
cold blocks inline; so in the text format (and hence compile-tests) it's
not possible to see the sinking. This PR adds a test for cold-block
sinking that actually verifies the machine code. (The test also includes
an add-instruction in the cold path that would have been incorrectly
skipped prior to this fix.)
Fortunately this bug would not have been triggered by the one current
use of cold blocks in #3699, because there the only operation in the
cold block was an (always effectful) call instruction. The worst-case
effect of the bug in other code would be a regalloc panic; no silent
miscompilations could result.
This adds ISLE support for the s390x back-end and moves lowering
of most instructions to ISLE. The only instructions still remaining
are calls, returns, traps, and branches, most of which will need
additional support in common code.
Generated code is not intended to be (significantly) different
than before; any additional optimizations now made easier to
implement due to the ISLE layer can be added in follow-on patches.
There were a few differences in some filetests, but those are all
just simple register allocation changes (and all to the better!).
In preparing to move the s390x back-end to ISLE, I noticed a few
missing pieces in the common prelude code. This patch:
- Defines the reference types $R32 / $R64.
- Provides a trap_code_bad_conversion_to_integer helper.
- Provides an avoid_div_traps helper. This requires passing the
generic flags in addition to the ISA-specifc flags into the
ISLE lowering context.
In preparing the back-end to move to ISLE, I detected a
number of codegen bugs in the existing code, which are
fixed here:
- Fix internal compiler error with uload16/icmp corner case.
- Fix broken Cls lowering.
- Correctly mask shift count for i8/i16 shifts.
In addition, I made several changes to operand encodings
in various MInst patterns. These should not have any
functional effect, but will make the ISLE migration easier:
- Encode floating-point constants as u32/u64 in MInst patterns.
- Encode shift amounts as u8 and Reg in ShiftOp pattern.
- Use MemArg in LoadMultiple64 and StoreMultiple64 patterns.
This commit migrates these existing instructions to ISLE from the manual
lowerings implemented today. This was mostly straightforward but while I
was at it I fixed what appeared to be broken translations for I{8,16}
for `clz`, `cls`, and `ctz`. Previously the lowerings would produce
results as-if the input was 32-bits, but now I believe they all
correctly account for the bit-width.
This patch makes spillslot allocation, spilling and reloading all based
on register class only. Hence when we have a 32- or 64-bit value in a
128-bit XMM register on x86-64 or vector register on aarch64, this
results in larger spillslots and spills/restores.
Why make this change, if it results in less efficient stack-frame usage?
Simply put, it is safer: there is always a risk when allocating
spillslots or spilling/reloading that we get the wrong type and make the
spillslot or the store/load too small. This was one contributing factor
to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that
puns an arbitrary user-controlled vector constant over another
stackslot. (If this were a pointer, that could result in RCE. SIMD is
not yet on by default in a release, fortunately.
In particular, we have not been particularly careful about using moves
between values of different types, for example with `raw_bitcast` or
with certain SIMD operations, and such moves indicate to regalloc.rs
that vregs are in equivalence classes and some arbitrary vreg in the
class is provided when allocating the spillslot or spilling/reloading.
Since regalloc.rs does not track actual type, and since we haven't been
careful about moves, we can't really trust this "arbitrary vreg in
equivalence class" to provide accurate type information.
In the fix to CVE-2021-32629 we fixed this for integer registers by
always spilling/reloading 64 bits; this fix can be seen as the analogous
change for FP/vector regs.
* aarch64: Use smaller instruction helpers in ISLE
This commit moves the aarch64 backend's ISLE to be more similar to the
x64 backend's ISLE where one-liner instruction builders are used for
various forms of instructions instead of always using the
constructor-per-variant-of-`Inst`. Overall I think this change worked
out quite well and sets up some naming idioms as well for various forms
of instructions.
* rebase conflict
Fixes#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.
One option is to remove the hash check under certain circumstances, as
implemented in #3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).
This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
This commit translates the `rotl` and `rotr` lowerings already existing
to ISLE. The port was relatively straightforward with the biggest
changing being the instructions generated around i128 rotl/rotr
primarily due to register changes.
* aarch64: Migrate ishl/ushr/sshr to ISLE
This commit migrates the `ishl`, `ushr`, and `sshr` instructions to
ISLE. These involve special cases for almost all types of integers
(including vectors) and helper functions for the i128 lowerings since
the i128 lowerings look to be used for other instructions as well. This
doesn't delete the i128 lowerings in the Rust code just yet because
they're still used by Rust lowerings, but they should be deletable in
due time once those lowerings are translated to ISLE.
* Use more descriptive names for i128 lowerings
* Use a with_flags-lookalike for csel
* Use existing `with_flags_*`
* Coment backwards order
* Update generated code