This fixes a build issue with the most recent version of `libfuzzer-sys`
where the prior version of the `fuzz_target!` macro didn't need the
crate as an explicit dependency but the latest version does.
* Remove register class from `SpillSlot`
The register allocator was already allowing moves between spillslots and
registers of different classes, so this PR formalizes this by making
spillslots independent of register class.
This also fixes#79 by properly tracking the register class of an
`InsertedMove` with the `to_vreg` field which turns out to never
be `None` in practice. Removing the `Option` allows the register
class of the `VReg` to be used when building the per-class move lists.
Fixes#79
* Address review feedback
Fixed stack slots are treated as `PReg`s by most of the register
allocator, but need some additional handling the move resolver to avoid
generating stack-to-stack moves.
This allows a non-allocatable `PReg` to be passed on directly to the
allocations vector without any liverange tracking from the register
allocator. The main intended use case is to support ISA-specific special
registers such as a fixed zero register.
* Remove unused regalloc2-test crate
This code doesn't build, and Chris says it's "a really old harness that
existed prior to building the fuzzing and was used mainly to profile and
get stats before integration with Cranelift".
* Re-export libfuzzer/arbitrary from fuzzing module
This avoids needing to keep dependencies on `arbitrary` in sync across
the three different Cargo.toml files in this project.
However, before version 0.4.2, libfuzzer-sys only supported using its
macros if it was available at the top-level `libfuzzer_sys` path, which
breaks when re-exporting it. So I'm upgrading to that version (or the
newest patch release of it).
Upgrading libfuzzer-sys in turn brings in the 1.0 release of the
arbitrary crate, with a minor API change along the way.
Currently, regalloc2 sets aside one register per class, unconditionally,
to make move resolution possible. To solve the "parallel moves problem",
we sometimes need to conjure a cyclic permutation of data among
registers or stack slots (this can result, for example, from blockparam
flow that swaps two values on a loop backedge). This set-aside scratch
register is used when a cycle exists.
regalloc2 also uses the scratch register when needed to break down a
stack-to-stack move (which could happen due to blockparam moves on edges
when source and destination are both spilled) into a stack-to-reg move
followed by reg-to-stack, because most machines have loads and stores
but not memory-to-memory moves.
A set-aside register is certainly the simplest solution, but it is not
optimal: it means that we have one fewer register available for use by
the program, and this can be costly especially on machines with fewer
registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some
registers may be set aside by our embedder for other purposes too. Every
register we can reclaim is some nontrivial performance in large function
bodies!
This PR removes this restriction and allows regalloc2 to use all
available physical registers. It then solves the two problems above,
cyclic moves and stack-to-stack moves, with a two-stage approach:
- First, it finds a location to use to resolve cycles, if any exist. If
a register is unallocated at the location of the move, we can use it.
Often we get lucky and this is the case. Otherwise, we allocate a
stackslot to use as the temp. This is perfectly fine at this stage,
even if it means that we have more stack-to-stack moves.
- Then, it resolves stack-to-stack moves into stack-to-reg /
reg-to-stack. There are two subcases here. If there is *another*
available free physical register, we opportunistically use it for this
decomposition. If not, we fall back to our last-ditch option: we pick
a victim register of the appropriate class, we allocate another
temporary stackslot, we spill the victim to that slot just for this
move, we do the move in the above way (stack-to-reg / reg-to-stack)
with the victim, then we reload the victim. So one move (original
stack-to-stack) becomes four moves, but no state is clobbered.
This PR extends the `moves` fuzz-target to exercise this functionality
as well, randomly choosing for some spare registers to exist or not, and
randomly generating {stack,reg}-to-{stack,reg} moves in the initial
parallel-move input set. The target does a simple symbolic simulation of
the sequential move sequence and ensures that the final state is
equivalent to the parallel-move semantics.
I fuzzed both the `moves` target, focusing on the new logic; as well as
the `ion_checker` target, checking the whole register allocator, and
both seem clean (~150M cases on the former, ~1M cases on the latter).
The checker was built to validate programs produced by the fuzzing
testcase generator, which was built before regalloc2 supported special
handling of moves. (In a pure-SSA world, move elision is not needed,
because moves are not needed, and blockparams are the only way of
tying together vregs.)
Due to this, the checker works great for our independent regalloc2
fuzzing setup, but when used on regalloc inputs produced by Cranelift,
cannot prove correctness.
This PR extends the checker's analysis to properly handle "program
moves", which are distinct from regalloc-inserted moves in that they
are present in the original program and hence are semantically
relevant. A program move edits all sets of symbolic vregs at all
allocs, and where the source vreg appears, it inserts the dest vreg as
well. (It also removes the dest vreg from all other sets, since the
old value becomes stale, as is done for other defs.)
Given this, and given some additional checking for moves to/from
pinned vregs, the checker can now be used to fully validate
Cranelift-sourced regalloc2 invocations.
Also requires some metadata in edit output to properly hook up the
checker in regalloc.rs to track user-moves without seeing the original
insts with operands.
The main enhancement in this commit is support for reference types and
stackmaps. This requires tracking whether each VReg is a "reference" or
"pointer". At certain instructions designated as "safepoints", the
regalloc will (i) ensure that all references are in spillslots rather
than in registers, and (ii) provide a list of exactly which spillslots
have live references at that program point. This can be used by, e.g., a
GC to trace and possibly modify pointers. The stackmap of spillslots is
precise: it includes all live references, and *only* live references.
This commit also brings in some API tweaks as part of the in-progress
Cranelift glue. In particular, it makes Allocations and Operands
mutually disjoint by using the same bitfield for the type-tag in both
and choosing non-overlapping tags. This will allow instructions to carry
an Operand for each register slot and then overwrite these in place with
Allocations. The `OperandOrAllocation` type does the necessary magic to
make this look like an enum, but staying in 32 bits.