* Remove log_enabled statements around annotate calls, which are already guarded against annotations_enabled
* Use a trace_enabled!() macro that follows the same logic as trace!() to find if additional traces have been enabled or not
Currently, regalloc2 sets aside one register per class, unconditionally,
to make move resolution possible. To solve the "parallel moves problem",
we sometimes need to conjure a cyclic permutation of data among
registers or stack slots (this can result, for example, from blockparam
flow that swaps two values on a loop backedge). This set-aside scratch
register is used when a cycle exists.
regalloc2 also uses the scratch register when needed to break down a
stack-to-stack move (which could happen due to blockparam moves on edges
when source and destination are both spilled) into a stack-to-reg move
followed by reg-to-stack, because most machines have loads and stores
but not memory-to-memory moves.
A set-aside register is certainly the simplest solution, but it is not
optimal: it means that we have one fewer register available for use by
the program, and this can be costly especially on machines with fewer
registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some
registers may be set aside by our embedder for other purposes too. Every
register we can reclaim is some nontrivial performance in large function
bodies!
This PR removes this restriction and allows regalloc2 to use all
available physical registers. It then solves the two problems above,
cyclic moves and stack-to-stack moves, with a two-stage approach:
- First, it finds a location to use to resolve cycles, if any exist. If
a register is unallocated at the location of the move, we can use it.
Often we get lucky and this is the case. Otherwise, we allocate a
stackslot to use as the temp. This is perfectly fine at this stage,
even if it means that we have more stack-to-stack moves.
- Then, it resolves stack-to-stack moves into stack-to-reg /
reg-to-stack. There are two subcases here. If there is *another*
available free physical register, we opportunistically use it for this
decomposition. If not, we fall back to our last-ditch option: we pick
a victim register of the appropriate class, we allocate another
temporary stackslot, we spill the victim to that slot just for this
move, we do the move in the above way (stack-to-reg / reg-to-stack)
with the victim, then we reload the victim. So one move (original
stack-to-stack) becomes four moves, but no state is clobbered.
This PR extends the `moves` fuzz-target to exercise this functionality
as well, randomly choosing for some spare registers to exist or not, and
randomly generating {stack,reg}-to-{stack,reg} moves in the initial
parallel-move input set. The target does a simple symbolic simulation of
the sequential move sequence and ensures that the final state is
equivalent to the parallel-move semantics.
I fuzzed both the `moves` target, focusing on the new logic; as well as
the `ion_checker` target, checking the whole register allocator, and
both seem clean (~150M cases on the former, ~1M cases on the latter).
This resolves an issue seen when the source program uses multiple
regclasses (Int and Float): in some cases, the logic that grabs the
vregs and retains them (with class) in `vreg_regs` missed a register and
we had a class mismatch. This occurred because data structures were
initialized assuming `Int` regclass at first.
This PR instead removes the `vreg_regs` array, stores the class
explicitly as an `Option<RegClass>` in the `VRegData`, and provides a
`Env::vreg()` method that reconstitutes a `VReg` given its index and its
observed class. We "observe" the class of every vreg seen during the
liveness pass (and we assert that every occurrence of the vreg index has
the same class). In this way, we still have a single source-of-truth for
the vreg class (the mention of the vreg itself) and we explicitly
represent the "not observed yet" state (and panic on attempting to use
such a vreg) rather than implicitly taking the wrong class.
Support for debug-labels.
If the client adds labels to vregs across ranges of instructions in the
input program, the regalloc will provide metadata in the `Output` that
describes the `Allocation`s in which each such vreg is stored for those
ranges. This allows the client to emit debug metadata telling a debugger
where to find program values at each point in the program.
Large parts of the code in regalloc2 are currently licensed under the
Mozilla Public License (MPL) 2.0, because they derive in meaningful
ways from the register allocator in IonMonkey, which is part of
Firefox. The relevant source files are marked as such, with references
to the files in the Firefox source tree.
The intent of the regalloc2 project was to port the register allocator
from Firefox to use in Cranelift, borrowing good technology and
improving on it in the spirit of open source.
However, Several use-cases of Cranelift require, or at least strongly
prefer, the Apache-2.0 license with the LLVM exception (matching the
license of Cranelift itself, and Bytecode Alliance projects
generally). While using this license is not strictly necessary for
regalloc2 to be usable (The MPL is an excellent open-source license!),
relicensing fully under this license to harmonize with the rest of
Cranelift and Bytecode Alliance codebases significantly widens
possibilities and reduces friction; then regalloc2 is "just another
part of Cranelift" and doesn't have to be treated specially.
The source in `src/ion/` specifically began as a fairly direct port of
the algorithms in the following files in the `mozilla-central`
repository (Firefox codebase):
* The bulk of the "backtracking allocator" algorithm:
* `js/src/jit/BacktrackingAllocator.{cpp,h}`
* Helpers and definitions in the surrounding infrastructure:
* `js/src/jit/RegisterAllocator.h`
* `js/src/jit/RegisterAllocator.cpp`
* `js/src/jit/StackSlotAllocator.h`
* `js/src/jit/LIR.h`
* A few data structure implementations:
* `js/src/ds/SplayTree.h`
* `js/src/ds/PriorityQueue.h`
Subsequent work in improving regalloc2 has caused it to drift from the
direct port -- for example, it no longer uses splay trees or the
direct port of the priority queue above -- but it is of course very
clearly still a derivative work.
Analysis of the contributors to these files indicates that we need
signoff from the following folks:
* Mozilla Corp, for contributions made by Mozilla employees (the
majority of the code). Communications with Mozilla (thanks
@tschneidereit and @bholley for doing the work here!) indicate that
@ekr is able to sign off when ready here.
* Andy Wingo, specifically for the work done in [Bug
1620197](https://bugzilla.mozilla.org/show_bug.cgi?id=1620197) and
[Bug 1609057](https://bugzilla.mozilla.org/show_bug.cgi?id=1609057) to
generalize the stack allocator for a Wasm feature (multiple returns).
Additionally, since the initial port, we have had three contributions
from @Amanieu:
[#9](https://github.com/bytecodealliance/regalloc2/pull/9),
[#11](https://github.com/bytecodealliance/regalloc2/pull/11),
[#13](https://github.com/bytecodealliance/regalloc2/pull/13).
So, if everyone applicable is happy with this relicensing, this PR
removes the MPL-2.0 license in `src/ion/` and marks all files as
covered under `Apache-2.0 WITH LLVM-exception`. Please let us know if
this is OK!
Signoffs:
- [ ] @ekr, for Mozilla's contributions
- [ ] @wingo, for contributions to original code in `mozilla-central`
- [ ] @Amanieu, for the three PRs linked above
Thanks!