Commit Graph

36 Commits

Author SHA1 Message Date
Jamey Sharp
7bb83a3361 checker: Use a couple of Rust idioms (#114)
This should have no functional change, just makes the source slightly
easier to read and reason about.
2023-02-16 01:59:20 +00:00
Chris Fallin
c3e513c4cb Fix checker when empty blocks result in unchanged-from-Top entry state. (#113)
The checker works by keeping a worklist of blocks to process, and adds a
block to the worklist when its entry state changes. Every entry state is
initially `Top` (in a lattice). The entry block is explicitly added to
the worklist to kick off the processing.

In ordinary cases, the entry block has some instructions that change
state from `Top` to something else (lower in the lattice), and this is
propagated to its successors; its successors are added to the worklist;
and so on. No other state is `Top` from then on (because of
monotonicity) so every reachable block is processed.

However, if the entry block is completely empty except for the
terminating branch, the state remains `Top`; then the entry state of its
successors, even when updated, is still `Top`; and the state didn't
change so the blocks are not added to the worklist. (Nevermind that they
were not processed in the first place!) The bug is that the invariant
"has been processed already with current state" is not true initially,
when the current state is set to `Top` but nothing has been processed.

This PR makes a simple fix: it adds every block to the worklist
initially to be processed, in input order (which is usually RPO order in
practice) as a good first heuristic; then if after processing the input
state changes again, it can be reprocessed until fixpoint as always.

Fixes bytecodealliance/wasmtime#5791.
2023-02-15 17:42:51 -08:00
Chris Fallin
e09f6519a6 Remove pinned VRegs. (#108) 2023-01-24 17:31:41 -08:00
Amanieu d'Antras
520cafa129 Handle fixed stack slots in the move resolver (#78)
Fixed stack slots are treated as `PReg`s by most of the register
allocator, but need some additional handling the move resolver to avoid
generating stack-to-stack moves.
2022-09-19 12:27:24 -07:00
Amanieu d'Antras
1495c1e342 Add fixed-non-allocatable operand support (#77)
This allows a non-allocatable `PReg` to be passed on directly to the
allocations vector without any liverange tracking from the register
allocator. The main intended use case is to support ISA-specific special
registers such as a fixed zero register.
2022-09-19 12:23:53 -07:00
Chris Fallin
9733cb2227 Clobbers: use a more efficient bitmask representation in API. (#58)
* Clobbers: use a more efficient bitmask representation in API.

Currently, the `Function` trait requires a `&[PReg]` for the
clobber-list for a given instruction. In most cases where clobbers are
used, the list may be long: e.g., ABIs specify a fixed set of registers
that are clobbered and there may be ~half of all registers in this list.
What's more, the list can't be shared for e.g. all calls of a given ABI,
because actual return-values (defs) can't be clobbers. So we need to
allocate space for long, sometimes-slightly-different lists; this is
inefficient for the embedder and for us.

It's much more efficient to use a bitmask to represent a set of physical
registers. With current data structure bitpacking limitations, we can
support at most 128 physical registers; this means we can use a `u128`
bitmask. This also allows e.g. an embedder to start with a constant for
a given ABI, and mask out bits for actual return-value registers on call
instructions.

This PR makes that change, for minor but positive performance impact.

* Review comments.
2022-06-27 12:27:19 -07:00
Chris Fallin
52818a7ed6 Handle conflicting Before and After fixed-reg constraints with a copy. (#54)
* Extend fuzzer to generate cases like #53.

Currently, the fuzz testcase generator will add at most one
fixed-register constraint to an instruction per physical register. This
avoids impossible situations, such as specifying that both `v0` and `v1`
must be placed into the same `p0`.

However, it *should* be possible to say that `v0` is in `p0` before the
instruction, and `v1` is in `p0` after the instruction (i.e., at `Early`
and `Late` operand positions).

This in fact exposes a limitation in the current allocator design: when
`v0` is live downward, with the above constraints, it will result in an
impossible allocation situation because we cannot split in the middle of
an instruction. A subsequent fix will rectify this by using the
multi-fixed-reg fixup mechanism.

* Handle conflicting Before and After fixed-reg constraints with a copy.

This fixes #53. Previously, if two operands on an instruction
specified *different* vregs constrained to the same physical register
at the Before (Early) and After (Late) points of the instruction, and
the Before was live downward as well, we would panic: we can't insert
a move into the middle of an instruction, so putting the first vreg in
the preg at Early implies we have an unsolveable conflict at Late.

We can solve this issue by adding some new logic to insert a copy, and
rewrite the constraint. This reuses the multi-fixed-reg-constraint
fixup logic. While that logic handles the case where the *same* vreg
has multiple *different* fixed-reg constraints, this new logic
handles *different* vregs with the *same* fixed-reg constraints, but
at different *program points*; so the two are complementary.

This addresses the specific test case in #53, and also fuzzes cleanly
with the change to the fuzz testcase generator to generate these
cases (which also immediately found the bug).

* Add a reservation to the PReg when rewriting constraint so it is not doubly-allocated.

* Distinguish initial fixup moves from secondary moves.

* Use `trace` macro, not `log::trace`, to avoid trace output when feature is disabled.

* Rework operand rewriting to properly handle bundle-merging edge case.

When the liverange for the defined vreg with fixed constraint at Late is
*merged* with the liverange for the used vreg with fixed constraint at
Early, the strategy of putting a fixed reservation on the preg at Early
fails, because the whole bundle is minimal (if it spans just the
instruction's Early and Late and nothing else). This could happen if
e.g. the def flows into a blockparam arg that merges with a blockparam
defining the used value.

Instead we move the def one halfstep earlier, to the Early point, with
its fixed-reg constraint still in place. This has the same effect but
works when the two are merged.

* Fix checker issue: make more flexible in the presence of victim-register saves.
2022-05-31 14:01:27 -07:00
Chris Fallin
a5c48fda8a Support program moves, including pinned vregs, in the checker. (#43)
The checker was built to validate programs produced by the fuzzing
testcase generator, which was built before regalloc2 supported special
handling of moves. (In a pure-SSA world, move elision is not needed,
because moves are not needed, and blockparams are the only way of
tying together vregs.)

Due to this, the checker works great for our independent regalloc2
fuzzing setup, but when used on regalloc inputs produced by Cranelift,
cannot prove correctness.

This PR extends the checker's analysis to properly handle "program
moves", which are distinct from regalloc-inserted moves in that they
are present in the original program and hence are semantically
relevant. A program move edits all sets of symbolic vregs at all
allocs, and where the source vreg appears, it inserts the dest vreg as
well. (It also removes the dest vreg from all other sets, since the
old value becomes stale, as is done for other defs.)

Given this, and given some additional checking for moves to/from
pinned vregs, the checker can now be used to fully validate
Cranelift-sourced regalloc2 invocations.
2022-04-18 10:36:26 -07:00
Chris Fallin
a369150213 Make some improvements to clarity of checker implementation. (#37)
This PR makes two changes, both suggested by @fitzgen in #36:

1. It updates the top-level description of the analysis to more
   simply and accurately describe the analysis lattice.

2. It modifies both the `CheckerValue` and `CheckerState` types to be
   enums with separate arms for the top/universe value, and adds helpers
   as appropriate to update the values. There should be no functional
   change; this update just makes the meet-functions and updates more
   clear, and makes a bad state ("top" but with values) unrepresentable.

Closes #36.
2022-03-30 11:23:56 -07:00
Chris Fallin
b1a512dbf6 Checker analysis: change order of block processing for better efficiency. (#29)
After going through the checker with @fitzgen, we discussed the dataflow
analysis and @fitzgen noted that it would likely be more efficient to,
for example, process an inner cycle of blocks in a loop nest and
converge before returning to the outer loop. I had written a BFS-style
workqueue loop to converge the dataflow analysis without much thought,
with a FIFO workqueue. Any workqueue ordering will work and will
converge to the same fixpoint (as long as we are operating on a
lattice), but indeed some orderings will be more efficient, and a
DFS-style (LIFO stack) workqueue will give us this property of
converging inner loops first.

In measurements, there doesn't seem to be much of a difference for small
fuzz testcases, but this will likely matter more if/when we try to run
the checker to validate register allocation on large functions.
2022-03-10 10:35:08 -08:00
Chris Fallin
7ce69de5b0 Address review comments. 2022-01-20 19:45:41 -08:00
Chris Fallin
2133606366 Fix doc-tests: escape figure properly 2022-01-20 17:21:32 -08:00
Chris Fallin
ccd6b4fc2c Remove DefAlloc -- no longer needed. 2022-01-19 23:57:31 -08:00
Chris Fallin
3b037f3c9e Rework checker to not require DefAlloc by tracking all vregs on each alloc.
The symbolic checker currently works by tracking a single symbolic vreg
label for each "alloc" (physical register or stack slot). On definition
of a vreg into an alloc, the label is updated to the new vreg's name.

This worked quite well when the regalloc was simpler, but in the
presence of redundant move elimination, it started to become apparent
that the analysis has a shortcoming: when multiple vregs have the same
value, and the regalloc has deduced this, it can make use of an alloc
that is labeled with one vreg but use it as another vreg and the checker
will not validate the use.

In other words, the regalloc became smart enough to avoid emitting
unnecessary moves, but the checker was relying on those moves to know
the most up-to-date symbolic name for a value in a physical location. In
a sense, a register or stackslot can contain *both* vreg1 and vreg2, and
the regalloc can use it as either.

The stopgap measure of emitting more DefAllocs as part of the redundant
move elimination never quite sat right with me. It works, but it's
asking too much of the regalloc to prove why its moves are correct. We
should rely less on the regalloc and on complex
built-just-for-the-checker plumbing; we should instead improve the
checker so that it can prove on its own that the result is correct.

This PR modifies the checker so that its basic abstraction for an
alloc's value is a *set* of virtual register labels, rather than just
one. The transfer function is a little more complex, but manageable: a
move keeps the old label(s) and adds a new one; redefining a vreg into
one alloc needs to remove that vreg label from all other alloc's sets.

This completely removes the need for metadata from the regalloc (!); all
we need is the original program (pre-alloc, with vregs), the set of
allocations, and the set of inserted moves, and we can validate the
result. This should mean that we trust our checker-validated allocation
results more, and should result in less complexity and maintenance going
forward if we improve the allocator further.
2022-01-19 23:50:35 -08:00
Amanieu d'Antras
ee4de54240 Guard trace! behind cfg!(debug_assertions)
Even if the trace log level is disabled, the presence of the trace!
macro still has a significant impact on performance because it is
present in the inner loops of the allocator.

Removing the trace! calls at compile-time reduces instruction count by
~7%.
2022-01-11 13:30:13 +00:00
Amanieu d'Antras
693fb6a975 Only emit DefAlloc edits when the "checker" feature is enabled.
This reduces instruction counts by ~2% when disabled.
2022-01-11 13:03:24 +00:00
Amanieu d'Antras
74928b83fa Replace all assert! with debug_assert!
This results in a ~6% reduction in instruction count.
2022-01-11 03:54:08 +00:00
Amanieu d'Antras
6f59cd407b Use block_insts_and_edits in the checker 2021-12-27 22:09:07 +01:00
Amanieu d'Antras
870e4729e1 Add fixed stack slots to the fuzzer 2021-12-11 22:39:19 +00:00
Amanieu d'Antras
77e6a9e0d7 Add support for fixed stack slots
This works by allowing a PReg to be marked as being a stack location
instead of a physical register.
2021-12-11 22:31:58 +00:00
Chris Fallin
c53fbb4a5c Fix fuzzbug related to bundle priority ordering.
Changes in computation of bundle priorities during review of the initial
PR introduced a possible mis-ordering of priorities: inner-loop bundle
use weights could exceed the weights of 1_000_000 and 2_000_000 used for
minimal bundles without and with fixed uses (respectively). These two
kinds of minimal bundle are meant to be the highest-priority bundles,
evicting any other bundle they need to, because they can't be split
further. This PR introduces two special bundle weights for these two
kinds of bundles, and clamps all other bundle weights to just below
them.

Thanks to @Amanieu for reporting the issue! Fixes #19.
2021-11-30 15:36:12 -08:00
Amanieu d'Antras
a516e6d6f3 Return safepoint_slots as Allocations instead of SpillSlots
This enables us to support reftype vregs in register locations in the
future.
2021-11-16 00:47:43 +00:00
Chris Fallin
6389071e09 Address review comments. 2021-08-31 17:42:50 -07:00
Chris Fallin
b19fa4857f Rename operand positions to Early and Late, and make weights f16/f32 values. 2021-08-31 17:31:23 -07:00
Chris Fallin
3a18564e98 Addressed more review comments. 2021-08-30 17:51:55 -07:00
Chris Fallin
6d313f2b56 Address review comments: more doc comments and some minor refactorings. 2021-08-30 17:15:37 -07:00
Chris Fallin
3e1e0f39b6 Convert all log::debug to log::trace. 2021-08-12 12:05:19 -07:00
Chris Fallin
84285c26fb Rename OperandPolicy to OperandConstraint as per feedback from @julian-seward1. 2021-08-12 11:17:52 -07:00
Chris Fallin
c6bcd3c941 WIP: redundant-move elimination. 2021-06-07 21:15:32 -07:00
Chris Fallin
0eaa0fde06 Fix to checker: analyze all blocks, even if out-state of entry block is empty 2021-06-05 14:47:55 -07:00
Chris Fallin
2a5f571b80 WIP: Handle moves between realregs (pregs) and vregs somewhat specially, by converting into operand constraints
Still has a fuzzbug in interaction between R->R and V->R moves. Will
likely rework to make pinned-vreg handling more general but want to save
a checkpoint here; idea for rework:
- set allocs immediately if an Operand is a pinned vreg;
- reserve preg ranges;
- then, in rest of liveness computation / LR construction, convert
  pinned-vregs to operands with constraints, but otherwise do not
  special-case as we do in this commit.
2021-05-20 19:53:19 -07:00
Chris Fallin
b7fd53efc5 Fix checker: after moving edge-moves to prior to last branch of block (for simpler semantics for library user), we can no longer check blockparams; but this is fine because they do not exist in post-regalloc code. 2021-05-09 19:47:37 -07:00
Chris Fallin
a6e3128821 Support mod (modify) operands, for better efficiency with regalloc.rs/Cranelift shim. 2021-05-07 19:48:34 -07:00
Chris Fallin
02b6516acd Some memory-size/bitpacking optimizations 2021-05-06 20:47:17 -07:00
Chris Fallin
a08b0121a0 Add support for reftypes/stackmaps and Stack constraints, and misc API changes.
The main enhancement in this commit is support for reference types and
stackmaps. This requires tracking whether each VReg is a "reference" or
"pointer". At certain instructions designated as "safepoints", the
regalloc will (i) ensure that all references are in spillslots rather
than in registers, and (ii) provide a list of exactly which spillslots
have live references at that program point. This can be used by, e.g., a
GC to trace and possibly modify pointers. The stackmap of spillslots is
precise: it includes all live references, and *only* live references.

This commit also brings in some API tweaks as part of the in-progress
Cranelift glue. In particular, it makes Allocations and Operands
mutually disjoint by using the same bitfield for the type-tag in both
and choosing non-overlapping tags. This will allow instructions to carry
an Operand for each register slot and then overwrite these in place with
Allocations. The `OperandOrAllocation` type does the necessary magic to
make this look like an enum, but staying in 32 bits.
2021-04-17 21:29:13 -07:00
Chris Fallin
8e923b0ad9 Initial public commit of regalloc2. 2021-04-13 17:40:12 -07:00