Commit Graph

221 Commits

Author SHA1 Message Date
Chris Fallin
1379c65a6a Handle conflict-related liverange splits arising from stack constraints without falling back to spill bundle. (#49)
Currently, we unconditionally trim the ends of liveranges around a split
when we do a split, including splits due to conflicts in a
liverange/bundle's requirements (e.g., a liverange with both a register
and a stack use). These trimmed ends, if they exist, go to the spill
bundle, and the spill bundle may receive a register during second-chance
allocation or otherwise will receive a stack slot.

This was previously measured to reduce contention significantly, because
it reduces the sizes of liveranges that participate in the first-chance
competition for allocations. When a split has to occur, we might as well
relegate the "connecting pieces" to a process that comes later, with a
hint to try to get the right register if possible but no hard connection
to either end.

However, in the case of a split arising from a reg-to-stack /
stack-to-reg conflict, as happens when references are used or def'd as
registers and then cross safepoints, this extra step in the connectivity
(normal LR with register use, then spill bundle, then normal LR with
stack use) can lead to extra moves. Additionally, when one of the LRs
has a stack constraint, contention is far less important; so it doesn't
hurt to skip the trimming step. In fact, it's likely much better to put
the "connecting piece" together with the stack side of the conflict.

Ideally we would handle this with the same move-cost logic we use for
conflicts detected during backtracking, but the requirements-related
splitting happens separately and that logic would need to be generalized
further. For now, this is sufficient to eliminate redundant moves as
seen in e.g. bytecodealliance/wasmtime#3785.
2022-05-16 22:36:51 -07:00
Chris Fallin
9b83635980 Bump version to 0.1.2. (#44)
This will get #43 into a release to allow us to use the checker on
Cranelift outputs.
2022-04-18 13:20:07 -07:00
Chris Fallin
a5c48fda8a Support program moves, including pinned vregs, in the checker. (#43)
The checker was built to validate programs produced by the fuzzing
testcase generator, which was built before regalloc2 supported special
handling of moves. (In a pure-SSA world, move elision is not needed,
because moves are not needed, and blockparams are the only way of
tying together vregs.)

Due to this, the checker works great for our independent regalloc2
fuzzing setup, but when used on regalloc inputs produced by Cranelift,
cannot prove correctness.

This PR extends the checker's analysis to properly handle "program
moves", which are distinct from regalloc-inserted moves in that they
are present in the original program and hence are semantically
relevant. A program move edits all sets of symbolic vregs at all
allocs, and where the source vreg appears, it inserts the dest vreg as
well. (It also removes the dest vreg from all other sets, since the
old value becomes stale, as is done for other defs.)

Given this, and given some additional checking for moves to/from
pinned vregs, the checker can now be used to fully validate
Cranelift-sourced regalloc2 invocations.
2022-04-18 10:36:26 -07:00
Chris Fallin
f307ed170c Bump version to 0.1.1. (#41) 2022-04-13 10:37:43 -07:00
Chris Fallin
4cac1614bf Add serde support for exposed types. (#40)
This adds derived `Serialize` and `Deserialize` implementations for
exposed types that describe registers, operands, and related program
inputs; entity indices; and regalloc output types. This allows
serialization of any of the embedder's IR data types that may embed or
build upon regalloc2 types.

These implementations (and the dependency on the `serde` crate itself)
are enabled only when the non-default `enable-serde` feature is
specified.
2022-04-13 10:14:00 -07:00
Chris Fallin
94cd6c421c Bump version to 0.1.0 for release. (#39) 2022-04-04 16:37:01 -07:00
Chris Fallin
0cb08095e6 Make multiple defs of one vreg possible on one instruction. (#38)
Currently, if this is done, overlapping liveranges are created, and we
hit an assert that ensures our non-overlapping and
built-in-reverse-order invariants during liverange construction.

One could argue that multiple defs of a single vreg don't make a ton of
sense -- which def's value is valid after the instruction? -- but if
they all get the same alloc, then the answer is "whatever was put in
that alloc", and this is just a case of an instruction being a bit
over-eager when listing its registers.

This can arise in practice when operand lists come from combinations or
concatenations: for example, in Cranelift's s390x backend, there is a
"Loop" pseudo-instruction, and the operands of the Loop are the operands
of all the sub-instructions.

It seems more logically cohesive overall to say that one can state an
operand as many times as one likes; so this PR makes it so.
2022-04-04 16:22:05 -07:00
Chris Fallin
a369150213 Make some improvements to clarity of checker implementation. (#37)
This PR makes two changes, both suggested by @fitzgen in #36:

1. It updates the top-level description of the analysis to more
   simply and accurately describe the analysis lattice.

2. It modifies both the `CheckerValue` and `CheckerState` types to be
   enums with separate arms for the top/universe value, and adds helpers
   as appropriate to update the values. There should be no functional
   change; this update just makes the meet-functions and updates more
   clear, and makes a bad state ("top" but with values) unrepresentable.

Closes #36.
2022-03-30 11:23:56 -07:00
Chris Fallin
ad41f8a7a5 Record vreg classes explicitly during liverange pass. (#35)
This resolves an issue seen when the source program uses multiple
regclasses (Int and Float): in some cases, the logic that grabs the
vregs and retains them (with class) in `vreg_regs` missed a register and
we had a class mismatch. This occurred because data structures were
initialized assuming `Int` regclass at first.

This PR instead removes the `vreg_regs` array, stores the class
explicitly as an `Option<RegClass>` in the `VRegData`, and provides a
`Env::vreg()` method that reconstitutes a `VReg` given its index and its
observed class. We "observe" the class of every vreg seen during the
liveness pass (and we assert that every occurrence of the vreg index has
the same class). In this way, we still have a single source-of-truth for
the vreg class (the mention of the vreg itself) and we explicitly
represent the "not observed yet" state (and panic on attempting to use
such a vreg) rather than implicitly taking the wrong class.
2022-03-29 14:00:14 -07:00
Chris Fallin
433e8b3776 Early defs reserve a register for whole instruction. (#32)
The `Operand` abstraction allows a def to be positioned at the "early"
point of an instruction, before its effect and alongside its normal
uses. This is intended to allow the embedder to express that a def may
be written before all uses are read, so it should not conflict with the
uses.

It's also convenient to use early defs to express temporaries, which
should be available throughout a regalloc-level instruction's emitted
sequence. In such a case, the register should not be used again after
the instruction, so it is dead following the instruction.

Strictly speaking, and according to regalloc2 prior to this PR, then the
temp will *only* conflict with the uses at the early-point, and not the
defs at the late-point (after the instruction), because it's dead past
its point of definition. But for a temp we really want it to register
conflicts not just with the normal uses but with the normal defs as
well.

This PR changes the semantics so that an early def builds a liverange
that spans the early- and late-point of an instruction when the vreg is
dead flowing down from the instruction, giving the semantics we want for
temps.
2022-03-18 10:32:49 -07:00
Chris Fallin
4f1161d9e4 Generalize debug-info support a bit. (#34)
* Generalize debug-info support a bit.

Previously, debug value-label support required each vreg to have a
disjoint sequence of instruction ranges, each with one label.

Unfortunately, it's entirely possible for multiple values at the program
level to map to one vreg at the IR level, leading to multiple labels.

This PR generalizes the debug-info generation support to allow for
arbitrary (label, range, vreg) tuples, as long as they are sorted by
vreg, with no other requirements. The lookup is a little more costly
when we generate the debuginfo, but in practice we shouldn't have more
than a *few* debug value labels per vreg, so in practice the constants
should be small.

* Typo fix from Amanieu

Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>

Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>
2022-03-18 10:32:27 -07:00
Chris Fallin
00dc692489 Allow for reused inputs when the reused vreg is also used as other (normal) uses. (#33)
The "reused input" operand constraint allows for an instruction to have
a def-operand whose allocation is constrained to reuse the same
allocation as one of the uses.

This is useful to express constraints needed for some instruction sets,
like x86, where at the ISA level, one register serves both as an input
and the output.

Unfortunately the way that we lower the constraints to liveranges does
not work if we have the same vreg used both for the reused input and
another input -- it results in impossible-to-solve constraints. For
example, the instruction

```
    alu_op v42 use, v42 use, v43 def reuse(0)
```

would result in an impossible allocation.

This fixes liverange construction to properly handle all uses of the
vreg whose operand is reused, rather than just the one reused operand.
2022-03-18 10:14:27 -07:00
Chris Fallin
bf92e7c02f Allow pinned-vregs to be implicit liveins. (#30)
Previously, the regalloc required all liveins to be defined by a
pseudoinstruction at the start of the function body. The regalloc.rs
compatibility shim did this, but it's slightly inconvenient when using
the API directly. This change allows pinned vregs to be implicit liveins
to the function body instead.
2022-03-18 10:13:56 -07:00
Chris Fallin
b1a512dbf6 Checker analysis: change order of block processing for better efficiency. (#29)
After going through the checker with @fitzgen, we discussed the dataflow
analysis and @fitzgen noted that it would likely be more efficient to,
for example, process an inner cycle of blocks in a loop nest and
converge before returning to the outer loop. I had written a BFS-style
workqueue loop to converge the dataflow analysis without much thought,
with a FIFO workqueue. Any workqueue ordering will work and will
converge to the same fixpoint (as long as we are operating on a
lattice), but indeed some orderings will be more efficient, and a
DFS-style (LIFO stack) workqueue will give us this property of
converging inner loops first.

In measurements, there doesn't seem to be much of a difference for small
fuzz testcases, but this will likely matter more if/when we try to run
the checker to validate register allocation on large functions.
2022-03-10 10:35:08 -08:00
Chris Fallin
fe021ad6d4 Simplify pinned-vreg API: don't require slice of all pinned vregs. (#28)
Simplify pinned-vreg API: don't require slice of all pinned vregs.

Previously, we kept a bool flag `is_pinned` in the `VRegData`, and we
required a `&[VReg]` of all pinned vregs to be provided by
`Function::pinned_vregs()`. This was (I think) done for convenience, but
it turns out not to really be necessary, as we can just query
`is_pinned_vreg` where needed (and in the likely implementation, e.g. in
Cranelift, this will be a `< NUM_PINNED_VREGS` check that can be
inlined). This adds convenience for the embedder (the main benefit), and
also reduces complexity, removes some state, and avoids some work
initializing the regalloc state for a run.
2022-03-04 15:12:16 -08:00
Chris Fallin
14442df3fc Support for debug-labels. (#27)
Support for debug-labels.

If the client adds labels to vregs across ranges of instructions in the
input program, the regalloc will provide metadata in the `Output` that
describes the `Allocation`s in which each such vreg is stored for those
ranges. This allows the client to emit debug metadata telling a debugger
where to find program values at each point in the program.
2022-03-03 16:58:33 -08:00
Chris Fallin
d9d97451f8 Merge pull request #26 from cfallin/new-checker
Rework checker to not require DefAlloc by tracking all vregs on each alloc.
2022-01-25 16:18:05 -08:00
Chris Fallin
7ce69de5b0 Address review comments. 2022-01-20 19:45:41 -08:00
Chris Fallin
2133606366 Fix doc-tests: escape figure properly 2022-01-20 17:21:32 -08:00
Chris Fallin
ccd6b4fc2c Remove DefAlloc -- no longer needed. 2022-01-19 23:57:31 -08:00
Chris Fallin
3b037f3c9e Rework checker to not require DefAlloc by tracking all vregs on each alloc.
The symbolic checker currently works by tracking a single symbolic vreg
label for each "alloc" (physical register or stack slot). On definition
of a vreg into an alloc, the label is updated to the new vreg's name.

This worked quite well when the regalloc was simpler, but in the
presence of redundant move elimination, it started to become apparent
that the analysis has a shortcoming: when multiple vregs have the same
value, and the regalloc has deduced this, it can make use of an alloc
that is labeled with one vreg but use it as another vreg and the checker
will not validate the use.

In other words, the regalloc became smart enough to avoid emitting
unnecessary moves, but the checker was relying on those moves to know
the most up-to-date symbolic name for a value in a physical location. In
a sense, a register or stackslot can contain *both* vreg1 and vreg2, and
the regalloc can use it as either.

The stopgap measure of emitting more DefAllocs as part of the redundant
move elimination never quite sat right with me. It works, but it's
asking too much of the regalloc to prove why its moves are correct. We
should rely less on the regalloc and on complex
built-just-for-the-checker plumbing; we should instead improve the
checker so that it can prove on its own that the result is correct.

This PR modifies the checker so that its basic abstraction for an
alloc's value is a *set* of virtual register labels, rather than just
one. The transfer function is a little more complex, but manageable: a
move keeps the old label(s) and adds a new one; redefining a vreg into
one alloc needs to remove that vreg label from all other alloc's sets.

This completely removes the need for metadata from the regalloc (!); all
we need is the original program (pre-alloc, with vregs), the set of
allocations, and the set of inserted moves, and we can validate the
result. This should mean that we trust our checker-validated allocation
results more, and should result in less complexity and maintenance going
forward if we improve the allocator further.
2022-01-19 23:50:35 -08:00
Chris Fallin
56a8f844a8 Merge pull request #25 from Amanieu/perf
Performance improvements
2022-01-11 14:35:10 -08:00
Amanieu d'Antras
6b1a5e8b1b Address review feedback 2022-01-11 22:27:15 +00:00
Amanieu d'Antras
be61078e4e Format Cargo.toml 2022-01-11 13:34:50 +00:00
Amanieu d'Antras
ee4de54240 Guard trace! behind cfg!(debug_assertions)
Even if the trace log level is disabled, the presence of the trace!
macro still has a significant impact on performance because it is
present in the inner loops of the allocator.

Removing the trace! calls at compile-time reduces instruction count by
~7%.
2022-01-11 13:30:13 +00:00
Amanieu d'Antras
2d9d5dd82b Rearrange some struct fields to work better with u64_key/u128_key
This allows the compiler to load the whole key with 1 or 2 64-bit
accesses, assuming little-endian ordering.

Improves instruction count by ~1%.
2022-01-11 13:24:51 +00:00
Amanieu d'Antras
693fb6a975 Only emit DefAlloc edits when the "checker" feature is enabled.
This reduces instruction counts by ~2% when disabled.
2022-01-11 13:03:24 +00:00
Amanieu d'Antras
d95a9d9399 Combine sort keys into u64/u128
This allows the compiler to perform branch-less comparisons, which are
more efficient.

This results in ~5% fewer instructions executed.
2022-01-11 13:03:21 +00:00
Amanieu d'Antras
053375f049 Remove PRegData::reg and use PReg::from_index instead
Performance impact is negligible but this is a good cleanup.
2022-01-11 13:02:08 +00:00
Amanieu d'Antras
74928b83fa Replace all assert! with debug_assert!
This results in a ~6% reduction in instruction count.
2022-01-11 03:54:08 +00:00
Chris Fallin
a27f93f01e Merge pull request #23 from Amanieu/iter
Add a helper to iterate over insts and edits of a block in order
2022-01-05 10:02:35 -08:00
Amanieu d'Antras
6f59cd407b Use block_insts_and_edits in the checker 2021-12-27 22:09:07 +01:00
Amanieu d'Antras
8ab44c383e Add a helper to iterate over insts and edits of a block in order 2021-12-27 22:08:36 +01:00
Chris Fallin
8752a8c5bd Merge pull request #17 from Amanieu/fixed_stack
Add support for fixed stack slots
2021-12-12 22:15:55 -08:00
Amanieu d'Antras
51493ab03a Apply review feedback 2021-12-12 00:33:30 +00:00
Amanieu d'Antras
38ffc479c2 Simplify the internal representation of PReg 2021-12-11 22:39:19 +00:00
Amanieu d'Antras
870e4729e1 Add fixed stack slots to the fuzzer 2021-12-11 22:39:19 +00:00
Amanieu d'Antras
8f435243e0 Properly handle fixed stack slots during multi-fixed-reg fixup 2021-12-11 22:39:14 +00:00
Amanieu d'Antras
707aacd818 Split up functions in liverange.rs
This helps with profiling even if they are inlined since perf with DWARF
callgraph profiling can attribute execution time to inlined functions.
2021-12-11 22:31:58 +00:00
Amanieu d'Antras
4f8e115115 Refactor requirement computation 2021-12-11 22:31:58 +00:00
Amanieu d'Antras
77e6a9e0d7 Add support for fixed stack slots
This works by allowing a PReg to be marked as being a stack location
instead of a physical register.
2021-12-11 22:31:58 +00:00
Chris Fallin
2f433929c4 Merge pull request #21 from cfallin/fuzzbug-20211204
Fix fuzzbug: add checker metadata for new vreg on multi-fixed-reg fixup move.
2021-12-05 09:56:37 -08:00
Chris Fallin
ef6c8f3226 Fix fuzzbug: add checker metadata for new vreg on multi-fixed-reg fixup move.
When an instruction uses the same vreg constrained to multiple different
fixed registers, the allocator converts all but one of the fixed
constraints to `Any` and then records a special fixup move that copies
the value to the other fixed registers just before the instruction. This
allows the allocator to maintain the invariant that a value lives in
only one place at a time throughout most of its logic, and constrains
the complexity-fallout of this corner case to just a special last-minute
edit.

Unfortunately some recent CPU time thrown at the fuzzer has uncovered
a subtle interaction with the redundant move eliminator that confuses
the checker.

Specifically, when the correct value is *already* in the second
constrained fixed reg, because of an unrelated other move (e.g. because
of a blockparam or other vreg moved from the original), the redundant
move eliminator can delete the fixup move without telling the checker
that it has done so.

Such an optimization is perfectly valid, and the generated code is
correct; but the checker thinks that some other vreg (the one that was
copied from the original) is in the second preg, and panics.

The fix is to use the mechanism that indicates "this move defines a new
vreg" (emitting a `defalloc` checker-instruction) to force the checker
to understand that after the fixup move, the given preg actually
contains the appropriate vreg.
2021-12-04 23:30:30 -08:00
Chris Fallin
822f2bc937 Merge pull request #18 from Amanieu/blockparam
Rework the API for outgoing blockparams
2021-12-01 10:24:12 -08:00
Amanieu d'Antras
6621a57cb7 Fix liveranges for branch parameters 2021-12-01 01:43:20 +00:00
Amanieu d'Antras
0cb3a8019f Rework the API for outgoing blockparams 2021-12-01 01:43:20 +00:00
Chris Fallin
fdd9913a7a Merge pull request #20 from cfallin/fuzzbug-fix
Fix fuzzbug related to bundle priority ordering.
2021-11-30 15:45:35 -08:00
Chris Fallin
c53fbb4a5c Fix fuzzbug related to bundle priority ordering.
Changes in computation of bundle priorities during review of the initial
PR introduced a possible mis-ordering of priorities: inner-loop bundle
use weights could exceed the weights of 1_000_000 and 2_000_000 used for
minimal bundles without and with fixed uses (respectively). These two
kinds of minimal bundle are meant to be the highest-priority bundles,
evicting any other bundle they need to, because they can't be split
further. This PR introduces two special bundle weights for these two
kinds of bundles, and clamps all other bundle weights to just below
them.

Thanks to @Amanieu for reporting the issue! Fixes #19.
2021-11-30 15:36:12 -08:00
Chris Fallin
c7bc6c941c Merge pull request #15 from cfallin/relicensing
Relicense fully to Apache-2.0 WITH LLVM-exception.
2021-11-18 12:40:54 -08:00
Chris Fallin
9774e97939 Merge pull request #16 from Amanieu/misc
Various fixes & minor improvements
2021-11-15 17:50:47 -08:00