Commit Graph

11 Commits

Author SHA1 Message Date
Amanieu d'Antras
2bd03256b3 Make regalloc2 #![no_std] (#119)
* Make regalloc2 `#![no_std]`

This crate doesn't require any features from the standard library, so it
can be made `no_std` to allow it to be used in environments that can't
use the Rust standard library.

This PR mainly performs the following mechanical changes:
- `std::collections` is replaced with `alloc::collections`.
- `std::*` is replaced with `core::*`.
- `Vec`, `vec!`, `format!` and `ToString` are imported when needed since
  they are no longer in the prelude.
- `HashSet` and `HashMap` are taken from the `hashbrown` crate, which is
  the same implementation that the standard library uses.
- `FxHashSet` and `FxHashMap` are typedefs in `lib.rs` that are based on
  the `hashbrown` types.

The only functional change is that `RegAllocError` no longer implements
the `Error` trait since that is not available in `core`.

Dependencies were adjusted to not require `std` and this is tested in CI
by building against the `thumbv6m-none-eabi` target that doesn't have
`std`.

* Add the Error trait impl back under a "std" feature
2023-03-09 11:25:59 -08:00
Amanieu d'Antras
520cafa129 Handle fixed stack slots in the move resolver (#78)
Fixed stack slots are treated as `PReg`s by most of the register
allocator, but need some additional handling the move resolver to avoid
generating stack-to-stack moves.
2022-09-19 12:27:24 -07:00
Chris Fallin
52818a7ed6 Handle conflicting Before and After fixed-reg constraints with a copy. (#54)
* Extend fuzzer to generate cases like #53.

Currently, the fuzz testcase generator will add at most one
fixed-register constraint to an instruction per physical register. This
avoids impossible situations, such as specifying that both `v0` and `v1`
must be placed into the same `p0`.

However, it *should* be possible to say that `v0` is in `p0` before the
instruction, and `v1` is in `p0` after the instruction (i.e., at `Early`
and `Late` operand positions).

This in fact exposes a limitation in the current allocator design: when
`v0` is live downward, with the above constraints, it will result in an
impossible allocation situation because we cannot split in the middle of
an instruction. A subsequent fix will rectify this by using the
multi-fixed-reg fixup mechanism.

* Handle conflicting Before and After fixed-reg constraints with a copy.

This fixes #53. Previously, if two operands on an instruction
specified *different* vregs constrained to the same physical register
at the Before (Early) and After (Late) points of the instruction, and
the Before was live downward as well, we would panic: we can't insert
a move into the middle of an instruction, so putting the first vreg in
the preg at Early implies we have an unsolveable conflict at Late.

We can solve this issue by adding some new logic to insert a copy, and
rewrite the constraint. This reuses the multi-fixed-reg-constraint
fixup logic. While that logic handles the case where the *same* vreg
has multiple *different* fixed-reg constraints, this new logic
handles *different* vregs with the *same* fixed-reg constraints, but
at different *program points*; so the two are complementary.

This addresses the specific test case in #53, and also fuzzes cleanly
with the change to the fuzz testcase generator to generate these
cases (which also immediately found the bug).

* Add a reservation to the PReg when rewriting constraint so it is not doubly-allocated.

* Distinguish initial fixup moves from secondary moves.

* Use `trace` macro, not `log::trace`, to avoid trace output when feature is disabled.

* Rework operand rewriting to properly handle bundle-merging edge case.

When the liverange for the defined vreg with fixed constraint at Late is
*merged* with the liverange for the used vreg with fixed constraint at
Early, the strategy of putting a fixed reservation on the preg at Early
fails, because the whole bundle is minimal (if it spans just the
instruction's Early and Late and nothing else). This could happen if
e.g. the def flows into a blockparam arg that merges with a blockparam
defining the used value.

Instead we move the def one halfstep earlier, to the Early point, with
its fixed-reg constraint still in place. This has the same effect but
works when the two are merged.

* Fix checker issue: make more flexible in the presence of victim-register saves.
2022-05-31 14:01:27 -07:00
Chris Fallin
869c21e79c Remove an explicitly-set-aside scratch register per class. (#51)
Currently, regalloc2 sets aside one register per class, unconditionally,
to make move resolution possible. To solve the "parallel moves problem",
we sometimes need to conjure a cyclic permutation of data among
registers or stack slots (this can result, for example, from blockparam
flow that swaps two values on a loop backedge). This set-aside scratch
register is used when a cycle exists.

regalloc2 also uses the scratch register when needed to break down a
stack-to-stack move (which could happen due to blockparam moves on edges
when source and destination are both spilled) into a stack-to-reg move
followed by reg-to-stack, because most machines have loads and stores
but not memory-to-memory moves.

A set-aside register is certainly the simplest solution, but it is not
optimal: it means that we have one fewer register available for use by
the program, and this can be costly especially on machines with fewer
registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some
registers may be set aside by our embedder for other purposes too. Every
register we can reclaim is some nontrivial performance in large function
bodies!

This PR removes this restriction and allows regalloc2 to use all
available physical registers. It then solves the two problems above,
cyclic moves and stack-to-stack moves, with a two-stage approach:

- First, it finds a location to use to resolve cycles, if any exist. If
  a register is unallocated at the location of the move, we can use it.
  Often we get lucky and this is the case. Otherwise, we allocate a
  stackslot to use as the temp. This is perfectly fine at this stage,
  even if it means that we have more stack-to-stack moves.

- Then, it resolves stack-to-stack moves into stack-to-reg /
  reg-to-stack. There are two subcases here. If there is *another*
  available free physical register, we opportunistically use it for this
  decomposition. If not, we fall back to our last-ditch option: we pick
  a victim register of the appropriate class, we allocate another
  temporary stackslot, we spill the victim to that slot just for this
  move, we do the move in the above way (stack-to-reg / reg-to-stack)
  with the victim, then we reload the victim. So one move (original
  stack-to-stack) becomes four moves, but no state is clobbered.

This PR extends the `moves` fuzz-target to exercise this functionality
as well, randomly choosing for some spare registers to exist or not, and
randomly generating {stack,reg}-to-{stack,reg} moves in the initial
parallel-move input set. The target does a simple symbolic simulation of
the sequential move sequence and ensures that the final state is
equivalent to the parallel-move semantics.

I fuzzed both the `moves` target, focusing on the new logic; as well as
the `ion_checker` target, checking the whole register allocator, and
both seem clean (~150M cases on the former, ~1M cases on the latter).
2022-05-23 10:48:37 -07:00
Amanieu d'Antras
d95a9d9399 Combine sort keys into u64/u128
This allows the compiler to perform branch-less comparisons, which are
more efficient.

This results in ~5% fewer instructions executed.
2022-01-11 13:03:21 +00:00
Amanieu d'Antras
74928b83fa Replace all assert! with debug_assert!
This results in a ~6% reduction in instruction count.
2022-01-11 03:54:08 +00:00
Chris Fallin
3a18564e98 Addressed more review comments. 2021-08-30 17:51:55 -07:00
Chris Fallin
69ad31f013 Replace remaining instances of use of debug feature with debug_assertions.
Also fix some code that did not build in debug mode anymore (d'oh!) in
`src/ion/merges.rs`, as exposed by this change.
2021-08-12 17:35:55 -07:00
Chris Fallin
37fa3ec763 Improve prog-move handling: no use/def records, just directly connect the LRs.
Also requires some metadata in edit output to properly hook up the
checker in regalloc.rs to track user-moves without seeing the original
insts with operands.
2021-05-11 23:59:12 -07:00
Chris Fallin
940c1b719d Changes from review comments. 2021-04-18 13:12:33 -07:00
Chris Fallin
8e923b0ad9 Initial public commit of regalloc2. 2021-04-13 17:40:12 -07:00