Commit Graph

20 Commits

Author SHA1 Message Date
Amanieu d'Antras
2bd03256b3 Make regalloc2 #![no_std] (#119)
* Make regalloc2 `#![no_std]`

This crate doesn't require any features from the standard library, so it
can be made `no_std` to allow it to be used in environments that can't
use the Rust standard library.

This PR mainly performs the following mechanical changes:
- `std::collections` is replaced with `alloc::collections`.
- `std::*` is replaced with `core::*`.
- `Vec`, `vec!`, `format!` and `ToString` are imported when needed since
  they are no longer in the prelude.
- `HashSet` and `HashMap` are taken from the `hashbrown` crate, which is
  the same implementation that the standard library uses.
- `FxHashSet` and `FxHashMap` are typedefs in `lib.rs` that are based on
  the `hashbrown` types.

The only functional change is that `RegAllocError` no longer implements
the `Error` trait since that is not available in `core`.

Dependencies were adjusted to not require `std` and this is tested in CI
by building against the `thumbv6m-none-eabi` target that doesn't have
`std`.

* Add the Error trait impl back under a "std" feature
2023-03-09 11:25:59 -08:00
Amanieu d'Antras
7354cfedde Remove support for program moves (#118) 2023-03-04 16:38:05 -08:00
Chris Fallin
e09f6519a6 Remove pinned VRegs. (#108) 2023-01-24 17:31:41 -08:00
Chris Fallin
1b38a71e38 Some fixes to allow for call instructions to name args, returns, and clobbers with constraints. (#74)
* Some fixes to allow for call instructions to name args, returns, and clobbers with constraints.

- Allow early-pos uses with fixed regs that conflict with
  clobbers (which happen at late-pos), in addition to the
  existing logic for conflicts with late-pos defs with fixed
  regs.

  This is a pretty subtle issue that was uncovered in #53 for the def
  case, and the fix here is the mirror of that fix for clobbers. The
  root cause for all this complexity is that we can't split in the
  middle of an instruction (because there's no way to insert a move
  there!) so if a use is live-downward, we can't let it live in preg A
  at early-pos and preg B != A at late-pos; instead we need to rewrite
  the constraints and use a fixup move.

  The earlier change to fix #53 was actually a bit too conservative in
  that it always applied when such conflicts existed, even if the
  downward arg was not live. This PR fixes that (it's fine for the
  early-use and late-def to be fixed to the same reg if the use's
  liverange ends after early-pos) and adapts the same flexibility to
  the clobbers case as well.

- Reworks the fixups for the def case mentioned above to not shift the
  def to the Early point. Doing so causes issues when the def is to a
  reffy vreg: it can then be falsely included in a stackmap if the
  instruction containing this operand is a safepoint.

- Fixes the last-resort split-bundle-into-minimal-pieces logic from
  #59 to properly limit a minimal bundle piece to end after the
  early-pos, rather than cover the entire instruction. This was causing
  artificial overlaps between args that end after early-pos and defs
  that start at late-pos when one of the vregs hit the fallback split
  behavior.

* Fix fuzzbug: do not merge when a liverange has a fixed-reg def.

This can create impossible situations: e.g., if a vreg is constrained
to p0 as a late-def, and another, completely different vreg is
constrained to p0 as an early-use on the same instruction, and the
instruction also has a third vreg (early-use), we do not want to merge
the liverange for the third vreg with the first, because it would
result in an unsolveable conflict for p0 at the early-point.

* Review comments.
2022-09-20 15:58:20 -07:00
Chris Fallin
4eb2a2528b Limit split count per original bundle with fallback 1-to-N split. (#59)
* Limit split count per original bundle with fallback 1-to-N split.

Right now, splitting a bundle produces two halves. Furthermore, it has
cost linear in the length of the bundle, because the resulting
half-bundles have their requirements recomputed with a new scan, and
because we copy half the use-list over to the tail end sub-bundle.

This works fine when a bundle has a handful of splits overall, but not
when an input has a systematic pattern of conflicts that will require
O(|bundle|) splits (e.g., every Use is constrained to a different fixed
register than the last one). In such a case, we get quadratic behavior.

This PR adds a per-spillset (so, per-original-bundle) counter for
splits, and when it reaches a preset threshold (10 for now), we instead
split directly into minimal bundles along the whole length of the
bundle, putting the regions without uses in the spill bundle.

This basically approximates what a non-splitting allocator would do: it
"spills" the whole bundle to possibly a stackslot, or a second-chance
register allocation at best, via the spill bundle; and then does minimal
reservations of registers just at uses/defs and moves the "spilled"
value into/out of them immediately.

Together with another small optimization, this PR results in a 4x
compilation speedup and 24x memory use reduction on one particularly bad
case with alternating conflicting requirements on a vreg (see
bytecodealliance/wasmtime#4291 for details).

* Review comments.
2022-06-27 13:23:09 -07:00
Chris Fallin
ad41f8a7a5 Record vreg classes explicitly during liverange pass. (#35)
This resolves an issue seen when the source program uses multiple
regclasses (Int and Float): in some cases, the logic that grabs the
vregs and retains them (with class) in `vreg_regs` missed a register and
we had a class mismatch. This occurred because data structures were
initialized assuming `Int` regclass at first.

This PR instead removes the `vreg_regs` array, stores the class
explicitly as an `Option<RegClass>` in the `VRegData`, and provides a
`Env::vreg()` method that reconstitutes a `VReg` given its index and its
observed class. We "observe" the class of every vreg seen during the
liveness pass (and we assert that every occurrence of the vreg index has
the same class). In this way, we still have a single source-of-truth for
the vreg class (the mention of the vreg itself) and we explicitly
represent the "not observed yet" state (and panic on attempting to use
such a vreg) rather than implicitly taking the wrong class.
2022-03-29 14:00:14 -07:00
Chris Fallin
fe021ad6d4 Simplify pinned-vreg API: don't require slice of all pinned vregs. (#28)
Simplify pinned-vreg API: don't require slice of all pinned vregs.

Previously, we kept a bool flag `is_pinned` in the `VRegData`, and we
required a `&[VReg]` of all pinned vregs to be provided by
`Function::pinned_vregs()`. This was (I think) done for convenience, but
it turns out not to really be necessary, as we can just query
`is_pinned_vreg` where needed (and in the likely implementation, e.g. in
Cranelift, this will be a `< NUM_PINNED_VREGS` check that can be
inlined). This adds convenience for the embedder (the main benefit), and
also reduces complexity, removes some state, and avoids some work
initializing the regalloc state for a run.
2022-03-04 15:12:16 -08:00
Amanieu d'Antras
ee4de54240 Guard trace! behind cfg!(debug_assertions)
Even if the trace log level is disabled, the presence of the trace!
macro still has a significant impact on performance because it is
present in the inner loops of the allocator.

Removing the trace! calls at compile-time reduces instruction count by
~7%.
2022-01-11 13:30:13 +00:00
Amanieu d'Antras
2d9d5dd82b Rearrange some struct fields to work better with u64_key/u128_key
This allows the compiler to load the whole key with 1 or 2 64-bit
accesses, assuming little-endian ordering.

Improves instruction count by ~1%.
2022-01-11 13:24:51 +00:00
Amanieu d'Antras
74928b83fa Replace all assert! with debug_assert!
This results in a ~6% reduction in instruction count.
2022-01-11 03:54:08 +00:00
Amanieu d'Antras
51493ab03a Apply review feedback 2021-12-12 00:33:30 +00:00
Amanieu d'Antras
4f8e115115 Refactor requirement computation 2021-12-11 22:31:58 +00:00
Amanieu d'Antras
77e6a9e0d7 Add support for fixed stack slots
This works by allowing a PReg to be marked as being a stack location
instead of a physical register.
2021-12-11 22:31:58 +00:00
Chris Fallin
cf0d515709 Relicense fully to Apache-2.0 WITH LLVM-exception.
Large parts of the code in regalloc2 are currently licensed under the
Mozilla Public License (MPL) 2.0, because they derive in meaningful
ways from the register allocator in IonMonkey, which is part of
Firefox. The relevant source files are marked as such, with references
to the files in the Firefox source tree.

The intent of the regalloc2 project was to port the register allocator
from Firefox to use in Cranelift, borrowing good technology and
improving on it in the spirit of open source.

However, Several use-cases of Cranelift require, or at least strongly
prefer, the Apache-2.0 license with the LLVM exception (matching the
license of Cranelift itself, and Bytecode Alliance projects
generally). While using this license is not strictly necessary for
regalloc2 to be usable (The MPL is an excellent open-source license!),
relicensing fully under this license to harmonize with the rest of
Cranelift and Bytecode Alliance codebases significantly widens
possibilities and reduces friction; then regalloc2 is "just another
part of Cranelift" and doesn't have to be treated specially.

The source in `src/ion/` specifically began as a fairly direct port of
the algorithms in the following files in the `mozilla-central`
repository (Firefox codebase):

* The bulk of the "backtracking allocator" algorithm:
  * `js/src/jit/BacktrackingAllocator.{cpp,h}`
* Helpers and definitions in the surrounding infrastructure:
  * `js/src/jit/RegisterAllocator.h`
  * `js/src/jit/RegisterAllocator.cpp`
  * `js/src/jit/StackSlotAllocator.h`
  * `js/src/jit/LIR.h`
* A few data structure implementations:
  * `js/src/ds/SplayTree.h`
  * `js/src/ds/PriorityQueue.h`

Subsequent work in improving regalloc2 has caused it to drift from the
direct port -- for example, it no longer uses splay trees or the
direct port of the priority queue above -- but it is of course very
clearly still a derivative work.

Analysis of the contributors to these files indicates that we need
signoff from the following folks:

* Mozilla Corp, for contributions made by Mozilla employees (the
  majority of the code). Communications with Mozilla (thanks
  @tschneidereit and @bholley for doing the work here!) indicate that
  @ekr is able to sign off when ready here.

* Andy Wingo, specifically for the work done in [Bug
  1620197](https://bugzilla.mozilla.org/show_bug.cgi?id=1620197) and
  [Bug 1609057](https://bugzilla.mozilla.org/show_bug.cgi?id=1609057) to
  generalize the stack allocator for a Wasm feature (multiple returns).

Additionally, since the initial port, we have had three contributions
from @Amanieu:
[#9](https://github.com/bytecodealliance/regalloc2/pull/9),
[#11](https://github.com/bytecodealliance/regalloc2/pull/11),
[#13](https://github.com/bytecodealliance/regalloc2/pull/13).

So, if everyone applicable is happy with this relicensing, this PR
removes the MPL-2.0 license in `src/ion/` and marks all files as
covered under `Apache-2.0 WITH LLVM-exception`. Please let us know if
this is OK!

Signoffs:

- [ ] @ekr, for Mozilla's contributions
- [ ] @wingo, for contributions to original code in `mozilla-central`
- [ ] @Amanieu, for the three PRs linked above

Thanks!
2021-11-10 10:54:28 -08:00
Chris Fallin
6f0893d69d Address review comments. 2021-08-31 17:56:06 -07:00
Chris Fallin
6d313f2b56 Address review comments: more doc comments and some minor refactorings. 2021-08-30 17:15:37 -07:00
Chris Fallin
69ad31f013 Replace remaining instances of use of debug feature with debug_assertions.
Also fix some code that did not build in debug mode anymore (d'oh!) in
`src/ion/merges.rs`, as exposed by this change.
2021-08-12 17:35:55 -07:00
Chris Fallin
3e1e0f39b6 Convert all log::debug to log::trace. 2021-08-12 12:05:19 -07:00
Chris Fallin
84285c26fb Rename OperandPolicy to OperandConstraint as per feedback from @julian-seward1. 2021-08-12 11:17:52 -07:00
Chris Fallin
b36a563d69 Cleanup: split allocator implemntation into 11 files of more reasonable size. 2021-06-18 16:51:41 -07:00