Commit Graph

9 Commits

Author SHA1 Message Date
Nick Fitzgerald
eb0a8fd22f Bump to version 0.4.2 (#99)
* Bump to version 0.4.2

* Remove take-and-replace dance

There are no conflicting borrows of `self` anymore.
2022-11-01 10:30:30 -07:00
Nick Fitzgerald
fdcf7b694f Avoid indexing and use iteration (#98)
...while still appeasing the borrow checker by taking spillsets out of `self`
and then putting them back in again when we're done.

I was just doing this to enable playing with some data structures in follow up
commits, but decided to benchmark this commit as-is and found 2-4% speed ups to
Cranelift compilation!

```
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 39946528.13 ± 38398.29 (confidence = 99%)

  no-index.so is 1.04x to 1.04x faster than main.so!

  [985704952 985984130.24 986180413] main.so
  [945649144 946037602.11 946262076] no-index.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 48413802.56 ± 34288.05 (confidence = 99%)

  no-index.so is 1.03x to 1.03x faster than main.so!

  [1593663899 1593926801.92 1594246604] main.so
  [1545196678 1545512999.36 1545802144] no-index.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 841028066.56 ± 253404.59 (confidence = 99%)

  no-index.so is 1.02x to 1.02x faster than main.so!

  [34798712681 34801346430.28 34802786661] main.so
  [33958847844 33960318363.72 33962177143] no-index.so
```
2022-11-01 10:00:09 -07:00
Amanieu d'Antras
906a053208 Remove register class from SpillSlot (#80)
* Remove register class from `SpillSlot`

The register allocator was already allowing moves between spillslots and
registers of different classes, so this PR formalizes this by making
spillslots independent of register class.

This also fixes #79 by properly tracking the register class of an
`InsertedMove` with the `to_vreg` field which turns out to never
be `None` in practice. Removing the `Option` allows the register
class of the `VReg` to be used when building the per-class move lists.

Fixes #79

* Address review feedback
2022-09-20 14:05:23 -07:00
Chris Fallin
4eb2a2528b Limit split count per original bundle with fallback 1-to-N split. (#59)
* Limit split count per original bundle with fallback 1-to-N split.

Right now, splitting a bundle produces two halves. Furthermore, it has
cost linear in the length of the bundle, because the resulting
half-bundles have their requirements recomputed with a new scan, and
because we copy half the use-list over to the tail end sub-bundle.

This works fine when a bundle has a handful of splits overall, but not
when an input has a systematic pattern of conflicts that will require
O(|bundle|) splits (e.g., every Use is constrained to a different fixed
register than the last one). In such a case, we get quadratic behavior.

This PR adds a per-spillset (so, per-original-bundle) counter for
splits, and when it reaches a preset threshold (10 for now), we instead
split directly into minimal bundles along the whole length of the
bundle, putting the regions without uses in the spill bundle.

This basically approximates what a non-splitting allocator would do: it
"spills" the whole bundle to possibly a stackslot, or a second-chance
register allocation at best, via the spill bundle; and then does minimal
reservations of registers just at uses/defs and moves the "spilled"
value into/out of them immediately.

Together with another small optimization, this PR results in a 4x
compilation speedup and 24x memory use reduction on one particularly bad
case with alternating conflicting requirements on a vreg (see
bytecodealliance/wasmtime#4291 for details).

* Review comments.
2022-06-27 13:23:09 -07:00
Chris Fallin
427e041f1c Fix spillslot allocation to actually reuse spillslots. (#56)
* Fix spillslot allocation to actually reuse spillslots.

The old logic, which did some linked-list rearranging to try to probe
more-likely-to-be-free slots first and which was inherited straight from
the original IonMonkey allocator, was slightly broken (error in
translation and not in IonMonkey, to be clear): it did not get the
list-splicing right, so quite often dropped a slot on the floor and
failed to consider it for further reuse.

On some experimentation, it seems to work just as well to keep a
SmallVec of spillslot indices per size class instead, and save the last
probe-point in order to spread load throughout the allocated slots while
limiting the number of probes (to bound quadratic behavior).

This change moves the maximum slot count from 285 to 92 in `python.wasm`
from bytecodealliance/wasmtime#4214, and the maximum frame size from
2384 bytes to 752 bytes.
2022-06-03 16:01:10 -07:00
Amanieu d'Antras
ee4de54240 Guard trace! behind cfg!(debug_assertions)
Even if the trace log level is disabled, the presence of the trace!
macro still has a significant impact on performance because it is
present in the inner loops of the allocator.

Removing the trace! calls at compile-time reduces instruction count by
~7%.
2022-01-11 13:30:13 +00:00
Chris Fallin
cf0d515709 Relicense fully to Apache-2.0 WITH LLVM-exception.
Large parts of the code in regalloc2 are currently licensed under the
Mozilla Public License (MPL) 2.0, because they derive in meaningful
ways from the register allocator in IonMonkey, which is part of
Firefox. The relevant source files are marked as such, with references
to the files in the Firefox source tree.

The intent of the regalloc2 project was to port the register allocator
from Firefox to use in Cranelift, borrowing good technology and
improving on it in the spirit of open source.

However, Several use-cases of Cranelift require, or at least strongly
prefer, the Apache-2.0 license with the LLVM exception (matching the
license of Cranelift itself, and Bytecode Alliance projects
generally). While using this license is not strictly necessary for
regalloc2 to be usable (The MPL is an excellent open-source license!),
relicensing fully under this license to harmonize with the rest of
Cranelift and Bytecode Alliance codebases significantly widens
possibilities and reduces friction; then regalloc2 is "just another
part of Cranelift" and doesn't have to be treated specially.

The source in `src/ion/` specifically began as a fairly direct port of
the algorithms in the following files in the `mozilla-central`
repository (Firefox codebase):

* The bulk of the "backtracking allocator" algorithm:
  * `js/src/jit/BacktrackingAllocator.{cpp,h}`
* Helpers and definitions in the surrounding infrastructure:
  * `js/src/jit/RegisterAllocator.h`
  * `js/src/jit/RegisterAllocator.cpp`
  * `js/src/jit/StackSlotAllocator.h`
  * `js/src/jit/LIR.h`
* A few data structure implementations:
  * `js/src/ds/SplayTree.h`
  * `js/src/ds/PriorityQueue.h`

Subsequent work in improving regalloc2 has caused it to drift from the
direct port -- for example, it no longer uses splay trees or the
direct port of the priority queue above -- but it is of course very
clearly still a derivative work.

Analysis of the contributors to these files indicates that we need
signoff from the following folks:

* Mozilla Corp, for contributions made by Mozilla employees (the
  majority of the code). Communications with Mozilla (thanks
  @tschneidereit and @bholley for doing the work here!) indicate that
  @ekr is able to sign off when ready here.

* Andy Wingo, specifically for the work done in [Bug
  1620197](https://bugzilla.mozilla.org/show_bug.cgi?id=1620197) and
  [Bug 1609057](https://bugzilla.mozilla.org/show_bug.cgi?id=1609057) to
  generalize the stack allocator for a Wasm feature (multiple returns).

Additionally, since the initial port, we have had three contributions
from @Amanieu:
[#9](https://github.com/bytecodealliance/regalloc2/pull/9),
[#11](https://github.com/bytecodealliance/regalloc2/pull/11),
[#13](https://github.com/bytecodealliance/regalloc2/pull/13).

So, if everyone applicable is happy with this relicensing, this PR
removes the MPL-2.0 license in `src/ion/` and marks all files as
covered under `Apache-2.0 WITH LLVM-exception`. Please let us know if
this is OK!

Signoffs:

- [ ] @ekr, for Mozilla's contributions
- [ ] @wingo, for contributions to original code in `mozilla-central`
- [ ] @Amanieu, for the three PRs linked above

Thanks!
2021-11-10 10:54:28 -08:00
Chris Fallin
3e1e0f39b6 Convert all log::debug to log::trace. 2021-08-12 12:05:19 -07:00
Chris Fallin
b36a563d69 Cleanup: split allocator implemntation into 11 files of more reasonable size. 2021-06-18 16:51:41 -07:00