Commit Graph

263 Commits

Author SHA1 Message Date
Chris Fallin
80cdd0c5ac Properly handle multiple same-fixed-reg constraints to the same vreg in one inst. 2021-05-06 01:01:27 -07:00
Chris Fallin
ab828b6c86 MachineEnv fields are public 2021-05-05 23:14:04 -07:00
Chris Fallin
48fbc235ea BitVec::get() takes immutable self 2021-05-05 23:08:19 -07:00
Chris Fallin
15ed2d6522 Allow multiple defs per vreg (i.e., accept non-SSA code).
This generalizes the allocator to accept multiple defs by making defs
just another type of "use" (uses are now perhaps more properly called
"mentions", but for now we abuse the terminology slightly).

It turns out that this actually was not terribly hard, because we don't
rely on the properties that a strict SSA requirement otherwise might
allow us to: e.g., defs always at exactly the start of a vreg's ranges.
Because we already accepted arbitrary block order and irreducible CFGs,
and approximated live-ranges with the single-pass algorithm, we are
robust in our "stitching" (move insertion) and so all we really care
about is computing some superset of the actual live-ranges and then a
non-interfering coloring of (split pieces of) those ranges. Multiple
defs don't change that, as long as we compute the ranges properly.

We still have blockparams in this design, so the client *can* provide
SSA directly, and everything will work as before. But a client that
produces non-SSA need not use them at all; it can just happily reassign
to vregs and everything will Just Work.

This is part of the effort to port Cranelift over to regalloc2; I have
decided that it may be easier to build a compatibility shim that matches
regalloc.rs's interface than to continue boiling the ocean and
converting all of the lowering sequences to SSA. It then becomes a
separable piece of work (and simply further performance improvements and
simplifications) to remove the need for this shim.
2021-05-05 22:49:45 -07:00
Chris Fallin
9e7021cfd0 Derive Ord/hash on OperandOrAllocation. 2021-05-03 19:18:19 -07:00
Chris Fallin
49c54b6144 Misc usability and functionality enhancements:
- Support preferred and non-preferred subsets of a register class. This
  allows allocating, e.g., caller-saved registers before callee-saved
  registers.
- Allow branch blockparam args to start an a certain offset in branch
  operands; this allows branches to have other operands too (e.g.,
  conditional-branch inputs).
- Allow `OperandOrAllocation` to be constructed from an `Allocation` and
  `OperandKind` as well (i.e., an allocation with an use/def bit).
2021-04-30 21:14:09 -07:00
Chris Fallin
414f3f828d Factored out test program and fuzzing features; core crate now only depends on smallvec and log. 2021-04-18 14:19:32 -07:00
Chris Fallin
34ab744f4f Add GitHub CI config. 2021-04-18 13:18:18 -07:00
Chris Fallin
940c1b719d Changes from review comments. 2021-04-18 13:12:33 -07:00
Chris Fallin
a08b0121a0 Add support for reftypes/stackmaps and Stack constraints, and misc API changes.
The main enhancement in this commit is support for reference types and
stackmaps. This requires tracking whether each VReg is a "reference" or
"pointer". At certain instructions designated as "safepoints", the
regalloc will (i) ensure that all references are in spillslots rather
than in registers, and (ii) provide a list of exactly which spillslots
have live references at that program point. This can be used by, e.g., a
GC to trace and possibly modify pointers. The stackmap of spillslots is
precise: it includes all live references, and *only* live references.

This commit also brings in some API tweaks as part of the in-progress
Cranelift glue. In particular, it makes Allocations and Operands
mutually disjoint by using the same bitfield for the type-tag in both
and choosing non-overlapping tags. This will allow instructions to carry
an Operand for each register slot and then overwrite these in place with
Allocations. The `OperandOrAllocation` type does the necessary magic to
make this look like an enum, but staying in 32 bits.
2021-04-17 21:29:13 -07:00
Chris Fallin
33ac6cb41d Heuristic improvement: reg-scan offset by inst location.
We currently use a heuristic that our scan for an available PReg
starts at an index into the register list that rotates with the bundle
index. This is a simple way to distribute contention across the whole
register file more evenly and avoid repeating less-likely-to-succeed
reg-map probes to lower-numbered registers for every bundle.

After some experimentation with different options (queue that
dynamically puts registers at end after allocating, various
ways of mixing/hashing indices, etc.), adding the *instruction offset*
(of the start of the first range in the bundle) as well gave the best
results. This is very simple and gives us a likely better-than-random
conflict avoidance because ranges tend to be local, so rotating
through registers as we scan down the list of instructions seems like
a very natural strategy.

On the tests used by our `cargo bench` benchmark, this reduces regfile
probes for the largest (459 instruction) benchmark from 1538 to 829,
i.e., approximately by half, and results in an 11% allocation speedup.
2021-04-13 23:31:34 -07:00
Chris Fallin
8e923b0ad9 Initial public commit of regalloc2. 2021-04-13 17:40:12 -07:00
Chris Fallin
41841996c8 Initial empty commit. 2021-04-13 17:31:06 -07:00