Commit Graph

49 Commits

Author SHA1 Message Date
Chris Fallin
b7fd53efc5 Fix checker: after moving edge-moves to prior to last branch of block (for simpler semantics for library user), we can no longer check blockparams; but this is fine because they do not exist in post-regalloc code. 2021-05-09 19:47:37 -07:00
Chris Fallin
4f26b1c78f Properly handle prog-moves with fixed srcs or dests 2021-05-09 13:35:38 -07:00
Chris Fallin
5c5ea4cb9b bugfix 2021-05-09 04:11:30 -07:00
Chris Fallin
34421fcc6b fix to prog-move handling: happens in middle of inst; and insert uses to make later move-insertion happy with this 2021-05-09 03:51:10 -07:00
Chris Fallin
8d7530d3fa Edge moves always before jumps, never after; semantics are too subtle otherwise (client needs to handle specially) 2021-05-09 02:20:38 -07:00
Chris Fallin
c380b0d979 assert fix: RegClass doesn't need to match for spillslots (can be reused across classes) 2021-05-09 01:51:00 -07:00
Chris Fallin
9fdc69edde fuzzbug fix in range-summary iter 2021-05-09 01:48:16 -07:00
Chris Fallin
509c5dc2fd Remove sanity-check logic in range summary construction -- zero-length
ranges make this somewhat fickle to verify, and fuzzing will catch any
issues.
2021-05-09 01:15:34 -07:00
Chris Fallin
095a883814 Fix crit-edge detection logic in CFGInfo analysis 2021-05-09 01:06:59 -07:00
Chris Fallin
b9e89885c4 Error checking: properly signal a crit-edge requirement failure (used for regalloc.rs fuzzer) 2021-05-08 21:48:58 -07:00
Chris Fallin
f1fc9a8f7e Fix related to move handling 2021-05-08 19:04:16 -07:00
Chris Fallin
00c64f680a Handle moves by joining LRs at inst boundary, not middle of move inst 2021-05-08 17:45:24 -07:00
Chris Fallin
3d0d760c70 Bugfix: process parallel moves separately for Int and Float classes 2021-05-08 16:16:30 -07:00
Chris Fallin
179ef0e534 Bugfix: Mod with dead def spans both Before and After positions 2021-05-08 15:47:38 -07:00
Chris Fallin
ed339ab4d8 Some minor opts: inlining, and smallvec reuse 2021-05-07 20:54:27 -07:00
Chris Fallin
e41b0101a8 Struct-of-array transform: pull LiveRangeHot out of LiveRange with just range and next-in-bundle link 2021-05-07 20:41:33 -07:00
Chris Fallin
4185eab441 More efficient live-range creation re: uses 2021-05-07 20:12:40 -07:00
Chris Fallin
a6e3128821 Support mod (modify) operands, for better efficiency with regalloc.rs/Cranelift shim. 2021-05-07 19:48:34 -07:00
Chris Fallin
d2cc4f1ac2 More efficient queue_bundles (saves 18% on clang.wasm) 2021-05-07 19:20:28 -07:00
Chris Fallin
040c3c838c Some structure packing: Use now fits in three u32s. 2021-05-07 19:05:20 -07:00
Chris Fallin
bfe1c632c9 Some preallocation and removal of one u32 from LiveRange struct 2021-05-07 18:52:46 -07:00
Chris Fallin
a453501ebb sort_unstable (quicksort) everywhere 2021-05-07 18:17:13 -07:00
Chris Fallin
df59b5ede4 Inline all the things (ProgPoint edition) 2021-05-07 17:55:04 -07:00
Chris Fallin
4f6346768e Pinned-VReg mechanism. 2021-05-07 17:45:51 -07:00
Chris Fallin
3ddcf05fea Optimizations: (i) range-summary array; (ii) early exit from btree probe loop (one conflict bundle is enough, empirically) 2021-05-07 17:03:44 -07:00
Chris Fallin
0f3454b4d7 Inlining on btree commitment map comparators for a 10% win 2021-05-07 01:51:40 -07:00
Chris Fallin
3713d6131e Replace approximate liveness with true iterative liveness; turns out it is better to improve accuracy so that later stages of the allocator have less wasted work/interference 2021-05-07 01:22:12 -07:00
Chris Fallin
42582e0c6f Some stats for loop effects on liveins: 487k loop set-unions (441 loops) in one func in bz2 -- fix TBD 2021-05-07 00:19:41 -07:00
Chris Fallin
2ba518517d Fuzzbugfix: actually do need eager liveness computation; must uphold invariant that all earlier-in-postorder blocks have full livein sets. 2021-05-06 23:29:59 -07:00
Chris Fallin
2ff02b50a3 Some perf opts in liveness computation and set impl:
- Cache the most recent u64 chunk in the set to avoid some hashmap
  lookups;
- Defer the live-set union'ing over the loop body until query time
  (remember the set that would have been union'd in instead), and lazily
  propagate the liveness bit at that query time, union-find style;
- Do n-1 rather than n union operations for n successors (first is a
  clone instead);
- Don't union in liveness sets from blocks we haven't visited yet (the
  loop-body/backedge handling handles these).
2021-05-06 22:46:16 -07:00
Chris Fallin
a148dccac3 Parameterize adaptive-map size in BitVec. 2021-05-06 22:14:46 -07:00
Chris Fallin
02b6516acd Some memory-size/bitpacking optimizations 2021-05-06 20:47:17 -07:00
Chris Fallin
07a5a88972 BitVec perf: use adaptive hybrid chunked small-array + FxHashMap. 2021-05-06 20:03:44 -07:00
Chris Fallin
e2beb471c4 Handle moves specially with move-insertion logic rather than ordinary operand/inst handling 2021-05-06 18:49:23 -07:00
Chris Fallin
747c56c2c3 Some micro-optimizations in BitVec. 2021-05-06 16:19:38 -07:00
Chris Fallin
1a7a0c5e3d Some performance tweaks -- try to reduce register probe count with soem more hints. Also fix spill-weight caching. 2021-05-06 16:09:39 -07:00
Chris Fallin
80cdd0c5ac Properly handle multiple same-fixed-reg constraints to the same vreg in one inst. 2021-05-06 01:01:27 -07:00
Chris Fallin
ab828b6c86 MachineEnv fields are public 2021-05-05 23:14:04 -07:00
Chris Fallin
48fbc235ea BitVec::get() takes immutable self 2021-05-05 23:08:19 -07:00
Chris Fallin
15ed2d6522 Allow multiple defs per vreg (i.e., accept non-SSA code).
This generalizes the allocator to accept multiple defs by making defs
just another type of "use" (uses are now perhaps more properly called
"mentions", but for now we abuse the terminology slightly).

It turns out that this actually was not terribly hard, because we don't
rely on the properties that a strict SSA requirement otherwise might
allow us to: e.g., defs always at exactly the start of a vreg's ranges.
Because we already accepted arbitrary block order and irreducible CFGs,
and approximated live-ranges with the single-pass algorithm, we are
robust in our "stitching" (move insertion) and so all we really care
about is computing some superset of the actual live-ranges and then a
non-interfering coloring of (split pieces of) those ranges. Multiple
defs don't change that, as long as we compute the ranges properly.

We still have blockparams in this design, so the client *can* provide
SSA directly, and everything will work as before. But a client that
produces non-SSA need not use them at all; it can just happily reassign
to vregs and everything will Just Work.

This is part of the effort to port Cranelift over to regalloc2; I have
decided that it may be easier to build a compatibility shim that matches
regalloc.rs's interface than to continue boiling the ocean and
converting all of the lowering sequences to SSA. It then becomes a
separable piece of work (and simply further performance improvements and
simplifications) to remove the need for this shim.
2021-05-05 22:49:45 -07:00
Chris Fallin
9e7021cfd0 Derive Ord/hash on OperandOrAllocation. 2021-05-03 19:18:19 -07:00
Chris Fallin
49c54b6144 Misc usability and functionality enhancements:
- Support preferred and non-preferred subsets of a register class. This
  allows allocating, e.g., caller-saved registers before callee-saved
  registers.
- Allow branch blockparam args to start an a certain offset in branch
  operands; this allows branches to have other operands too (e.g.,
  conditional-branch inputs).
- Allow `OperandOrAllocation` to be constructed from an `Allocation` and
  `OperandKind` as well (i.e., an allocation with an use/def bit).
2021-04-30 21:14:09 -07:00
Chris Fallin
414f3f828d Factored out test program and fuzzing features; core crate now only depends on smallvec and log. 2021-04-18 14:19:32 -07:00
Chris Fallin
34ab744f4f Add GitHub CI config. 2021-04-18 13:18:18 -07:00
Chris Fallin
940c1b719d Changes from review comments. 2021-04-18 13:12:33 -07:00
Chris Fallin
a08b0121a0 Add support for reftypes/stackmaps and Stack constraints, and misc API changes.
The main enhancement in this commit is support for reference types and
stackmaps. This requires tracking whether each VReg is a "reference" or
"pointer". At certain instructions designated as "safepoints", the
regalloc will (i) ensure that all references are in spillslots rather
than in registers, and (ii) provide a list of exactly which spillslots
have live references at that program point. This can be used by, e.g., a
GC to trace and possibly modify pointers. The stackmap of spillslots is
precise: it includes all live references, and *only* live references.

This commit also brings in some API tweaks as part of the in-progress
Cranelift glue. In particular, it makes Allocations and Operands
mutually disjoint by using the same bitfield for the type-tag in both
and choosing non-overlapping tags. This will allow instructions to carry
an Operand for each register slot and then overwrite these in place with
Allocations. The `OperandOrAllocation` type does the necessary magic to
make this look like an enum, but staying in 32 bits.
2021-04-17 21:29:13 -07:00
Chris Fallin
33ac6cb41d Heuristic improvement: reg-scan offset by inst location.
We currently use a heuristic that our scan for an available PReg
starts at an index into the register list that rotates with the bundle
index. This is a simple way to distribute contention across the whole
register file more evenly and avoid repeating less-likely-to-succeed
reg-map probes to lower-numbered registers for every bundle.

After some experimentation with different options (queue that
dynamically puts registers at end after allocating, various
ways of mixing/hashing indices, etc.), adding the *instruction offset*
(of the start of the first range in the bundle) as well gave the best
results. This is very simple and gives us a likely better-than-random
conflict avoidance because ranges tend to be local, so rotating
through registers as we scan down the list of instructions seems like
a very natural strategy.

On the tests used by our `cargo bench` benchmark, this reduces regfile
probes for the largest (459 instruction) benchmark from 1538 to 829,
i.e., approximately by half, and results in an 11% allocation speedup.
2021-04-13 23:31:34 -07:00
Chris Fallin
8e923b0ad9 Initial public commit of regalloc2. 2021-04-13 17:40:12 -07:00
Chris Fallin
41841996c8 Initial empty commit. 2021-04-13 17:31:06 -07:00