regalloc2

Author	SHA1	Message	Date
Chris Fallin	c380b0d979	assert fix: RegClass doesn't need to match for spillslots (can be reused across classes)	2021-05-09 01:51:00 -07:00
Chris Fallin	9fdc69edde	fuzzbug fix in range-summary iter	2021-05-09 01:48:16 -07:00
Chris Fallin	509c5dc2fd	Remove sanity-check logic in range summary construction -- zero-length ranges make this somewhat fickle to verify, and fuzzing will catch any issues.	2021-05-09 01:15:34 -07:00
Chris Fallin	095a883814	Fix crit-edge detection logic in CFGInfo analysis	2021-05-09 01:06:59 -07:00
Chris Fallin	b9e89885c4	Error checking: properly signal a crit-edge requirement failure (used for regalloc.rs fuzzer)	2021-05-08 21:48:58 -07:00
Chris Fallin	f1fc9a8f7e	Fix related to move handling	2021-05-08 19:04:16 -07:00
Chris Fallin	00c64f680a	Handle moves by joining LRs at inst boundary, not middle of move inst	2021-05-08 17:45:24 -07:00
Chris Fallin	3d0d760c70	Bugfix: process parallel moves separately for Int and Float classes	2021-05-08 16:16:30 -07:00
Chris Fallin	179ef0e534	Bugfix: Mod with dead def spans both Before and After positions	2021-05-08 15:47:38 -07:00
Chris Fallin	ed339ab4d8	Some minor opts: inlining, and smallvec reuse	2021-05-07 20:54:27 -07:00
Chris Fallin	e41b0101a8	Struct-of-array transform: pull LiveRangeHot out of LiveRange with just range and next-in-bundle link	2021-05-07 20:41:33 -07:00
Chris Fallin	4185eab441	More efficient live-range creation re: uses	2021-05-07 20:12:40 -07:00
Chris Fallin	a6e3128821	Support `mod` (modify) operands, for better efficiency with regalloc.rs/Cranelift shim.	2021-05-07 19:48:34 -07:00
Chris Fallin	d2cc4f1ac2	More efficient queue_bundles (saves 18% on clang.wasm)	2021-05-07 19:20:28 -07:00
Chris Fallin	040c3c838c	Some structure packing: Use now fits in three u32s.	2021-05-07 19:05:20 -07:00
Chris Fallin	bfe1c632c9	Some preallocation and removal of one u32 from LiveRange struct	2021-05-07 18:52:46 -07:00
Chris Fallin	a453501ebb	sort_unstable (quicksort) everywhere	2021-05-07 18:17:13 -07:00
Chris Fallin	df59b5ede4	Inline all the things (ProgPoint edition)	2021-05-07 17:55:04 -07:00
Chris Fallin	4f6346768e	Pinned-VReg mechanism.	2021-05-07 17:45:51 -07:00
Chris Fallin	3ddcf05fea	Optimizations: (i) range-summary array; (ii) early exit from btree probe loop (one conflict bundle is enough, empirically)	2021-05-07 17:03:44 -07:00
Chris Fallin	0f3454b4d7	Inlining on btree commitment map comparators for a 10% win	2021-05-07 01:51:40 -07:00
Chris Fallin	3713d6131e	Replace approximate liveness with true iterative liveness; turns out it is better to improve accuracy so that later stages of the allocator have less wasted work/interference	2021-05-07 01:22:12 -07:00
Chris Fallin	42582e0c6f	Some stats for loop effects on liveins: 487k loop set-unions (441 loops) in one func in bz2 -- fix TBD	2021-05-07 00:19:41 -07:00
Chris Fallin	2ba518517d	Fuzzbugfix: actually do need eager liveness computation; must uphold invariant that all earlier-in-postorder blocks have full livein sets.	2021-05-06 23:29:59 -07:00
Chris Fallin	2ff02b50a3	Some perf opts in liveness computation and set impl: - Cache the most recent u64 chunk in the set to avoid some hashmap lookups; - Defer the live-set union'ing over the loop body until query time (remember the set that would have been union'd in instead), and lazily propagate the liveness bit at that query time, union-find style; - Do n-1 rather than n union operations for n successors (first is a clone instead); - Don't union in liveness sets from blocks we haven't visited yet (the loop-body/backedge handling handles these).	2021-05-06 22:46:16 -07:00
Chris Fallin	a148dccac3	Parameterize adaptive-map size in BitVec.	2021-05-06 22:14:46 -07:00
Chris Fallin	02b6516acd	Some memory-size/bitpacking optimizations	2021-05-06 20:47:17 -07:00
Chris Fallin	07a5a88972	BitVec perf: use adaptive hybrid chunked small-array + FxHashMap.	2021-05-06 20:03:44 -07:00
Chris Fallin	e2beb471c4	Handle moves specially with move-insertion logic rather than ordinary operand/inst handling	2021-05-06 18:49:23 -07:00
Chris Fallin	747c56c2c3	Some micro-optimizations in BitVec.	2021-05-06 16:19:38 -07:00
Chris Fallin	1a7a0c5e3d	Some performance tweaks -- try to reduce register probe count with soem more hints. Also fix spill-weight caching.	2021-05-06 16:09:39 -07:00
Chris Fallin	80cdd0c5ac	Properly handle multiple same-fixed-reg constraints to the same vreg in one inst.	2021-05-06 01:01:27 -07:00
Chris Fallin	ab828b6c86	MachineEnv fields are public	2021-05-05 23:14:04 -07:00
Chris Fallin	48fbc235ea	BitVec::get() takes immutable self	2021-05-05 23:08:19 -07:00
Chris Fallin	15ed2d6522	Allow multiple defs per vreg (i.e., accept non-SSA code). This generalizes the allocator to accept multiple defs by making defs just another type of "use" (uses are now perhaps more properly called "mentions", but for now we abuse the terminology slightly). It turns out that this actually was not terribly hard, because we don't rely on the properties that a strict SSA requirement otherwise might allow us to: e.g., defs always at exactly the start of a vreg's ranges. Because we already accepted arbitrary block order and irreducible CFGs, and approximated live-ranges with the single-pass algorithm, we are robust in our "stitching" (move insertion) and so all we really care about is computing some superset of the actual live-ranges and then a non-interfering coloring of (split pieces of) those ranges. Multiple defs don't change that, as long as we compute the ranges properly. We still have blockparams in this design, so the client can provide SSA directly, and everything will work as before. But a client that produces non-SSA need not use them at all; it can just happily reassign to vregs and everything will Just Work. This is part of the effort to port Cranelift over to regalloc2; I have decided that it may be easier to build a compatibility shim that matches regalloc.rs's interface than to continue boiling the ocean and converting all of the lowering sequences to SSA. It then becomes a separable piece of work (and simply further performance improvements and simplifications) to remove the need for this shim.	2021-05-05 22:49:45 -07:00
Chris Fallin	9e7021cfd0	Derive Ord/hash on OperandOrAllocation.	2021-05-03 19:18:19 -07:00
Chris Fallin	49c54b6144	Misc usability and functionality enhancements: - Support preferred and non-preferred subsets of a register class. This allows allocating, e.g., caller-saved registers before callee-saved registers. - Allow branch blockparam args to start an a certain offset in branch operands; this allows branches to have other operands too (e.g., conditional-branch inputs). - Allow `OperandOrAllocation` to be constructed from an `Allocation` and `OperandKind` as well (i.e., an allocation with an use/def bit).	2021-04-30 21:14:09 -07:00
Chris Fallin	414f3f828d	Factored out test program and fuzzing features; core crate now only depends on smallvec and log.	2021-04-18 14:19:32 -07:00
Chris Fallin	34ab744f4f	Add GitHub CI config.	2021-04-18 13:18:18 -07:00
Chris Fallin	940c1b719d	Changes from review comments.	2021-04-18 13:12:33 -07:00
Chris Fallin	a08b0121a0	Add support for reftypes/stackmaps and Stack constraints, and misc API changes. The main enhancement in this commit is support for reference types and stackmaps. This requires tracking whether each VReg is a "reference" or "pointer". At certain instructions designated as "safepoints", the regalloc will (i) ensure that all references are in spillslots rather than in registers, and (ii) provide a list of exactly which spillslots have live references at that program point. This can be used by, e.g., a GC to trace and possibly modify pointers. The stackmap of spillslots is precise: it includes all live references, and only live references. This commit also brings in some API tweaks as part of the in-progress Cranelift glue. In particular, it makes Allocations and Operands mutually disjoint by using the same bitfield for the type-tag in both and choosing non-overlapping tags. This will allow instructions to carry an Operand for each register slot and then overwrite these in place with Allocations. The `OperandOrAllocation` type does the necessary magic to make this look like an enum, but staying in 32 bits.	2021-04-17 21:29:13 -07:00
Chris Fallin	33ac6cb41d	Heuristic improvement: reg-scan offset by inst location. We currently use a heuristic that our scan for an available PReg starts at an index into the register list that rotates with the bundle index. This is a simple way to distribute contention across the whole register file more evenly and avoid repeating less-likely-to-succeed reg-map probes to lower-numbered registers for every bundle. After some experimentation with different options (queue that dynamically puts registers at end after allocating, various ways of mixing/hashing indices, etc.), adding the instruction offset (of the start of the first range in the bundle) as well gave the best results. This is very simple and gives us a likely better-than-random conflict avoidance because ranges tend to be local, so rotating through registers as we scan down the list of instructions seems like a very natural strategy. On the tests used by our `cargo bench` benchmark, this reduces regfile probes for the largest (459 instruction) benchmark from 1538 to 829, i.e., approximately by half, and results in an 11% allocation speedup.	2021-04-13 23:31:34 -07:00
Chris Fallin	8e923b0ad9	Initial public commit of regalloc2.	2021-04-13 17:40:12 -07:00
Chris Fallin	41841996c8	Initial empty commit.	2021-04-13 17:31:06 -07:00

... 2 3 4 5 6

294 Commits