regalloc2

T0b1/regalloc2

Fork 0

Commit Graph

Author	SHA1	Message	Date
Chris Fallin	33ac6cb41d	Heuristic improvement: reg-scan offset by inst location. We currently use a heuristic that our scan for an available PReg starts at an index into the register list that rotates with the bundle index. This is a simple way to distribute contention across the whole register file more evenly and avoid repeating less-likely-to-succeed reg-map probes to lower-numbered registers for every bundle. After some experimentation with different options (queue that dynamically puts registers at end after allocating, various ways of mixing/hashing indices, etc.), adding the instruction offset (of the start of the first range in the bundle) as well gave the best results. This is very simple and gives us a likely better-than-random conflict avoidance because ranges tend to be local, so rotating through registers as we scan down the list of instructions seems like a very natural strategy. On the tests used by our `cargo bench` benchmark, this reduces regfile probes for the largest (459 instruction) benchmark from 1538 to 829, i.e., approximately by half, and results in an 11% allocation speedup.	2021-04-13 23:31:34 -07:00
Chris Fallin	8e923b0ad9	Initial public commit of regalloc2.	2021-04-13 17:40:12 -07:00

Author

SHA1

Message

Date

Chris Fallin

33ac6cb41d

Heuristic improvement: reg-scan offset by inst location.

We currently use a heuristic that our scan for an available PReg
starts at an index into the register list that rotates with the bundle
index. This is a simple way to distribute contention across the whole
register file more evenly and avoid repeating less-likely-to-succeed
reg-map probes to lower-numbered registers for every bundle.

After some experimentation with different options (queue that
dynamically puts registers at end after allocating, various
ways of mixing/hashing indices, etc.), adding the *instruction offset*
(of the start of the first range in the bundle) as well gave the best
results. This is very simple and gives us a likely better-than-random
conflict avoidance because ranges tend to be local, so rotating
through registers as we scan down the list of instructions seems like
a very natural strategy.

On the tests used by our `cargo bench` benchmark, this reduces regfile
probes for the largest (459 instruction) benchmark from 1538 to 829,
i.e., approximately by half, and results in an 11% allocation speedup.

2021-04-13 23:31:34 -07:00

Chris Fallin

8e923b0ad9

Initial public commit of regalloc2.

2021-04-13 17:40:12 -07:00

2 Commits