Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.
After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.
The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.
This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.
Example 1, the Python interpreter which contains a very large function
with a lot of variables.
Before:
15.664 0.011 Register allocation
1.535 1.535 RA liveness analysis
2.872 1.911 RA coalescing CSSA
4.436 4.436 RA spilling
2.610 2.598 RA reloading
4.200 4.199 RA coloring
After:
9.795 0.013 Register allocation
1.372 1.372 RA liveness analysis
6.231 6.227 RA coalescing CSSA
0.712 0.712 RA spilling
0.598 0.598 RA reloading
0.869 0.869 RA coloring
Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.
Example 2, the clang compiler.
Before:
57.148 0.035 Register allocation
9.630 9.630 RA liveness analysis
7.210 7.169 RA coalescing CSSA
9.972 9.972 RA spilling
11.602 11.572 RA reloading
18.698 18.672 RA coloring
After:
64.792 0.042 Register allocation
8.630 8.630 RA liveness analysis
22.937 22.928 RA coalescing CSSA
8.684 8.684 RA spilling
9.559 9.551 RA reloading
14.939 14.936 RA coloring
Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.
Most examples are less extreme than these two. They just get better code
at about the same compile time.
The initial phase of computing virtual registers can now be implemented
with a textbook union-find algorithm using a disjoint set forest
complete with rank and path compression optimizations.
The disjoint set forest is converted to virtual register value lists in
a single linear scan implemented in finish_union_find().
This union-find algorithm will soon be used by the coalescer.
Instead of requiring the values in a virtual register to be sorted
according to the domtree.rpo_cmp() order, just require any topological
ordering w.r.t. dominance.
The coalescer with stop using the RPO shortly.
The new PrimaryMap replaces the primary EntityMap and the PrimaryEntityData
marker trait which was causing some confusion. We now have a clear
division between the two types of maps:
- PrimaryMap is used to assign entity numbers to the primary data for an
entity.
- EntityMap is a secondary mapping adding additional info.
The split also means that the secondary EntityMap can now behave as if
all keys have a default value. This means that we can get rid of the
annoying ensure() and get_or_default() methods ther were used everywhere
instead of indexing. Just use normal indexing now; non-existent keys
will return the default value.
Add a VirtRegs collection which tracks virtual registers.
A virtual register is a set of related SSA values whose live ranges
don't interfere. It is advantageous to use the same register or spill
slot for al the values in a virtual register. It reduces copies for EBB
arguments.