Pairwise virtual register coalescing.

Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.

After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.

The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.

This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.

Example 1, the Python interpreter which contains a very large function
with a lot of variables.

Before:
  15.664    0.011  Register allocation
   1.535    1.535  RA liveness analysis
   2.872    1.911  RA coalescing CSSA
   4.436    4.436  RA spilling
   2.610    2.598  RA reloading
   4.200    4.199  RA coloring

After:
   9.795    0.013  Register allocation
   1.372    1.372  RA liveness analysis
   6.231    6.227  RA coalescing CSSA
   0.712    0.712  RA spilling
   0.598    0.598  RA reloading
   0.869    0.869  RA coloring

Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.

Example 2, the clang compiler.

Before:
  57.148    0.035  Register allocation
   9.630    9.630  RA liveness analysis
   7.210    7.169  RA coalescing CSSA
   9.972    9.972  RA spilling
  11.602   11.572  RA reloading
  18.698   18.672  RA coloring

After:
  64.792    0.042  Register allocation
   8.630    8.630  RA liveness analysis
  22.937   22.928  RA coalescing CSSA
   8.684    8.684  RA spilling
   9.559    9.551  RA reloading
  14.939   14.936  RA coloring

Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.

Most examples are less extreme than these two. They just get better code
at about the same compile time.
This commit is contained in:
Jakob Stoklund Olesen
2018-01-22 12:27:04 -08:00
parent b124eaf77d
commit 416b21c18d
4 changed files with 664 additions and 192 deletions

View File

@@ -51,6 +51,16 @@ pub enum ExpandedProgramPoint {
Ebb(Ebb),
}
impl ExpandedProgramPoint {
/// Get the instruction we know is inside.
pub fn unwrap_inst(self) -> Inst {
match self {
ExpandedProgramPoint::Inst(x) => x,
ExpandedProgramPoint::Ebb(x) => panic!("expected inst: {}", x),
}
}
}
impl From<Inst> for ExpandedProgramPoint {
fn from(inst: Inst) -> ExpandedProgramPoint {
ExpandedProgramPoint::Inst(inst)