Pairwise virtual register coalescing.

Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.

After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.

The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.

This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.

Example 1, the Python interpreter which contains a very large function
with a lot of variables.

Before:
  15.664    0.011  Register allocation
   1.535    1.535  RA liveness analysis
   2.872    1.911  RA coalescing CSSA
   4.436    4.436  RA spilling
   2.610    2.598  RA reloading
   4.200    4.199  RA coloring

After:
   9.795    0.013  Register allocation
   1.372    1.372  RA liveness analysis
   6.231    6.227  RA coalescing CSSA
   0.712    0.712  RA spilling
   0.598    0.598  RA reloading
   0.869    0.869  RA coloring

Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.

Example 2, the clang compiler.

Before:
  57.148    0.035  Register allocation
   9.630    9.630  RA liveness analysis
   7.210    7.169  RA coalescing CSSA
   9.972    9.972  RA spilling
  11.602   11.572  RA reloading
  18.698   18.672  RA coloring

After:
  64.792    0.042  Register allocation
   8.630    8.630  RA liveness analysis
  22.937   22.928  RA coalescing CSSA
   8.684    8.684  RA spilling
   9.559    9.551  RA reloading
  14.939   14.936  RA coloring

Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.

Most examples are less extreme than these two. They just get better code
at about the same compile time.
This commit is contained in:
Jakob Stoklund Olesen
2018-01-22 12:27:04 -08:00
parent b124eaf77d
commit 416b21c18d
4 changed files with 664 additions and 192 deletions

View File

@@ -12,10 +12,11 @@
//! memory-to-memory copies when a spilled value is passed as an EBB argument.
use dbg::DisplayList;
use dominator_tree::DominatorTreePreorder;
use entity::{EntityList, ListPool};
use entity::{PrimaryMap, EntityMap, Keys};
use entity::EntityRef;
use ir::Value;
use ir::{Value, Function};
use packed_option::PackedOption;
use ref_slice::ref_slice;
use std::cmp::Ordering;
@@ -112,22 +113,55 @@ impl VirtRegs {
}
}
/// Sort the values in `vreg` according to `compare`.
///
/// If the ordering defined by `compare` is not total, value numbers are used as a last resort
/// tie-breaker. This makes it possible to use an unstable sorting algorithm which can be
/// faster because it doesn't allocate memory.
/// Sort the values in `vreg` according to the dominator tree pre-order.
///
/// Returns the slice of sorted values which `values(vreg)` will also return from now on.
pub fn sort_values<F>(&mut self, vreg: VirtReg, mut compare: F) -> &[Value]
where
F: FnMut(Value, Value) -> Ordering,
{
pub fn sort_values(
&mut self,
vreg: VirtReg,
func: &Function,
preorder: &DominatorTreePreorder,
) -> &[Value] {
let s = self.vregs[vreg].as_mut_slice(&mut self.pool);
s.sort_unstable_by(|&a, &b| compare(a, b).then(a.cmp(&b)));
s.sort_unstable_by(|&a, &b| preorder.pre_cmp_def(a, b, func));
s
}
/// Insert a single value into a sorted virtual register.
///
/// It is assumed that the virtual register containing `big` is already sorted by
/// `sort_values()`, and that `single` does not already belong to a virtual register.
///
/// If `big` is not part of a virtual register, one will be created.
pub fn insert_single(
&mut self,
big: Value,
single: Value,
func: &Function,
preorder: &DominatorTreePreorder,
) -> VirtReg {
assert_eq!(self.get(single), None, "Expected singleton {}", single);
// Make sure `big` has a vreg.
let vreg = self.get(big).unwrap_or_else(|| {
let vr = self.alloc();
self.vregs[vr].push(big, &mut self.pool);
self.value_vregs[big] = vr.into();
vr
});
// Determine the insertion position for `single`.
let index = match self.values(vreg).binary_search_by(
|&v| preorder.pre_cmp_def(v, single, func),
) {
Ok(_) => panic!("{} already in {}", single, vreg),
Err(i) => i,
};
self.vregs[vreg].insert(index, single, &mut self.pool);
self.value_vregs[single] = vreg.into();
vreg
}
/// Remove a virtual register.
///
/// The values in `vreg` become singletons, and the virtual register number may be reused in