Pairwise virtual register coalescing.
Use a better algorithm for resolving interferences in virtual registers. This improves code quality by generating much fewer copies on some complicated functions. After the initial union-find phase, the check_vreg() function uses a Budimlic forest to check for interference between the values in the virtual registers, as before. All the interference-free vregs are done. Others are passed to synthesize_vreg() which dissolves the vreg and then attempts to rebuild one or more vregs from the contained values. The pairwise interference checks use *virtual copies* to make sure that any future conflicts can be resolved by inserting a copy instruction. This technique was not present in the old coalescer which caused some correctness issues. This coalescing algorithm makes much better code, and it is generally a bit slower than before. Some of the slowdown is made up by the following passes being faster because they have to process less code. Example 1, the Python interpreter which contains a very large function with a lot of variables. Before: 15.664 0.011 Register allocation 1.535 1.535 RA liveness analysis 2.872 1.911 RA coalescing CSSA 4.436 4.436 RA spilling 2.610 2.598 RA reloading 4.200 4.199 RA coloring After: 9.795 0.013 Register allocation 1.372 1.372 RA liveness analysis 6.231 6.227 RA coalescing CSSA 0.712 0.712 RA spilling 0.598 0.598 RA reloading 0.869 0.869 RA coloring Coalescing is more than twice as slow, but because of the vastly better code quality, overall register allocation time is improved by 37%. Example 2, the clang compiler. Before: 57.148 0.035 Register allocation 9.630 9.630 RA liveness analysis 7.210 7.169 RA coalescing CSSA 9.972 9.972 RA spilling 11.602 11.572 RA reloading 18.698 18.672 RA coloring After: 64.792 0.042 Register allocation 8.630 8.630 RA liveness analysis 22.937 22.928 RA coalescing CSSA 8.684 8.684 RA spilling 9.559 9.551 RA reloading 14.939 14.936 RA coloring Here coalescing is 3x slower, but overall regalloc time only regresses by 13%. Most examples are less extreme than these two. They just get better code at about the same compile time.
This commit is contained in:
@@ -12,10 +12,11 @@
|
||||
//! memory-to-memory copies when a spilled value is passed as an EBB argument.
|
||||
|
||||
use dbg::DisplayList;
|
||||
use dominator_tree::DominatorTreePreorder;
|
||||
use entity::{EntityList, ListPool};
|
||||
use entity::{PrimaryMap, EntityMap, Keys};
|
||||
use entity::EntityRef;
|
||||
use ir::Value;
|
||||
use ir::{Value, Function};
|
||||
use packed_option::PackedOption;
|
||||
use ref_slice::ref_slice;
|
||||
use std::cmp::Ordering;
|
||||
@@ -112,22 +113,55 @@ impl VirtRegs {
|
||||
}
|
||||
}
|
||||
|
||||
/// Sort the values in `vreg` according to `compare`.
|
||||
///
|
||||
/// If the ordering defined by `compare` is not total, value numbers are used as a last resort
|
||||
/// tie-breaker. This makes it possible to use an unstable sorting algorithm which can be
|
||||
/// faster because it doesn't allocate memory.
|
||||
/// Sort the values in `vreg` according to the dominator tree pre-order.
|
||||
///
|
||||
/// Returns the slice of sorted values which `values(vreg)` will also return from now on.
|
||||
pub fn sort_values<F>(&mut self, vreg: VirtReg, mut compare: F) -> &[Value]
|
||||
where
|
||||
F: FnMut(Value, Value) -> Ordering,
|
||||
{
|
||||
pub fn sort_values(
|
||||
&mut self,
|
||||
vreg: VirtReg,
|
||||
func: &Function,
|
||||
preorder: &DominatorTreePreorder,
|
||||
) -> &[Value] {
|
||||
let s = self.vregs[vreg].as_mut_slice(&mut self.pool);
|
||||
s.sort_unstable_by(|&a, &b| compare(a, b).then(a.cmp(&b)));
|
||||
s.sort_unstable_by(|&a, &b| preorder.pre_cmp_def(a, b, func));
|
||||
s
|
||||
}
|
||||
|
||||
/// Insert a single value into a sorted virtual register.
|
||||
///
|
||||
/// It is assumed that the virtual register containing `big` is already sorted by
|
||||
/// `sort_values()`, and that `single` does not already belong to a virtual register.
|
||||
///
|
||||
/// If `big` is not part of a virtual register, one will be created.
|
||||
pub fn insert_single(
|
||||
&mut self,
|
||||
big: Value,
|
||||
single: Value,
|
||||
func: &Function,
|
||||
preorder: &DominatorTreePreorder,
|
||||
) -> VirtReg {
|
||||
assert_eq!(self.get(single), None, "Expected singleton {}", single);
|
||||
|
||||
// Make sure `big` has a vreg.
|
||||
let vreg = self.get(big).unwrap_or_else(|| {
|
||||
let vr = self.alloc();
|
||||
self.vregs[vr].push(big, &mut self.pool);
|
||||
self.value_vregs[big] = vr.into();
|
||||
vr
|
||||
});
|
||||
|
||||
// Determine the insertion position for `single`.
|
||||
let index = match self.values(vreg).binary_search_by(
|
||||
|&v| preorder.pre_cmp_def(v, single, func),
|
||||
) {
|
||||
Ok(_) => panic!("{} already in {}", single, vreg),
|
||||
Err(i) => i,
|
||||
};
|
||||
self.vregs[vreg].insert(index, single, &mut self.pool);
|
||||
self.value_vregs[single] = vreg.into();
|
||||
vreg
|
||||
}
|
||||
|
||||
/// Remove a virtual register.
|
||||
///
|
||||
/// The values in `vreg` become singletons, and the virtual register number may be reused in
|
||||
|
||||
Reference in New Issue
Block a user