Files
wasmtime/cranelift/codegen/src/loop_analysis.rs
Chris Fallin f980defe17 egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

* Review feedback.

* Review feedback.

* Review feedback.

* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00

444 lines
17 KiB
Rust

//! A loop analysis represented as mappings of loops to their header Block
//! and parent in the loop tree.
use crate::dominator_tree::DominatorTree;
use crate::entity::entity_impl;
use crate::entity::SecondaryMap;
use crate::entity::{Keys, PrimaryMap};
use crate::flowgraph::{BlockPredecessor, ControlFlowGraph};
use crate::ir::{Block, Function, Layout};
use crate::packed_option::PackedOption;
use crate::timing;
use alloc::vec::Vec;
use smallvec::{smallvec, SmallVec};
/// A opaque reference to a code loop.
#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct Loop(u32);
entity_impl!(Loop, "loop");
/// Loop tree information for a single function.
///
/// Loops are referenced by the Loop object, and for each loop you can access its header block,
/// its eventual parent in the loop tree and all the block belonging to the loop.
pub struct LoopAnalysis {
loops: PrimaryMap<Loop, LoopData>,
block_loop_map: SecondaryMap<Block, PackedOption<Loop>>,
valid: bool,
}
struct LoopData {
header: Block,
parent: PackedOption<Loop>,
level: LoopLevel,
}
/// A level in a loop nest.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct LoopLevel(u8);
impl LoopLevel {
const INVALID: u8 = u8::MAX;
/// Get the root level (no loop).
pub fn root() -> Self {
Self(0)
}
/// Get the loop level.
pub fn level(self) -> usize {
self.0 as usize
}
/// Invalid loop level.
pub fn invalid() -> Self {
Self(Self::INVALID)
}
/// One loop level deeper.
pub fn inc(self) -> Self {
if self.0 == (Self::INVALID - 1) {
self
} else {
Self(self.0 + 1)
}
}
/// A clamped loop level from a larger-width (usize) depth.
pub fn clamped(level: usize) -> Self {
Self(
u8::try_from(std::cmp::min(level, (Self::INVALID as usize) - 1))
.expect("Clamped value must always convert"),
)
}
}
impl std::default::Default for LoopLevel {
fn default() -> Self {
LoopLevel::invalid()
}
}
impl LoopData {
/// Creates a `LoopData` object with the loop header and its eventual parent in the loop tree.
pub fn new(header: Block, parent: Option<Loop>) -> Self {
Self {
header,
parent: parent.into(),
level: LoopLevel::invalid(),
}
}
}
/// Methods for querying the loop analysis.
impl LoopAnalysis {
/// Allocate a new blank loop analysis struct. Use `compute` to compute the loop analysis for
/// a function.
pub fn new() -> Self {
Self {
valid: false,
loops: PrimaryMap::new(),
block_loop_map: SecondaryMap::new(),
}
}
/// Returns all the loops contained in a function.
pub fn loops(&self) -> Keys<Loop> {
self.loops.keys()
}
/// Returns the header block of a particular loop.
///
/// The characteristic property of a loop header block is that it dominates some of its
/// predecessors.
pub fn loop_header(&self, lp: Loop) -> Block {
self.loops[lp].header
}
/// Return the eventual parent of a loop in the loop tree.
pub fn loop_parent(&self, lp: Loop) -> Option<Loop> {
self.loops[lp].parent.expand()
}
/// Return the innermost loop for a given block.
pub fn innermost_loop(&self, block: Block) -> Option<Loop> {
self.block_loop_map[block].expand()
}
/// Determine if a Block is a loop header. If so, return the loop.
pub fn is_loop_header(&self, block: Block) -> Option<Loop> {
self.innermost_loop(block)
.filter(|&lp| self.loop_header(lp) == block)
}
/// Determine if a Block belongs to a loop by running a finger along the loop tree.
///
/// Returns `true` if `block` is in loop `lp`.
pub fn is_in_loop(&self, block: Block, lp: Loop) -> bool {
let block_loop = self.block_loop_map[block];
match block_loop.expand() {
None => false,
Some(block_loop) => self.is_child_loop(block_loop, lp),
}
}
/// Determines if a loop is contained in another loop.
///
/// `is_child_loop(child,parent)` returns `true` if and only if `child` is a child loop of
/// `parent` (or `child == parent`).
pub fn is_child_loop(&self, child: Loop, parent: Loop) -> bool {
let mut finger = Some(child);
while let Some(finger_loop) = finger {
if finger_loop == parent {
return true;
}
finger = self.loop_parent(finger_loop);
}
false
}
/// Returns the loop-nest level of a given block.
pub fn loop_level(&self, block: Block) -> LoopLevel {
self.innermost_loop(block)
.map_or(LoopLevel(0), |lp| self.loops[lp].level)
}
}
impl LoopAnalysis {
/// Detects the loops in a function. Needs the control flow graph and the dominator tree.
pub fn compute(&mut self, func: &Function, cfg: &ControlFlowGraph, domtree: &DominatorTree) {
let _tt = timing::loop_analysis();
self.loops.clear();
self.block_loop_map.clear();
self.block_loop_map.resize(func.dfg.num_blocks());
self.find_loop_headers(cfg, domtree, &func.layout);
self.discover_loop_blocks(cfg, domtree, &func.layout);
self.assign_loop_levels();
self.valid = true;
}
/// Check if the loop analysis is in a valid state.
///
/// Note that this doesn't perform any kind of validity checks. It simply checks if the
/// `compute()` method has been called since the last `clear()`. It does not check that the
/// loop analysis is consistent with the CFG.
pub fn is_valid(&self) -> bool {
self.valid
}
/// Clear all the data structures contained in the loop analysis. This will leave the
/// analysis in a similar state to a context returned by `new()` except that allocated
/// memory be retained.
pub fn clear(&mut self) {
self.loops.clear();
self.block_loop_map.clear();
self.valid = false;
}
// Traverses the CFG in reverse postorder and create a loop object for every block having a
// back edge.
fn find_loop_headers(
&mut self,
cfg: &ControlFlowGraph,
domtree: &DominatorTree,
layout: &Layout,
) {
// We traverse the CFG in reverse postorder
for &block in domtree.cfg_postorder().iter().rev() {
for BlockPredecessor {
inst: pred_inst, ..
} in cfg.pred_iter(block)
{
// If the block dominates one of its predecessors it is a back edge
if domtree.dominates(block, pred_inst, layout) {
// This block is a loop header, so we create its associated loop
let lp = self.loops.push(LoopData::new(block, None));
self.block_loop_map[block] = lp.into();
break;
// We break because we only need one back edge to identify a loop header.
}
}
}
}
// Intended to be called after `find_loop_headers`. For each detected loop header,
// discovers all the block belonging to the loop and its inner loops. After a call to this
// function, the loop tree is fully constructed.
fn discover_loop_blocks(
&mut self,
cfg: &ControlFlowGraph,
domtree: &DominatorTree,
layout: &Layout,
) {
let mut stack: Vec<Block> = Vec::new();
// We handle each loop header in reverse order, corresponding to a pseudo postorder
// traversal of the graph.
for lp in self.loops().rev() {
for BlockPredecessor {
block: pred,
inst: pred_inst,
} in cfg.pred_iter(self.loops[lp].header)
{
// We follow the back edges
if domtree.dominates(self.loops[lp].header, pred_inst, layout) {
stack.push(pred);
}
}
while let Some(node) = stack.pop() {
let continue_dfs: Option<Block>;
match self.block_loop_map[node].expand() {
None => {
// The node hasn't been visited yet, we tag it as part of the loop
self.block_loop_map[node] = PackedOption::from(lp);
continue_dfs = Some(node);
}
Some(node_loop) => {
// We copy the node_loop into a mutable reference passed along the while
let mut node_loop = node_loop;
// The node is part of a loop, which can be lp or an inner loop
let mut node_loop_parent_option = self.loops[node_loop].parent;
while let Some(node_loop_parent) = node_loop_parent_option.expand() {
if node_loop_parent == lp {
// We have encountered lp so we stop (already visited)
break;
} else {
//
node_loop = node_loop_parent;
// We lookup the parent loop
node_loop_parent_option = self.loops[node_loop].parent;
}
}
// Now node_loop_parent is either:
// - None and node_loop is an new inner loop of lp
// - Some(...) and the initial node_loop was a known inner loop of lp
match node_loop_parent_option.expand() {
Some(_) => continue_dfs = None,
None => {
if node_loop != lp {
self.loops[node_loop].parent = lp.into();
continue_dfs = Some(self.loops[node_loop].header)
} else {
// If lp is a one-block loop then we make sure we stop
continue_dfs = None
}
}
}
}
}
// Now we have handled the popped node and need to continue the DFS by adding the
// predecessors of that node
if let Some(continue_dfs) = continue_dfs {
for BlockPredecessor { block: pred, .. } in cfg.pred_iter(continue_dfs) {
stack.push(pred)
}
}
}
}
}
fn assign_loop_levels(&mut self) {
let mut stack: SmallVec<[Loop; 8]> = smallvec![];
for lp in self.loops.keys() {
if self.loops[lp].level == LoopLevel::invalid() {
stack.push(lp);
while let Some(&lp) = stack.last() {
if let Some(parent) = self.loops[lp].parent.into() {
if self.loops[parent].level != LoopLevel::invalid() {
self.loops[lp].level = self.loops[parent].level.inc();
stack.pop();
} else {
stack.push(parent);
}
} else {
self.loops[lp].level = LoopLevel::root().inc();
stack.pop();
}
}
}
}
}
}
#[cfg(test)]
mod tests {
use crate::cursor::{Cursor, FuncCursor};
use crate::dominator_tree::DominatorTree;
use crate::flowgraph::ControlFlowGraph;
use crate::ir::{types, Function, InstBuilder};
use crate::loop_analysis::{Loop, LoopAnalysis};
use alloc::vec::Vec;
#[test]
fn nested_loops_detection() {
let mut func = Function::new();
let block0 = func.dfg.make_block();
let block1 = func.dfg.make_block();
let block2 = func.dfg.make_block();
let block3 = func.dfg.make_block();
let cond = func.dfg.append_block_param(block0, types::I32);
{
let mut cur = FuncCursor::new(&mut func);
cur.insert_block(block0);
cur.ins().jump(block1, &[]);
cur.insert_block(block1);
cur.ins().jump(block2, &[]);
cur.insert_block(block2);
cur.ins().brnz(cond, block1, &[]);
cur.ins().jump(block3, &[]);
cur.insert_block(block3);
cur.ins().brnz(cond, block0, &[]);
}
let mut loop_analysis = LoopAnalysis::new();
let mut cfg = ControlFlowGraph::new();
let mut domtree = DominatorTree::new();
cfg.compute(&func);
domtree.compute(&func, &cfg);
loop_analysis.compute(&func, &cfg, &domtree);
let loops = loop_analysis.loops().collect::<Vec<Loop>>();
assert_eq!(loops.len(), 2);
assert_eq!(loop_analysis.loop_header(loops[0]), block0);
assert_eq!(loop_analysis.loop_header(loops[1]), block1);
assert_eq!(loop_analysis.loop_parent(loops[1]), Some(loops[0]));
assert_eq!(loop_analysis.loop_parent(loops[0]), None);
assert_eq!(loop_analysis.is_in_loop(block0, loops[0]), true);
assert_eq!(loop_analysis.is_in_loop(block0, loops[1]), false);
assert_eq!(loop_analysis.is_in_loop(block1, loops[1]), true);
assert_eq!(loop_analysis.is_in_loop(block1, loops[0]), true);
assert_eq!(loop_analysis.is_in_loop(block2, loops[1]), true);
assert_eq!(loop_analysis.is_in_loop(block2, loops[0]), true);
assert_eq!(loop_analysis.is_in_loop(block3, loops[0]), true);
assert_eq!(loop_analysis.is_in_loop(block0, loops[1]), false);
assert_eq!(loop_analysis.loop_level(block0).level(), 1);
assert_eq!(loop_analysis.loop_level(block1).level(), 2);
assert_eq!(loop_analysis.loop_level(block2).level(), 2);
assert_eq!(loop_analysis.loop_level(block3).level(), 1);
}
#[test]
fn complex_loop_detection() {
let mut func = Function::new();
let block0 = func.dfg.make_block();
let block1 = func.dfg.make_block();
let block2 = func.dfg.make_block();
let block3 = func.dfg.make_block();
let block4 = func.dfg.make_block();
let block5 = func.dfg.make_block();
let cond = func.dfg.append_block_param(block0, types::I32);
{
let mut cur = FuncCursor::new(&mut func);
cur.insert_block(block0);
cur.ins().brnz(cond, block1, &[]);
cur.ins().jump(block3, &[]);
cur.insert_block(block1);
cur.ins().jump(block2, &[]);
cur.insert_block(block2);
cur.ins().brnz(cond, block1, &[]);
cur.ins().jump(block5, &[]);
cur.insert_block(block3);
cur.ins().jump(block4, &[]);
cur.insert_block(block4);
cur.ins().brnz(cond, block3, &[]);
cur.ins().jump(block5, &[]);
cur.insert_block(block5);
cur.ins().brnz(cond, block0, &[]);
}
let mut loop_analysis = LoopAnalysis::new();
let mut cfg = ControlFlowGraph::new();
let mut domtree = DominatorTree::new();
cfg.compute(&func);
domtree.compute(&func, &cfg);
loop_analysis.compute(&func, &cfg, &domtree);
let loops = loop_analysis.loops().collect::<Vec<Loop>>();
assert_eq!(loops.len(), 3);
assert_eq!(loop_analysis.loop_header(loops[0]), block0);
assert_eq!(loop_analysis.loop_header(loops[1]), block1);
assert_eq!(loop_analysis.loop_header(loops[2]), block3);
assert_eq!(loop_analysis.loop_parent(loops[1]), Some(loops[0]));
assert_eq!(loop_analysis.loop_parent(loops[2]), Some(loops[0]));
assert_eq!(loop_analysis.loop_parent(loops[0]), None);
assert_eq!(loop_analysis.is_in_loop(block0, loops[0]), true);
assert_eq!(loop_analysis.is_in_loop(block1, loops[1]), true);
assert_eq!(loop_analysis.is_in_loop(block2, loops[1]), true);
assert_eq!(loop_analysis.is_in_loop(block3, loops[2]), true);
assert_eq!(loop_analysis.is_in_loop(block4, loops[2]), true);
assert_eq!(loop_analysis.is_in_loop(block5, loops[0]), true);
assert_eq!(loop_analysis.loop_level(block0).level(), 1);
assert_eq!(loop_analysis.loop_level(block1).level(), 2);
assert_eq!(loop_analysis.loop_level(block2).level(), 2);
assert_eq!(loop_analysis.loop_level(block3).level(), 2);
assert_eq!(loop_analysis.loop_level(block4).level(), 2);
assert_eq!(loop_analysis.loop_level(block5).level(), 1);
}
}