egraph-based midend: draw the rest of the owl (productionized). (#4953)

* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This commit is contained in:
Chris Fallin
2022-10-11 18:15:53 -07:00
committed by GitHub
parent e2f1ced0b6
commit 2be12a5167
59 changed files with 5125 additions and 1580 deletions

View File

@@ -0,0 +1,69 @@
//! Extended domtree with various traversal support.
use crate::dominator_tree::DominatorTree;
use crate::ir::{Block, Function};
use cranelift_entity::{packed_option::PackedOption, SecondaryMap};
#[derive(Clone, Debug)]
pub(crate) struct DomTreeWithChildren {
nodes: SecondaryMap<Block, DomTreeNode>,
root: Block,
}
#[derive(Clone, Copy, Debug, Default)]
struct DomTreeNode {
children: PackedOption<Block>,
next: PackedOption<Block>,
}
impl DomTreeWithChildren {
pub(crate) fn new(func: &Function, domtree: &DominatorTree) -> DomTreeWithChildren {
let mut nodes: SecondaryMap<Block, DomTreeNode> =
SecondaryMap::with_capacity(func.dfg.num_blocks());
for block in func.layout.blocks() {
let idom_inst = match domtree.idom(block) {
Some(idom_inst) => idom_inst,
None => continue,
};
let idom = func
.layout
.inst_block(idom_inst)
.expect("Dominating instruction should be part of a block");
nodes[block].next = nodes[idom].children;
nodes[idom].children = block.into();
}
let root = func.layout.entry_block().unwrap();
Self { nodes, root }
}
pub(crate) fn root(&self) -> Block {
self.root
}
pub(crate) fn children<'a>(&'a self, block: Block) -> DomTreeChildIter<'a> {
let block = self.nodes[block].children;
DomTreeChildIter {
domtree: self,
block,
}
}
}
pub(crate) struct DomTreeChildIter<'a> {
domtree: &'a DomTreeWithChildren,
block: PackedOption<Block>,
}
impl<'a> Iterator for DomTreeChildIter<'a> {
type Item = Block;
fn next(&mut self) -> Option<Block> {
self.block.expand().map(|block| {
self.block = self.domtree.nodes[block].next;
block
})
}
}

View File

@@ -0,0 +1,612 @@
//! Elaboration phase: lowers EGraph back to sequences of operations
//! in CFG nodes.
use super::domtree::DomTreeWithChildren;
use super::node::{op_cost, Cost, Node, NodeCtx};
use super::Analysis;
use super::Stats;
use crate::dominator_tree::DominatorTree;
use crate::fx::FxHashSet;
use crate::ir::{Block, Function, Inst, Opcode, RelSourceLoc, Type, Value, ValueList};
use crate::loop_analysis::LoopAnalysis;
use crate::scoped_hash_map::ScopedHashMap;
use crate::trace;
use alloc::vec::Vec;
use cranelift_egraph::{EGraph, Id, Language, NodeKey};
use cranelift_entity::{packed_option::PackedOption, SecondaryMap};
use smallvec::{smallvec, SmallVec};
use std::ops::Add;
type LoopDepth = u32;
pub(crate) struct Elaborator<'a> {
func: &'a mut Function,
domtree: &'a DominatorTree,
loop_analysis: &'a LoopAnalysis,
node_ctx: &'a NodeCtx,
egraph: &'a EGraph<NodeCtx, Analysis>,
id_to_value: ScopedHashMap<Id, IdValue>,
id_to_best_cost_and_node: SecondaryMap<Id, (Cost, Id)>,
/// Stack of blocks and loops in current elaboration path.
loop_stack: SmallVec<[LoopStackEntry; 8]>,
cur_block: Option<Block>,
first_branch: SecondaryMap<Block, PackedOption<Inst>>,
remat_ids: &'a FxHashSet<Id>,
/// Explicitly-unrolled value elaboration stack.
elab_stack: Vec<ElabStackEntry>,
elab_result_stack: Vec<IdValue>,
/// Explicitly-unrolled block elaboration stack.
block_stack: Vec<BlockStackEntry>,
stats: &'a mut Stats,
}
#[derive(Clone, Debug)]
struct LoopStackEntry {
/// The hoist point: a block that immediately dominates this
/// loop. May not be an immediate predecessor, but will be a valid
/// point to place all loop-invariant ops: they must depend only
/// on inputs that dominate the loop, so are available at (the end
/// of) this block.
hoist_block: Block,
/// The depth in the scope map.
scope_depth: u32,
}
#[derive(Clone, Debug)]
enum ElabStackEntry {
/// Next action is to resolve this id into a node and elaborate
/// args.
Start { id: Id },
/// Args have been pushed; waiting for results.
PendingNode {
canonical: Id,
node_key: NodeKey,
remat: bool,
num_args: usize,
},
/// Waiting for a result to return one projected value of a
/// multi-value result.
PendingProjection { canonical: Id, index: usize },
}
#[derive(Clone, Debug)]
enum BlockStackEntry {
Elaborate { block: Block, idom: Option<Block> },
Pop,
}
#[derive(Clone, Debug)]
enum IdValue {
/// A single value.
Value {
depth: LoopDepth,
block: Block,
value: Value,
},
/// Multiple results; indices in `node_args`.
Values {
depth: LoopDepth,
block: Block,
values: ValueList,
},
}
impl IdValue {
fn block(&self) -> Block {
match self {
IdValue::Value { block, .. } | IdValue::Values { block, .. } => *block,
}
}
}
impl<'a> Elaborator<'a> {
pub(crate) fn new(
func: &'a mut Function,
domtree: &'a DominatorTree,
loop_analysis: &'a LoopAnalysis,
egraph: &'a EGraph<NodeCtx, Analysis>,
node_ctx: &'a NodeCtx,
remat_ids: &'a FxHashSet<Id>,
stats: &'a mut Stats,
) -> Self {
let num_blocks = func.dfg.num_blocks();
let mut id_to_best_cost_and_node =
SecondaryMap::with_default((Cost::infinity(), Id::invalid()));
id_to_best_cost_and_node.resize(egraph.classes.len());
Self {
func,
domtree,
loop_analysis,
egraph,
node_ctx,
id_to_value: ScopedHashMap::with_capacity(egraph.classes.len()),
id_to_best_cost_and_node,
loop_stack: smallvec![],
cur_block: None,
first_branch: SecondaryMap::with_capacity(num_blocks),
remat_ids,
elab_stack: vec![],
elab_result_stack: vec![],
block_stack: vec![],
stats,
}
}
fn cur_loop_depth(&self) -> LoopDepth {
self.loop_stack.len() as LoopDepth
}
fn start_block(&mut self, idom: Option<Block>, block: Block, block_params: &[(Id, Type)]) {
trace!(
"start_block: block {:?} with idom {:?} at loop depth {} scope depth {}",
block,
idom,
self.cur_loop_depth(),
self.id_to_value.depth()
);
// Note that if the *entry* block is a loop header, we will
// not make note of the loop here because it will not have an
// immediate dominator. We must disallow this case because we
// will skip adding the `LoopStackEntry` here but our
// `LoopAnalysis` will otherwise still make note of this loop
// and loop depths will not match.
if let Some(idom) = idom {
if self.loop_analysis.is_loop_header(block).is_some() {
self.loop_stack.push(LoopStackEntry {
// Any code hoisted out of this loop will have code
// placed in `idom`, and will have def mappings
// inserted in to the scoped hashmap at that block's
// level.
hoist_block: idom,
scope_depth: (self.id_to_value.depth() - 1) as u32,
});
trace!(
" -> loop header, pushing; depth now {}",
self.loop_stack.len()
);
}
} else {
debug_assert!(
self.loop_analysis.is_loop_header(block).is_none(),
"Entry block (domtree root) cannot be a loop header!"
);
}
self.cur_block = Some(block);
for &(id, ty) in block_params {
let value = self.func.dfg.append_block_param(block, ty);
trace!(" -> block param id {:?} value {:?}", id, value);
self.id_to_value.insert_if_absent(
id,
IdValue::Value {
depth: self.cur_loop_depth(),
block,
value,
},
);
}
}
fn add_node(&mut self, node: &Node, args: &[Value], to_block: Block) -> ValueList {
let (instdata, result_tys) = match node {
Node::Pure { op, types, .. } | Node::Inst { op, types, .. } => (
op.with_args(args, &mut self.func.dfg.value_lists),
types.as_slice(&self.node_ctx.types),
),
Node::Load { op, ty, .. } => (
op.with_args(args, &mut self.func.dfg.value_lists),
std::slice::from_ref(ty),
),
_ => panic!("Cannot `add_node()` on block param or projection"),
};
let srcloc = match node {
Node::Inst { srcloc, .. } | Node::Load { srcloc, .. } => *srcloc,
_ => RelSourceLoc::default(),
};
let opcode = instdata.opcode();
// Is this instruction either an actual terminator (an
// instruction that must end the block), or at least in the
// group of branches at the end (including conditional
// branches that may be followed by an actual terminator)? We
// call this the "terminator group", and we record the first
// inst in this group (`first_branch` below) so that we do not
// insert instructions needed only by args of later
// instructions in the terminator group in the middle of the
// terminator group.
//
// E.g., for the original sequence
// v1 = op ...
// brnz vCond, block1
// jump block2(v1)
//
// elaboration would naively produce
//
// brnz vCond, block1
// v1 = op ...
// jump block2(v1)
//
// but we use the `first_branch` mechanism below to ensure
// that once we've emitted at least one branch, all other
// elaborated insts have to go before that. So we emit brnz
// first, then as we elaborate the jump, we find we need the
// `op`; we `insert_inst` it *before* the brnz (which is the
// `first_branch`).
let is_terminator_group_inst =
opcode.is_branch() || opcode.is_return() || opcode == Opcode::Trap;
let inst = self.func.dfg.make_inst(instdata);
self.func.srclocs[inst] = srcloc;
for &ty in result_tys {
self.func.dfg.append_result(inst, ty);
}
if is_terminator_group_inst {
self.func.layout.append_inst(inst, to_block);
if self.first_branch[to_block].is_none() {
self.first_branch[to_block] = Some(inst).into();
}
} else if let Some(branch) = self.first_branch[to_block].into() {
self.func.layout.insert_inst(inst, branch);
} else {
self.func.layout.append_inst(inst, to_block);
}
self.func.dfg.inst_results_list(inst)
}
fn compute_best_nodes(&mut self) {
let best = &mut self.id_to_best_cost_and_node;
for (eclass_id, eclass) in &self.egraph.classes {
trace!("computing best for eclass {:?}", eclass_id);
if let Some(child1) = eclass.child1() {
trace!(" -> child {:?}", child1);
best[eclass_id] = best[child1];
}
if let Some(child2) = eclass.child2() {
trace!(" -> child {:?}", child2);
if best[child2].0 < best[eclass_id].0 {
best[eclass_id] = best[child2];
}
}
if let Some(node_key) = eclass.get_node() {
let node = node_key.node(&self.egraph.nodes);
trace!(" -> eclass {:?}: node {:?}", eclass_id, node);
let (cost, id) = match node {
Node::Param { .. }
| Node::Inst { .. }
| Node::Load { .. }
| Node::Result { .. } => (Cost::zero(), eclass_id),
Node::Pure { op, .. } => {
let args_cost = self
.node_ctx
.children(node)
.iter()
.map(|&arg_id| {
trace!(" -> arg {:?}", arg_id);
best[arg_id].0
})
// Can't use `.sum()` for `Cost` types; do
// an explicit reduce instead.
.fold(Cost::zero(), Cost::add);
let level = self.egraph.analysis_value(eclass_id).loop_level;
let cost = op_cost(op).at_level(level) + args_cost;
(cost, eclass_id)
}
};
if cost < best[eclass_id].0 {
best[eclass_id] = (cost, id);
}
}
debug_assert_ne!(best[eclass_id].0, Cost::infinity());
debug_assert_ne!(best[eclass_id].1, Id::invalid());
trace!("best for eclass {:?}: {:?}", eclass_id, best[eclass_id]);
}
}
fn elaborate_eclass_use(&mut self, id: Id) {
self.elab_stack.push(ElabStackEntry::Start { id });
self.process_elab_stack();
debug_assert_eq!(self.elab_result_stack.len(), 1);
self.elab_result_stack.clear();
}
fn process_elab_stack(&mut self) {
while let Some(entry) = self.elab_stack.last() {
match entry {
&ElabStackEntry::Start { id } => {
// We always replace the Start entry, so pop it now.
self.elab_stack.pop();
self.stats.elaborate_visit_node += 1;
let canonical = self.egraph.canonical_id(id);
trace!("elaborate: id {}", id);
let remat = if let Some(val) = self.id_to_value.get(&canonical) {
// Look at the defined block, and determine whether this
// node kind allows rematerialization if the value comes
// from another block. If so, ignore the hit and recompute
// below.
let remat = val.block() != self.cur_block.unwrap()
&& self.remat_ids.contains(&canonical);
if !remat {
trace!("elaborate: id {} -> {:?}", id, val);
self.stats.elaborate_memoize_hit += 1;
self.elab_result_stack.push(val.clone());
continue;
}
trace!("elaborate: id {} -> remat", id);
self.stats.elaborate_memoize_miss_remat += 1;
// The op is pure at this point, so it is always valid to
// remove from this map.
self.id_to_value.remove(&canonical);
true
} else {
self.remat_ids.contains(&canonical)
};
self.stats.elaborate_memoize_miss += 1;
// Get the best option; we use `id` (latest id) here so we
// have a full view of the eclass.
let (_, best_node_eclass) = self.id_to_best_cost_and_node[id];
debug_assert_ne!(best_node_eclass, Id::invalid());
trace!(
"elaborate: id {} -> best {} -> eclass node {:?}",
id,
best_node_eclass,
self.egraph.classes[best_node_eclass]
);
let node_key = self.egraph.classes[best_node_eclass].get_node().unwrap();
let node = node_key.node(&self.egraph.nodes);
trace!(" -> enode {:?}", node);
// Is the node a block param? We should never get here if so
// (they are inserted when first visiting the block).
if matches!(node, Node::Param { .. }) {
unreachable!("Param nodes should already be inserted");
}
// Is the node a result projection? If so, resolve
// the value we are projecting a part of, then
// eventually return here (saving state with a
// PendingProjection).
if let Node::Result { value, result, .. } = node {
trace!(" -> result; pushing arg value {}", value);
self.elab_stack.push(ElabStackEntry::PendingProjection {
index: *result,
canonical,
});
self.elab_stack.push(ElabStackEntry::Start { id: *value });
continue;
}
// We're going to need to emit this
// operator. First, enqueue all args to be
// elaborated. Push state to receive the results
// and later elab this node.
let num_args = self.node_ctx.children(&node).len();
self.elab_stack.push(ElabStackEntry::PendingNode {
canonical,
node_key,
remat,
num_args,
});
// Push args in reverse order so we process the
// first arg first.
for &arg_id in self.node_ctx.children(&node).iter().rev() {
self.elab_stack.push(ElabStackEntry::Start { id: arg_id });
}
}
&ElabStackEntry::PendingNode {
canonical,
node_key,
remat,
num_args,
} => {
self.elab_stack.pop();
let node = node_key.node(&self.egraph.nodes);
// We should have all args resolved at this point.
let arg_idx = self.elab_result_stack.len() - num_args;
let args = &self.elab_result_stack[arg_idx..];
// Gather the individual output-CLIF `Value`s.
let arg_values: SmallVec<[Value; 8]> = args
.iter()
.map(|idvalue| match idvalue {
IdValue::Value { value, .. } => *value,
IdValue::Values { .. } => {
panic!("enode depends directly on multi-value result")
}
})
.collect();
// Compute max loop depth.
let max_loop_depth = args
.iter()
.map(|idvalue| match idvalue {
IdValue::Value { depth, .. } => *depth,
IdValue::Values { .. } => unreachable!(),
})
.max()
.unwrap_or(0);
// Remove args from result stack.
self.elab_result_stack.truncate(arg_idx);
// Determine the location at which we emit it. This is the
// current block *unless* we hoist above a loop when all args
// are loop-invariant (and this op is pure).
let (loop_depth, scope_depth, block) = if node.is_non_pure() {
// Non-pure op: always at the current location.
(
self.cur_loop_depth(),
self.id_to_value.depth(),
self.cur_block.unwrap(),
)
} else if max_loop_depth == self.cur_loop_depth() || remat {
// Pure op, but depends on some value at the current loop
// depth, or remat forces it here: as above.
(
self.cur_loop_depth(),
self.id_to_value.depth(),
self.cur_block.unwrap(),
)
} else {
// Pure op, and does not depend on any args at current
// loop depth: hoist out of loop.
self.stats.elaborate_licm_hoist += 1;
let data = &self.loop_stack[max_loop_depth as usize];
(max_loop_depth, data.scope_depth as usize, data.hoist_block)
};
// Loop scopes are a subset of all scopes.
debug_assert!(scope_depth >= loop_depth as usize);
// This is an actual operation; emit the node in sequence now.
let results = self.add_node(node, &arg_values[..], block);
let results_slice = results.as_slice(&self.func.dfg.value_lists);
// Build the result and memoize in the id-to-value map.
let result = if results_slice.len() == 1 {
IdValue::Value {
depth: loop_depth,
block,
value: results_slice[0],
}
} else {
IdValue::Values {
depth: loop_depth,
block,
values: results,
}
};
self.id_to_value.insert_if_absent_with_depth(
canonical,
result.clone(),
scope_depth,
);
// Push onto the elab-results stack.
self.elab_result_stack.push(result)
}
&ElabStackEntry::PendingProjection { index, canonical } => {
self.elab_stack.pop();
// Grab the input from the elab-result stack.
let value = self.elab_result_stack.pop().expect("Should have result");
let (depth, block, values) = match value {
IdValue::Values {
depth,
block,
values,
..
} => (depth, block, values),
IdValue::Value { .. } => {
unreachable!("Projection nodes should not be used on single results");
}
};
let values = values.as_slice(&self.func.dfg.value_lists);
let value = IdValue::Value {
depth,
block,
value: values[index],
};
self.id_to_value.insert_if_absent(canonical, value.clone());
self.elab_result_stack.push(value);
}
}
}
}
fn elaborate_block<'b, PF: Fn(Block) -> &'b [(Id, Type)], SEF: Fn(Block) -> &'b [Id]>(
&mut self,
idom: Option<Block>,
block: Block,
block_params_fn: &PF,
block_side_effects_fn: &SEF,
) {
let blockparam_ids_tys = (block_params_fn)(block);
self.start_block(idom, block, blockparam_ids_tys);
for &id in (block_side_effects_fn)(block) {
self.elaborate_eclass_use(id);
}
}
fn elaborate_domtree<'b, PF: Fn(Block) -> &'b [(Id, Type)], SEF: Fn(Block) -> &'b [Id]>(
&mut self,
block_params_fn: &PF,
block_side_effects_fn: &SEF,
domtree: &DomTreeWithChildren,
) {
let root = domtree.root();
self.block_stack.push(BlockStackEntry::Elaborate {
block: root,
idom: None,
});
while let Some(top) = self.block_stack.pop() {
match top {
BlockStackEntry::Elaborate { block, idom } => {
self.block_stack.push(BlockStackEntry::Pop);
self.id_to_value.increment_depth();
self.elaborate_block(idom, block, block_params_fn, block_side_effects_fn);
// Push children. We are doing a preorder
// traversal so we do this after processing this
// block above.
let block_stack_end = self.block_stack.len();
for child in domtree.children(block) {
self.block_stack.push(BlockStackEntry::Elaborate {
block: child,
idom: Some(block),
});
}
// Reverse what we just pushed so we elaborate in
// original block order. (The domtree iter is a
// single-ended iter over a singly-linked list so
// we can't `.rev()` above.)
self.block_stack[block_stack_end..].reverse();
}
BlockStackEntry::Pop => {
self.id_to_value.decrement_depth();
if let Some(innermost_loop) = self.loop_stack.last() {
if innermost_loop.scope_depth as usize == self.id_to_value.depth() {
self.loop_stack.pop();
}
}
}
}
}
}
fn clear_func_body(&mut self) {
// Clear all instructions and args/results from the DFG. We
// rebuild them entirely during elaboration. (TODO: reuse the
// existing inst for the *first* copy of a given node.)
self.func.dfg.clear_insts();
// Clear the instructions in every block, but leave the list
// of blocks and their layout unmodified.
self.func.layout.clear_insts();
self.func.srclocs.clear();
}
pub(crate) fn elaborate<'b, PF: Fn(Block) -> &'b [(Id, Type)], SEF: Fn(Block) -> &'b [Id]>(
&mut self,
block_params_fn: PF,
block_side_effects_fn: SEF,
) {
let domtree = DomTreeWithChildren::new(self.func, self.domtree);
self.stats.elaborate_func += 1;
self.stats.elaborate_func_pre_insts += self.func.dfg.num_insts() as u64;
self.clear_func_body();
self.compute_best_nodes();
self.elaborate_domtree(&block_params_fn, &block_side_effects_fn, &domtree);
self.stats.elaborate_func_post_insts += self.func.dfg.num_insts() as u64;
}
}

View File

@@ -0,0 +1,376 @@
//! Node definition for EGraph representation.
use super::MemoryState;
use crate::ir::{Block, DataFlowGraph, Inst, InstructionImms, Opcode, RelSourceLoc, Type};
use crate::loop_analysis::LoopLevel;
use cranelift_egraph::{BumpArena, BumpSlice, CtxEq, CtxHash, Id, Language, UnionFind};
use cranelift_entity::{EntityList, ListPool};
use std::hash::{Hash, Hasher};
#[derive(Debug)]
pub enum Node {
/// A blockparam. Effectively an input/root; does not refer to
/// predecessors' branch arguments, because this would create
/// cycles.
Param {
/// CLIF block this param comes from.
block: Block,
/// Index of blockparam within block.
index: u32,
/// Type of the value.
ty: Type,
/// The loop level of this Param.
loop_level: LoopLevel,
},
/// A CLIF instruction that is pure (has no side-effects). Not
/// tied to any location; we will compute a set of locations at
/// which to compute this node during lowering back out of the
/// egraph.
Pure {
/// The instruction data, without SSA values.
op: InstructionImms,
/// eclass arguments to the operator.
args: EntityList<Id>,
/// Types of results.
types: BumpSlice<Type>,
},
/// A CLIF instruction that has side-effects or is otherwise not
/// representable by `Pure`.
Inst {
/// The instruction data, without SSA values.
op: InstructionImms,
/// eclass arguments to the operator.
args: EntityList<Id>,
/// Types of results.
types: BumpSlice<Type>,
/// The index of the original instruction. We include this so
/// that the `Inst`s are not deduplicated: every instance is a
/// logically separate and unique side-effect. However,
/// because we clear the DataFlowGraph before elaboration,
/// this `Inst` is *not* valid to fetch any details from the
/// original instruction.
inst: Inst,
/// The source location to preserve.
srcloc: RelSourceLoc,
/// The loop level of this Inst.
loop_level: LoopLevel,
},
/// A projection of one result of an `Inst` or `Pure`.
Result {
/// `Inst` or `Pure` node.
value: Id,
/// Index of the result we want.
result: usize,
/// Type of the value.
ty: Type,
},
/// A load instruction. Nominally a side-effecting `Inst` (and
/// included in the list of side-effecting roots so it will always
/// be elaborated), but represented as a distinct kind of node so
/// that we can leverage deduplication to do
/// redundant-load-elimination for free (and make store-to-load
/// forwarding much easier).
Load {
// -- identity depends on:
/// The original load operation. Must have one argument, the
/// address.
op: InstructionImms,
/// The type of the load result.
ty: Type,
/// Address argument. Actual address has an offset, which is
/// included in `op` (and thus already considered as part of
/// the key).
addr: Id,
/// The abstract memory state that this load accesses.
mem_state: MemoryState,
// -- not included in dedup key:
/// The `Inst` we will use for a trap location for this
/// load. Excluded from Eq/Hash so that loads that are
/// identical except for the specific instance will dedup on
/// top of each other.
inst: Inst,
/// Source location, for traps. Not included in Eq/Hash.
srcloc: RelSourceLoc,
},
}
impl Node {
pub(crate) fn is_non_pure(&self) -> bool {
match self {
Node::Inst { .. } | Node::Load { .. } => true,
_ => false,
}
}
}
/// Shared pools for type and id lists in nodes.
pub struct NodeCtx {
/// Arena for result-type arrays.
pub types: BumpArena<Type>,
/// Arena for arg eclass-ID lists.
pub args: ListPool<Id>,
}
impl NodeCtx {
pub(crate) fn with_capacity_for_dfg(dfg: &DataFlowGraph) -> Self {
let n_types = dfg.num_values();
let n_args = dfg.value_lists.capacity();
Self {
types: BumpArena::arena_with_capacity(n_types),
args: ListPool::with_capacity(n_args),
}
}
}
impl NodeCtx {
fn ids_eq(&self, a: &EntityList<Id>, b: &EntityList<Id>, uf: &mut UnionFind) -> bool {
let a = a.as_slice(&self.args);
let b = b.as_slice(&self.args);
a.len() == b.len() && a.iter().zip(b.iter()).all(|(&a, &b)| uf.equiv_id_mut(a, b))
}
fn hash_ids<H: Hasher>(&self, a: &EntityList<Id>, hash: &mut H, uf: &mut UnionFind) {
let a = a.as_slice(&self.args);
for &id in a {
uf.hash_id_mut(hash, id);
}
}
}
impl CtxEq<Node, Node> for NodeCtx {
fn ctx_eq(&self, a: &Node, b: &Node, uf: &mut UnionFind) -> bool {
match (a, b) {
(
&Node::Param {
block,
index,
ty,
loop_level: _,
},
&Node::Param {
block: other_block,
index: other_index,
ty: other_ty,
loop_level: _,
},
) => block == other_block && index == other_index && ty == other_ty,
(
&Node::Result { value, result, ty },
&Node::Result {
value: other_value,
result: other_result,
ty: other_ty,
},
) => uf.equiv_id_mut(value, other_value) && result == other_result && ty == other_ty,
(
&Node::Pure {
ref op,
ref args,
ref types,
},
&Node::Pure {
op: ref other_op,
args: ref other_args,
types: ref other_types,
},
) => {
*op == *other_op
&& self.ids_eq(args, other_args, uf)
&& types.as_slice(&self.types) == other_types.as_slice(&self.types)
}
(
&Node::Inst { inst, ref args, .. },
&Node::Inst {
inst: other_inst,
args: ref other_args,
..
},
) => inst == other_inst && self.ids_eq(args, other_args, uf),
(
&Node::Load {
ref op,
ty,
addr,
mem_state,
..
},
&Node::Load {
op: ref other_op,
ty: other_ty,
addr: other_addr,
mem_state: other_mem_state,
// Explicitly exclude: `inst` and `srcloc`. We
// want loads to merge if identical in
// opcode/offset, address expression, and last
// store (this does implicit
// redundant-load-elimination.)
//
// Note however that we *do* include `ty` (the
// type) and match on that: we otherwise would
// have no way of disambiguating loads of
// different widths to the same address.
..
},
) => {
op == other_op
&& ty == other_ty
&& uf.equiv_id_mut(addr, other_addr)
&& mem_state == other_mem_state
}
_ => false,
}
}
}
impl CtxHash<Node> for NodeCtx {
fn ctx_hash(&self, value: &Node, uf: &mut UnionFind) -> u64 {
let mut state = crate::fx::FxHasher::default();
std::mem::discriminant(value).hash(&mut state);
match value {
&Node::Param {
block,
index,
ty: _,
loop_level: _,
} => {
block.hash(&mut state);
index.hash(&mut state);
}
&Node::Result {
value,
result,
ty: _,
} => {
uf.hash_id_mut(&mut state, value);
result.hash(&mut state);
}
&Node::Pure {
ref op,
ref args,
types: _,
} => {
op.hash(&mut state);
self.hash_ids(args, &mut state, uf);
// Don't hash `types`: it requires an indirection
// (hence cache misses), and result type *should* be
// fully determined by op and args.
}
&Node::Inst { inst, ref args, .. } => {
inst.hash(&mut state);
self.hash_ids(args, &mut state, uf);
}
&Node::Load {
ref op,
ty,
addr,
mem_state,
..
} => {
op.hash(&mut state);
ty.hash(&mut state);
uf.hash_id_mut(&mut state, addr);
mem_state.hash(&mut state);
}
}
state.finish()
}
}
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub(crate) struct Cost(u32);
impl Cost {
pub(crate) fn at_level(&self, loop_level: LoopLevel) -> Cost {
let loop_level = std::cmp::min(2, loop_level.level());
let multiplier = 1u32 << ((10 * loop_level) as u32);
Cost(self.0.saturating_mul(multiplier)).finite()
}
pub(crate) fn infinity() -> Cost {
// 2^32 - 1 is, uh, pretty close to infinite... (we use `Cost`
// only for heuristics and always saturate so this suffices!)
Cost(u32::MAX)
}
pub(crate) fn zero() -> Cost {
Cost(0)
}
/// Clamp this cost at a "finite" value. Can be used in
/// conjunction with saturating ops to avoid saturating into
/// `infinity()`.
fn finite(self) -> Cost {
Cost(std::cmp::min(u32::MAX - 1, self.0))
}
}
impl std::default::Default for Cost {
fn default() -> Cost {
Cost::zero()
}
}
impl std::ops::Add<Cost> for Cost {
type Output = Cost;
fn add(self, other: Cost) -> Cost {
Cost(self.0.saturating_add(other.0)).finite()
}
}
pub(crate) fn op_cost(op: &InstructionImms) -> Cost {
match op.opcode() {
// Constants.
Opcode::Iconst | Opcode::F32const | Opcode::F64const | Opcode::Bconst => Cost(0),
// Extends/reduces.
Opcode::Bextend
| Opcode::Breduce
| Opcode::Uextend
| Opcode::Sextend
| Opcode::Ireduce
| Opcode::Iconcat
| Opcode::Isplit => Cost(1),
// "Simple" arithmetic.
Opcode::Iadd
| Opcode::Isub
| Opcode::Band
| Opcode::BandNot
| Opcode::Bor
| Opcode::BorNot
| Opcode::Bxor
| Opcode::BxorNot
| Opcode::Bnot => Cost(2),
// Everything else.
_ => Cost(3),
}
}
impl Language for NodeCtx {
type Node = Node;
fn children<'a>(&'a self, node: &'a Node) -> &'a [Id] {
match node {
Node::Param { .. } => &[],
Node::Pure { args, .. } | Node::Inst { args, .. } => args.as_slice(&self.args),
Node::Load { addr, .. } => std::slice::from_ref(addr),
Node::Result { value, .. } => std::slice::from_ref(value),
}
}
fn children_mut<'a>(&'a mut self, node: &'a mut Node) -> &'a mut [Id] {
match node {
Node::Param { .. } => &mut [],
Node::Pure { args, .. } | Node::Inst { args, .. } => args.as_mut_slice(&mut self.args),
Node::Load { addr, .. } => std::slice::from_mut(addr),
Node::Result { value, .. } => std::slice::from_mut(value),
}
}
fn needs_dedup(&self, node: &Node) -> bool {
match node {
Node::Pure { .. } | Node::Load { .. } => true,
_ => false,
}
}
}

View File

@@ -0,0 +1,266 @@
//! Last-store tracking via alias analysis.
//!
//! We partition memory state into several *disjoint pieces* of
//! "abstract state". There are a finite number of such pieces:
//! currently, we call them "heap", "table", "vmctx", and "other". Any
//! given address in memory belongs to exactly one disjoint piece.
//!
//! One never tracks which piece a concrete address belongs to at
//! runtime; this is a purely static concept. Instead, all
//! memory-accessing instructions (loads and stores) are labeled with
//! one of these four categories in the `MemFlags`. It is forbidden
//! for a load or store to access memory under one category and a
//! later load or store to access the same memory under a different
//! category. This is ensured to be true by construction during
//! frontend translation into CLIF and during legalization.
//!
//! Given that this non-aliasing property is ensured by the producer
//! of CLIF, we can compute a *may-alias* property: one load or store
//! may-alias another load or store if both access the same category
//! of abstract state.
//!
//! The "last store" pass helps to compute this aliasing: we perform a
//! fixpoint analysis to track the last instruction that *might have*
//! written to a given part of abstract state. We also track the block
//! containing this store.
//!
//! We can't say for sure that the "last store" *did* actually write
//! that state, but we know for sure that no instruction *later* than
//! it (up to the current instruction) did. However, we can get a
//! must-alias property from this: if at a given load or store, we
//! look backward to the "last store", *AND* we find that it has
//! exactly the same address expression and value type, then we know
//! that the current instruction's access *must* be to the same memory
//! location.
//!
//! To get this must-alias property, we leverage the node
//! hashconsing. We design the Eq/Hash (node identity relation
//! definition) of the `Node` struct so that all loads with (i) the
//! same "last store", and (ii) the same address expression, and (iii)
//! the same opcode-and-offset, will deduplicate (the first will be
//! computed, and the later ones will use the same value). Furthermore
//! we have an optimization that rewrites a load into the stored value
//! of the last store *if* the last store has the same address
//! expression and constant offset.
//!
//! This gives us two optimizations, "redundant load elimination" and
//! "store-to-load forwarding".
//!
//! In theory we could also do *dead-store elimination*, where if a
//! store overwrites a value earlier written by another store, *and*
//! if no other load/store to the abstract state category occurred,
//! *and* no other trapping instruction occurred (at which point we
//! need an up-to-date memory state because post-trap-termination
//! memory state can be observed), *and* we can prove the original
//! store could not have trapped, then we can eliminate the original
//! store. Because this is so complex, and the conditions for doing it
//! correctly when post-trap state must be correct likely reduce the
//! potential benefit, we don't yet do this.
use crate::flowgraph::ControlFlowGraph;
use crate::fx::{FxHashMap, FxHashSet};
use crate::inst_predicates::has_memory_fence_semantics;
use crate::ir::{Block, Function, Inst, InstructionData, MemFlags, Opcode};
use crate::trace;
use cranelift_entity::SecondaryMap;
use smallvec::{smallvec, SmallVec};
/// For a given program point, the vector of last-store instruction
/// indices for each disjoint category of abstract state.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)]
struct LastStores {
heap: MemoryState,
table: MemoryState,
vmctx: MemoryState,
other: MemoryState,
}
/// State of memory seen by a load.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
pub enum MemoryState {
/// State at function entry: nothing is known (but it is one
/// consistent value, so two loads from "entry" state at the same
/// address will still provide the same result).
#[default]
Entry,
/// State just after a store by the given instruction. The
/// instruction is a store from which we can forward.
Store(Inst),
/// State just before the given instruction. Used for abstract
/// value merges at merge-points when we cannot name a single
/// producing site.
BeforeInst(Inst),
/// State just after the given instruction. Used when the
/// instruction may update the associated state, but is not a
/// store whose value we can cleanly forward. (E.g., perhaps a
/// barrier of some sort.)
AfterInst(Inst),
}
impl LastStores {
fn update(&mut self, func: &Function, inst: Inst) {
let opcode = func.dfg[inst].opcode();
if has_memory_fence_semantics(opcode) {
self.heap = MemoryState::AfterInst(inst);
self.table = MemoryState::AfterInst(inst);
self.vmctx = MemoryState::AfterInst(inst);
self.other = MemoryState::AfterInst(inst);
} else if opcode.can_store() {
if let Some(memflags) = func.dfg[inst].memflags() {
*self.for_flags(memflags) = MemoryState::Store(inst);
} else {
self.heap = MemoryState::AfterInst(inst);
self.table = MemoryState::AfterInst(inst);
self.vmctx = MemoryState::AfterInst(inst);
self.other = MemoryState::AfterInst(inst);
}
}
}
fn for_flags(&mut self, memflags: MemFlags) -> &mut MemoryState {
if memflags.heap() {
&mut self.heap
} else if memflags.table() {
&mut self.table
} else if memflags.vmctx() {
&mut self.vmctx
} else {
&mut self.other
}
}
fn meet_from(&mut self, other: &LastStores, loc: Inst) {
let meet = |a: MemoryState, b: MemoryState| -> MemoryState {
match (a, b) {
(a, b) if a == b => a,
_ => MemoryState::BeforeInst(loc),
}
};
self.heap = meet(self.heap, other.heap);
self.table = meet(self.table, other.table);
self.vmctx = meet(self.vmctx, other.vmctx);
self.other = meet(self.other, other.other);
}
}
/// An alias-analysis pass.
pub struct AliasAnalysis {
/// Last-store instruction (or none) for a given load. Use a hash map
/// instead of a `SecondaryMap` because this is sparse.
load_mem_state: FxHashMap<Inst, MemoryState>,
}
impl AliasAnalysis {
/// Perform an alias analysis pass.
pub fn new(func: &Function, cfg: &ControlFlowGraph) -> AliasAnalysis {
log::trace!("alias analysis: input is:\n{:?}", func);
let block_input = Self::compute_block_input_states(func, cfg);
let load_mem_state = Self::compute_load_last_stores(func, block_input);
AliasAnalysis { load_mem_state }
}
fn compute_block_input_states(
func: &Function,
cfg: &ControlFlowGraph,
) -> SecondaryMap<Block, Option<LastStores>> {
let mut block_input = SecondaryMap::with_capacity(func.dfg.num_blocks());
let mut worklist: SmallVec<[Block; 8]> = smallvec![];
let mut worklist_set = FxHashSet::default();
let entry = func.layout.entry_block().unwrap();
worklist.push(entry);
worklist_set.insert(entry);
block_input[entry] = Some(LastStores::default());
while let Some(block) = worklist.pop() {
worklist_set.remove(&block);
let state = block_input[block].clone().unwrap();
trace!("alias analysis: input to {} is {:?}", block, state);
let state = func
.layout
.block_insts(block)
.fold(state, |mut state, inst| {
state.update(func, inst);
trace!("after {}: state is {:?}", inst, state);
state
});
for succ in cfg.succ_iter(block) {
let succ_first_inst = func.layout.first_inst(succ).unwrap();
let succ_state = &mut block_input[succ];
let old = succ_state.clone();
if let Some(succ_state) = succ_state.as_mut() {
succ_state.meet_from(&state, succ_first_inst);
} else {
*succ_state = Some(state);
};
let updated = *succ_state != old;
if updated && worklist_set.insert(succ) {
worklist.push(succ);
}
}
}
block_input
}
fn compute_load_last_stores(
func: &Function,
block_input: SecondaryMap<Block, Option<LastStores>>,
) -> FxHashMap<Inst, MemoryState> {
let mut load_mem_state = FxHashMap::default();
for block in func.layout.blocks() {
let mut state = block_input[block].clone().unwrap();
for inst in func.layout.block_insts(block) {
trace!(
"alias analysis: scanning at {} with state {:?} ({:?})",
inst,
state,
func.dfg[inst],
);
// N.B.: we match `Load` specifically, and not any
// other kinds of loads (or any opcode such that
// `opcode.can_load()` returns true), because some
// "can load" instructions actually have very
// different semantics (are not just a load of a
// particularly-typed value). For example, atomic
// (load/store, RMW, CAS) instructions "can load" but
// definitely should not participate in store-to-load
// forwarding or redundant-load elimination. Our goal
// here is to provide a `MemoryState` just for plain
// old loads whose semantics we can completely reason
// about.
if let InstructionData::Load {
opcode: Opcode::Load,
flags,
..
} = func.dfg[inst]
{
let mem_state = *state.for_flags(flags);
trace!(
"alias analysis: at {}: load with mem_state {:?}",
inst,
mem_state,
);
load_mem_state.insert(inst, mem_state);
}
state.update(func, inst);
}
}
load_mem_state
}
/// Get the state seen by a load, if any.
pub fn get_state_for_load(&self, inst: Inst) -> Option<MemoryState> {
self.load_mem_state.get(&inst).copied()
}
}