egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This commit is contained in:
414
cranelift/codegen/src/egraph.rs
Normal file
414
cranelift/codegen/src/egraph.rs
Normal file
@@ -0,0 +1,414 @@
|
||||
//! Egraph-based mid-end optimization framework.
|
||||
|
||||
use crate::dominator_tree::DominatorTree;
|
||||
use crate::flowgraph::ControlFlowGraph;
|
||||
use crate::loop_analysis::{LoopAnalysis, LoopLevel};
|
||||
use crate::trace;
|
||||
use crate::{
|
||||
fx::{FxHashMap, FxHashSet},
|
||||
inst_predicates::has_side_effect,
|
||||
ir::{Block, Function, Inst, InstructionData, InstructionImms, Opcode, Type},
|
||||
};
|
||||
use alloc::vec::Vec;
|
||||
use core::ops::Range;
|
||||
use cranelift_egraph::{EGraph, Id, Language, NewOrExisting};
|
||||
use cranelift_entity::EntityList;
|
||||
use cranelift_entity::SecondaryMap;
|
||||
|
||||
mod domtree;
|
||||
mod elaborate;
|
||||
mod node;
|
||||
mod stores;
|
||||
|
||||
use elaborate::Elaborator;
|
||||
pub use node::{Node, NodeCtx};
|
||||
pub use stores::{AliasAnalysis, MemoryState};
|
||||
|
||||
pub struct FuncEGraph<'a> {
|
||||
/// Dominator tree, used for elaboration pass.
|
||||
domtree: &'a DominatorTree,
|
||||
/// Loop analysis results, used for built-in LICM during elaboration.
|
||||
loop_analysis: &'a LoopAnalysis,
|
||||
/// Last-store tracker for integrated alias analysis during egraph build.
|
||||
alias_analysis: AliasAnalysis,
|
||||
/// The egraph itself.
|
||||
pub(crate) egraph: EGraph<NodeCtx, Analysis>,
|
||||
/// "node context", containing arenas for node data.
|
||||
pub(crate) node_ctx: NodeCtx,
|
||||
/// Ranges in `side_effect_ids` for sequences of side-effecting
|
||||
/// eclasses per block.
|
||||
side_effects: SecondaryMap<Block, Range<u32>>,
|
||||
side_effect_ids: Vec<Id>,
|
||||
/// Map from store instructions to their nodes; used for store-to-load forwarding.
|
||||
pub(crate) store_nodes: FxHashMap<Inst, (Type, Id)>,
|
||||
/// Ranges in `blockparam_ids_tys` for sequences of blockparam
|
||||
/// eclass IDs and types per block.
|
||||
blockparams: SecondaryMap<Block, Range<u32>>,
|
||||
blockparam_ids_tys: Vec<(Id, Type)>,
|
||||
/// Which canonical node IDs do we want to rematerialize in each
|
||||
/// block where they're used?
|
||||
pub(crate) remat_ids: FxHashSet<Id>,
|
||||
/// Which canonical node IDs have an enode whose value subsumes
|
||||
/// all others it's unioned with?
|
||||
pub(crate) subsume_ids: FxHashSet<Id>,
|
||||
/// Statistics recorded during the process of building,
|
||||
/// optimizing, and lowering out of this egraph.
|
||||
pub(crate) stats: Stats,
|
||||
/// Current rewrite-recursion depth. Used to enforce a finite
|
||||
/// limit on rewrite rule application so that we don't get stuck
|
||||
/// in an infinite chain.
|
||||
pub(crate) rewrite_depth: usize,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Default)]
|
||||
pub(crate) struct Stats {
|
||||
pub(crate) node_created: u64,
|
||||
pub(crate) node_param: u64,
|
||||
pub(crate) node_result: u64,
|
||||
pub(crate) node_pure: u64,
|
||||
pub(crate) node_inst: u64,
|
||||
pub(crate) node_load: u64,
|
||||
pub(crate) node_dedup_query: u64,
|
||||
pub(crate) node_dedup_hit: u64,
|
||||
pub(crate) node_dedup_miss: u64,
|
||||
pub(crate) node_ctor_created: u64,
|
||||
pub(crate) node_ctor_deduped: u64,
|
||||
pub(crate) node_union: u64,
|
||||
pub(crate) node_subsume: u64,
|
||||
pub(crate) store_map_insert: u64,
|
||||
pub(crate) side_effect_nodes: u64,
|
||||
pub(crate) rewrite_rule_invoked: u64,
|
||||
pub(crate) rewrite_depth_limit: u64,
|
||||
pub(crate) store_to_load_forward: u64,
|
||||
pub(crate) elaborate_visit_node: u64,
|
||||
pub(crate) elaborate_memoize_hit: u64,
|
||||
pub(crate) elaborate_memoize_miss: u64,
|
||||
pub(crate) elaborate_memoize_miss_remat: u64,
|
||||
pub(crate) elaborate_licm_hoist: u64,
|
||||
pub(crate) elaborate_func: u64,
|
||||
pub(crate) elaborate_func_pre_insts: u64,
|
||||
pub(crate) elaborate_func_post_insts: u64,
|
||||
}
|
||||
|
||||
impl<'a> FuncEGraph<'a> {
|
||||
/// Create a new EGraph for the given function. Requires the
|
||||
/// domtree to be precomputed as well; the domtree is used for
|
||||
/// scheduling when lowering out of the egraph.
|
||||
pub fn new(
|
||||
func: &Function,
|
||||
domtree: &'a DominatorTree,
|
||||
loop_analysis: &'a LoopAnalysis,
|
||||
cfg: &ControlFlowGraph,
|
||||
) -> FuncEGraph<'a> {
|
||||
let node_count_estimate = func.dfg.num_values() * 2;
|
||||
let alias_analysis = AliasAnalysis::new(func, cfg);
|
||||
let mut this = Self {
|
||||
domtree,
|
||||
loop_analysis,
|
||||
alias_analysis,
|
||||
egraph: EGraph::with_capacity(node_count_estimate, Some(Analysis)),
|
||||
node_ctx: NodeCtx::with_capacity_for_dfg(&func.dfg),
|
||||
side_effects: SecondaryMap::default(),
|
||||
side_effect_ids: vec![],
|
||||
store_nodes: FxHashMap::default(),
|
||||
blockparams: SecondaryMap::default(),
|
||||
blockparam_ids_tys: vec![],
|
||||
remat_ids: FxHashSet::default(),
|
||||
subsume_ids: FxHashSet::default(),
|
||||
stats: Default::default(),
|
||||
rewrite_depth: 0,
|
||||
};
|
||||
this.build(func);
|
||||
this
|
||||
}
|
||||
|
||||
fn build(&mut self, func: &Function) {
|
||||
// Mapping of SSA `Value` to eclass ID.
|
||||
let mut value_to_id = FxHashMap::default();
|
||||
|
||||
// For each block in RPO, create an enode for block entry, for
|
||||
// each block param, and for each instruction.
|
||||
for &block in self.domtree.cfg_postorder().iter().rev() {
|
||||
let loop_level = self.loop_analysis.loop_level(block);
|
||||
let blockparam_start =
|
||||
u32::try_from(self.blockparam_ids_tys.len()).expect("Overflow in blockparam count");
|
||||
for (i, &value) in func.dfg.block_params(block).iter().enumerate() {
|
||||
let ty = func.dfg.value_type(value);
|
||||
let param = self
|
||||
.egraph
|
||||
.add(
|
||||
Node::Param {
|
||||
block,
|
||||
index: i
|
||||
.try_into()
|
||||
.expect("blockparam index should fit in Node::Param"),
|
||||
ty,
|
||||
loop_level,
|
||||
},
|
||||
&mut self.node_ctx,
|
||||
)
|
||||
.get();
|
||||
value_to_id.insert(value, param);
|
||||
self.blockparam_ids_tys.push((param, ty));
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_param += 1;
|
||||
}
|
||||
let blockparam_end =
|
||||
u32::try_from(self.blockparam_ids_tys.len()).expect("Overflow in blockparam count");
|
||||
self.blockparams[block] = blockparam_start..blockparam_end;
|
||||
|
||||
let side_effect_start =
|
||||
u32::try_from(self.side_effect_ids.len()).expect("Overflow in side-effect count");
|
||||
for inst in func.layout.block_insts(block) {
|
||||
// Build args from SSA values.
|
||||
let args = EntityList::from_iter(
|
||||
func.dfg.inst_args(inst).iter().map(|&arg| {
|
||||
let arg = func.dfg.resolve_aliases(arg);
|
||||
*value_to_id
|
||||
.get(&arg)
|
||||
.expect("Must have seen def before this use")
|
||||
}),
|
||||
&mut self.node_ctx.args,
|
||||
);
|
||||
|
||||
let results = func.dfg.inst_results(inst);
|
||||
|
||||
let types = self
|
||||
.node_ctx
|
||||
.types
|
||||
.from_iter(results.iter().map(|&val| func.dfg.value_type(val)));
|
||||
let types = types.freeze(&mut self.node_ctx.types);
|
||||
|
||||
let load_mem_state = self.alias_analysis.get_state_for_load(inst);
|
||||
let is_readonly_load = match func.dfg[inst] {
|
||||
InstructionData::Load {
|
||||
opcode: Opcode::Load,
|
||||
flags,
|
||||
..
|
||||
} => flags.readonly() && flags.notrap(),
|
||||
_ => false,
|
||||
};
|
||||
|
||||
// Create the egraph node.
|
||||
let op = InstructionImms::from(&func.dfg[inst]);
|
||||
let opcode = op.opcode();
|
||||
let srcloc = func.srclocs[inst];
|
||||
|
||||
let node = if is_readonly_load {
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_pure += 1;
|
||||
Node::Pure { op, args, types }
|
||||
} else if let Some(load_mem_state) = load_mem_state {
|
||||
let addr = args.as_slice(&self.node_ctx.args)[0];
|
||||
let ty = types.as_slice(&self.node_ctx.types)[0];
|
||||
trace!("load at inst {} has mem state {:?}", inst, load_mem_state);
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_load += 1;
|
||||
Node::Load {
|
||||
op,
|
||||
ty,
|
||||
inst,
|
||||
addr,
|
||||
mem_state: load_mem_state,
|
||||
srcloc,
|
||||
}
|
||||
} else if has_side_effect(func, inst) || opcode.can_load() {
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_inst += 1;
|
||||
Node::Inst {
|
||||
op,
|
||||
inst,
|
||||
args,
|
||||
types,
|
||||
srcloc,
|
||||
loop_level,
|
||||
}
|
||||
} else {
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_pure += 1;
|
||||
Node::Pure { op, args, types }
|
||||
};
|
||||
let dedup_needed = self.node_ctx.needs_dedup(&node);
|
||||
let is_pure = matches!(node, Node::Pure { .. });
|
||||
|
||||
let mut id = self.egraph.add(node, &mut self.node_ctx);
|
||||
|
||||
if dedup_needed {
|
||||
self.stats.node_dedup_query += 1;
|
||||
match id {
|
||||
NewOrExisting::New(_) => {
|
||||
self.stats.node_dedup_miss += 1;
|
||||
}
|
||||
NewOrExisting::Existing(_) => {
|
||||
self.stats.node_dedup_hit += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if opcode == Opcode::Store {
|
||||
let store_data_ty = func.dfg.value_type(func.dfg.inst_args(inst)[0]);
|
||||
self.store_nodes.insert(inst, (store_data_ty, id.get()));
|
||||
self.stats.store_map_insert += 1;
|
||||
}
|
||||
|
||||
// Loads that did not already merge into an existing
|
||||
// load: try to forward from a store (store-to-load
|
||||
// forwarding).
|
||||
if let NewOrExisting::New(new_id) = id {
|
||||
if load_mem_state.is_some() {
|
||||
let opt_id = crate::opts::store_to_load(new_id, self);
|
||||
trace!("store_to_load: {} -> {}", new_id, opt_id);
|
||||
if opt_id != new_id {
|
||||
id = NewOrExisting::Existing(opt_id);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Now either optimize (for new pure nodes), or add to
|
||||
// the side-effecting list (for all other new nodes).
|
||||
let id = match id {
|
||||
NewOrExisting::Existing(id) => id,
|
||||
NewOrExisting::New(id) if is_pure => {
|
||||
// Apply all optimization rules immediately; the
|
||||
// aegraph (acyclic egraph) works best when we do
|
||||
// this so all uses pick up the eclass with all
|
||||
// possible enodes.
|
||||
crate::opts::optimize_eclass(id, self)
|
||||
}
|
||||
NewOrExisting::New(id) => {
|
||||
self.side_effect_ids.push(id);
|
||||
self.stats.side_effect_nodes += 1;
|
||||
id
|
||||
}
|
||||
};
|
||||
|
||||
// Create results and save in Value->Id map.
|
||||
match results {
|
||||
&[] => {}
|
||||
&[one_result] => {
|
||||
trace!("build: value {} -> id {}", one_result, id);
|
||||
value_to_id.insert(one_result, id);
|
||||
}
|
||||
many_results => {
|
||||
debug_assert!(many_results.len() > 1);
|
||||
for (i, &result) in many_results.iter().enumerate() {
|
||||
let ty = func.dfg.value_type(result);
|
||||
let projection = self
|
||||
.egraph
|
||||
.add(
|
||||
Node::Result {
|
||||
value: id,
|
||||
result: i,
|
||||
ty,
|
||||
},
|
||||
&mut self.node_ctx,
|
||||
)
|
||||
.get();
|
||||
self.stats.node_created += 1;
|
||||
self.stats.node_result += 1;
|
||||
trace!("build: value {} -> id {}", result, projection);
|
||||
value_to_id.insert(result, projection);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let side_effect_end =
|
||||
u32::try_from(self.side_effect_ids.len()).expect("Overflow in side-effect count");
|
||||
let side_effect_range = side_effect_start..side_effect_end;
|
||||
self.side_effects[block] = side_effect_range;
|
||||
}
|
||||
}
|
||||
|
||||
/// Scoped elaboration: compute a final ordering of op computation
|
||||
/// for each block and replace the given Func body.
|
||||
///
|
||||
/// This works in concert with the domtree. We do a preorder
|
||||
/// traversal of the domtree, tracking a scoped map from Id to
|
||||
/// (new) Value. The map's scopes correspond to levels in the
|
||||
/// domtree.
|
||||
///
|
||||
/// At each block, we iterate forward over the side-effecting
|
||||
/// eclasses, and recursively generate their arg eclasses, then
|
||||
/// emit the ops themselves.
|
||||
///
|
||||
/// To use an eclass in a given block, we first look it up in the
|
||||
/// scoped map, and get the Value if already present. If not, we
|
||||
/// need to generate it. We emit the extracted enode for this
|
||||
/// eclass after recursively generating its args. Eclasses are
|
||||
/// thus computed "as late as possible", but then memoized into
|
||||
/// the Id-to-Value map and available to all dominated blocks and
|
||||
/// for the rest of this block. (This subsumes GVN.)
|
||||
pub fn elaborate(&mut self, func: &mut Function) {
|
||||
let mut elab = Elaborator::new(
|
||||
func,
|
||||
self.domtree,
|
||||
self.loop_analysis,
|
||||
&self.egraph,
|
||||
&self.node_ctx,
|
||||
&self.remat_ids,
|
||||
&mut self.stats,
|
||||
);
|
||||
elab.elaborate(
|
||||
|block| {
|
||||
let blockparam_range = self.blockparams[block].clone();
|
||||
&self.blockparam_ids_tys
|
||||
[blockparam_range.start as usize..blockparam_range.end as usize]
|
||||
},
|
||||
|block| {
|
||||
let side_effect_range = self.side_effects[block].clone();
|
||||
&self.side_effect_ids
|
||||
[side_effect_range.start as usize..side_effect_range.end as usize]
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// State for egraph analysis that computes all needed properties.
|
||||
pub(crate) struct Analysis;
|
||||
|
||||
/// Analysis results for each eclass id.
|
||||
#[derive(Clone, Debug)]
|
||||
pub(crate) struct AnalysisValue {
|
||||
pub(crate) loop_level: LoopLevel,
|
||||
}
|
||||
|
||||
impl Default for AnalysisValue {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
loop_level: LoopLevel::root(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl cranelift_egraph::Analysis for Analysis {
|
||||
type L = NodeCtx;
|
||||
type Value = AnalysisValue;
|
||||
|
||||
fn for_node(
|
||||
&self,
|
||||
ctx: &NodeCtx,
|
||||
n: &Node,
|
||||
values: &SecondaryMap<Id, AnalysisValue>,
|
||||
) -> AnalysisValue {
|
||||
let loop_level = match n {
|
||||
&Node::Pure { ref args, .. } => args
|
||||
.as_slice(&ctx.args)
|
||||
.iter()
|
||||
.map(|&arg| values[arg].loop_level)
|
||||
.max()
|
||||
.unwrap_or(LoopLevel::root()),
|
||||
&Node::Load { addr, .. } => values[addr].loop_level,
|
||||
&Node::Result { value, .. } => values[value].loop_level,
|
||||
&Node::Inst { loop_level, .. } | &Node::Param { loop_level, .. } => loop_level,
|
||||
};
|
||||
|
||||
AnalysisValue { loop_level }
|
||||
}
|
||||
|
||||
fn meet(&self, _ctx: &NodeCtx, v1: &AnalysisValue, v2: &AnalysisValue) -> AnalysisValue {
|
||||
AnalysisValue {
|
||||
loop_level: std::cmp::max(v1.loop_level, v2.loop_level),
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user