Files
wasmtime/cranelift/codegen/src/context.rs
Chris Fallin f980defe17 egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

* Review feedback.

* Review feedback.

* Review feedback.

* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00

396 lines
13 KiB
Rust

//! Cranelift compilation context and main entry point.
//!
//! When compiling many small functions, it is important to avoid repeatedly allocating and
//! deallocating the data structures needed for compilation. The `Context` struct is used to hold
//! on to memory allocations between function compilations.
//!
//! The context does not hold a `TargetIsa` instance which has to be provided as an argument
//! instead. This is because an ISA instance is immutable and can be used by multiple compilation
//! contexts concurrently. Typically, you would have one context per compilation thread and only a
//! single ISA instance.
use crate::alias_analysis::AliasAnalysis;
use crate::dce::do_dce;
use crate::dominator_tree::DominatorTree;
use crate::egraph::EgraphPass;
use crate::flowgraph::ControlFlowGraph;
use crate::ir::Function;
use crate::isa::TargetIsa;
use crate::legalizer::simple_legalize;
use crate::licm::do_licm;
use crate::loop_analysis::LoopAnalysis;
use crate::machinst::{CompiledCode, CompiledCodeStencil};
use crate::nan_canonicalization::do_nan_canonicalization;
use crate::remove_constant_phis::do_remove_constant_phis;
use crate::result::{CodegenResult, CompileResult};
use crate::settings::{FlagsOrIsa, OptLevel};
use crate::simple_gvn::do_simple_gvn;
use crate::simple_preopt::do_preopt;
use crate::trace;
use crate::unreachable_code::eliminate_unreachable_code;
use crate::verifier::{verify_context, VerifierErrors, VerifierResult};
use crate::{timing, CompileError};
#[cfg(feature = "souper-harvest")]
use alloc::string::String;
use alloc::vec::Vec;
#[cfg(feature = "souper-harvest")]
use crate::souper_harvest::do_souper_harvest;
/// Persistent data structures and compilation pipeline.
pub struct Context {
/// The function we're compiling.
pub func: Function,
/// The control flow graph of `func`.
pub cfg: ControlFlowGraph,
/// Dominator tree for `func`.
pub domtree: DominatorTree,
/// Loop analysis of `func`.
pub loop_analysis: LoopAnalysis,
/// Result of MachBackend compilation, if computed.
pub(crate) compiled_code: Option<CompiledCode>,
/// Flag: do we want a disassembly with the CompiledCode?
pub want_disasm: bool,
}
impl Context {
/// Allocate a new compilation context.
///
/// The returned instance should be reused for compiling multiple functions in order to avoid
/// needless allocator thrashing.
pub fn new() -> Self {
Self::for_function(Function::new())
}
/// Allocate a new compilation context with an existing Function.
///
/// The returned instance should be reused for compiling multiple functions in order to avoid
/// needless allocator thrashing.
pub fn for_function(func: Function) -> Self {
Self {
func,
cfg: ControlFlowGraph::new(),
domtree: DominatorTree::new(),
loop_analysis: LoopAnalysis::new(),
compiled_code: None,
want_disasm: false,
}
}
/// Clear all data structures in this context.
pub fn clear(&mut self) {
self.func.clear();
self.cfg.clear();
self.domtree.clear();
self.loop_analysis.clear();
self.compiled_code = None;
self.want_disasm = false;
}
/// Returns the compilation result for this function, available after any `compile` function
/// has been called.
pub fn compiled_code(&self) -> Option<&CompiledCode> {
self.compiled_code.as_ref()
}
/// Set the flag to request a disassembly when compiling with a
/// `MachBackend` backend.
pub fn set_disasm(&mut self, val: bool) {
self.want_disasm = val;
}
/// Compile the function, and emit machine code into a `Vec<u8>`.
///
/// Run the function through all the passes necessary to generate
/// code for the target ISA represented by `isa`, as well as the
/// final step of emitting machine code into a `Vec<u8>`. The
/// machine code is not relocated. Instead, any relocations can be
/// obtained from `compiled_code()`.
///
/// Performs any optimizations that are enabled, unless
/// `optimize()` was already invoked.
///
/// This function calls `compile`, taking care to resize `mem` as
/// needed.
///
/// Returns information about the function's code and read-only
/// data.
pub fn compile_and_emit(
&mut self,
isa: &dyn TargetIsa,
mem: &mut Vec<u8>,
) -> CompileResult<&CompiledCode> {
let compiled_code = self.compile(isa)?;
mem.extend_from_slice(compiled_code.code_buffer());
Ok(compiled_code)
}
/// Internally compiles the function into a stencil.
///
/// Public only for testing and fuzzing purposes.
pub fn compile_stencil(&mut self, isa: &dyn TargetIsa) -> CodegenResult<CompiledCodeStencil> {
let _tt = timing::compile();
self.verify_if(isa)?;
self.optimize(isa)?;
isa.compile_function(&self.func, self.want_disasm)
}
/// Optimize the function, performing all compilation steps up to
/// but not including machine-code lowering and register
/// allocation.
///
/// Public only for testing purposes.
pub fn optimize(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
log::debug!(
"Number of CLIF instructions to optimize: {}",
self.func.dfg.num_insts()
);
log::debug!(
"Number of CLIF blocks to optimize: {}",
self.func.dfg.num_blocks()
);
let opt_level = isa.flags().opt_level();
crate::trace!(
"Optimizing (opt level {:?}):\n{}",
opt_level,
self.func.display()
);
self.compute_cfg();
if !isa.flags().use_egraphs() && opt_level != OptLevel::None {
self.preopt(isa)?;
}
if isa.flags().enable_nan_canonicalization() {
self.canonicalize_nans(isa)?;
}
self.legalize(isa)?;
if !isa.flags().use_egraphs() && opt_level != OptLevel::None {
self.compute_domtree();
self.compute_loop_analysis();
self.licm(isa)?;
self.simple_gvn(isa)?;
}
self.compute_domtree();
self.eliminate_unreachable_code(isa)?;
if isa.flags().use_egraphs() || opt_level != OptLevel::None {
self.dce(isa)?;
}
self.remove_constant_phis(isa)?;
if isa.flags().use_egraphs() {
self.egraph_pass()?;
} else if opt_level != OptLevel::None && isa.flags().enable_alias_analysis() {
self.replace_redundant_loads()?;
self.simple_gvn(isa)?;
}
Ok(())
}
/// Compile the function.
///
/// Run the function through all the passes necessary to generate code for the target ISA
/// represented by `isa`. This does not include the final step of emitting machine code into a
/// code sink.
///
/// Returns information about the function's code and read-only data.
pub fn compile(&mut self, isa: &dyn TargetIsa) -> CompileResult<&CompiledCode> {
let _tt = timing::compile();
let stencil = self.compile_stencil(isa).map_err(|error| CompileError {
inner: error,
func: &self.func,
})?;
Ok(self
.compiled_code
.insert(stencil.apply_params(&self.func.params)))
}
/// If available, return information about the code layout in the
/// final machine code: the offsets (in bytes) of each basic-block
/// start, and all basic-block edges.
#[deprecated = "use CompiledCode::get_code_bb_layout"]
pub fn get_code_bb_layout(&self) -> Option<(Vec<usize>, Vec<(usize, usize)>)> {
self.compiled_code().map(CompiledCode::get_code_bb_layout)
}
/// Creates unwind information for the function.
///
/// Returns `None` if the function has no unwind information.
#[cfg(feature = "unwind")]
#[deprecated = "use CompiledCode::create_unwind_info"]
pub fn create_unwind_info(
&self,
isa: &dyn TargetIsa,
) -> CodegenResult<Option<crate::isa::unwind::UnwindInfo>> {
self.compiled_code().unwrap().create_unwind_info(isa)
}
/// Run the verifier on the function.
///
/// Also check that the dominator tree and control flow graph are consistent with the function.
pub fn verify<'a, FOI: Into<FlagsOrIsa<'a>>>(&self, fisa: FOI) -> VerifierResult<()> {
let mut errors = VerifierErrors::default();
let _ = verify_context(&self.func, &self.cfg, &self.domtree, fisa, &mut errors);
if errors.is_empty() {
Ok(())
} else {
Err(errors)
}
}
/// Run the verifier only if the `enable_verifier` setting is true.
pub fn verify_if<'a, FOI: Into<FlagsOrIsa<'a>>>(&self, fisa: FOI) -> CodegenResult<()> {
let fisa = fisa.into();
if fisa.flags.enable_verifier() {
self.verify(fisa)?;
}
Ok(())
}
/// Perform dead-code elimination on the function.
pub fn dce<'a, FOI: Into<FlagsOrIsa<'a>>>(&mut self, fisa: FOI) -> CodegenResult<()> {
do_dce(&mut self.func, &mut self.domtree);
self.verify_if(fisa)?;
Ok(())
}
/// Perform constant-phi removal on the function.
pub fn remove_constant_phis<'a, FOI: Into<FlagsOrIsa<'a>>>(
&mut self,
fisa: FOI,
) -> CodegenResult<()> {
do_remove_constant_phis(&mut self.func, &mut self.domtree);
self.verify_if(fisa)?;
Ok(())
}
/// Perform pre-legalization rewrites on the function.
pub fn preopt(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
do_preopt(&mut self.func, &mut self.cfg, isa);
self.verify_if(isa)?;
Ok(())
}
/// Perform NaN canonicalizing rewrites on the function.
pub fn canonicalize_nans(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
do_nan_canonicalization(&mut self.func);
self.verify_if(isa)
}
/// Run the legalizer for `isa` on the function.
pub fn legalize(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
// Legalization invalidates the domtree and loop_analysis by mutating the CFG.
// TODO: Avoid doing this when legalization doesn't actually mutate the CFG.
self.domtree.clear();
self.loop_analysis.clear();
// Run some specific legalizations only.
simple_legalize(&mut self.func, &mut self.cfg, isa);
self.verify_if(isa)
}
/// Compute the control flow graph.
pub fn compute_cfg(&mut self) {
self.cfg.compute(&self.func)
}
/// Compute dominator tree.
pub fn compute_domtree(&mut self) {
self.domtree.compute(&self.func, &self.cfg)
}
/// Compute the loop analysis.
pub fn compute_loop_analysis(&mut self) {
self.loop_analysis
.compute(&self.func, &self.cfg, &self.domtree)
}
/// Compute the control flow graph and dominator tree.
pub fn flowgraph(&mut self) {
self.compute_cfg();
self.compute_domtree()
}
/// Perform simple GVN on the function.
pub fn simple_gvn<'a, FOI: Into<FlagsOrIsa<'a>>>(&mut self, fisa: FOI) -> CodegenResult<()> {
do_simple_gvn(&mut self.func, &mut self.domtree);
self.verify_if(fisa)
}
/// Perform LICM on the function.
pub fn licm(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
do_licm(
&mut self.func,
&mut self.cfg,
&mut self.domtree,
&mut self.loop_analysis,
);
self.verify_if(isa)
}
/// Perform unreachable code elimination.
pub fn eliminate_unreachable_code<'a, FOI>(&mut self, fisa: FOI) -> CodegenResult<()>
where
FOI: Into<FlagsOrIsa<'a>>,
{
eliminate_unreachable_code(&mut self.func, &mut self.cfg, &self.domtree);
self.verify_if(fisa)
}
/// Replace all redundant loads with the known values in
/// memory. These are loads whose values were already loaded by
/// other loads earlier, as well as loads whose values were stored
/// by a store instruction to the same instruction (so-called
/// "store-to-load forwarding").
pub fn replace_redundant_loads(&mut self) -> CodegenResult<()> {
let mut analysis = AliasAnalysis::new(&self.func, &self.domtree);
analysis.compute_and_update_aliases(&mut self.func);
Ok(())
}
/// Harvest candidate left-hand sides for superoptimization with Souper.
#[cfg(feature = "souper-harvest")]
pub fn souper_harvest(
&mut self,
out: &mut std::sync::mpsc::Sender<String>,
) -> CodegenResult<()> {
do_souper_harvest(&self.func, out);
Ok(())
}
/// Run optimizations via the egraph infrastructure.
pub fn egraph_pass(&mut self) -> CodegenResult<()> {
trace!(
"About to optimize with egraph phase:\n{}",
self.func.display()
);
self.compute_loop_analysis();
let mut alias_analysis = AliasAnalysis::new(&self.func, &self.domtree);
let mut pass = EgraphPass::new(
&mut self.func,
&self.domtree,
&self.loop_analysis,
&mut alias_analysis,
);
pass.run();
log::info!("egraph stats: {:?}", pass.stats);
trace!("After egraph optimization:\n{}", self.func.display());
Ok(())
}
}