Files
wasmtime/cranelift/codegen/src/lib.rs
Chris Fallin f980defe17 egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

* Review feedback.

* Review feedback.

* Review feedback.

* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00

139 lines
4.0 KiB
Rust

//! Cranelift code generation library.
#![deny(missing_docs, trivial_numeric_casts, unused_extern_crates)]
#![warn(unused_import_braces)]
#![cfg_attr(feature = "std", deny(unstable_features))]
#![cfg_attr(feature = "clippy", plugin(clippy(conf_file = "../../clippy.toml")))]
#![cfg_attr(feature="cargo-clippy", allow(
// Produces only a false positive:
clippy::while_let_loop,
// Produces many false positives, but did produce some valid lints, now fixed:
clippy::needless_lifetimes,
// Generated code makes some style transgressions, but readability doesn't suffer much:
clippy::many_single_char_names,
clippy::identity_op,
clippy::needless_borrow,
clippy::cast_lossless,
clippy::unreadable_literal,
clippy::assign_op_pattern,
clippy::empty_line_after_outer_attr,
// Hard to avoid in generated code:
clippy::cognitive_complexity,
clippy::too_many_arguments,
// Code generator doesn't have a way to collapse identical arms:
clippy::match_same_arms,
// These are relatively minor style issues, but would be easy to fix:
clippy::new_without_default,
clippy::should_implement_trait,
clippy::len_without_is_empty))]
#![cfg_attr(
feature = "cargo-clippy",
warn(
clippy::float_arithmetic,
clippy::mut_mut,
clippy::nonminimal_bool,
clippy::map_unwrap_or,
clippy::unicode_not_nfc,
clippy::use_self
)
)]
#![no_std]
// Various bits and pieces of this crate might only be used for one platform or
// another, but it's not really too useful to learn about that all the time. On
// CI we build at least one version of this crate with `--features all-arch`
// which means we'll always detect truly dead code, otherwise if this is only
// built for one platform we don't have to worry too much about trimming
// everything down.
#![cfg_attr(not(feature = "all-arch"), allow(dead_code))]
#[allow(unused_imports)] // #[macro_use] is required for no_std
#[macro_use]
extern crate alloc;
#[cfg(feature = "std")]
#[macro_use]
extern crate std;
#[cfg(not(feature = "std"))]
use hashbrown::{hash_map, HashMap, HashSet};
#[cfg(feature = "std")]
use std::collections::{hash_map, HashMap, HashSet};
pub use crate::context::Context;
pub use crate::value_label::{ValueLabelsRanges, ValueLocRange};
pub use crate::verifier::verify_function;
pub use crate::write::write_function;
pub use cranelift_bforest as bforest;
pub use cranelift_entity as entity;
#[cfg(feature = "unwind")]
pub use gimli;
#[macro_use]
mod machinst;
pub mod binemit;
pub mod cfg_printer;
pub mod cursor;
pub mod data_value;
pub mod dbg;
pub mod dominator_tree;
pub mod flowgraph;
pub mod ir;
pub mod isa;
pub mod loop_analysis;
pub mod print_errors;
pub mod settings;
pub mod timing;
pub mod verifier;
pub mod write;
pub use crate::entity::packed_option;
pub use crate::machinst::buffer::{MachCallSite, MachReloc, MachSrcLoc, MachStackMap, MachTrap};
pub use crate::machinst::{CompiledCode, TextSectionBuilder};
mod alias_analysis;
mod bitset;
mod constant_hash;
mod context;
mod ctxhash;
mod dce;
mod divconst_magic_numbers;
mod egraph;
mod fx;
mod inst_predicates;
mod isle_prelude;
mod iterators;
mod legalizer;
mod licm;
mod nan_canonicalization;
mod opts;
mod remove_constant_phis;
mod result;
mod scoped_hash_map;
mod simple_gvn;
mod simple_preopt;
mod unionfind;
mod unreachable_code;
mod value_label;
#[cfg(feature = "souper-harvest")]
mod souper_harvest;
pub use crate::result::{CodegenError, CodegenResult, CompileError};
#[cfg(feature = "incremental-cache")]
pub mod incremental_cache;
/// Even when trace logging is disabled, the trace macro has a significant performance cost so we
/// disable it by default.
#[macro_export]
macro_rules! trace {
($($tt:tt)*) => {
if cfg!(feature = "trace-log") {
::log::trace!($($tt)*);
}
};
}
include!(concat!(env!("OUT_DIR"), "/version.rs"));