This patch includes:
- A complete rework of the way that CLIF blocks and edge blocks are
lowered into VCode blocks. The new mechanism in `BlockLoweringOrder`
computes RPO over the CFG, but with a twist: it merges edge blocks intto
heads or tails of original CLIF blocks wherever possible, and it does
this without ever actually materializing the full nodes-plus-edges
graph first. The backend driver lowers blocks in final order so
there's no need to reshuffle later.
- A new `MachBuffer` that replaces the `MachSection`. This is a special
version of a code-sink that is far more than a humble `Vec<u8>`. In
particular, it keeps a record of label definitions and label uses,
with a machine-pluggable `LabelUse` trait that defines various types
of fixups (basically internal relocations).
Importantly, it implements some simple peephole-style branch rewrites
*inline in the emission pass*, without any separate traversals over
the code to use fallthroughs, swap taken/not-taken arms, etc. It
tracks branches at the tail of the buffer and can (i) remove blocks
that are just unconditional branches (by redirecting the label), (ii)
understand a conditional/unconditional pair and swap the conditional
polarity when it's helpful; and (iii) remove branches that branch to
the fallthrough PC.
The `MachBuffer` also implements branch-island support. On
architectures like AArch64, this is needed to allow conditional
branches within plausibly-attainable ranges (+/- 1MB on AArch64
specifically). It also does this inline while streaming through the
emission, without any sort of fixpoint algorithm or later moving of
code, by simply tracking outstanding references and "deadlines" and
emitting an island just-in-time when we're in danger of going out of
range.
- A rework of the instruction selector driver. This is largely following
the same algorithm as before, but is cleaned up significantly, in
particular in the API: the machine backend can ask for an input arg
and get any of three forms (constant, register, producing
instruction), indicating it needs the register or can merge the
constant or producing instruction as appropriate. This new driver
takes special care to emit constants right at use-sites (and at phi
inputs), minimizing their live-ranges, and also special-cases the
"pinned register" to avoid superfluous moves.
Overall, on `bz2.wasm`, the results are:
wasmtime full run (compile + runtime) of bz2:
baseline: 9774M insns, 9742M cycles, 3.918s
w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns)
clif-util wasm compile bz2:
baseline: 2633M insns, 3278M cycles, 1.034s
w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns)
All numbers are averages of two runs on an Ampere eMAG.
124 lines
3.6 KiB
Rust
124 lines
3.6 KiB
Rust
//! Cranelift code generation library.
|
|
|
|
#![deny(missing_docs, trivial_numeric_casts, unused_extern_crates)]
|
|
#![warn(unused_import_braces)]
|
|
#![cfg_attr(feature = "std", deny(unstable_features))]
|
|
#![cfg_attr(feature = "clippy", plugin(clippy(conf_file = "../../clippy.toml")))]
|
|
#![cfg_attr(feature="cargo-clippy", allow(
|
|
// Produces only a false positive:
|
|
clippy::while_let_loop,
|
|
// Produces many false positives, but did produce some valid lints, now fixed:
|
|
clippy::needless_lifetimes,
|
|
// Generated code makes some style transgressions, but readability doesn't suffer much:
|
|
clippy::many_single_char_names,
|
|
clippy::identity_op,
|
|
clippy::needless_borrow,
|
|
clippy::cast_lossless,
|
|
clippy::unreadable_literal,
|
|
clippy::assign_op_pattern,
|
|
clippy::empty_line_after_outer_attr,
|
|
// Hard to avoid in generated code:
|
|
clippy::cognitive_complexity,
|
|
clippy::too_many_arguments,
|
|
// Code generator doesn't have a way to collapse identical arms:
|
|
clippy::match_same_arms,
|
|
// These are relatively minor style issues, but would be easy to fix:
|
|
clippy::new_without_default,
|
|
clippy::should_implement_trait,
|
|
clippy::len_without_is_empty))]
|
|
#![cfg_attr(
|
|
feature = "cargo-clippy",
|
|
warn(
|
|
clippy::float_arithmetic,
|
|
clippy::mut_mut,
|
|
clippy::nonminimal_bool,
|
|
clippy::option_map_unwrap_or,
|
|
clippy::option_map_unwrap_or_else,
|
|
clippy::unicode_not_nfc,
|
|
clippy::use_self
|
|
)
|
|
)]
|
|
#![no_std]
|
|
// Various bits and pieces of this crate might only be used for one platform or
|
|
// another, but it's not really too useful to learn about that all the time. On
|
|
// CI we build at least one version of this crate with `--features all-arch`
|
|
// which means we'll always detect truly dead code, otherwise if this is only
|
|
// built for one platform we don't have to worry too much about trimming
|
|
// everything down.
|
|
#![cfg_attr(not(feature = "all-arch"), allow(dead_code))]
|
|
|
|
#[allow(unused_imports)] // #[macro_use] is required for no_std
|
|
#[macro_use]
|
|
extern crate alloc;
|
|
|
|
#[cfg(feature = "std")]
|
|
#[macro_use]
|
|
extern crate std;
|
|
|
|
#[cfg(not(feature = "std"))]
|
|
use hashbrown::{hash_map, HashMap, HashSet};
|
|
#[cfg(feature = "std")]
|
|
use std::collections::{hash_map, HashMap, HashSet};
|
|
|
|
pub use crate::context::Context;
|
|
pub use crate::legalizer::legalize_function;
|
|
pub use crate::value_label::{ValueLabelsRanges, ValueLocRange};
|
|
pub use crate::verifier::verify_function;
|
|
pub use crate::write::write_function;
|
|
|
|
pub use cranelift_bforest as bforest;
|
|
pub use cranelift_entity as entity;
|
|
|
|
pub mod binemit;
|
|
pub mod cfg_printer;
|
|
pub mod cursor;
|
|
pub mod dbg;
|
|
pub mod dominator_tree;
|
|
pub mod flowgraph;
|
|
pub mod ir;
|
|
pub mod isa;
|
|
pub mod loop_analysis;
|
|
pub mod machinst;
|
|
pub mod print_errors;
|
|
pub mod settings;
|
|
pub mod timing;
|
|
pub mod verifier;
|
|
pub mod write;
|
|
|
|
pub use crate::entity::packed_option;
|
|
|
|
mod abi;
|
|
mod bitset;
|
|
mod constant_hash;
|
|
mod context;
|
|
mod dce;
|
|
mod divconst_magic_numbers;
|
|
mod fx;
|
|
mod inst_predicates;
|
|
mod iterators;
|
|
mod legalizer;
|
|
mod licm;
|
|
mod nan_canonicalization;
|
|
mod partition_slice;
|
|
mod postopt;
|
|
mod predicates;
|
|
mod redundant_reload_remover;
|
|
mod regalloc;
|
|
mod remove_constant_phis;
|
|
mod result;
|
|
mod scoped_hash_map;
|
|
mod simple_gvn;
|
|
mod simple_preopt;
|
|
mod stack_layout;
|
|
mod topo_order;
|
|
mod unreachable_code;
|
|
mod value_label;
|
|
|
|
#[cfg(feature = "enable-peepmatic")]
|
|
mod peepmatic;
|
|
|
|
pub use crate::result::{CodegenError, CodegenResult};
|
|
|
|
/// Version number of this crate.
|
|
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
|