Add a basic alias analysis with redundant-load elim and store-to-load fowarding opts. (#4163)
This PR adds a basic *alias analysis*, and optimizations that use it. This is a "mid-end optimization": it operates on CLIF, the machine-independent IR, before lowering occurs. The alias analysis (or maybe more properly, a sort of memory-value analysis) determines when it can prove a particular memory location is equal to a given SSA value, and when it can, it replaces any loads of that location. This subsumes two common optimizations: * Redundant load elimination: when the same memory address is loaded two times, and it can be proven that no intervening operations will write to that memory, then the second load is *redundant* and its result must be the same as the first. We can use the first load's result and remove the second load. * Store-to-load forwarding: when a load can be proven to access exactly the memory written by a preceding store, we can replace the load's result with the store's data operand, and remove the load. Both of these optimizations rely on a "last store" analysis that is a sort of coloring mechanism, split across disjoint categories of abstract state. The basic idea is that every memory-accessing operation is put into one of N disjoint categories; it is disallowed for memory to ever be accessed by an op in one category and later accessed by an op in another category. (The frontend must ensure this.) Then, given this, we scan the code and determine, for each memory-accessing op, when a single prior instruction is a store to the same category. This "colors" the instruction: it is, in a sense, a static name for that version of memory. This analysis provides an important invariant: if two operations access memory with the same last-store, then *no other store can alias* in the time between that last store and these operations. This must-not-alias property, together with a check that the accessed address is *exactly the same* (same SSA value and offset), and other attributes of the access (type, extension mode) are the same, let us prove that the results are the same. Given last-store info, we scan the instructions and build a table from "memory location" key (last store, address, offset, type, extension) to known SSA value stored in that location. A store inserts a new mapping. A load may also insert a new mapping, if we didn't already have one. Then when a load occurs and an entry already exists for its "location", we can reuse the value. This will be either RLE or St-to-Ld depending on where the value came from. Note that this *does* work across basic blocks: the last-store analysis is a full iterative dataflow pass, and we are careful to check dominance of a previously-defined value before aliasing to it at a potentially redundant load. So we will do the right thing if we only have a "partially redundant" load (loaded already but only in one predecessor block), but we will also correctly reuse a value if there is a store or load above a loop and a redundant load of that value within the loop, as long as no potentially-aliasing stores happen within the loop.
This commit is contained in:
22
cranelift/filetests/filetests/alias/categories.clif
Normal file
22
cranelift/filetests/filetests/alias/categories.clif
Normal file
@@ -0,0 +1,22 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;; Check that aliasing properly respects the last store in each
|
||||
;; "category" separately.
|
||||
|
||||
function %f0(i64, i64) -> i32, i32 {
|
||||
|
||||
block0(v0: i64, v1: i64):
|
||||
v2 = iconst.i32 42
|
||||
v3 = iconst.i32 43
|
||||
store.i32 heap v2, v0+8
|
||||
store.i32 table v3, v1+8
|
||||
|
||||
v4 = load.i32 heap v0+8
|
||||
v5 = load.i32 table v1+8
|
||||
; check: v4 -> v2
|
||||
; check: v5 -> v3
|
||||
|
||||
return v4, v5
|
||||
}
|
||||
44
cranelift/filetests/filetests/alias/extends.clif
Normal file
44
cranelift/filetests/filetests/alias/extends.clif
Normal file
@@ -0,0 +1,44 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;; Test that extension modes are properly accounted for when deciding
|
||||
;; whether loads alias.
|
||||
|
||||
function %f0(i64 vmctx, i32) -> i32, i32, i32, i64, i64, i64 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
|
||||
;; Initial load. This will not be reused by anything below, even
|
||||
;; though it does access the same address.
|
||||
v3 = load.i32 v2+8
|
||||
|
||||
;; These loads must remain (must not be removed as redundant).
|
||||
v4 = uload8.i32 v2+8
|
||||
; check: v4 = uload8.i32 v2+8
|
||||
v5 = sload8.i32 v2+8
|
||||
; check: v5 = sload8.i32 v2+8
|
||||
v6 = load.i64 v2+8
|
||||
; check: v6 = load.i64 v2+8
|
||||
|
||||
;; 8-bit store only partially overwrites the address.
|
||||
istore8 v6, v2+8
|
||||
|
||||
;; This must not pick up the store data.
|
||||
v7 = load.i64 v2+8
|
||||
; check: v7 = load.i64 v2+8
|
||||
|
||||
;; Another store, this one non-truncating but actually using an
|
||||
;; `i8` value.
|
||||
v8 = iconst.i8 123
|
||||
store.i8 v8, v2+8
|
||||
|
||||
v9 = load.i64 v2+8
|
||||
; check: v9 = load.i64 v2+8
|
||||
|
||||
return v3, v4, v5, v6, v7, v9
|
||||
}
|
||||
45
cranelift/filetests/filetests/alias/fence.clif
Normal file
45
cranelift/filetests/filetests/alias/fence.clif
Normal file
@@ -0,0 +1,45 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;; Test that certain instructions act as fences that inhibit alias
|
||||
;; analysis to move accesses across them.
|
||||
|
||||
function %f0(i64 vmctx, i32) -> i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
|
||||
v3 = load.i32 v2+8
|
||||
v4 = load.i32 vmctx v0+16
|
||||
|
||||
atomic_store.i32 v1, v0
|
||||
|
||||
v5 = load.i32 vmctx v0+16
|
||||
; check: v5 = load.i32 vmctx v0+16
|
||||
|
||||
v6 = atomic_cas.i32 v0, v1, v1
|
||||
|
||||
v7 = load.i32 vmctx v0+16
|
||||
; check: v7 = load.i32 vmctx v0+16
|
||||
|
||||
fence
|
||||
|
||||
v8 = load.i32 vmctx v0+16
|
||||
; check: v8 = load.i32 vmctx v0+16
|
||||
|
||||
v9 = atomic_rmw.i32 add v0, v1
|
||||
|
||||
v10 = load.i32 vmctx v0+16
|
||||
; check: v10 = load.i32 vmctx v0+16
|
||||
|
||||
v11 = atomic_load.i32 v0
|
||||
|
||||
v12 = load.i32 vmctx v0+16
|
||||
; check: v12 = load.i32 vmctx v0+16
|
||||
|
||||
return v3, v4, v5, v6, v7, v8, v9, v10, v11, v12
|
||||
}
|
||||
29
cranelift/filetests/filetests/alias/multiple-blocks.clif
Normal file
29
cranelift/filetests/filetests/alias/multiple-blocks.clif
Normal file
@@ -0,0 +1,29 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;; Check RLE across basic blocks.
|
||||
|
||||
function %f0(i64 vmctx, i32) -> i32 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
v3 = load.i32 v2+8
|
||||
brz v2, block1
|
||||
jump block2
|
||||
|
||||
block1:
|
||||
v4 = load.i32 v2+8
|
||||
; check: v4 -> v3
|
||||
jump block3(v4)
|
||||
|
||||
block2:
|
||||
jump block3(v3)
|
||||
|
||||
block3(v5: i32):
|
||||
return v5
|
||||
}
|
||||
35
cranelift/filetests/filetests/alias/partial-redundancy.clif
Normal file
35
cranelift/filetests/filetests/alias/partial-redundancy.clif
Normal file
@@ -0,0 +1,35 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;; A test of partial redundancy: we should *not* RLE when an earlier
|
||||
;; load to the location is only in one predecessor of multiple.
|
||||
|
||||
function %f0(i64 vmctx, i32) -> i32, i32 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
fn0 = %g(i64 vmctx)
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
brz v1, block1
|
||||
jump block2
|
||||
|
||||
block1:
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
v3 = load.i32 v2+64
|
||||
jump block3(v3)
|
||||
|
||||
block2:
|
||||
v4 = heap_addr.i64 heap0, v1, 0
|
||||
v5 = load.i32 v4+128
|
||||
jump block3(v5)
|
||||
|
||||
block3(v6: i32):
|
||||
v7 = heap_addr.i64 heap0, v1, 0
|
||||
v8 = load.i32 v7+64
|
||||
;; load should survive:
|
||||
; check: v8 = load.i32 v7+64
|
||||
return v6, v8
|
||||
|
||||
}
|
||||
54
cranelift/filetests/filetests/alias/simple-alias.clif
Normal file
54
cranelift/filetests/filetests/alias/simple-alias.clif
Normal file
@@ -0,0 +1,54 @@
|
||||
test alias-analysis
|
||||
set opt_level=speed
|
||||
target aarch64
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
;; Redundant-load elimination
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
function %f0(i64 vmctx, i32) -> i32, i32, i32, i32 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
fn0 = %g(i64 vmctx)
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
v3 = load.i32 v2+8
|
||||
;; This should reuse the load above.
|
||||
v4 = heap_addr.i64 heap0, v1, 0
|
||||
v5 = load.i32 v4+8
|
||||
; check: v5 -> v3
|
||||
|
||||
call fn0(v0)
|
||||
|
||||
;; The second load is redundant wrt the first, but the call above
|
||||
;; is a barrier that prevents reusing v3 or v5.
|
||||
v6 = load.i32 v4+8
|
||||
v7 = load.i32 v4+8
|
||||
; check: v7 -> v6
|
||||
|
||||
return v3, v5, v6, v7
|
||||
}
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
;; Store-to-load forwarding
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
function %f1(i64 vmctx, i32) -> i32 {
|
||||
gv0 = vmctx
|
||||
gv1 = load.i64 notrap readonly aligned gv0+8
|
||||
heap0 = static gv1, bound 0x1_0000_0000, offset_guard 0x8000_0000, index_type i32
|
||||
fn0 = %g(i64 vmctx)
|
||||
|
||||
block0(v0: i64, v1: i32):
|
||||
v2 = heap_addr.i64 heap0, v1, 0
|
||||
store.i32 v1, v2+8
|
||||
|
||||
;; This load should pick up the store above.
|
||||
v3 = heap_addr.i64 heap0, v1, 0
|
||||
v4 = load.i32 v3+8
|
||||
; check: v4 -> v1
|
||||
|
||||
return v4
|
||||
}
|
||||
@@ -37,6 +37,7 @@ mod runone;
|
||||
mod runtest_environment;
|
||||
mod subtest;
|
||||
|
||||
mod test_alias_analysis;
|
||||
mod test_cat;
|
||||
mod test_compile;
|
||||
mod test_dce;
|
||||
@@ -111,6 +112,7 @@ pub fn run_passes(
|
||||
/// a `.clif` test file.
|
||||
fn new_subtest(parsed: &TestCommand) -> anyhow::Result<Box<dyn subtest::SubTest>> {
|
||||
match parsed.command {
|
||||
"alias-analysis" => test_alias_analysis::subtest(parsed),
|
||||
"cat" => test_cat::subtest(parsed),
|
||||
"compile" => test_compile::subtest(parsed),
|
||||
"dce" => test_dce::subtest(parsed),
|
||||
|
||||
48
cranelift/filetests/src/test_alias_analysis.rs
Normal file
48
cranelift/filetests/src/test_alias_analysis.rs
Normal file
@@ -0,0 +1,48 @@
|
||||
//! Test command for testing the alias analysis pass.
|
||||
//!
|
||||
//! The `alias-analysis` test command runs each function through GVN
|
||||
//! and then alias analysis after ensuring that all instructions are
|
||||
//! legal for the target.
|
||||
//!
|
||||
//! The resulting function is sent to `filecheck`.
|
||||
|
||||
use crate::subtest::{run_filecheck, Context, SubTest};
|
||||
use cranelift_codegen;
|
||||
use cranelift_codegen::ir::Function;
|
||||
use cranelift_reader::TestCommand;
|
||||
use std::borrow::Cow;
|
||||
|
||||
struct TestAliasAnalysis;
|
||||
|
||||
pub fn subtest(parsed: &TestCommand) -> anyhow::Result<Box<dyn SubTest>> {
|
||||
assert_eq!(parsed.command, "alias-analysis");
|
||||
if !parsed.options.is_empty() {
|
||||
anyhow::bail!("No options allowed on {}", parsed);
|
||||
}
|
||||
Ok(Box::new(TestAliasAnalysis))
|
||||
}
|
||||
|
||||
impl SubTest for TestAliasAnalysis {
|
||||
fn name(&self) -> &'static str {
|
||||
"alias-analysis"
|
||||
}
|
||||
|
||||
fn is_mutating(&self) -> bool {
|
||||
true
|
||||
}
|
||||
|
||||
fn run(&self, func: Cow<Function>, context: &Context) -> anyhow::Result<()> {
|
||||
let mut comp_ctx = cranelift_codegen::Context::for_function(func.into_owned());
|
||||
|
||||
comp_ctx.flowgraph();
|
||||
comp_ctx
|
||||
.simple_gvn(context.flags_or_isa())
|
||||
.map_err(|e| crate::pretty_anyhow_error(&comp_ctx.func, Into::into(e)))?;
|
||||
comp_ctx
|
||||
.replace_redundant_loads()
|
||||
.map_err(|e| crate::pretty_anyhow_error(&comp_ctx.func, Into::into(e)))?;
|
||||
|
||||
let text = comp_ctx.func.display().to_string();
|
||||
run_filecheck(&text, context)
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user