egraphs: fix accidental remat of call. (#5726)
In the provided test case in #5716, the result of a call was then added to 0. We have a rewrite rule that sets the remat-bit on any add of a value and a constant, because these frequently appear (e.g. from address offset calculations) and this can frequently reduce register pressure (one long-lived base vs. many long-lived base+offset values). Separately, we have an algebraic rule that `x+0` rewrites to `x`. The result of this was that we had an eclass with the remat bit set on the add, but the add was also union'd into the call. We pick the latter during extraction, because it's cheaper not to do the add at all; but we still get the remat bit, and try to remat a call (!), which blows up later. This PR fixes the logic to look up the "best value" for a value (i.e., whatever extraction determined), and look up the remat bit on *that* node, not the canonical node. (Why did the canonical node become the iadd and not the call? Because the former had a lower value-number, as an accident of IR construction; we don't impose any requirements on the input CLIF's value-number ordering, and I don't think this breaks any of the important acyclic properties, even though there is technically a dependence from a lower-numbered to a higher-numbered node. In essence one can think of them as having "virtual numbers" in any true topologically-sorted order, and the only place the actual integer indices matter should be in choosing the "canonical ID", which is just used for dedup'ing, modulo this bug.) Fixes #5716.
This commit is contained in:
@@ -278,6 +278,14 @@ impl<'a> Elaborator<'a> {
|
||||
before
|
||||
);
|
||||
|
||||
// Get the best option; we use `value` (latest
|
||||
// value) here so we have a full view of the
|
||||
// eclass.
|
||||
trace!("looking up best value for {}", value);
|
||||
let (_, best_value) = self.value_to_best_value[value];
|
||||
debug_assert_ne!(best_value, Value::reserved_value());
|
||||
trace!("elaborate: value {} -> best {}", value, best_value);
|
||||
|
||||
let remat = if let Some(elab_val) =
|
||||
self.value_to_elaborated_value.get(&canonical_value)
|
||||
{
|
||||
@@ -287,7 +295,7 @@ impl<'a> Elaborator<'a> {
|
||||
// from another block. If so, ignore the hit
|
||||
// and recompute below.
|
||||
let remat = elab_val.in_block != self.cur_block
|
||||
&& self.remat_values.contains(&canonical_value);
|
||||
&& self.remat_values.contains(&best_value);
|
||||
if !remat {
|
||||
trace!("elaborate: value {} -> {:?}", value, elab_val);
|
||||
self.stats.elaborate_memoize_hit += 1;
|
||||
@@ -304,20 +312,12 @@ impl<'a> Elaborator<'a> {
|
||||
// Value not available; but still look up
|
||||
// whether it's been flagged for remat because
|
||||
// this affects placement.
|
||||
let remat = self.remat_values.contains(&canonical_value);
|
||||
let remat = self.remat_values.contains(&best_value);
|
||||
trace!(" -> not present in map; remat = {}", remat);
|
||||
remat
|
||||
};
|
||||
self.stats.elaborate_memoize_miss += 1;
|
||||
|
||||
// Get the best option; we use `value` (latest
|
||||
// value) here so we have a full view of the
|
||||
// eclass.
|
||||
trace!("looking up best value for {}", value);
|
||||
let (_, best_value) = self.value_to_best_value[value];
|
||||
debug_assert_ne!(best_value, Value::reserved_value());
|
||||
trace!("elaborate: value {} -> best {}", value, best_value,);
|
||||
|
||||
// Now resolve the value to its definition to see
|
||||
// how we can compute it.
|
||||
let (inst, result_idx) = match self.func.dfg.value_def(best_value) {
|
||||
|
||||
41
cranelift/filetests/filetests/egraph/issue-5716.clif
Normal file
41
cranelift/filetests/filetests/egraph/issue-5716.clif
Normal file
@@ -0,0 +1,41 @@
|
||||
;; This tests that a call does not get rematerialized, even if a remat flag is
|
||||
;; set on a different node in its eclass.
|
||||
;;
|
||||
;; Below, `v97` is an add of `v238` (the call's first return value) and a
|
||||
;; constant 0; a mid-end rule rewrites this to just `v238` (i.e., `v97` is unioned
|
||||
;; in). Separately, a rule states that an add of a value and a constant always
|
||||
;; gets rematerialized at use. When `v97` is used in a later block, it would have
|
||||
;; rematerialized the add; except, if we instead use the result of the call
|
||||
;; directly, we should *not* remat the call. If we do, a compile error results
|
||||
;; later.
|
||||
|
||||
test compile
|
||||
set opt_level=speed_and_size
|
||||
target aarch64
|
||||
|
||||
function u0:33() system_v {
|
||||
ss0 = explicit_slot 32
|
||||
sig0 = (i64, i64, i64, i64, i64) -> i64, i64 system_v
|
||||
fn0 = colocated u0:0 sig0
|
||||
jt0 = jump_table [block36, block38]
|
||||
block0:
|
||||
v80 = iconst.i32 0
|
||||
v91 = iconst.i64 0
|
||||
v92 = iconst.i64 0
|
||||
v96 = iconst.i64 0
|
||||
v235 = iconst.i64 0
|
||||
v236 = iconst.i64 0
|
||||
v237 = iconst.i64 0
|
||||
v238, v239 = call fn0(v236, v237, v91, v92, v235) ; v236 = 0, v237 = 0, v91 = 0, v92 = 0, v235 = 0
|
||||
v97 = iadd v238, v96 ; v96 = 0
|
||||
br_table v80, block37, jt0 ; v80 = 0
|
||||
block36:
|
||||
trap user0
|
||||
block37:
|
||||
trap unreachable
|
||||
block38:
|
||||
v98 = load.i8 notrap v97
|
||||
v99 = fcvt_from_uint.f64 v98
|
||||
stack_store v99, ss0
|
||||
trap user0
|
||||
}
|
||||
Reference in New Issue
Block a user