x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times.
On x64, the new backend generates `cmp` instructions at their use-sites
when possible (when the icmp that generates a boolean is known) so that
the condition flows directly through flags rather than a materialized
boolean. E.g., both `bint` (boolean to int) and `select` (conditional
select) instruction lowerings invoke `emit_cmp()` to do so.
Load-op fusion in `emit_cmp()` nominally allowed `cmp` to use its `cmp
reg, mem` form.
However, the mergeable-load condition (load has only single use) was not
adequately checked. Consider the sequence:
```
v2 = load.i64 v1
v3 = icmp eq v0, v2
v4 = bint.i64 v3
v5 = select.i64 v3, v0, v1
```
The load `v2` is only used in the `icmp` at `v3`. However, the cmp will
be separately codegen'd twice, once for the `bint` and once for the
`select`.
Prior to this fix, the above example would result in the load at `v2`
sinking to the `cmp` just above the `select`; we then emit another `cmp`
for the `bint`, but the load has already been used once so we do not
allow merging. We thus (i) expect the register for `v2` to contain the
loaded value, but (ii) skip the codegen for the load because it has been
sunk. This results in a regalloc error (unexpected livein) as the
unfilled register is upward-exposed to the entry point.
Because of this, we need to accept only the reg, reg form in
`emit_cmp()` (and the FP equivalent). We could get marginally better
code by tracking whether the `cmp` we are emitting comes from an
`icmp`/`fcmp` with only one use; but IMHO simplicity is a better rule
here when subtle interactions occur.
This commit is contained in:
49
cranelift/filetests/filetests/isa/x64/cmp-mem-bug.clif
Normal file
49
cranelift/filetests/filetests/isa/x64/cmp-mem-bug.clif
Normal file
@@ -0,0 +1,49 @@
|
||||
test compile
|
||||
target x86_64
|
||||
feature "experimental_x64"
|
||||
|
||||
function %f0(i64, i64) -> i64, i64 {
|
||||
block0(v0: i64, v1: i64):
|
||||
v2 = load.i64 v1
|
||||
; check: movq 0(%rsi), %rax
|
||||
|
||||
v3 = icmp eq v0, v2
|
||||
|
||||
v4 = bint.i64 v3
|
||||
; nextln: cmpq %rax, %rdi
|
||||
; nextln: setz %cl
|
||||
; nextln: movzbq %cl, %rcx
|
||||
|
||||
v5 = select.i64 v3, v0, v1
|
||||
; nextln: cmpq %rax, %rdi
|
||||
; nextln: cmovzq %rdi, %rsi
|
||||
|
||||
return v4, v5
|
||||
; nextln: movq %rcx, %rax
|
||||
; nextln: movq %rsi, %rdx
|
||||
}
|
||||
|
||||
function %f1(f64, i64) -> i64, f64 {
|
||||
block0(v0: f64, v1: i64):
|
||||
v2 = load.f64 v1
|
||||
; check: movsd 0(%rdi), %xmm1
|
||||
|
||||
v3 = fcmp eq v0, v2
|
||||
|
||||
v4 = bint.i64 v3
|
||||
; nextln: ucomisd %xmm1, %xmm0
|
||||
; nextln: setnp %dil
|
||||
; nextln: setz %sil
|
||||
; nextln: andl %edi, %esi
|
||||
; nextln: movzbq %sil, %rsi
|
||||
|
||||
v5 = select.f64 v3, v0, v0
|
||||
; nextln: ucomisd %xmm1, %xmm0
|
||||
; nextln: movaps %xmm0, %xmm1
|
||||
; nextln: jnp $$next; movsd %xmm0, %xmm1; $$next:
|
||||
; nextln: jz $$next; movsd %xmm0, %xmm1; $$next:
|
||||
|
||||
return v4, v5
|
||||
; nextln: movq %rsi, %rax
|
||||
; nextln: movaps %xmm1, %xmm0
|
||||
}
|
||||
Reference in New Issue
Block a user