x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times.

On x64, the new backend generates `cmp` instructions at their use-sites when possible (when the icmp that generates a boolean is known) so that the condition flows directly through flags rather than a materialized boolean. E.g., both `bint` (boolean to int) and `select` (conditional select) instruction lowerings invoke `emit_cmp()` to do so. Load-op fusion in `emit_cmp()` nominally allowed `cmp` to use its `cmp reg, mem` form. However, the mergeable-load condition (load has only single use) was not adequately checked. Consider the sequence: ``` v2 = load.i64 v1 v3 = icmp eq v0, v2 v4 = bint.i64 v3 v5 = select.i64 v3, v0, v1 ``` The load `v2` is only used in the `icmp` at `v3`. However, the cmp will be separately codegen'd twice, once for the `bint` and once for the `select`. Prior to this fix, the above example would result in the load at `v2` sinking to the `cmp` just above the `select`; we then emit another `cmp` for the `bint`, but the load has already been used once so we do not allow merging. We thus (i) expect the register for `v2` to contain the loaded value, but (ii) skip the codegen for the load because it has been sunk. This results in a regalloc error (unexpected livein) as the unfilled register is upward-exposed to the entry point. Because of this, we need to accept only the reg, reg form in `emit_cmp()` (and the FP equivalent). We could get marginally better code by tracking whether the `cmp` we are emitting comes from an `icmp`/`fcmp` with only one use; but IMHO simplicity is a better rule here when subtle interactions occur.
2021-01-12 15:37:53 -08:00
parent 7ed7c088a4
commit 4638de673c
2 changed files with 62 additions and 4 deletions
--- a/cranelift/filetests/filetests/isa/x64/cmp-mem-bug.clif
+++ b/cranelift/filetests/filetests/isa/x64/cmp-mem-bug.clif
@@ -0,0 +1,49 @@
+test compile
+target x86_64
+feature "experimental_x64"
+
+function %f0(i64, i64) -> i64, i64 {
+block0(v0: i64, v1: i64):
+    v2 = load.i64 v1
+; check:  movq    0(%rsi), %rax
+
+    v3 = icmp eq v0, v2
+
+    v4 = bint.i64 v3
+; nextln: cmpq    %rax, %rdi
+; nextln: setz    %cl
+; nextln: movzbq  %cl, %rcx
+
+    v5 = select.i64 v3, v0, v1
+; nextln: cmpq    %rax, %rdi
+; nextln: cmovzq  %rdi, %rsi
+
+    return v4, v5
+; nextln: movq    %rcx, %rax
+; nextln: movq    %rsi, %rdx
+}
+
+function %f1(f64, i64) -> i64, f64 {
+block0(v0: f64, v1: i64):
+    v2 = load.f64 v1
+; check:  movsd   0(%rdi), %xmm1
+
+    v3 = fcmp eq v0, v2
+
+    v4 = bint.i64 v3
+; nextln: ucomisd %xmm1, %xmm0
+; nextln: setnp   %dil
+; nextln: setz    %sil
+; nextln: andl    %edi, %esi
+; nextln: movzbq  %sil, %rsi
+
+    v5 = select.f64 v3, v0, v0
+; nextln: ucomisd %xmm1, %xmm0
+; nextln: movaps  %xmm0, %xmm1
+; nextln: jnp $$next; movsd %xmm0, %xmm1; $$next:
+; nextln: jz $$next; movsd %xmm0, %xmm1; $$next:
+
+    return v4, v5
+; nextln: movq    %rsi, %rax
+; nextln: movaps  %xmm1, %xmm0
+}