x64 backend: merge loads into ALU ops when appropriate.

This PR makes use of the support in #2366 for sinking effectful instructions and merging them with consumers. In particular, on x86, we want to make use of the ability of many instructions to load one operand directly from memory. That is, instead of this: ``` movq 0(%rdi), %rax addq %rax, %rbx ``` we want to generate this: ``` addq 0(%rdi), %rax ``` As described in more detail in #2366, sinking and merging the load is only possible under certain conditions. In particular, we need to ensure that the use is the *only* use (otherwise the load happens more than once), and we need to ensure that it does not move across other effectful ops (see #2366 for how we ensure this). This change is actually fairly simple, given that all the framework is in place: we simply pattern-match a load on one operand of an ALU instruction that takes an RMI (reg, mem, or immediate) operand, and generate the mem form when we match. Also makes a drive-by improvement in the x64 backend to use statically-monomorphized `LowerCtx` types rather than a `&mut dyn LowerCtx`. On `bz2.wasm`, this results in ~1% instruction-count reduction. More is likely possible by following up with other instructions that can merge memory loads as well.
2020-11-09 19:05:40 -08:00
parent 9e511ec0c0
commit b97f07b405
2 changed files with 166 additions and 42 deletions
--- a/cranelift/filetests/filetests/isa/x64/load-op.clif
+++ b/cranelift/filetests/filetests/isa/x64/load-op.clif
@@ -0,0 +1,46 @@
+test compile
+target x86_64
+feature "experimental_x64"
+
+function %add_from_mem_u32_1(i64, i32) -> i32 {
+block0(v0: i64, v1: i32):
+  v2 = load.i32 v0
+  v3 = iadd.i32 v2, v1
+  ; check: addl    0(%rdi), %r12d
+  return v3
+}
+
+function %add_from_mem_u32_2(i64, i32) -> i32 {
+block0(v0: i64, v1: i32):
+  v2 = load.i32 v0
+  v3 = iadd.i32 v1, v2
+  ; check: addl    0(%rdi), %r12d
+  return v3
+}
+
+function %add_from_mem_u64_1(i64, i64) -> i64 {
+block0(v0: i64, v1: i64):
+  v2 = load.i64 v0
+  v3 = iadd.i64 v2, v1
+  ; check: addq    0(%rdi), %r12
+  return v3
+}
+
+function %add_from_mem_u64_2(i64, i64) -> i64 {
+block0(v0: i64, v1: i64):
+  v2 = load.i64 v0
+  v3 = iadd.i64 v1, v2
+  ; check: addq    0(%rdi), %r12
+  return v3
+}
+
+; test narrow loads: 8-bit load should not merge because the `addl` is 32 bits
+; and would load 32 bits from memory, which may go beyond the end of the heap.
+function %add_from_mem_not_narrow(i64, i8) -> i8 {
+block0(v0: i64, v1: i8):
+  v2 = load.i8 v0
+  v3 = iadd.i8 v2, v1
+  ; check: movzbq  0(%rdi), %r12
+  ; nextln: addl    %esi, %r12d
+  return v3
+}