Files
wasmtime/cranelift/filetests/filetests/vcode/aarch64/condops.clif
Chris Fallin 21dac670f0 Aarch64: handle csel with icmp/fcmp source without materializing the bool.
Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.

On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):

pre:

       1117.232000      task-clock (msec)         #    1.000 CPUs utilized
               133      context-switches          #    0.119 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,041      page-faults               #    0.005 M/sec
     3,511,615,100      cycles                    #    3.143 GHz
     4,272,427,772      instructions              #    1.22  insn per cycle
   <not supported>      branches
        27,980,906      branch-misses

       1.117299838 seconds time elapsed

post:

       1003.738075      task-clock (msec)         #    1.000 CPUs utilized
               121      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,052      page-faults               #    0.005 M/sec
     3,224,875,393      cycles                    #    3.213 GHz
     4,000,838,686      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,928,232      branch-misses

       1.003440004 seconds time elapsed

In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.
2020-07-17 21:10:21 -07:00

55 lines
910 B
Plaintext

test compile
target aarch64
function %f(i8, i64, i64) -> i64 {
block0(v0: i8, v1: i64, v2: i64):
v3 = iconst.i8 42
v4 = ifcmp v0, v3
v5 = selectif.i64 eq v4, v1, v2
return v5
}
; check: subs wzr
; check: csel x0, $(=x[0-9]+, x[0-9]+), eq
function %g(i8) -> b1 {
block0(v0: i8):
v3 = iconst.i8 42
v4 = ifcmp v0, v3
v5 = trueif eq v4
return v5
}
; check: subs wzr
; check: cset x0, eq
function %h(i8, i8, i8) -> i8 {
block0(v0: i8, v1: i8, v2: i8):
v3 = bitselect.i8 v0, v1, v2
return v3
}
; check: and
; nextln: bic
; nextln: orr
function %i(b1, i8, i8) -> i8 {
block0(v0: b1, v1: i8, v2: i8):
v3 = select.i8 v0, v1, v2
return v3
}
; check: subs wzr
; nextln: csel
function %i(i32, i8, i8) -> i8 {
block0(v0: i32, v1: i8, v2: i8):
v3 = iconst.i32 42
v4 = icmp.i32 eq v0, v3
v5 = select.i8 v4, v1, v2
return v5
}
; check: subs wzr, w0, #42
; nextln: csel x0, x1, x2, eq