Aarch64: handle csel with icmp/fcmp source without materializing the bool.

Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.

On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):

pre:

       1117.232000      task-clock (msec)         #    1.000 CPUs utilized
               133      context-switches          #    0.119 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,041      page-faults               #    0.005 M/sec
     3,511,615,100      cycles                    #    3.143 GHz
     4,272,427,772      instructions              #    1.22  insn per cycle
   <not supported>      branches
        27,980,906      branch-misses

       1.117299838 seconds time elapsed

post:

       1003.738075      task-clock (msec)         #    1.000 CPUs utilized
               121      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,052      page-faults               #    0.005 M/sec
     3,224,875,393      cycles                    #    3.213 GHz
     4,000,838,686      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,928,232      branch-misses

       1.003440004 seconds time elapsed

In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.

This commit is contained in:

Chris Fallin

2020-07-17 16:30:45 -07:00

parent 8dd4ab2f1e

commit 21dac670f0

2 changed files with 53 additions and 11 deletions

11

cranelift/filetests/filetests/vcode/aarch64/condops.clif

View File

@@ -41,3 +41,14 @@ block0(v0: b1, v1: i8, v2: i8):
 ; check: subs wzr
 ; nextln: csel
 function %i(i32, i8, i8) -> i8 {
 block0(v0: i32, v1: i8, v2: i8):
   v3 = iconst.i32 42
   v4 = icmp.i32 eq v0, v3
   v5 = select.i8 v4, v1, v2
   return v5
 }
 ; check: subs wzr, w0, #42
 ; nextln: csel x0, x1, x2, eq

Aarch64: handle csel with icmp/fcmp source without materializing the bool.

11 cranelift/filetests/filetests/vcode/aarch64/condops.clif Unescape Escape View File

11

cranelift/filetests/filetests/vcode/aarch64/condops.clif

View File