Files
wasmtime/cranelift/codegen
Chris Fallin 21dac670f0 Aarch64: handle csel with icmp/fcmp source without materializing the bool.
Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.

On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):

pre:

       1117.232000      task-clock (msec)         #    1.000 CPUs utilized
               133      context-switches          #    0.119 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,041      page-faults               #    0.005 M/sec
     3,511,615,100      cycles                    #    3.143 GHz
     4,272,427,772      instructions              #    1.22  insn per cycle
   <not supported>      branches
        27,980,906      branch-misses

       1.117299838 seconds time elapsed

post:

       1003.738075      task-clock (msec)         #    1.000 CPUs utilized
               121      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,052      page-faults               #    0.005 M/sec
     3,224,875,393      cycles                    #    3.213 GHz
     4,000,838,686      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,928,232      branch-misses

       1.003440004 seconds time elapsed

In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.
2020-07-17 21:10:21 -07:00
..
2020-07-17 12:03:17 +02:00

This crate contains the core Cranelift code generator. It translates code from an intermediate representation into executable machine code.