Commit Graph

1790 Commits

Author SHA1 Message Date
Yury Delendik
399ee0a54c Serialize and deserialize compilation artifacts. (#2020)
* Serialize and deserialize Module
* Use bincode to serialize
* Add wasm_module_serialize; docs
* Simple tests
2020-07-21 15:05:50 -05:00
Chris Fallin
96ef2f1a1b Fix u8::MAX -> std::u8::MAX. (#2047)
As per Carlo Kok on Zulip #cranelift, this breaks builds with stable
Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to
support older versions if we can easily do so.

This PR also adds `cranelift-tools` to the crates checked on CI with
Rust 1.41.0, which pulls in all backends (including `aarch64`).
2020-07-20 14:59:15 -05:00
Chris Fallin
784e2f1480 Merge pull request #2038 from jgouly/arith2
arm64: Enable arith2 tests
2020-07-20 09:00:10 -07:00
Chris Fallin
1b3b2dbfd0 Merge pull request #2043 from cfallin/csel-opt
Aarch64: handle csel with icmp/fcmp source without materializing the bool.
2020-07-18 19:33:47 -07:00
Chris Fallin
ea894c0eeb Merge pull request #2042 from cfallin/aarch64-fix-regshift-mask
Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.
2020-07-18 19:33:35 -07:00
Chris Fallin
21dac670f0 Aarch64: handle csel with icmp/fcmp source without materializing the bool.
Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.

On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):

pre:

       1117.232000      task-clock (msec)         #    1.000 CPUs utilized
               133      context-switches          #    0.119 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,041      page-faults               #    0.005 M/sec
     3,511,615,100      cycles                    #    3.143 GHz
     4,272,427,772      instructions              #    1.22  insn per cycle
   <not supported>      branches
        27,980,906      branch-misses

       1.117299838 seconds time elapsed

post:

       1003.738075      task-clock (msec)         #    1.000 CPUs utilized
               121      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,052      page-faults               #    0.005 M/sec
     3,224,875,393      cycles                    #    3.213 GHz
     4,000,838,686      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,928,232      branch-misses

       1.003440004 seconds time elapsed

In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.
2020-07-17 21:10:21 -07:00
Nick Fitzgerald
ee5982fd16 peepmatic: Be generic over the operator type
This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to
`peepmatic_runtime::Operator` conversion overhead, and paves the way for
allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations).

Rather than defining our own `peepmatic::Operator` type like we used to, now the
whole `peepmatic` crate is effectively generic over a `TOperator` type
parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode`
as the concrete type for our `TOperator` type parameter. For testing, we also
define a `TestOperator` type, so that we can test Peepmatic code without
building all of Cranelift, and we can keep them somewhat isolated from each
other.

The methods that `peepmatic::Operator` had are now translated into trait bounds
on the `TOperator` type. These traits need to be shared between all of
`peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic
integration. Therefore, these new traits live in a new crate:
`peepmatic-traits`. This crate acts as a header file of sorts for shared
trait/type/macro definitions.

Additionally, the `peepmatic-runtime` crate no longer depends on the
`peepmatic-macro` procedural macro crate, which should lead to faster build
times for Cranelift when it is using pre-built peephole optimizers.
2020-07-17 16:16:49 -07:00
Chris Fallin
9bd9c628aa Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.
We had previously fixed a bug in which constant shift amounts should be
masked to modulo the number of bits in the operand; however, we did not
fix the analogous case for shifts incorporated into the second register
argument of ALU instructions that support integrated shifts.  This
failure to mask resulted in illegal instructions being generated, e.g.
in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes
the issue by masking the amount, as the shift semantics require.
2020-07-17 14:55:23 -07:00
Nick Fitzgerald
ae95ad8733 cranelift: Don't build peepmatic-based optimizations in build.rs
Instead, when the `rebuild-peephole-optimizers` feature is enabled, rebuild them
the first time they are used. This allows peepmatic to run when Cranelift's
`Opcode` is defined and available, which paves the way forward for:

* merging `peepmatic_runtime::operator::Operator` and Cranelift's `Opcode` (we
  are wasting a bunch of cycles converting between the two of them), and

* supporting vcode optimizations in `peepmatic`.
2020-07-17 14:35:16 -07:00
Johnnie Birch
a7cedf3100 Add support for 32 bit and 64 bit fcmp for the new backend
Implements commiss and commisd.
2020-07-17 13:46:54 -07:00
Nick Fitzgerald
8dd4ab2f1e Merge pull request #2022 from MaxGraey/peepmatic-bnot
peepmatic: Add bnot operation
2020-07-17 09:39:38 -07:00
Nikolay Volf
4f4edc7aef Remove spam from "do_remove_constant_phis" 2020-07-17 18:14:16 +02:00
Joey Gouly
40473dffed arm64: Enable arith2 tests
Copyright (c) 2020, Arm Limited.
2020-07-17 15:58:16 +01:00
Benjamin Bouvier
ead8a835c4 machinst x64: add more FP support 2020-07-17 15:56:44 +02:00
bjorn3
5c5a30f76c Fix review comments 2020-07-17 12:03:17 +02:00
bjorn3
7b7b1f4997 Rename sarg__ to sarg_t 2020-07-17 12:03:17 +02:00
bjorn3
4971d9ee80 Merge {make_incoming,get_outgoing}_{,struct_}arg 2020-07-17 12:03:17 +02:00
bjorn3
0d4fa6d32a Fix review comments 2020-07-17 12:03:17 +02:00
bjorn3
4431ac1108 Implement SystemV struct argument passing 2020-07-17 12:03:17 +02:00
MaxGraey
c653c563dd Merge branch 'main' into peepmatic-bnot 2020-07-16 22:01:18 +03:00
Benjamin Bouvier
bab337fc32 Address review comments; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
5a55646fc3 machinst x64: support out-of-bounds memory accesses; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
ea33ce9116 machinst x64: basic support for baldrdash
+ fix multi-value support
2020-07-16 18:21:06 +02:00
Benjamin Bouvier
00b38c91f6 machinst x64: fix generation of RegMemImm immediate operands; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
1430c5e436 machinst x64: fix index handling of jump table;
The index should be truncated to 32 bits before being used for the jump
table entry computation.
2020-07-16 18:21:06 +02:00
Benjamin Bouvier
55b9059954 machinst x64: remove spurious assertion about FP offset requiring to be 16-bytes aligned 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
3905a1b17b machinst x64: implement SymbolValue and FuncAddr with a movabsq+reloc; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
cfa0a0c4e8 machinst x64: lower resumable_trap as trap; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
311027869b machinst x64: implement popcnt.i64 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
d9310e8d90 machinst x64: fix checked div sequence
- it should mark as clobbering (def) rdx, not modifying it
- the signed-div check requires a temporary to compare against int64_min
2020-07-16 18:21:06 +02:00
Benjamin Bouvier
f932bccaf8 machinst x64: fix sign-extension at boundary 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
6f5403a94b machinst x64: lower Ctz using the Bsf x86 instruction 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
33e0d05645 machinst x64: have cmov modify its destination operand;
This is tricky: the control flow implicitly implied by the operand makes
it so that the output register may be undefined, if we mark it only as a
"def". Make it a "mod" instead, which matches our usage in the codebase,
and will make it crash if the output operand isn't unconditionally
defined before the instruction.
2020-07-16 18:21:06 +02:00
Benjamin Bouvier
aa7db7fd7b machinst x64: fix JmpUnknown register mapping; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
fe7dd41435 machinst x64: fix iconst emission 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
ec2209665a machinst x64: implement bsr and lower Clz; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
eda2d143ed machinst x64: add support for umulhi/smulhi; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
571061fe4c machinst x64: add support for rotations; 2020-07-16 18:21:06 +02:00
Benjamin Bouvier
22892466e7 machinst x64: fix implementation of *reduce;
They should just generate a plain move, since the high bits are then
ignored, and not an extended move.
2020-07-16 18:21:06 +02:00
MaxGraey
4564c396d2 Merge branch 'main' into peepmatic-bnot 2020-07-16 16:13:28 +03:00
MaxGraey
657aea5286 remove rule and tests 2020-07-16 14:56:11 +03:00
Andrew Brown
c8ddf8a34c Encode [u|s]widen_low for x86 2020-07-15 11:32:08 -07:00
Andrew Brown
fafef7db77 Add x86_palignr instructions
This instruction is necessary for implementing `[s|u]widen_high`.
2020-07-15 11:32:08 -07:00
Andrew Brown
0e5e8a62c8 Add DerivedFunction for doubling lane widths and halving the number of lanes (i.e. merging)
Certain operations (e.g. widening) will have operands with types like `NxM` but will return results with types like `(N*2)x(M/2)` (double the lane width, halve the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::DoubleWidth` then `DerivedFunction::HalfVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, 1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)), I added `DerivedFunction::MergeLanes` to do the necessary type conversion.
2020-07-15 11:32:08 -07:00
Chris Fallin
12a31c88d7 Merge pull request #2021 from akirilov-arm/VectorSize
AArch64: Introduce an enum to specify vector instruction operand sizes
2020-07-15 09:43:18 -07:00
MaxGraey
67b785d241 refactor: use different sections for this rule 2020-07-15 17:11:27 +03:00
Benjamin Bouvier
abf157bd69 machinst x64: Only use the feature flag to enable the x64 new backend;
Before this patch, running the x64 new backend would require both
compiling with --features experimental_x64 and running with
`use_new_backend`.

This patches changes this behavior so that the runtime flag is not
needed anymore: using the feature flag will enforce usage of the new
backend everywhere, making using and testing it much simpler:

    cargo run --features experimental_x64 ;; other CLI options/flags

This also gives a hint at what the meta language generation would look
like after switching to the new backend.

Compiling only with the x64 codegen flag gives a nice compile time speedup.
2020-07-15 13:11:28 +02:00
MaxGraey
5b38857e7f add bnot to peepmatic + transform rule 2020-07-15 13:46:25 +03:00
Anton Kirilov
95b0b05af2 AArch64: Introduce an enum to specify vector instruction operand sizes
Copyright (c) 2020, Arm Limited.
2020-07-14 21:37:44 +01:00
Anton Kirilov
400639245c AArch64: Remove show_freg_sized()
It provides the same functionality as show_vreg_scalar().

Copyright (c) 2020, Arm Limited.
2020-07-14 11:27:46 -07:00