Commit Graph

1136 Commits

Author SHA1 Message Date
Nick Fitzgerald
5ad82de3c5 Bump Wasmtime to 0.22.0; Cranelift to 0.69.0 2021-01-07 14:51:12 -08:00
Chris Fallin
6eea015d6c Multi-register value support: framework for Values wider than machine regs.
This will allow for support for `I128` values everywhere, and `I64`
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to *reason* about a `Value` residing in more than
one register.
2021-01-05 17:45:02 -08:00
Nick Fitzgerald
6317290a1d Merge pull request #2548 from cfallin/fix-aarch64-sp
aarch64: fix reg/imm `sub` insts that read `SP`, not the zero register.
2021-01-05 16:38:25 -08:00
Chris Fallin
aac3751025 aarch64: fix reg/imm sub insts that read SP, not the zero register.
On AArch64, the zero register (xzr) and the stack pointer (xsp) are
alternately named by the same index `31` in machine code depending on
context. In particular, in the reg-reg-immediate ALU instruction form,
add/subtract will use the stack pointer, not the zero register, if index
31 is given for the first (register) source arg.

In a few places, we were emitting subtract instructions with the zero
register as an argument and a reg/immediate as the second argument. When
an immediate could be incorporated directly (we have the `iconst`
definition visible), this would result in incorrect code being
generated.

This issue was found in `ineg` and in the sequence for vector
right-shifts.

Reported by Ian Cullinan; thanks!
2021-01-05 15:48:07 -08:00
Chris Fallin
dbd2241b60 x64: handle tests of b1 values correctly (only LSB is defined).
Previously, `select` and `brz`/`brnz` instructions, when given a `b1`
boolean argument, would test whether that boolean argument was nonzero,
rather than whether its LSB was nonzero. Since our invariant for mapping
CLIF state to machine state is that bits beyond the width of a value are
undefined, the proper lowering is to test only the LSB.

(aarch64 does not have the same issue because its `Extend` pseudoinst
already properly handles masking of b1 values when a zero-extend is
requested, as it is for select/brz/brnz.)

Found by Nathan Ringo on Zulip [1] (thanks!).

[1]
https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s
2021-01-05 14:45:46 -08:00
Yury Delendik
2964023a77 [SIMD][x86_64] Add encoding for PMADDWD (#2530)
* [SIMD][x86_64] Add encoding for PMADDWD

* also for "experimental_x64"
2020-12-24 07:52:50 -06:00
Johnnie Birch
f705a72aeb Refactor packed moves to use xmm_mov instead of xmm_rm_r
Refactors previous packed move implementation to use xmm_mov
instead of xmm_rm_r which looks to simplify register accounting
during lowering.
2020-12-16 17:13:27 -08:00
Johnnie Birch
51973aefbb Implements x64 SIMD loads for the new backend. 2020-12-16 17:13:27 -08:00
Chris Fallin
e91987c43c Allow both x86 backends to be included, selected with a "variant" flag. (#2514)
This PR adds a new `isa::lookup_variant()` that takes a `BackendVariant`
(`Legacy`, `MachInst` or `Any`), and exposes both x86 backends as
separate variants if both are compiled into the build.

This will allow some new use-cases that require both backends in the
same process: for example, differential fuzzing between old and new
backends, or perhaps allowing for dynamic feature-flag selection between
the backends.
2020-12-16 09:56:04 -06:00
Chris Fallin
743529b4eb Merge pull request #2492 from uweigand/endian-memory-v5
Support explicit endianness in Cranelift IR MemFlags
2020-12-14 13:59:08 -08:00
Ulrich Weigand
467a1af83a Support explicit endianness in Cranelift IR MemFlags
WebAssembly memory operations are by definition little-endian even on
big-endian target platforms.  However, other memory accesses will require
native target endianness (e.g. to access parts of the VMContext that is
also accessed by VM native code).  This means on big-endian targets,
the code generator will have to handle both little- and big-endian
memory accesses.  However, there is currently no way to encode that
distinction into the Cranelift IR that describes memory accesses.

This patch provides such a way by adding an (optional) explicit
endianness marker to an instance of MemFlags.  Since each Cranelift IR
instruction that describes memory accesses already has an instance of
MemFlags attached, this can now be used to provide endianness
information.

Note that by default, memory accesses will continue to use the native
target ISA endianness.  To override this to specify an explicit
endianness, a MemFlags value that was built using the set_endianness
routine must be used.  This patch does so for accesses that implement
WebAssembly memory operations.

This patch addresses issue #2124.
2020-12-14 20:15:37 +01:00
bjorn3
8f7f8ee0b4 Fix iconst.i8 0 miscompilation 2020-12-12 09:44:05 +01:00
Alex Crichton
25000afe69 Enable fuzzing the module linking implementation
This commit updates all the wasm-tools crates that we use and enables
fuzzing of the module linking proposal in our various fuzz targets. This
also refactors some of the dummy value generation logic to not be
fallible and to always succeed, the thinking being that we don't want to
accidentally hide errors while fuzzing. Additionally instantiation is
only allowed to fail with a `Trap`, other failure reasons are unwrapped.
2020-12-11 08:36:52 -08:00
Chris Fallin
267d4a8bdb Merge pull request #2490 from cfallin/fix-popcnt-load-width
x64 lowering fix: i32.popcnt should not merge load and make it 64-bit.
2020-12-08 22:28:41 -08:00
Y-Nak
855a6374dd Fix missing modification of jump table in licm 2020-12-09 11:13:33 +09:00
Chris Fallin
6632c45c01 x64 lowering fix: i32.popcnt should not merge load and make it 64-bit.
As a subtle consequence of the recent load-op fusion, popcnt of a
value that came from a load.i32 was compiling into a 64-bit load. This
is a result of the way in which x86 infers the width of loads: it is a
consequence of the instruction containing the memory reference, not the
memory reference itself. So the `input_to_reg_mem()` helper (convert an
instruction input into a register or memory reference) was providing the
appropriate memory reference for the result of a load.i32, but never
encoded the assumption that it would only be used in a 32-bit
instruction. It turns out that popcnt.i32 uses a 64-bit instruction to
load this RM op, hence widening a 32-bit to 64-bit load (which is
problematic when the offset is (memory_length - 4)).

Separately, popcnt was using the RM operand twice, resulting in two
loads if we merged a load. This isn't a correctness bug in practice
because only a racy sequence (store interleaving between the loads)
would produce incorrect results, but we decided earlier to treat loads
as effectful for now, neither reordering nor duplicating them, to
deliberately reduce complexity.

Because of the second issue, the fix is just to force the operand into a
register always, so any source load will not be merged.

Discovered via fuzzing with oss-fuzz.
2020-12-08 12:24:34 -08:00
Chris Fallin
2cec20aa57 Merge pull request #2486 from cfallin/fix-probestack
Two Lucet-related fixes to stack overflow handling.
2020-12-07 16:47:37 -08:00
Chris Fallin
3a01d14712 Two Lucet-related fixes to stack overflow handling.
Lucet uses stack probes rather than explicit stack limit checks as
Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I
previously was not running some Lucet runtime tests with the new
backend, so was missing some test failures due to missing pieces in the
new backend.

This PR adds (i) calls to probestack, when enabled, in the prologue of
every function with a stack frame larger than one page (configurable via
flags); and (ii) trap metadata for every instruction on x86-64 that can
access the stack, hence be the first point at which a stack overflow is
detected when the stack pointer is decremented.
2020-12-07 16:08:53 -08:00
Chris Fallin
1dddba649a x64 regalloc register order: put caller-saves (volatiles) first.
The x64 backend currently builds the `RealRegUniverse` in a way that
is generating somewhat suboptimal code. In many blocks, we see uses of
callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in
very short leaf functions where there are plenty of volatiles to use.
This is leading to unnecessary spills/reloads.

On one (local) test program, a medium-sized C benchmark compiled to Wasm
and run on Wasmtime, I am seeing a ~10% performance improvement with
this change; it will be less pronounced in programs with high register
pressure (there we are likely to use all registers regardless, so the
prologue/epilogue will save/restore all callee-saves), or in programs
with fewer calls, but this is a clear win for small functions and in
many cases removes prologue/epilogue clobber-saves altogether.

Separately, I think the RA's coalescing is tripping up a bit in some
cases; see e.g. the filetest touched by this commit that loads a value
into %rsi then moves to %rax and returns immediately. This is an
orthogonal issue, though, and should be addressed (if worthwhile) in
regalloc.rs.
2020-12-06 22:37:43 -08:00
Nick Fitzgerald
502b39606f Merge pull request #2364 from fitzgen/peepmatic-rebuild-test
peepmatic: Make the test-we-can-get-and-rebuild peephole optimizers test work on arm64
2020-12-04 10:32:39 -08:00
Julian Seward
8f34d2dc59 aarch64 isel: collect_address_addends: correctly handle ExtendOp::UXTW(negative immediate).
The current code doesn't correctly handle the case where `ExtendOp::UXTW` has
as source, a constant-producing insn that produces a negative (32-bit) value.
Then the value is incorrectly sign-extended to 64 bits (in fact, this has
already been done by `ctx.get_constant(insn)`), whereas it needs to be zero
extended.  The obvious fix, done here, is just to force bits 63:32 of the
extension to zero, hence zero-extending it.
2020-12-04 19:21:40 +01:00
Chris Fallin
bfd10512c1 Merge pull request #2473 from cfallin/fix-lowering
Fix lowering instruction-sinking (load-merging) bug.
2020-12-03 15:56:50 -08:00
Nick Fitzgerald
76d3fb61c3 Merge pull request #2293 from fitzgen/souper-no-assign-constant
souper-harvest: Do not generate assignments of constants
2020-12-03 15:20:41 -08:00
Chris Fallin
3e516e784b Fix lowering instruction-sinking (load-merging) bug.
This fixes a subtle corner case exposed during fuzzing. If we have a bit
of CLIF like:

```
    v0 = load.i64 ...
    v1 = iadd.i64 v0, ...
    v2 = do_other_thing v1
    v3 = load.i64 v1
```

and if this is lowered using a machine backend that can merge loads into
ALU ops, *and* that has an addressing mode that can look through add
ops, then the following can happen:

1. We lower the load at `v3`. This looks backward at the address
   operand tree and finds that `v1` is `v0` plus other things; it has an
   addressing mode that can add `v0`'s register and the other things
   directly; so it calls `put_value_in_reg(v0)` and uses its register in
   the amode. At this point, the add producing `v1` has no references,
   so it will not (yet) be codegen'd.
2. We lower `do_other_thing`, which puts `v1` in a register and uses it.
   the `iadd` now has a reference.
3. We reach the `iadd` and, because it has a reference, lower it. Our
   machine has the ability to merge a load into an ALU operation.
   Crucially, *we think the load at `v0` is mergeable* because it has
   only one user, the add at `v1` (!). So we merge it.
4. We reach the `load` at `v0` and because it has been merged into the
   `iadd`, we do not separately codegen it. The register that holds `v0`
   is thus never written, and the use of this register by the final load
   (Step 1) will see an undefined value.

The logic error here is that in the presence of pattern matching that
looks through pure ops, we can end up with multiple uses of a value that
originally had a single use (because we allow lookthrough of pure ops in
all cases). In other words, the multiple-use-ness of `v1` "passes
through" in some sense to `v0`. However, the load sinking logic is not
aware of this.

The fix, I think, is pretty simple: we disallow an effectful instruction
from sinking/merging if it already has some other use when we look back
at it.

If we disallowed lookthrough of *any* op that had multiple uses, even
pure ones, then we would avoid this scenario; but earlier experiments
showed that to have a non-negligible performance impact, so (given that
we've worked out the logic above) I think this complexity is worth it.
2020-12-03 14:59:12 -08:00
Nick Fitzgerald
1efdf10ca7 souper-harvest: ensure that select conditions are of type i1 2020-12-03 14:43:28 -08:00
Nick Fitzgerald
f7cf771ee6 souper-harvest: Do not generate assignments of constants
It turns out that Souper does not allow a constant to be assigned to a variable,
they may only be used as operands. The 2.0.0 version of the `souper-ir` crate
correctly reflects this. In the `cranelift_codegen::souper_harvest` module, we
need to modify our Souper IR harvester so that it delays converting `iconst` and
`bconst` into Souper IR until their values are used as operands. Finally, some
unit tests in the `peepmatic-souper` crate need some small updates as well.
2020-12-03 14:43:14 -08:00
Pat Hickey
0f1dc9a735 Merge pull request #2403 from bjorn3/simplejit_hot_swapping
SimpleJIT hot code swapping
2020-12-03 13:36:32 -08:00
Chris Fallin
8e0e44b802 Merge pull request #2413 from akirilov-arm/fmov_vector
Cranelift AArch64: Further vector constant improvements
2020-12-03 09:43:24 -08:00
Alex Crichton
f003388ec7 Implement imported/exported modules/instances (#2461)
* Implement imported/exported modules/instances

This commit implements the final piece of the module linking proposal
which is to flesh out the support for importing/exporting instances and
modules. This ended up having a few changes:

* Two more `PrimaryMap` instances are now stored in an `Instance`. The value
  for instances is `InstanceHandle` (pretty easy) and for modules it's
  `Box<dyn Any>` (less easy).

* The custom host state for `InstanceHandle` for `wasmtime` is now
  `Arc<TypeTables` to be able to fully reconstruct an instance's types
  just from its instance.

* Type matching for imports now has been updated to take
  instances/modules into account.

One of the main downsides of this implementation is that type matching
of imports is duplicated between wasmparser and wasmtime, leading to
posssible bugs especially in the subtelties of module linking. I'm not
sure how best to unify these two pieces of validation, however, and it
may be more trouble than it's worth.

cc #2094

* Update wat/wast/wasmparser

* Review comments

* Fix a bug in publish script to vendor the right witx

Currently there's two witx binaries in our repository given the two wasi
spec submodules, so this updates the publication script to vendor the
right one.
2020-12-03 10:15:42 -06:00
Anton Kirilov
f59b274d22 Cranelift AArch64: Further vector constant improvements
Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements,
and the vector variant of FMOV.

Copyright (c) 2020, Arm Limited.
2020-12-03 15:30:24 +00:00
Johnnie Birch
a548516f97 Enable SIMD spec tests for f32x4_rounding and f64x4_rounding.
Also address some review comments pointing out minor issues.
2020-12-02 13:44:51 -08:00
Johnnie Birch
a33e755cb2 Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest 2020-12-02 13:44:51 -08:00
Chris Fallin
dcbc4768c9 Merge pull request #2463 from cfallin/fix-heap-bounds-check-x64
x64 backend: fix condition-code used for part of explicit heap check.
2020-12-02 11:24:54 -08:00
Chris Fallin
60d7f7de0a Debug info: two fixes in x64 backend.
- Sort by generated-code offset to maintain invariant and avoid gimli
  panic.
- Fix srcloc interaction with branch peephole optimization in
  MachBuffer: if a srcloc range overlaps with a branch that is
  truncated, remove that srcloc range.

These issues were found while fuzzing the new backend (#2453); I suspect
that they arise with the new backend because we can sink instructions
(e.g. loads or extends) in more interesting ways than before, but I'm
not entirely sure.

Test coverage will be via the fuzz corpus once #2453 lands.
2020-12-02 10:41:14 -08:00
Chris Fallin
c9a81f008d x64 backend: fix condition-code used for part of explicit heap check.
A dynamic heap address computation may create up to two conditional
branches: the usual bounds-check, but also (in some cases) an
offset-addition overflow check.

The x64 backend had reversed the condition code for this check,
resulting in an always-trapping execution for a valid offset. I'm
somewhat surprised this has existed so long, but I suppose the
particular conditions (large offset, small offset guard, dynamic heap)
have been somewhat rare in our testing so far.

Found via fuzzing in #2453.
2020-12-02 10:40:53 -08:00
Chris Fallin
4bf2c15014 Merge pull request #2440 from jlb6740/remaining_simd_conversions
Adds support for i32x4.trunc_sat_f32x4_u
2020-11-30 22:53:59 -08:00
Johnnie Birch
09f3d4e331 Refactor convert from float to unsigned int and add comments 2020-11-29 00:04:24 -08:00
Chris Fallin
d413b907b4 Merge pull request #2414 from jgouly/extend-refactor
arm64: Refactor Inst::Extend handling
2020-11-25 17:22:07 -08:00
Nick Fitzgerald
93c199363f Merge pull request #2449 from bytecodealliance/cfallin/add-pre-host-hooks
Add FuncEnvironment hooks to generate prologue and epilogue code.
2020-11-24 17:48:29 -08:00
Chris Fallin
4300c2c075 Add FuncEnvironment hooks to generate prologue and epilogue code.
In some cases, it is useful to do some work at entry to or exit from a
Cranelift function translated from WebAssembly. This PR adds two
optional methods to the `FuncEnvironment` trait to do just this,
analogous to the pre/post-hooks on operators that already exist.

This PR also includes a drive-by compilation fix due to the latest
nightly wherein `.is_empty()` on a `Range` ambiguously refers to either
the `Range` impl or the `ExactSizeIterator` impl and can't resolve.
2020-11-24 16:36:15 -08:00
Johnnie Birch
ade9f12c72 Add support for X86_64 SIMD narrow instructions for vcode backend
Adds lowering support for:
i8x16.narrow_i16x8_s
i8x16.narrow_i16x8_u
i16x8.narrow_i32x4_s
i16x8.narrow_i32x4_u
2020-11-23 09:58:39 -08:00
Johnnie Birch
2cc501427e Add remaining X86_64 support for pack w/ signed/unsigned saturation
Adds lowering for packssdw, packusdw, packuswb
2020-11-22 23:14:29 -08:00
Johnnie Birch
258013cff1 Add support for SWidenHigh and UWidenHigh X86_64 for vcode backend
Support is based on SSE4.1
2020-11-22 22:14:19 -08:00
Johnnie Birch
124096735b Add support for palignr for X86_64 vcode backend 2020-11-22 22:14:02 -08:00
Johnnie Birch
f9937575d6 Add support for SwidenLow and UwidenLow for the X86_64 vcode backend
Adds support using lowerings compatible with SSE4.1
2020-11-22 21:38:53 -08:00
Johnnie Birch
615a575da1 Add support for x86_64 packed move lowering for the vcode backend 2020-11-22 20:23:00 -08:00
Johnnie Birch
b6d783a120 Adds support for i32x4.trunc_sat_f32x4_u 2020-11-22 12:00:54 -08:00
Alex Crichton
4d64c68b05 Run rustfmt 1.48
Run rustfmt over wasmtime with the new stable release which looks like
it wants to reformat a few lines.
2020-11-19 11:12:30 -08:00
Chris Fallin
073c727a74 x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap.
This end result was previously enacted by carrying a `SourceLoc` on
every load/store, which was somewhat cumbersome, and only indirectly
encoded metadata about a memory reference (can it trap) by its presence
or absence. We have a type for this -- `MemFlags` -- that tells us
everything we might want to know about a load or store, and we should
plumb it through to code emission instead.

This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load
and store `Inst` variants on aarch64. These two choices seem to factor
things out in the nicest way: there are relatively few load/store insts
on aarch64 but many addressing modes, while the opposite is true on x64.
2020-11-17 11:43:06 -08:00
Chris Fallin
b97f07b405 x64 backend: merge loads into ALU ops when appropriate.
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

```
    movq 0(%rdi), %rax
    addq %rax, %rbx
```

we want to generate this:

```
    addq 0(%rdi), %rax
```

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the *only* use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized `LowerCtx` types rather than a `&mut dyn
LowerCtx`.

On `bz2.wasm`, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.
2020-11-17 11:06:46 -08:00