Commit Graph

1090 Commits

Author SHA1 Message Date
Chris Fallin
d413b907b4 Merge pull request #2414 from jgouly/extend-refactor
arm64: Refactor Inst::Extend handling
2020-11-25 17:22:07 -08:00
Nick Fitzgerald
93c199363f Merge pull request #2449 from bytecodealliance/cfallin/add-pre-host-hooks
Add FuncEnvironment hooks to generate prologue and epilogue code.
2020-11-24 17:48:29 -08:00
Chris Fallin
4300c2c075 Add FuncEnvironment hooks to generate prologue and epilogue code.
In some cases, it is useful to do some work at entry to or exit from a
Cranelift function translated from WebAssembly. This PR adds two
optional methods to the `FuncEnvironment` trait to do just this,
analogous to the pre/post-hooks on operators that already exist.

This PR also includes a drive-by compilation fix due to the latest
nightly wherein `.is_empty()` on a `Range` ambiguously refers to either
the `Range` impl or the `ExactSizeIterator` impl and can't resolve.
2020-11-24 16:36:15 -08:00
Alex Crichton
4d64c68b05 Run rustfmt 1.48
Run rustfmt over wasmtime with the new stable release which looks like
it wants to reformat a few lines.
2020-11-19 11:12:30 -08:00
Chris Fallin
073c727a74 x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap.
This end result was previously enacted by carrying a `SourceLoc` on
every load/store, which was somewhat cumbersome, and only indirectly
encoded metadata about a memory reference (can it trap) by its presence
or absence. We have a type for this -- `MemFlags` -- that tells us
everything we might want to know about a load or store, and we should
plumb it through to code emission instead.

This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load
and store `Inst` variants on aarch64. These two choices seem to factor
things out in the nicest way: there are relatively few load/store insts
on aarch64 but many addressing modes, while the opposite is true on x64.
2020-11-17 11:43:06 -08:00
Chris Fallin
b97f07b405 x64 backend: merge loads into ALU ops when appropriate.
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

```
    movq 0(%rdi), %rax
    addq %rax, %rbx
```

we want to generate this:

```
    addq 0(%rdi), %rax
```

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the *only* use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized `LowerCtx` types rather than a `&mut dyn
LowerCtx`.

On `bz2.wasm`, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.
2020-11-17 11:06:46 -08:00
Chris Fallin
712ff22492 AArch64 SIMD: pattern-match load+splat into LD1R instruction. 2020-11-16 15:59:28 -08:00
Chris Fallin
39b5736727 Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat.
This was added as an incremental step to improve AArch64 code quality in
PR #2278. At the time, we did not have a way to pattern-match the load +
splat opcode sequence that the relevant Wasm opcodes lowered to.
However, now with PR #2366, we can merge effectful instructions such as
loads into other ops, and so we can do this pattern matching directly.
The pattern-matching update will come in a subsequent commit.
2020-11-16 15:31:56 -08:00
Chris Fallin
3c8cb7b908 MachInst lowering logic: allow effectful instructions to merge.
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) *into* other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.

The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.

The first insight is that given the right basic conditions, i.e.  that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
*one less* than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming *out* of the first (effectful def) instruction and *in* to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.

The second insight is that we can keep state as we scan, tracking the
"current color", and *update* this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively *re-color* every
instruction it moved over; this may allow further sinks.

Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):

```
  v0 = load x
  v1 = load y
  v2 = add v0, 1
  v3 = add v1, 1
```

Scanning from bottom to top, we first see the add producing `v3` and we
can sink the load producing `v1` into it, producing a load + ALU-op
machine instruction. This is legal because `v1` moves over only `v2`,
which is a pure instruction. Consider, though, `v2`: under a simple
scheme that has no other context, `v0` could not sink to `v2` because it
would move over `v1`, another load. But because we already sunk `v1`
down to `v3`, we are free to sink `v0` to `v2`; the update of the
"current color" during the scan allows this.

This PR also cleans up the `LowerCtx` interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly from `LowerCtx::get_input()`, it now returns
zero or more of (constant, mergable inst) from
`LowerCtx::maybe_get_input_as_source_or_const()`, and returns the
register only from `LowerCtx::put_input_in_reg()`. This removes the need
to explicitly denote uses of the register, so it's a little safer.

Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize the `x64` backend by using direct-memory operands.
2020-11-16 14:53:45 -08:00
Chris Fallin
7b9d870030 Merge pull request #2410 from cfallin/x64-gc
Fix and enable GC on new x64 backend.
2020-11-13 09:40:05 -08:00
Chris Fallin
88fce766b0 Merge pull request #2411 from cfallin/x86-backend-cfg
Don't run old x86 backend-specific tests with new x64 backend.
2020-11-13 09:29:16 -08:00
Joey Gouly
70cbc4ca7c arm64: Refactor Inst::Extend handling
This refactors the handling of Inst::Extend and simplifies the lowering
of Bextend and Bmask, which allows the use of SBFX instructions for
extensions from 1-bit booleans. Other extensions use aliases of BFM,
and the code was changed to reflect that, rather than hard coding bit
patterns. Also ImmLogic is now implemented, so another hard coded
instruction can be removed.

As part of looking at boolean handling, `normalize_boolean_result` was
changed to `materialize_boolean_result`, such that it can use either
CSET or CSETM. Using CSETM saves an instruction (previously CSET + SUB)
for booleans bigger than 1-bit.

Copyright (c) 2020, Arm Limited.
2020-11-13 16:17:25 +00:00
Chris Fallin
0d703c12ed Don't run old x86 backend-specific tests with new x64 backend.
Some of the test failures tracked by #2079 are in unwind tests that are
specific to the old x86 backend: namely, these tests invoke the unwind
implementation that is paired with the old backend, rather than generic
over all backends. It thus doesn't make sense to try to run these tests
with the new backend. (The new backend's unwind code should have
analogous tests written/ported over eventually.)

It seems that we were actually building *both* x86 backends when the
`x64` feature was enabled, except that the old x86 backend would never
be instantiated by the usual ISA-lookup logic because a `x86-64` target
triple unconditionally resolves to the new one.

This PR resolves both of the issues by tweaking the feature-config
directives to exclude the `x86` backend when `x64` is enabled.
2020-11-12 20:44:53 -08:00
Chris Fallin
01b60e81b0 Fix and enable GC on new x64 backend.
One critical bit of plumbing was missing: the `StackMapSink` passed to
`compile_and_emit` was not actually receiving stackmaps. This seemingly
very basic issue was not caught because the other major user of reftype
support, SpiderMonkey, extracts stackmaps with a lower-level API. The
SM integration was built this way to avoid an awkward API quirk when
passing stackmaps through a `CodeSink` that proxies them to a
`StackMapSink`: the `CodeSink` wants `Value`s for each reference slot,
while the actual `StackMapSink` does not require these. This PR tweaks
the plumbing in a slightly different way to make `wasmtime` GC tests,
and presumably other consumers of stack-map info from the top-level
Cranelift interface, happy.
2020-11-12 16:55:18 -08:00
Chris Fallin
113d061129 Merge pull request #2369 from akirilov-arm/move_fix
Cranelift AArch64: Various small fixes
2020-11-12 14:59:10 -08:00
Andrew Brown
bd93e69eb4 [machinst x64]: implement packed shifts 2020-11-12 14:21:45 -08:00
Andrew Brown
8ba92853be [machinst x64]: add punpack[hl]bw instructions 2020-11-12 14:21:45 -08:00
Andrew Brown
8131b15921 [machinst x64]: allow addressing of constants 2020-11-12 14:21:45 -08:00
Chris Fallin
89dbc4590d Merge pull request #2363 from cfallin/extend-only-if-abi
Do value-extensions at ABI boundaries only when ABI requires it.
2020-11-12 12:26:20 -08:00
Chris Fallin
fd6433aaf5 Merge pull request #2395 from cfallin/lucet-x64-support
Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.
2020-11-12 12:10:52 -08:00
Julian Seward
cbce34af0a aarch64/inst/unwind.rs: handle zero-length prologues correctly. 2020-11-12 17:41:21 +01:00
Anton Kirilov
edaada3f57 Cranelift AArch64: Various small fixes
* Use FMOV to move 64-bit FP registers and SIMD vectors.
* Add support for additional vector load types.
* Fix the printing of Inst::LoadAddr.

Copyright (c) 2020, Arm Limited.
2020-11-12 13:54:05 +00:00
Chris Fallin
19640367db Merge pull request #2394 from cfallin/no-size-asserts
Remove size-of-struct asserts that break with some Rust versions.
2020-11-11 18:04:34 -08:00
Chris Fallin
5e5e520654 Remove size-of-struct asserts that break with some Rust versions.
The asserts on the sizes of the VCode constant-table data structures
introduced in PR #2328 are dependent on the size of data structures such
as `HashMap` in the standard library, which can change. In particular,
on Rust 1.46 (which is not current, but could be e.g. pinned by a
project using Cranelift), it appears that these asserts fail. We
shouldn't depend on stdlib internals; IMHO the asserts on our own struct
sizes are enough to catch accidental size blowups.
2020-11-11 17:13:28 -08:00
Chris Fallin
5df8840483 Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.
`lucetc` currently *almost*, but not quite, works with the new x64
backend; the only missing piece is support for the particular
instructions emitted as part of its prologue stack-check.

We do not normally see `brff`, `brif`, or `ifcmp_sp` in CLIF generated by
`cranelift-wasm` without the old-backend legalization rules, so these
were not supported in the new x64 backend as they were not necessary for
Wasm MVP support. Using them resulted in an `unimplemented!()` panic.

This PR adds support for `brff` and `brif` analogously to how AArch64
implements them, by pattern-matching the `ifcmp` / `ffcmp` directly.
Then `ifcmp_sp` is a straightforward variant of `ifcmp`.

Along the way, this also removes the notion of "fallthrough block" from
the branch-group lowering method; instead, `fallthrough` instructions
are handled as normal branches to their explicitly-provided targets,
which (in the original CLIF) match the fallthrough block. The reason for
this is that the block reordering done as part of lowering can change
the fallthrough block. We were not using `fallthrough` instructions in
the output produced by `cranelift-wasm`, so this, too, was not
previously caught.

With these changes, the `lucetc` crate in Lucet passes all tests with
the `x64` feature-flag added to its `cranelift-codegen` dependency.
2020-11-11 13:43:39 -08:00
Chris Fallin
997b654235 Merge pull request #2393 from jgouly/constant-addend
arm64: Fold some constants into load instructions
2020-11-11 11:23:21 -08:00
Pat Hickey
aa259ff92a Merge pull request #2390 from bjorn3/more_simplejit_refactors
More SimpleJIT refactorings
2020-11-11 11:16:04 -08:00
Joey Gouly
a5011e8212 arm64: Fold some constants into load instructions
This changes the following:
  mov x0, #4
  ldr x0, [x1, #4]

Into:
  ldr x0, [x1]

I noticed this pattern (but with #0), in a benchmark.

Copyright (c) 2020, Arm Limited.
2020-11-11 18:47:43 +00:00
Julian Seward
41e87a2f99 Support wasm select instruction with V128-typed operands on AArch64.
* this requires upgrading to wasmparser 0.67.0.

* There are no CLIF side changes because the CLIF `select` instruction is
  polymorphic enough.

* on aarch64, there is unfortunately no conditional-move (csel) instruction on
  vectors.  This patch adds a synthetic instruction `VecCSel` which *does*
  behave like that.  At emit time, this is emitted as an if-then-else diamond
  (4 insns).

* aarch64 implementation is otherwise straightforwards.
2020-11-11 18:45:24 +01:00
bjorn3
b7a93c2321 Remove reloc_block
It isn't called and all reloc sinks either ignore it or panic when it is
called.
2020-11-11 12:36:17 +01:00
Chris Fallin
4dce51096d MachInst backends: handle SourceLocs out-of-band, not in Insts.
In existing MachInst backends, many instructions -- any that can trap or
result in a relocation -- carry `SourceLoc` values in order to propagate
the location-in-original-source to use to describe resulting traps or
relocation errors.

This is quite tedious, and also error-prone: it is likely that the
necessary plumbing will be missed in some cases, and in any case, it's
unnecessarily verbose.

This PR factors out the `SourceLoc` handling so that it is tracked
during emission as part of the `EmitState`, and plumbed through
automatically by the machine-independent framework. Instruction emission
code that directly emits trap or relocation records can query the
current location as necessary. Then we only need to ensure that memory
references and trap instructions, at their (one) emission point rather
than their (many) lowering/generation points, are wired up correctly.

This does have the side-effect that some loads and stores that do not
correspond directly to user code's heap accesses will have unnecessary
but harmless trap metadata. For example, the load that fetches a code
offset from a jump table will have a 'heap out of bounds' trap record
attached to it; but because it is bounds-checked, and will never
actually trap if the lowering is correct, this should be harmless.  The
simplicity improvement here seemed more worthwhile to me than plumbing
through a "corresponds to user-level load/store" bit, because the latter
is a bit complex when we allow for op merging.

Closes #2290: though it does not implement a full "metadata" scheme as
described in that issue, this seems simpler overall.
2020-11-10 15:46:53 -08:00
Alex Crichton
8dd091219a Update wasm-tools dependencies
Brings in fixes for some assorted wast issues.
2020-11-09 08:50:03 -08:00
Andrew Brown
c9e8889d47 Update clippy annotation to use latest version (#2375) 2020-11-09 09:24:59 -06:00
Yury Delendik
b2b7bc10e2 machinst aarch64: New backend unwind (#2313)
* Unwind information for aarch64 backend.
2020-11-06 08:02:45 -06:00
Yury Delendik
f60c0f3ec3 cranelift: refactor unwind logic to accommodate multiple backends (#2357)
*    Make cranelift_codegen::isa::unwind::input public
*    Move UnwindCode's common offset field out of the structure
*    Make MachCompileResult::unwind_info more generic
*    Record initial stack pointer offset
2020-11-05 16:57:40 -06:00
Andrew Brown
df59ffb1b6 Align island's worst case size 2020-11-05 14:25:02 -08:00
Andrew Brown
83f182b390 Implement initial emission of constants
This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.
2020-11-05 14:25:02 -08:00
Chris Fallin
a2bbb198de Do value-extensions at ABI boundaries only when ABI requires it.
There has been some confusion over the meaning of the "sign-extend"
(`sext`) and "zero-extend" (`uext`) attributes on parameters and return
values in signatures. According to the three implemented backends, these
attributes indicate that a value narrower than a full register should
always be extended in the way specified. However, they are much more
useful if they mean "extend in this way if the ABI requires extending":
only the ABI backend knows whether or not a particular ABI (e.g., x64
SysV vs. x64 Baldrdash) requires extensions, while only the frontend
(CLIF generator) knows whether or not a value is signed, so the two have
to work in concert.

This is the result of some very helpful discussion in #2354 (thanks to
@uweigand for raising the issue and @bjorn3 for helping to reason about
it).

This change respects the extension attributes in the above way, rather
than unconditionally extending, to avoid potential performance
degradation as we introduce more extension attributes on signatures.
2020-11-05 11:54:35 -08:00
Alex Crichton
e4c3fc5cf2 Update immediate and transitive dependencies
I don't think this has happened in awhile but I've run a `cargo update`
as well as trimming some of the duplicate/older dependencies in
`Cargo.lock` by updating some of our immediate dependencies as well.
2020-11-05 08:34:09 -08:00
Alex Crichton
ab1958434a Bump to 0.21.0 (#2359) 2020-11-05 09:39:53 -06:00
Julian Seward
dd9bfcefaa CL/aarch64: implement the wasm SIMD v128.load{32,64}_zero instructions.
This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  https://github.com/WebAssembly/simd/pull/237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
2020-11-04 20:00:04 +01:00
Chris Fallin
75e02276be Merge pull request #2360 from jgouly/reg_map
aarch64: Fix aarch64_map_regs for FpuRRI
2020-11-04 10:37:50 -08:00
Joey Gouly
0223cb2f8c aarch64: Fix aarch64_map_regs for FpuRRI
This was wrong since I added it in 02c3f238f.

Copyright (c) 2020, Arm Limited.
2020-11-04 16:53:25 +00:00
Ulrich Weigand
80c2d70d2d machinst ABI: Support for accumulating outgoing args
When performing a function call, the platform ABI may require space
on the stack to hold outgoing arguments and/or return values.

Currently, this is supported via decrementing the stack pointer
before the call and incrementing it afterwards, using the
emit_stack_pre_adjust and emit_stack_post_adjust methods of
ABICaller.  However, on some platforms it would be preferable
to just allocate enough space for any call done in the function
in the caller's prologue instead.

This patch adds support to allow back-ends to choose that method.
Instead of calling emit_stack_pre/post_adjust around a call, they
simply call a new accumulate_outgoing_args_size method of
ABICaller instead.  This will pass on the required size to the
ABICallee structure of the calling function, which will accumulate
the maximum size required for all function calls.

That accumulated size is then passed to the gen_clobber_save
and gen_clobber_restore functions so they can include the size
in the stack allocation / deallocation that already happens in
the prologue / epilogue code.
2020-11-03 18:49:34 +01:00
Chris Fallin
5ab7b4aa7f Merge pull request #2345 from uweigand/abi-stackalign
machinst ABI: Allow back-end to define stack alignment
2020-11-03 09:02:41 -08:00
Chris Fallin
0c240991ae Merge pull request #2346 from uweigand/abi-noframepointer
machinst ABI: Pass fixed frame size to gen_clobber_restore
2020-11-03 09:00:59 -08:00
Julian Seward
5a5fb11979 CL/aarch64: implement the wasm SIMD i32x4.dot_i16x8_s instruction
This patch implements, for aarch64, the following wasm SIMD extensions

  i32x4.dot_i16x8_s instruction
  https://github.com/WebAssembly/simd/pull/127

It also updates dependencies as follows, in order that the new instruction can
be parsed, decoded, etc:

  wat          to  1.0.27
  wast         to  26.0.1
  wasmparser   to  0.65.0
  wasmprinter  to  0.2.12

The changes are straightforward:

* new CLIF instruction `widening_pairwise_dot_product_s`

* translation from wasm into `widening_pairwise_dot_product_s`

* new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group)

* translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv`

There is no testcase in this commit, because that is a separate repo.  The
implementation has been tested, nevertheless.
2020-11-03 14:25:04 +01:00
Ulrich Weigand
c9bc4edd08 machinst ABI: Pass fixed frame size to gen_clobber_restore
The ABI common code currently passes the fixed frame size to
the gen_clobber_save back-end routine, which is required to
emit code to allocate the required stack space in the prologue.

Similarly, the back-end needs to emit code to de-allocate the
stack in the epilogue.  However, at this point the back-end
does not have access to that fixed frame size value any more.
With targets that use a frame pointer, this does not matter,
since de-allocation can be done simply by assigning the frame
pointer back to the stack pointer.  However, on targets that
do not use a frame pointer, the frame size is required.

To allow back-ends that option, this patch changes ABI common
code to pass the fixed frame size to get_clobber_restore as
well (the same value as is passed to get_clobber_save).
2020-11-03 11:15:03 +01:00
Ulrich Weigand
d02ae3940c machinst ABI: Allow back-end to define stack alignment
The common gen_prologue code currently assumes that the stack
pointer has to be aligned to twice the word size.  While this
is true for many ABIs, it does not hold universally.

This patch adds a new callback stack_align that back-ends can
provide to define the specific stack alignment required by the
ABI on that platform.
2020-11-03 09:43:55 +01:00
Andrew Brown
6d50099816 Rewrite interpreter generically (#2323)
* Rewrite interpreter generically

This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions).

* Replace macros with closures
2020-11-02 12:28:07 -08:00