Commit Graph

2116 Commits

Author SHA1 Message Date
whitequark
880e692fd4 x86: add encoding for bnot.b1.
Fixes #1743.

Co-authored-by: iximeow <git@iximeow.net>
2020-05-28 08:43:25 -07:00
Andrew Brown
b017844bef Fix interpreter semantics of 'irsub_imm'
Previously it used `arg - imm` but the functionality should be a wrapping `imm - arg` (see `cranelift/codegen/meta/src/shared/instructions.rs`).
2020-05-28 16:28:27 +02:00
teapotd
fbac2e53f9 Make vconst BxN match specification 2020-05-27 09:37:13 -07:00
Andrew Brown
628a9f0eaa Print more detailed test run failures (#1764) 2020-05-27 09:04:46 -05:00
Andrew Brown
4e016afca3 Add trace-level logging to interpreter 2020-05-26 18:45:25 +02:00
Andrew Brown
ca0c24e346 Avoid recursion in Interpreter::block 2020-05-26 18:45:25 +02:00
Chris Fallin
6ead7527af Merge pull request #1748 from akirilov-arm/simd_store
Enable the wast::Cranelift::spec::simd::simd_store test for AArch64
2020-05-26 09:21:36 -07:00
Ömer Sinan Ağacan
c619136752 Remove Eq bound of ReservedValue trait
A full Eq implementation is no needed for ReservedValue, as we only need
to check whether a value is the reserved one. For entities (defined with
`entity_impl!`) this doesn't make much difference, but for more
complicated types this avoids generating redundant `Eq`s.
2020-05-26 10:27:55 +02:00
bjorn3
eeb1e141ba Add some assertions to cranelift_frontend 2020-05-26 10:17:08 +02:00
Andrew Brown
6e7276e48d Replace single use of Frame::with_parameters with Frame::set_all 2020-05-26 09:56:58 +02:00
Andrew Brown
d73cb48c29 Add logging to frame operations 2020-05-26 09:56:58 +02:00
Andrew Brown
c92917de15 Fix typo in sadd_sat instruction definition 2020-05-26 09:55:26 +02:00
Anton Kirilov
8a928830ac Enable the wast::Cranelift::spec::simd::simd_store test for AArch64
Copyright (c) 2020, Arm Limited.
2020-05-24 22:53:07 +01:00
Chris Fallin
51f9ac2150 Merge pull request #1741 from cfallin/filetest-vcode-compile
Merge `vcode` filetest mode into `compile`.
2020-05-22 18:57:21 -07:00
Chris Fallin
48573b52b2 Merge vcode filetest mode into compile.
I hadn't realized before that the filetest backend for `test vcode` is
doing essentially what `compile` is doing, but for new (`MachInst`)
backends: it is just getting a disassembly and running it through
filecheck. There's no reason not to reuse `test compile` for the AArch64
tests as well.

This was motivated by the desire to have "this IR compiles successfully"
tests work on both x86 and AArch64. It seems this should work fine by
adding multiple `target` directives when a test case should be
compile-tested on multiple architectures.
2020-05-22 17:28:48 -07:00
Chris Fallin
73537e72c0 Merge pull request #1732 from jgouly/copysign-fpu
arm64: Use FPU instrctions for Fcopysign
2020-05-22 17:25:33 -07:00
Peter Huene
f36539130b Merge pull request #1734 from peterhuene/fix-saved-fprs
Cranelift: Fix FPR saving and shadow space allocation for Windows x64.
2020-05-22 12:06:37 -07:00
whitequark
b2e8ed4dc9 cranelift: add i64.[us]{div,rem} libcalls.
These libcalls are useful for 32-bit platforms.
2020-05-22 11:41:56 +00:00
Peter Huene
ce5f3e153b Only update XMM save unwind operation offsets when using a FP.
This commit prevents updating the XMM save unwind operation offsets when a
frame pointer is not used, even though currently Cranelift always uses a
frame pointer.

This will prevent incorrect unwind information in the future when we start
omitting frame pointers.
2020-05-21 16:46:30 -07:00
Peter Huene
2cd5ed1880 Address code review feedback. 2020-05-21 15:57:11 -07:00
Joey Gouly
02c3f238f8 arm64: Use FPU instrctions for Fcopysign
Copyright (c) 2020, Arm Limited.
2020-05-21 18:14:12 +01:00
Peter Huene
78c3091e84 Fix FPR saving and shadow space allocation for Windows x64.
This commit fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this commit removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents `layout_stack` from putting a spill or explicit
slot in this reserved space. In the future, `layout_stack` should take
advantage of the *caller-provided* shadow space for spills, but this commit does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.
2020-05-20 15:37:30 -07:00
Chris Fallin
c9e3b71c39 Merge pull request #1729 from cfallin/machinst-branch-opt
Fix MachBuffer branch optimization.
2020-05-20 14:43:57 -07:00
Chris Fallin
13e12908a6 MachBuffer branch opts: comments approximating a semi-formal correctness proof. 2020-05-20 14:12:19 -07:00
Chris Fallin
80ab154d04 Update from review comments. 2020-05-20 12:35:36 -07:00
Benjamin Bouvier
1f620e1b46 cranelift: bump regalloc.rs to 0.0.24 and adapt to latest API changes; 2020-05-20 15:37:15 +02:00
Chris Fallin
e11094b28b Fix MachBuffer branch optimization.
This patch fixes a subtle bug that occurred in the MachBuffer branch
optimization: in tracking labels at the current buffer tail using a
sorted-by-offset array, the code did not update this array properly when
redirecting labels. As a result, the dead-branch removal was unsafe,
because not every label pointing to a branch is guaranteed to be
redirected properly first.

Discovered while doing performance testing: bz2 silently took a wrong
branch and exited compression early. (Eek!)

To address this problem, this patch adopts a slightly simpler data
structure: we only track the labels *at the current buffer tail*, and
*at the start of each branch*, and we're careful to update these
appropriately to maintain the invariants. I'm pretty confident that this
is correct now, but we should (still) fuzz it a bunch, because wrong
control flow scares me a nonzero amount. I should probably also actually
write out a formal proof that these data-structure updates are correct.
The optimizations are important for performance (removing useless empty
blocks, and taking advantage of any fallthrough opportunities at all),
so I don't think we would want to drop them entirely.
2020-05-19 18:09:18 -07:00
Nick Fitzgerald
9d2100e54a Limit the size of automaton keys in the peepmatic_simple_automata fuzz target
Fixes https://oss-fuzz.com/testcase-detail/5742905129172992
2020-05-19 09:12:50 -07:00
Chris Fallin
d8d6fbe58c Merge pull request #1718 from cfallin/machinst-codebuffer
Rework of MachInst isel, branch fixups and lowering, and block ordering.
2020-05-19 07:17:22 -07:00
Nick Fitzgerald
28d6df0db6 Limit the size of automaton keys in the peepmatic_fst_diff fuzz target (#1724)
This should avoid timeouts caused by large keys.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22251
2020-05-18 21:27:00 -05:00
Chris Fallin
bdd2873c8c Address review comments. 2020-05-18 16:25:26 -07:00
Chris Fallin
687aca00fe Update x64 backend to use new lowering APIs. 2020-05-18 16:25:15 -07:00
Chris Fallin
72e6be9342 Rework of MachInst isel, branch fixups and lowering, and block ordering.
This patch includes:

- A complete rework of the way that CLIF blocks and edge blocks are
  lowered into VCode blocks. The new mechanism in `BlockLoweringOrder`
  computes RPO over the CFG, but with a twist: it merges edge blocks intto
  heads or tails of original CLIF blocks wherever possible, and it does
  this without ever actually materializing the full nodes-plus-edges
  graph first. The backend driver lowers blocks in final order so
  there's no need to reshuffle later.

- A new `MachBuffer` that replaces the `MachSection`. This is a special
  version of a code-sink that is far more than a humble `Vec<u8>`. In
  particular, it keeps a record of label definitions and label uses,
  with a machine-pluggable `LabelUse` trait that defines various types
  of fixups (basically internal relocations).

  Importantly, it implements some simple peephole-style branch rewrites
  *inline in the emission pass*, without any separate traversals over
  the code to use fallthroughs, swap taken/not-taken arms, etc. It
  tracks branches at the tail of the buffer and can (i) remove blocks
  that are just unconditional branches (by redirecting the label), (ii)
  understand a conditional/unconditional pair and swap the conditional
  polarity when it's helpful; and (iii) remove branches that branch to
  the fallthrough PC.

  The `MachBuffer` also implements branch-island support. On
  architectures like AArch64, this is needed to allow conditional
  branches within plausibly-attainable ranges (+/- 1MB on AArch64
  specifically). It also does this inline while streaming through the
  emission, without any sort of fixpoint algorithm or later moving of
  code, by simply tracking outstanding references and "deadlines" and
  emitting an island just-in-time when we're in danger of going out of
  range.

- A rework of the instruction selector driver. This is largely following
  the same algorithm as before, but is cleaned up significantly, in
  particular in the API: the machine backend can ask for an input arg
  and get any of three forms (constant, register, producing
  instruction), indicating it needs the register or can merge the
  constant or producing instruction as appropriate. This new driver
  takes special care to emit constants right at use-sites (and at phi
  inputs), minimizing their live-ranges, and also special-cases the
  "pinned register" to avoid superfluous moves.

Overall, on `bz2.wasm`, the results are:

    wasmtime full run (compile + runtime) of bz2:

    baseline:   9774M insns, 9742M cycles, 3.918s
    w/ changes: 7012M insns, 6888M cycles, 2.958s  (24.5% faster, 28.3% fewer insns)

    clif-util wasm compile bz2:

    baseline:   2633M insns, 3278M cycles, 1.034s
    w/ changes: 2366M insns, 2920M cycles, 0.923s  (10.7% faster, 10.1% fewer insns)

    All numbers are averages of two runs on an Ampere eMAG.
2020-05-16 23:08:22 -07:00
Y-Nak
0393d101b1 Fix typo in peepmatic (#1712) 2020-05-15 09:47:16 -05:00
Nick Fitzgerald
01f46d0238 Merge pull request #1692 from fitzgen/update-to-wasmparser-0.55.0
Update to using `wasmparser` 0.55.0
2020-05-14 14:00:24 -07:00
Chris Fallin
df4028749e Merge pull request #1699 from jgouly/inst-size
Reduce arm64 Inst enum size
2020-05-14 13:44:46 -07:00
Nick Fitzgerald
1a4f3fb2df Update deps and tests for anyref --> externref
* Update to using `wasmparser` 0.55.0
* Update wasmprinter to 0.2.5
* Update `wat` to 1.0.18, and `wast` to 17.0.0
2020-05-14 12:47:37 -07:00
Nick Fitzgerald
3c0b64fef7 Merge pull request #1710 from fitzgen/remove-unused-lhs-member
peepmatic: remove unused member from `PeepholeOptimizer`
2020-05-14 12:03:31 -07:00
Nick Fitzgerald
e9ef8ea3d5 peepmatic: remove unused member from PeepholeOptimizer
This is dead code, left over from an earlier design.
2020-05-14 11:08:59 -07:00
Nick Fitzgerald
fb7a690efc Merge pull request #1687 from fitzgen/sign-extend-immediates
cranelift: Sign extend `Imm64` immediates
2020-05-14 10:09:53 -07:00
Nick Fitzgerald
c093dee79e cranelift: Let lifetime elision elide lifetimes 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
923a73be7b deps: Bump z3 to 0.5.1
This fixes Windows builds.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
8d7ed0fd13 deps: Update wast to 15.0.0
This also updates `wat` in the lockfile so that the SIMD spec tests are passing
again.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
22a070ed4f peepmatic: Apply some review suggestions from @bjorn3 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
fd4f08e75f peepmatic: rustfmt 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
52c6ece5f3 peepmatic: Make peepmatic optional to enable
Rather than outright replacing parts of our existing peephole optimizations
passes, this makes peepmatic an optional cargo feature that can be enabled. This
allows us to take a conservative approach with enabling peepmatic everywhere,
while also allowing us to get it in-tree and make it easier to collaborate on
improving it quickly.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
6e135b3aea peepmatic: Fix a failed assertion due to extra iterations after fixed point
After replacing an instruction with an alias to an earlier value, trying to
further optimize that value is unnecessary, since we've already processed it,
and also was triggering an assertion.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
eb2dab0aa4 peepmatic: Save RHS actions as a boxed slice, not vec
A boxed slice is only two words, while a vec is three words. This should cut
down on the memory size of our automata and improve cache usage.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
210b036320 peepmatic: Represent various id types with u16
These ids end up in the automaton, so making them smaller should give us better
data cache locality and also smaller serialized sizes.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
469104c4d3 peepmatic: Make the results of match operations a smaller and more cache friendly 2020-05-14 07:52:23 -07:00