Commit Graph

2089 Commits

Author SHA1 Message Date
Nick Fitzgerald
9d2100e54a Limit the size of automaton keys in the peepmatic_simple_automata fuzz target
Fixes https://oss-fuzz.com/testcase-detail/5742905129172992
2020-05-19 09:12:50 -07:00
Chris Fallin
d8d6fbe58c Merge pull request #1718 from cfallin/machinst-codebuffer
Rework of MachInst isel, branch fixups and lowering, and block ordering.
2020-05-19 07:17:22 -07:00
Nick Fitzgerald
28d6df0db6 Limit the size of automaton keys in the peepmatic_fst_diff fuzz target (#1724)
This should avoid timeouts caused by large keys.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22251
2020-05-18 21:27:00 -05:00
Chris Fallin
bdd2873c8c Address review comments. 2020-05-18 16:25:26 -07:00
Chris Fallin
687aca00fe Update x64 backend to use new lowering APIs. 2020-05-18 16:25:15 -07:00
Chris Fallin
72e6be9342 Rework of MachInst isel, branch fixups and lowering, and block ordering.
This patch includes:

- A complete rework of the way that CLIF blocks and edge blocks are
  lowered into VCode blocks. The new mechanism in `BlockLoweringOrder`
  computes RPO over the CFG, but with a twist: it merges edge blocks intto
  heads or tails of original CLIF blocks wherever possible, and it does
  this without ever actually materializing the full nodes-plus-edges
  graph first. The backend driver lowers blocks in final order so
  there's no need to reshuffle later.

- A new `MachBuffer` that replaces the `MachSection`. This is a special
  version of a code-sink that is far more than a humble `Vec<u8>`. In
  particular, it keeps a record of label definitions and label uses,
  with a machine-pluggable `LabelUse` trait that defines various types
  of fixups (basically internal relocations).

  Importantly, it implements some simple peephole-style branch rewrites
  *inline in the emission pass*, without any separate traversals over
  the code to use fallthroughs, swap taken/not-taken arms, etc. It
  tracks branches at the tail of the buffer and can (i) remove blocks
  that are just unconditional branches (by redirecting the label), (ii)
  understand a conditional/unconditional pair and swap the conditional
  polarity when it's helpful; and (iii) remove branches that branch to
  the fallthrough PC.

  The `MachBuffer` also implements branch-island support. On
  architectures like AArch64, this is needed to allow conditional
  branches within plausibly-attainable ranges (+/- 1MB on AArch64
  specifically). It also does this inline while streaming through the
  emission, without any sort of fixpoint algorithm or later moving of
  code, by simply tracking outstanding references and "deadlines" and
  emitting an island just-in-time when we're in danger of going out of
  range.

- A rework of the instruction selector driver. This is largely following
  the same algorithm as before, but is cleaned up significantly, in
  particular in the API: the machine backend can ask for an input arg
  and get any of three forms (constant, register, producing
  instruction), indicating it needs the register or can merge the
  constant or producing instruction as appropriate. This new driver
  takes special care to emit constants right at use-sites (and at phi
  inputs), minimizing their live-ranges, and also special-cases the
  "pinned register" to avoid superfluous moves.

Overall, on `bz2.wasm`, the results are:

    wasmtime full run (compile + runtime) of bz2:

    baseline:   9774M insns, 9742M cycles, 3.918s
    w/ changes: 7012M insns, 6888M cycles, 2.958s  (24.5% faster, 28.3% fewer insns)

    clif-util wasm compile bz2:

    baseline:   2633M insns, 3278M cycles, 1.034s
    w/ changes: 2366M insns, 2920M cycles, 0.923s  (10.7% faster, 10.1% fewer insns)

    All numbers are averages of two runs on an Ampere eMAG.
2020-05-16 23:08:22 -07:00
Y-Nak
0393d101b1 Fix typo in peepmatic (#1712) 2020-05-15 09:47:16 -05:00
Nick Fitzgerald
01f46d0238 Merge pull request #1692 from fitzgen/update-to-wasmparser-0.55.0
Update to using `wasmparser` 0.55.0
2020-05-14 14:00:24 -07:00
Chris Fallin
df4028749e Merge pull request #1699 from jgouly/inst-size
Reduce arm64 Inst enum size
2020-05-14 13:44:46 -07:00
Nick Fitzgerald
1a4f3fb2df Update deps and tests for anyref --> externref
* Update to using `wasmparser` 0.55.0
* Update wasmprinter to 0.2.5
* Update `wat` to 1.0.18, and `wast` to 17.0.0
2020-05-14 12:47:37 -07:00
Nick Fitzgerald
3c0b64fef7 Merge pull request #1710 from fitzgen/remove-unused-lhs-member
peepmatic: remove unused member from `PeepholeOptimizer`
2020-05-14 12:03:31 -07:00
Nick Fitzgerald
e9ef8ea3d5 peepmatic: remove unused member from PeepholeOptimizer
This is dead code, left over from an earlier design.
2020-05-14 11:08:59 -07:00
Nick Fitzgerald
fb7a690efc Merge pull request #1687 from fitzgen/sign-extend-immediates
cranelift: Sign extend `Imm64` immediates
2020-05-14 10:09:53 -07:00
Nick Fitzgerald
c093dee79e cranelift: Let lifetime elision elide lifetimes 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
923a73be7b deps: Bump z3 to 0.5.1
This fixes Windows builds.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
8d7ed0fd13 deps: Update wast to 15.0.0
This also updates `wat` in the lockfile so that the SIMD spec tests are passing
again.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
22a070ed4f peepmatic: Apply some review suggestions from @bjorn3 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
fd4f08e75f peepmatic: rustfmt 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
52c6ece5f3 peepmatic: Make peepmatic optional to enable
Rather than outright replacing parts of our existing peephole optimizations
passes, this makes peepmatic an optional cargo feature that can be enabled. This
allows us to take a conservative approach with enabling peepmatic everywhere,
while also allowing us to get it in-tree and make it easier to collaborate on
improving it quickly.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
6e135b3aea peepmatic: Fix a failed assertion due to extra iterations after fixed point
After replacing an instruction with an alias to an earlier value, trying to
further optimize that value is unnecessary, since we've already processed it,
and also was triggering an assertion.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
eb2dab0aa4 peepmatic: Save RHS actions as a boxed slice, not vec
A boxed slice is only two words, while a vec is three words. This should cut
down on the memory size of our automata and improve cache usage.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
210b036320 peepmatic: Represent various id types with u16
These ids end up in the automaton, so making them smaller should give us better
data cache locality and also smaller serialized sizes.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
469104c4d3 peepmatic: Make the results of match operations a smaller and more cache friendly 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
9a1f8038b7 peepmatic: Do not transplant instructions whose results are potentially used elsewhere 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
090d1c2d32 cranelift: Port most of simple_preopt.rs over to the peepmatic DSL
This ports all of the identity, no-op, simplification, and canonicalization
related optimizations over from being hand-coded to the `peepmatic` DSL. This
does not handle the branch-to-branch optimizations or most of the
divide-by-constant optimizations.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
1a7670f964 peepmatic: Introduce the peepmatic-fuzzing crate
This crate contains oracles, generators, and fuzz targets for use with fuzzing
engines (e.g. libFuzzer). This doesn't contain the actual
`libfuzzer_sys::fuzz_target!` definitions (those are in the `peepmatic-fuzz`
crate) but does those definitions are one liners calling out to functions
defined in this crate.
2020-05-14 07:50:58 -07:00
Nick Fitzgerald
2828da1f56 peepmatic: Introduce the peepmatic-test crate
This crate provides testing utilities for `peepmatic`, and a test-only
instruction set we can use to check that various optimizations do or don't
apply.
2020-05-14 07:50:58 -07:00
Nick Fitzgerald
de9fc63009 peepmatic: Introduce the main peepmatic crate
Peepmatic is a DSL for peephole optimizations and compiler for generating
peephole optimizers from them. The user writes a set of optimizations in the
DSL, and then `peepmatic` compiles the set of optimizations into an efficient
peephole optimizer:

```
DSL ----peepmatic----> Peephole Optimizer
```

The generated peephole optimizer has all of its optimizations' left-hand sides
collapsed into a compact automata that makes matching candidate instruction
sequences fast.

The DSL's optimizations may be written by hand or discovered mechanically with a
superoptimizer like [Souper][]. Eventually, `peepmatic` should have a verifier
that ensures that the DSL's optimizations are sound, similar to what [Alive][]
does for LLVM optimizations.

[Souper]: https://github.com/google/souper
[Alive]: https://github.com/AliveToolkit/alive2
2020-05-14 07:50:58 -07:00
Nick Fitzgerald
197a9e88cb peepmatic: Introduce the peepmatic-runtime crate
The `peepmatic-runtime` crate contains everything required to use a
`peepmatic`-generated peephole optimizer.

In short: build times and code size.

If you are just using a peephole optimizer, you shouldn't need the functions
to construct it from scratch from the DSL (and the implied code size and
compilation time), let alone even build it at all. You should just
deserialize an already-built peephole optimizer, and then use it.

That's all that is contained here in this crate.
2020-05-14 07:50:58 -07:00
Nick Fitzgerald
0f03a97475 peepmatic: Introduce the peepmatic-macro crate
This crate provides the derive macros used by `peepmatic`, notable AST-related
derives that enumerate child AST nodes, and operator-related derives that
provide helpers for type checking.
2020-05-14 07:50:58 -07:00
Nick Fitzgerald
c82326a1ae peepmatic: Introduce the peepmatic-automata crate
The `peepmatic-automata` crate builds and queries finite-state transducer
automata.

A transducer is a type of automata that has not only an input that it
accepts or rejects, but also an output. While regular automata check whether
an input string is in the set that the automata accepts, a transducer maps
the input strings to values. A regular automata is sort of a compressed,
immutable set, and a transducer is sort of a compressed, immutable key-value
dictionary. A [trie] compresses a set of strings or map from a string to a
value by sharing prefixes of the input string. Automata and transducers can
compress even better: they can share both prefixes and suffixes. [*Index
1,600,000,000 Keys with Automata and Rust* by Andrew Gallant (aka
burntsushi)][burntsushi-blog-post] is a top-notch introduction.

If you're looking for a general-purpose transducers crate in Rust you're
probably looking for [the `fst` crate][fst-crate]. While this implementation
is fully generic and has no dependencies, its feature set is specific to
`peepmatic`'s needs:

* We need to associate extra data with each state: the match operation to
  evaluate next.

* We can't provide the full input string up front, so this crate must
  support incremental lookups. This is because the peephole optimizer is
  computing the input string incrementally and dynamically: it looks at the
  current state's match operation, evaluates it, and then uses the result as
  the next character of the input string.

* We also support incremental insertion and output when building the
  transducer. This is necessary because we don't want to emit output values
  that bind a match on an optimization's left-hand side's pattern (for
  example) until after we've succeeded in matching it, which might not
  happen until we've reached the n^th state.

* We need to support generic output values. The `fst` crate only supports
  `u64` outputs, while we need to build up an optimization's right-hand side
  instructions.

This implementation is based on [*Direct Construction of Minimal Acyclic
Subsequential Transducers* by Mihov and Maurel][paper]. That means that keys
must be inserted in lexicographic order during construction.

[trie]: https://en.wikipedia.org/wiki/Trie
[burntsushi-blog-post]: https://blog.burntsushi.net/transducers/#ordered-maps
[fst-crate]: https://crates.io/crates/fst
[paper]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3698&rep=rep1&type=pdf
2020-05-14 07:50:58 -07:00
Joey Gouly
f418b7a700 Reduce arm64 Inst enum size
This reduces the size of the Inst enum from 112 bytes to 48 bytes.

Using DHAT on a regex-rs.wasm benchmark, `valgrind --tool=dhat clif-util compile --target aarch64`

The total number of allocated bytes, drops by around 170 MB.
At t-gmax drops by 3 MB.

Using `perf stat clif-util compile --target aarch64`, the instructions count dropped by 0.6%. Cache misses dropped by 6%. Cycles dropped by 2.3%.
2020-05-14 15:45:55 +01:00
Ömer Sinan Ağacan
0592b5a995 Fix umbrella crate URL in docs/index.md (#1694) 2020-05-13 17:05:55 -07:00
Dan Gohman
fb0b9e3ae6 Change proc_exit to unwind the stack rather than exiting the host process. (#1646)
* Remove Cranelift's OutOfBounds trap, which is no longer used.

* Change proc_exit to unwind instead of exit the host process.

This implements the semantics in https://github.com/WebAssembly/WASI/pull/235.

Fixes #783.
Fixes #993.

* Fix exit-status tests on Windows.

* Revert the wiggle changes and re-introduce the wasi-common implementations.

* Move `wasi_proc_exit` into the wasmtime-wasi crate.

* Revert the spec_testsuite change.

* Remove the old proc_exit implementations.

* Make `TrapReason` an implementation detail.

* Allow exit status 2 on Windows too.

* Fix a documentation link.

* Really fix a documentation link.
2020-05-13 15:59:43 -07:00
Cerberuser
f5eab5225f Fixed links in compare-llvm.md (#1690)
Several links were broken by line-breaks between the link caption and
the link itself. This commit fixes them by moving each on its own line.

Co-authored-by: k.bagrov <k.bagrov@g.nsu.ru>
2020-05-13 11:52:36 +02:00
Benjamin Bouvier
5987cf5cda machinst: add a linear-scan checked variant too; 2020-05-13 10:56:32 +02:00
Benjamin Bouvier
07c55fa50f aarch64: suggest a scratch register that's not caller-saved;
If the scratch register is caller-saved, then it might appear in fixed
ranges because of call clobbers. Instead, use a register that's not
caller-saved and has no predefined use in the ABI.
2020-05-13 10:56:32 +02:00
Nick Fitzgerald
9b867b09c7 cranelift: Sign extend Imm64 immediates
When an instruction has an `Imm64` immediate, but operates on values of a
narrower width, we need to sign extend the value.

Fixes #1095
2020-05-12 15:44:48 -07:00
Julian Seward
94190d5724 cranelift/reader/src/parser.rs: fn parse_inst_resuts: produce the results as a
SmallVec<[Value; 1]>, not as a Vec<Value>.  This isn't a useful change for any
non-developer use of Cranelift, but it does significantly reduce the amount of
allocation "noise" seen when tuning the new backend pipeline as driven by
clif-util reading .clif files.  In one case the number of malloc calls
declined by about 20% with this change.
2020-05-11 12:27:15 +02:00
Chris Fallin
ee2f861fdd Merge pull request #1674 from cfallin/machinst-reg-universe-opt
MachInst backend: don't reallocate RealRegUniverses for each function compilation.
2020-05-09 14:10:26 -07:00
whitequark
4ec16fa057 Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ.
Co-authored-by: iximeow <git@iximeow.net>
2020-05-09 03:28:19 -07:00
whitequark
2331403741 Extend X86 ABI to cover stack overflow checking on X86-32.
In stark contrast with every reasonable architecture, X86-32 does not
pass any parameters in registers. Because of that we have to resort
to reading arguments from stack without being able to use the stack
slot machinery.

(This wouldn't have been avoidable even by pinning a register because
there is a trampoline in wasmtime with the C ABI that Cranelift needs
to be able to call.)
2020-05-09 03:27:06 -07:00
Chris Fallin
17cef9140c MachInst backend: don't reallocate RealRegUniverses for each function
compilation.

This saves ~0.14% instruction count, ~0.18% allocated bytes, and ~1.5%
allocated blocks on a `clif-util wasm` compilation of `bz2.wasm` for
aarch64.
2020-05-08 15:35:16 -07:00
Julian Seward
0bc0503f3f Add a transformation pass which removes phi nodes to which it can demonstrate
that only one value ever flows.  Has been observed to improve generated code
run times by up to 8%.  Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.
2020-05-08 09:41:16 +02:00
Andrew Brown
b65bd1c8a2 Add an interpret command to clif-util 2020-05-07 16:51:09 -07:00
Andrew Brown
9cf90b836b Move iterate_files to the utils module 2020-05-07 16:51:09 -07:00
Andrew Brown
b26ca3cbdd Add test interpret support to filetests 2020-05-07 16:51:09 -07:00
Andrew Brown
8b18fc5937 Add a CLIF interpreter
This is an incomplete version of a Cranelift IR interpreter: only a small subset of instructions are implemented and (known) missing parts are marked with TODO or FIXME.
2020-05-07 16:51:09 -07:00
Andrew Brown
b4238229c2 Cast DataValues to and from native types
Also, returns a `Result` in the `RunCommand::run` helper.
2020-05-07 16:51:09 -07:00
Benjamin Bouvier
528d3c1355 machinst: Steal the used/defs Sets when emitting a call in ABICall; 2020-05-07 12:24:02 +02:00