Commit Graph

215 Commits

Author SHA1 Message Date
Dan Gohman
ec6755512f Remove maintenance badges from the Cargo.toml files. (#6286)
Several of these badges were out of date, with some crates in wide production
use marked as "experimental". Insted of trying to keep them up to date, just
remove them, since they are [no longer displayed on crates.io].

[no longer displayed on crates.io]: https://doc.rust-lang.org/cargo/reference/manifest.html#the-badges-section
2023-04-26 01:33:09 +00:00
Trevor Elliott
d9e27c5441 Bump regalloc2 to 0.7.0 (#6237)
* Bump RA2 to 0.7.0

* Certify the RA2 update

* Import the rustc-hash audit

* Updates for regalloc2

prtest:full

* Update tests
2023-04-21 00:47:58 +00:00
Remo Senekowitsch
7eb8914090 Chaos mode MVP: Skip branch optimization in MachBuffer (#6039)
* fuzz: Add chaos mode control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: Skip branch optimization with chaos mode

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: Rename chaos engine -> control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* chaos mode: refactoring ControlPlane to be passed through the call stack by reference

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Remo Senekowitsch <contact@remsle.dev>

* fuzz: annotate chaos todos

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: cleanup control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: remove control plane from compiler context

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: move control plane into emit state

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: fix remaining compiler errors

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fix tests

* refactor emission state ctrl plane accessors

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* centralize conditional compilation of chaos mode

Also cleanup a few straggling dependencies on cranelift-control
that aren't needed anymore.

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* add cranelift-control to published crates

prtest:full

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* add cranelift-control to public crates

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

---------

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
Co-authored-by: Remo Senekowitsch <contact@remsle.dev>
2023-04-05 19:28:46 +00:00
wasmtime-publish
bf741955f0 Bump Wasmtime to 9.0.0 (#6143)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-04-05 17:06:36 +00:00
Qinxuan Chen
e2dcb19099 deps: update criterion dependency to v0.4.0 (#5935) 2023-03-07 15:21:48 +00:00
wasmtime-publish
354bc48015 Bump Wasmtime to 8.0.0 (#5932)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-03-06 15:08:16 +00:00
Trevor Elliott
80c147d9c0 Rework br_table to use BlockCall (#5731)
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
2023-02-16 09:23:27 -08:00
Chris Fallin
c15c4ed23d Cranelift: upgrade to regalloc2 0.6.1. (#5799)
* Cranelift: upgrade to regalloc2 0.6.1.

Fixes #5791 by pulling in bytecodealliance/regalloc2#113.

* Add cargo-vet entry for regalloc2 0.6.1.
2023-02-16 03:22:58 +00:00
Trevor Elliott
f04decc4a1 Use capstone to validate precise-output tests (#5780)
Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.
2023-02-15 16:35:10 -08:00
Alphyr
cb150d37ce Update dependencies (#5513) 2023-02-14 19:45:15 +00:00
Trevor Elliott
116e5a665f Bump regalloc2 to 0.6.0 (#5742)
* Bump regalloc2
* Certify regalloc2 0.6.0
2023-02-07 15:57:49 -08:00
wasmtime-publish
482f541101 Bump Wasmtime to 7.0.0 (#5712)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-02-06 09:10:19 -06:00
wasmtime-publish
7bfbec1b57 Bump Wasmtime to 6.0.0 (#5521)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-01-05 09:46:01 -06:00
Nick Fitzgerald
c0b587ac5f Remove heaps from core Cranelift, push them into cranelift-wasm (#5386)
* cranelift-wasm: translate Wasm loads into lower-level CLIF operations

Rather than using `heap_{load,store,addr}`.

* cranelift: Remove the `heap_{addr,load,store}` instructions

These are now legalized in the `cranelift-wasm` frontend.

* cranelift: Remove the `ir::Heap` entity from CLIF

* Port basic memory operation tests to .wat filetests

* Remove test for verifying CLIF heaps

* Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test

* Remove `heap_addr` from readonly.clif test

* Remove `heap_addr` from `table_addr.clif` test

* Remove `heap_addr` from the simd-fvpromote_low.clif test

* Remove `heap_addr` from simd-fvdemote.clif test

* Remove `heap_addr` from the load-op-store.clif test

* Remove the CLIF heap runtest

* Remove `heap_addr` from the global_value.clif test

* Remove `heap_addr` from fpromote.clif runtests

* Remove `heap_addr` from fdemote.clif runtests

* Remove `heap_addr` from memory.clif parser test

* Remove `heap_addr` from reject_load_readonly.clif test

* Remove `heap_addr` from reject_load_notrap.clif test

* Remove `heap_addr` from load_readonly_notrap.clif test

* Remove `static-heap-without-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` to generating
`.wat` tests.

* Remove `static-heap-with-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove more heap tests

These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove `heap_addr` from `simple-alias.clif` test

* Remove `heap_addr` from partial-redundancy.clif test

* Remove `heap_addr` from multiple-blocks.clif test

* Remove `heap_addr` from fence.clif test

* Remove `heap_addr` from extends.clif test

* Remove runtests that rely on heaps

Heaps are not a thing in CLIF or the interpreter anymore

* Add generated load/store `.wat` tests

* Enable memory-related wasm features in `.wat` tests

* Remove CLIF heap from fcmp-mem-bug.clif test

* Add a mode for compiling `.wat` all the way to assembly in filetests

* Also generate WAT to assembly tests in `make-load-store-tests.sh`

* cargo fmt

* Reinstate `f{de,pro}mote.clif` tests without the heap bits

* Remove undefined doc link

* Remove outdated SVG and dot file from docs

* Add docs about `None` returns for base address computation helpers

* Factor out `env.heap_access_spectre_mitigation()` to a local

* Expand docs for `FuncEnvironment::heaps` trait method

* Restore f{de,pro}mote+load clif runtests with stack memory
2022-12-15 00:26:45 +00:00
Trevor Elliott
ab6c8e1a1a Bump regalloc2 to version 0.5.1 (#5387)
Bump regalloc2 to version 0.5.1.
2022-12-06 15:38:03 -08:00
Chris Fallin
f980defe17 egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

* Review feedback.

* Review feedback.

* Review feedback.

* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00
wasmtime-publish
a28d4d3c89 Bump Wasmtime to 5.0.0 (#5372)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-12-05 08:38:57 -06:00
Trevor Elliott
f138fc0ed3 Bump regalloc2 to 0.5.0 (#5345)
* Bump the regalloc2 dependency to 0.5.0
* Replace preg_set_from_machine_env with PRegSet::from
* Vet the regalloc2 update
2022-11-29 11:25:35 -08:00
Jamey Sharp
ff5abfd993 cranelift-isle: Minor error-handling cleanups (#5338)
- Remove remaining references to Miette
- Borrow implementation of `line_starts` from codespan-reporting
- Clean up a use of `Result` that no longer conflicts with a local
  definition
- When printing plain errors, add a blank line between errors for
  readability
2022-11-29 03:07:05 +00:00
Jamey Sharp
044b57f334 cranelift-isle: Rewrite error reporting (#5318)
There were several issues with ISLE's existing error reporting
implementation.

- When using Miette for more readable error reports, it would panic if
  errors were reported from multiple files in the same run.
- Miette is pretty heavy-weight for what we're doing, with a lot of
  dependencies.
- The `Error::Errors` enum variant led to normalization steps in many
  places, to avoid using that variant to represent a single error.

This commit:
- replaces Miette with codespan-reporting
- gets rid of a bunch of cargo-vet exemptions
- replaces the `Error::Errors` variant with a new `Errors` type
- removes source info from `Error` variants so they're easy to construct
- adds source info only when formatting `Errors`
- formats `Errors` with a custom `Debug` impl
- shares common code between ISLE's callers, islec and cranelift-codegen
- includes a source snippet even with fancy-errors disabled

I tried to make this a series of smaller commits but I couldn't find any
good split points; everything was too entangled with everything else.
2022-11-23 14:20:48 -08:00
wasmtime-publish
08ef518c95 Bump Wasmtime to 4.0.0 (#5209)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-11-06 13:32:34 -06:00
Nick Fitzgerald
3c496d8cdc Update regalloc2 to v0.4.2 (#5169) 2022-11-01 11:18:19 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00
Benjamin Bouvier
d68ca3711b Upgrade sha2 to 0.10.2 in wasmtime (#4749) 2022-10-10 09:40:40 +00:00
wasmtime-publish
a9be4a9b56 Bump Wasmtime to 3.0.0 (#5016)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-10-05 09:30:55 -05:00
yuyang-ok
cdecc858b4 add riscv64 backend for cranelift. (#4271)
Add a RISC-V 64 (`riscv64`, RV64GC) backend.

Co-authored-by: yuyang <756445638@qq.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
2022-09-27 17:30:31 -07:00
Alex Crichton
7b311004b5 Leverage Cargo's workspace inheritance feature (#4905)
* Leverage Cargo's workspace inheritance feature

This commit is an attempt to reduce the complexity of the Cargo
manifests in this repository with Cargo's workspace-inheritance feature
becoming stable in Rust 1.64.0. This feature allows specifying fields in
the root workspace `Cargo.toml` which are then reused throughout the
workspace. For example this PR shares definitions such as:

* All of the Wasmtime-family of crates now use `version.workspace =
  true` to have a single location which defines the version number.
* All crates use `edition.workspace = true` to have one default edition
  for the entire workspace.
* Common dependencies are listed in `[workspace.dependencies]` to avoid
  typing the same version number in a lot of different places (e.g. the
  `wasmparser = "0.89.0"` is now in just one spot.

Currently the workspace-inheritance feature doesn't allow having two
different versions to inherit, so all of the Cranelift-family of crates
still manually specify their version. The inter-crate dependencies,
however, are shared amongst the root workspace.

This feature can be seen as a method of "preprocessing" of sorts for
Cargo manifests. This will help us develop Wasmtime but shouldn't have
any actual impact on the published artifacts -- everything's dependency
lists are still the same.

* Fix wasi-crypto tests
2022-09-26 11:30:01 -05:00
Jamey Sharp
bd870a9d6c Shrink all SmallVecs by 8 bytes (#4951)
We weren't using the "union" cargo feature for the smallvec crate, which
reduces the size of a SmallVec by one machine word. This feature
requires Rust 1.49 but we already require much newer versions.

When using Wasmtime to compile pulldown-cmark from Sightglass, this
saves a decent amount of memory allocations and writes. According to
`valgrind --tool=dhat`:

- 6.2MiB (3.69%) less memory allocated over the program's lifetime
- 0.5MiB (4.13%) less memory allocated at maximum heap size
- 5.5MiB (1.88%) fewer bytes written to
- 0.44% fewer instructions executed

Sightglass reports a statistically significant runtime improvement too:

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 24379323.60 ± 20051394.04 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so!

  [227506364 355007998.78 423280514] main-be690a468.so
  [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 360151622.56 ± 278294316.90 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so!

  [8709162212 8911001926.44 9535111576] main-be690a468.so
  [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 6936570.28 ± 6897696.38 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so!

  [155810934 175260571.20 234737344] main-be690a468.so
  [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so
2022-09-23 16:32:13 -07:00
Chris Fallin
19bd8687ac Upgrade to regalloc2 0.4.1. (#4945)
* Upgrade to regalloc2 0.4.1.

Incorporates bytecodealliance/regalloc2#85, which fixes a fuzzbug
related to constraints and liverange splits.

* Add audit of regalloc2 upgrade.
2022-09-23 00:00:06 +00:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Alex Crichton
65930640f8 Bump Wasmtime to 2.0.0 (#4874)
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
2022-09-06 13:49:56 -05:00
Nick Fitzgerald
ae7688059d Cranelift: Use bump allocation in remove_constant_phis pass (#4710)
* Cranelift: Use bump allocation in `remove_constant_phis` pass

This makes compilation 2-6% faster for Sightglass's bz2 benchmark:

```
compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 7290648.36 ± 4245152.07 (confidence = 99%)

  bump.so is 1.02x to 1.06x faster than main.so!

  [166388177 183238542.98 214732518] bump.so
  [172836648 190529191.34 217514271] main.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [182220055 225793551.12 277857575] bump.so
  [193212613 227784078.61 277175335] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3848442474 4295214144.37 4665127241] bump.so
  [3969505457 4262415290.10 4563869974] main.so
```

* Add audit for `bumpalo`

* Add an audit of `arrayvec` version 0.7.2

* Remove unnecessary `collect` into `Vec`

I wasn't able to measure any perf difference here, but its nice to do anyways.

* Use a `SecondaryMap` for keeping track of summaries
2022-08-15 21:36:01 +00:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
wasmtime-publish
412fa04911 Bump Wasmtime to 0.41.0 (#4620)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-08-04 20:02:19 -05:00
Jamey Sharp
f69acd6187 Upgrade regalloc2 -> 0.3.2 (#4603)
Includes a modest improvement in memory usage and performance by
removing analysis that was only used during fuzzing.
2022-08-04 00:06:13 +00:00
Benjamin Bouvier
8d0224341c cranelift: Introduce a feature to enable trace logs (#4484)
* Don't use `log::trace` directly but a feature-enabled `trace` macro
* Don't emit disassembly based on the log level
2022-08-01 11:19:15 +02:00
Chris Fallin
9c72a0566e Upgrade to regalloc2 0.3.1. (#4483)
This includes some changes from @bnjbvr to the trace-logging/annotation
to reduce overhead when logging is enabled but only non-RA2 subsystems
are at `Trace` level.
2022-07-21 01:22:39 +00:00
Alex Crichton
b9e63fe77a Update miette dependency to 5.1 (#4412)
Just some dependency gardening, no other external motivation.
2022-07-07 22:20:09 +00:00
Alex Crichton
9ae060a12a Update some dependency versions used by Wasmtime (#4405)
No major motivation here, mostly just dependency gardening.
2022-07-07 18:47:39 +00:00
wasmtime-publish
7c428bbd62 Bump Wasmtime to 0.40.0 (#4378)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-07-05 09:10:52 -05:00
Chris Fallin
b2e28b917a Cranelift: update to latest regalloc2: (#4324)
- Handle call instructions' clobbers with the clobbers API, using RA2's
  clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers
  list;

- Pull in changes from bytecodealliance/regalloc2#59 for much more sane
  edge-case behavior w.r.t. liverange splitting.
2022-06-28 09:01:59 -07:00
Chris Fallin
0d829a57ee Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. (#4335)
* Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60.

* Update RELEASES.md.

* Update two compile tests based on slightly shifting regalloc output.
2022-06-27 15:58:54 -05:00
wasmtime-publish
55946704cb Bump Wasmtime to 0.39.0 (#4225)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-06-06 09:12:47 -05:00
Chris Fallin
ae2c84205f Upgrade to regalloc2 v0.2.2. (#4222)
Pulls in an improvement to spillslot allocation
(bytecodealliance/regalloc2#56).
2022-06-03 17:12:32 -07:00
Chris Fallin
8f61eb9341 Upgrade to regalloc2 version 0.2.1. (#4199)
This resolves an edge-case where mul.i128 with an input that continues
to be live after the instruction could cause an invalid regalloc
constraint (basically, the regalloc did not previously support an
instruction use and def both being constrained to the same physical reg;
and the "mul" variant used for mul.i128 on x64 was the only instance of
such operands in Cranelift).

Causes two extra move instructions in the mul.i128 filetest, but that's
the price to pay for the slightly more general (works in all cases)
handling of the constraints.
2022-06-01 13:26:20 -07:00
Chris Fallin
b830c3cf93 Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182)
RA2 recently removed the need for a dedicated scratch register for
cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive
performance impact on function bodies that were register-constrained, as
it means that one more register is available. In Sightglass, I measured
+5-8% on `blake3-scalar`, at least among current benchmarks.
2022-05-23 12:51:04 -07:00
Chris Fallin
02d5edc591 Upgrade to regalloc2 0.1.3. (#4157)
* Upgrade to regalloc2 0.1.3.

This pulls in bytecodealliance/regalloc2#49, which slightly improves
codegen in some cases where a safepoint (for reference-typed values)
occurs in the same liverange as a register-constrained use. For
example, in bytecodealliance/wasmtime#3785, an extra move instruction
appeared and a callee-save register was used (necessitating a more
expensive prologue) because of suboptimal splitting heuristics, which
this PR fixes. The updated RA2 heuristics appear to have no measured
downsides in existing benchmarks and improve the manually-observed
codegen issue.

* Update filetests where regalloc2 improvement altered behavior with reftypes.
2022-05-18 11:48:40 -07:00
Chris Fallin
5d671952ee Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143)
This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to
invoke the ISLE DSL compiler on every compilation, rather than only
when the user specifies a special "rebuild ISLE" feature.

The main benefit of this change is that it vastly simplifies the mental
model required of developers, and removes a bunch of failure modes
we have tried to work around in other ways. There is now just one
"source of truth", the ISLE source itself, in the repository, and so there
is no need to understand a special "rebuild" step and how to handle
merge errors. There is no special process needed to develop the compiler
when modifying the DSL. And there is no "noise" in the git history produced
by constantly-regenerated files.

The two main downsides we discussed in #4066 are:
- Compile time could increase, by adding more to the "meta" step before the main build;
- It becomes less obvious where the source definitions are (everything becomes
  more "magic"), which makes exploration and debugging harder.

This PR addresses each of these concerns:

1. To maintain reasonable compile time, it includes work to cut down the
   dependencies of the `cranelift-isle` crate to *nothing* (only the Rust stdlib),
   in the default build. It does this by putting the error-reporting bits
   (`miette` crate) under an optional feature, and the logging (`log` crate) under
   a feature-controlled macro, and manually writing an `Error` impl rather than
   using `thiserror`. This completely avoids proc macros and the `syn` build slowness.

   The user can still get nice errors out of `miette`: this is enabled by specifying
   a Cargo feature `--features isle-errors`.

2. To allow the user to optionally inspect the generated source, which nominally
   lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree`
   that, as implied by the name, moves the target for ISLE generated source into
   the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable
   to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration
   currently works as well. To prevent surprises, if the feature is *not* specified, the
   build fails if this directory exists.
2022-05-11 22:25:24 -07:00
wasmtime-publish
9a6854456d Bump Wasmtime to 0.38.0 (#4103)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-05-05 13:43:02 -05:00
Alex Crichton
871a9d93f2 Update some dependencies in Cargo.lock (#4081)
* Run a `cargo update` over our dependencies

This'll notably fix a `cargo audit` error where we have a pinned version
of the `regex` crate which has a CVE assigned to it.

* Update to `object` and `hashbrown` crates

Prune some duplicate versions showing up from the previous `cargo update`
2022-04-28 11:12:58 -05:00