Commit Graph

37 Commits

Author SHA1 Message Date
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Alex Crichton
97894bc65e Add initial support for fused adapter trampolines (#4501)
* Add initial support for fused adapter trampolines

This commit lands a significant new piece of functionality to Wasmtime's
implementation of the component model in the form of the implementation
of fused adapter trampolines. Internally within a component core wasm
modules can communicate with each other by having their exports
`canon lift`'d to get `canon lower`'d into a different component. This
signifies that two components are communicating through a statically
known interface via the canonical ABI at this time. Previously Wasmtime
was able to identify that this communication was happening but it simply
panicked with `unimplemented!` upon seeing it. This commit is the
beginning of filling out this panic location with an actual
implementation.

The implementation route chosen here for fused adapters is to use a
WebAssembly module itself for the implementation. This means that, at
compile time of a component, Wasmtime is generating core WebAssembly
modules which then get recursively compiled within Wasmtime as well. The
choice to use WebAssembly itself as the implementation of fused adapters
stems from a few motivations:

* This does not represent a significant increase in the "trusted
  compiler base" of Wasmtime. Getting the Wasm -> CLIF translation
  correct once is hard enough much less for an entirely different IR to
  CLIF. By generating WebAssembly no new interactions with Cranelift are
  added which drastically reduces the possibilities for mistakes.

* Using WebAssembly means that component adapters are insulated from
  miscompilations and mistakes. If something goes wrong it's defined
  well within the WebAssembly specification how it goes wrong and what
  happens as a result. This means that the "blast zone" for a wrong
  adapter is the component instance but not the entire host itself.
  Accesses to linear memory are guaranteed to be in-bounds and otherwise
  handled via well-defined traps.

* A fully-finished fused adapter compiler is expected to be a
  significant and quite complex component of Wasmtime. Functionality
  along these lines is expected to be needed for Web-based polyfills of
  the component model and by using core WebAssembly it provides the
  opportunity to share code between Wasmtime and these polyfills for the
  component model.

* Finally the runtime implementation of managing WebAssembly modules is
  already implemented and quite easy to integrate with, so representing
  fused adapters with WebAssembly results in very little extra support
  necessary for the runtime implementation of instantiating and managing
  a component.

The compiler added in this commit is dubbed Wasmtime's Fused Adapter
Compiler of Trampolines (FACT) because who doesn't like deriving a name
from an acronym. Currently the trampoline compiler is limited in its
support for interface types and only supports a few primitives. I plan
on filing future PRs to flesh out the support here for all the variants
of `InterfaceType`. For now this PR is primarily focused on all of the
other infrastructure for the addition of a trampoline compiler.

With the choice to use core WebAssembly to implement fused adapters it
means that adapters need to be inserted into a module. Unfortunately
adapters cannot all go into a single WebAssembly module because adapters
themselves have dependencies which may be provided transitively through
instances that were instantiated with other adapters. This means that a
significant chunk of this PR (`adapt.rs`) is dedicated to determining
precisely which adapters go into precisely which adapter modules. This
partitioning process attempts to make large modules wherever it can to
cut down on core wasm instantiations but is likely not optimal as
it's just a simple heuristic today.

With all of this added together it's now possible to start writing
`*.wast` tests that internally have adapted modules communicating with
one another. A `fused.wast` test suite was added as part of this PR
which is the beginning of tests for the support of the fused adapter
compiler added in this PR. Currently this is primarily testing some
various topologies of adapters along with direct/indirect modes. This
will grow many more tests over time as more types are supported.

Overall I'm not 100% satisfied with the testing story of this PR. When a
test fails it's very difficult to debug since everything is written in
the text format of WebAssembly meaning there's no "conveniences" to
print out the state of the world when things go wrong and easily debug.
I think this will become even more apparent as more tests are written
for more types in subsequent PRs. At this time though I know of no
better alternative other than leaning pretty heavily on fuzz-testing to
ensure this is all exercised.

* Fix an unused field warning

* Fix tests in `wasmtime-runtime`

* Add some more tests for compiled trampolines

* Remap exports when injecting adapters

The exports of a component were accidentally left unmapped which meant
that they indexed the instance indexes pre-adapter module insertion.

* Fix typo

* Rebase conflicts
2022-07-25 23:13:26 +00:00
Chris Fallin
00f357c028 Cranelift: support 14-bit Type index with some bitpacking. (#4269)
* Cranelift: make `ir::Type` a `u16`.

* Cranelift: pack ValueData back into 64 bits.

After extending `Type` to a `u16`, `ValueData` became 12 bytes rather
than 8. This packs it back down to 8 bytes (64 bits) by stealing two
bits from the `Type` for the enum discriminant (leaving 14 bits for the
type itself).

Performance comparison (3-way between original (`ty-u8`), 16-bit `Type`
(`ty-u16`), and this PR (`ty-packed`)):

```
~/work/sightglass% target/release/sightglass-cli benchmark \
    -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \
    --iterations-per-process 10 --processes 2 \
    benchmarks-next/spidermonkey/benchmark.wasm

compilation
  benchmarks-next/spidermonkey/benchmark.wasm
    cycles
      [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so
      [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so
      [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so
    nanoseconds
      [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so
      [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so
      [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so
```

So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses
performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0%
cost (5.683s -> 5.723s). That's still not great, and we can likely do better,
but it's a start.

* Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.
2022-07-05 14:51:02 -07:00
Nick Fitzgerald
d377b665c6 Initial ISLE integration with the x64 backend
On the build side, this commit introduces two things:

1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.

2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.

Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.

Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.

In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:

    dst = src1 op src2

Rather than only the typical x86-64 2-operand form:

    dst = dst op src

This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.

("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)

There are two motivations for this change:

1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
   lowering to translate a CLIF expression that evaluates to some value into a
   `MachInst` expression that evaluates to the same value. We want both the
   lowering itself and the resulting `MachInst` to be pure and referentially
   transparent. This is both a nice paradigm for compiler writers that are
   authoring and maintaining lowering rules and is a prerequisite to any sort of
   formal verification of our lowering rules in the future.

2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
   be in SSA form.
2021-10-12 17:11:58 -07:00
katelyn martin
1b9ff6b181 🥓 rust fmt 2021-07-16 16:41:44 -04:00
katelyn martin
87726882dd add test cases for new entity_impl! form 2021-07-16 14:32:00 -04:00
katelyn martin
2e8f7bacf8 🌈 provide a new form of entity_impl!
N.B. There is likely still some light refactoring to do, so that we are
not duplicating so much code. We should also additionally introduce some
test coverage.
2021-07-16 14:32:00 -04:00
katelyn martin
d3277c005e 🔭 add simple entity tests 2021-07-16 12:40:39 -04:00
Amanieu d'Antras
76664fc73e Optimize codegen for SecondaryMap indexing (#2940)
Moves the slow path which resizes the vector out-of-line. The actual
indexing is also done in the out-of-line path which avoids the need for
a second bounds check in the fast path after a potential resize.
2021-05-27 11:09:15 -05:00
Alex Crichton
c77ea0c5c7 Add some more #[inline] annotations for trivial functions (#2817)
Looking at some profiles these or their related functions were all
showing up, so this commit adds `#[inline]` to allow cross-crate
inlining by default.
2021-04-08 12:23:54 -05:00
Peter Huene
ad9fa11d48 Code review feedback.
* Remove `once-cell` dependency.
* Remove function address `BTreeMap` from `CompiledModule` in favor of binary
  searching finished functions directly.
* Use `with_capacity` when populating `CompiledModule` finished functions and
  trampolines.
2021-04-07 16:37:04 -07:00
Amanieu d'Antras
9b1693aa72 Add EntityList::truncate 2021-03-08 18:21:02 +00:00
Amanieu d'Antras
65d0bc58d2 Add EntityList::deep_clone 2021-03-08 18:20:46 +00:00
Amanieu d'Antras
b2abe74f25 Improve codegen for remove and swap_remove on EntityList 2021-03-08 18:20:05 +00:00
bjorn3
2fc964ea35 Add serde serialization support for the full clif ir 2021-02-18 11:27:02 +01:00
Amanieu d'Antras
78f312799e Optimize EntityList::extend and add EntityList::from_iter 2021-01-29 14:09:52 +01:00
Andrew Brown
c9e8889d47 Update clippy annotation to use latest version (#2375) 2020-11-09 09:24:59 -06:00
CohenArthur
6849dc40bd Fix typo in generated documentation for entity! (#2176)
* entity: Fix typo in generated documentation

The same function documentation was used for `from_u32()` and `as_u32()`
while their behaviour is different
2020-08-31 10:40:24 +02:00
Alex Crichton
65eaca35dd Refactor where results of compilation are stored (#2086)
* Refactor where results of compilation are stored

This commit refactors the internals of compilation in Wasmtime to change
where results of individual function compilation are stored. Previously
compilation resulted in many maps being returned, and compilation
results generally held all these maps together. This commit instead
switches this to have all metadata stored in a `CompiledFunction`
instead of having a separate map for each item that can be stored.

The motivation for this is primarily to help out with future
module-linking-related PRs. What exactly "module level" is depends on
how we interpret modules and how many modules are in play, so it's a bit
easier for operations in wasmtime to work at the function level where
possible. This means that we don't have to pass around multiple
different maps and a function index, but instead just one map or just
one entry representing a compiled function.

Additionally this change updates where the parallelism of compilation
happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`.
This is another goal where `wasmtime-jit` will have more knowledge about
module-level pieces with module linking in play. User-facing-wise this
should be the same in terms of parallel compilation, though.

The ultimate goal of this refactoring is to make it easier for the
results of compilation to actually be a set of wasm modules. This means
we won't be able to have a map-per-metadata where the primary key is the
function index, because there will be many modules within one "object
file".

* Don't clear out fields, just don't store them

Persist a smaller set of fields in `CompilationArtifacts` instead of
trying to clear fields out and dynamically not accessing them.
2020-08-03 12:20:51 -05:00
Ömer Sinan Ağacan
c619136752 Remove Eq bound of ReservedValue trait
A full Eq implementation is no needed for ReservedValue, as we only need
to check whether a value is the reserved one. For entities (defined with
`entity_impl!`) this doesn't make much difference, but for more
complicated types this avoids generating redundant `Eq`s.
2020-05-26 10:27:55 +02:00
Ryan Hunt
832666c45e Mass rename Ebb and relatives to Block (#1365)
* Manually rename BasicBlock to BlockPredecessor

BasicBlock is a pair of (Ebb, Inst) that is used to represent the
basic block subcomponent of an Ebb that is a predecessor to an Ebb.

Eventually we will be able to remove this struct, but for now it
makes sense to give it a non-conflicting name so that we can start
to transition Ebb to represent a basic block.

I have not updated any comments that refer to BasicBlock, as
eventually we will remove BlockPredecessor and replace with Block,
which is a basic block, so the comments will become correct.

* Manually rename SSABuilder block types to avoid conflict

SSABuilder has its own Block and BlockData types. These along with
associated identifier will cause conflicts in a later commit, so
they are renamed to be more verbose here.

* Automatically rename 'Ebb' to 'Block' in *.rs

* Automatically rename 'EBB' to 'block' in *.rs

* Automatically rename 'ebb' to 'block' in *.rs

* Automatically rename 'extended basic block' to 'basic block' in *.rs

* Automatically rename 'an basic block' to 'a basic block' in *.rs

* Manually update comment for `Block`

`Block`'s wikipedia article required an update.

* Automatically rename 'an `Block`' to 'a `Block`' in *.rs

* Automatically rename 'extended_basic_block' to 'basic_block' in *.rs

* Automatically rename 'ebb' to 'block' in *.clif

* Manually rename clif constant that contains 'ebb' as substring to avoid conflict

* Automatically rename filecheck uses of 'EBB' to 'BB'

'regex: EBB' -> 'regex: BB'
'$EBB' -> '$BB'

* Automatically rename 'EBB' 'Ebb' to 'block' in *.clif

* Automatically rename 'an block' to 'a block' in *.clif

* Fix broken testcase when function name length increases

Test function names are limited to 16 characters. This causes
the new longer name to be truncated and fail a filecheck test. An
outdated comment was also fixed.
2020-02-07 10:46:47 -06:00
llogiq
0d8f8bc71f Fix some clippy warnings (#1277) 2019-12-07 09:47:43 -08:00
Peter Huene
9f506692c2 Fix clippy warnings.
This commit fixes the current set of (stable) clippy warnings in the repo.
2019-10-24 17:20:12 -07:00
bjorn3
bb8fa40ef0 Rustfmt 2019-10-02 11:50:44 -07:00
bjorn3
a114423d0a Remove std feature from cranelift-entity 2019-10-02 11:50:44 -07:00
Joshua Nelson
a1f6457e8a Allow building without std (#1069)
Closes https://github.com/CraneStation/cranelift/issues/1067
2019-09-26 18:00:03 +02:00
Erin Power
947fce194e Replaced instances of SparseSet with EntitySet 2019-09-23 17:20:25 +02:00
julian-seward1
92a01c816d Minor speedup tuning for SecondaryMap (#1020)
The `SecondaryMap` abstraction -- basically, resize-on-demand arrays with a
default value -- is very hot in Cranelift.  This small patch is the result of
many profiling runs.  It makes two changes:

* `fn index_mut` is changed to be `#[inline(always)]`, based on profile data.

* `fn index` and `fn index_mut` call `self.elems.resize()` directly, rather
  than via `self.resize()`.  The point of this is not to improve performance.
  Rather, it ensures that the public functions for `SecondaryMap` do not call
  each other.  When public interface functions call each other, it becomes
  difficult to interpret profiling results, because it's harder to see what
  fraction of costs for `SecondaryMap` as a whole come from outside the
  module, and what fraction is the result of "internal" calls to the external
  interface.

The overall result, for wasm_lua_binarytrees, is a 1.4% reduction in
instruction count for the compiler, and a 2.2% reduction in loads/stores.
2019-09-12 11:09:35 +02:00
Sean Stangl
4b085b9cf7 Avoid unnecessary reallocations in domtree::with_function() (#1011) 2019-09-10 08:18:06 -06:00
Pat Hickey
89d741f8ae upgrade to target-lexicon 0.8.0
* the target-lexicon crate no longer has or needs the std feature
  in cargo, so we can delete all default-features=false, any mentions
  of its std feature, and the nostd configs in many lib.rs files
* the representation of arm architectures has changed, so some case
  statements needed refactoring
2019-09-04 15:12:17 -07:00
julian-seward1
b8fb52446c Cranelift: implement redundant fill removal on tree-shaped CFG regions. Mozilla bug 1570584. (#906) 2019-08-25 19:37:34 +02:00
Artur Jamro
d3815a0399 Implement serde and equality traits for SecondaryMap 2019-08-22 15:54:10 -07:00
Artur Jamro
09ec0d4149 Derive Hash for some types 2019-07-27 06:23:03 -05:00
Artur Jamro
7a72ffefdd Add serde derive to PrimaryMap 2019-07-19 15:56:29 -07:00
Artur Jamro
9e884b4433 Add support for some serde serialization (#847)
* Add support for some serde serialization
2019-07-12 15:30:50 -07:00
Benjamin Bouvier
d7d48d5cc6 Add the dyn keyword before trait objects; 2019-06-24 11:42:26 +02:00
lazypassion
747ad3c4c5 moved crates in lib/ to src/, renamed crates, modified some files' text (#660)
moved crates in lib/ to src/, renamed crates, modified some files' text (#660)
2019-01-28 15:56:54 -08:00