Commit Graph

159 Commits

Author SHA1 Message Date
Alex Crichton
c83dee07b7 Fix a memory reservation bug in reserve_modules
This method attempted to reserve space in the `results` list of final
modules. Unfortunately `results.reserve(nmodules)` isn't enough here
because this can be called many times before a module is actually
finished and pushed onto the vector. The attempted logic to work around
this was buggy, however, and would simply trigger geometric growth on
every single reservation because it erroneously assumed that a
reservation would be exactly met.

This is fixed by avoiding looking at the vector's capacity and instead
keeping track of modules-to-be in a side field. This is the incremented
and passed to `reserve` as it represents the number of modules that will
eventually make their way into the result vector.
2020-12-14 07:57:47 -08:00
Alex Crichton
f003388ec7 Implement imported/exported modules/instances (#2461)
* Implement imported/exported modules/instances

This commit implements the final piece of the module linking proposal
which is to flesh out the support for importing/exporting instances and
modules. This ended up having a few changes:

* Two more `PrimaryMap` instances are now stored in an `Instance`. The value
  for instances is `InstanceHandle` (pretty easy) and for modules it's
  `Box<dyn Any>` (less easy).

* The custom host state for `InstanceHandle` for `wasmtime` is now
  `Arc<TypeTables` to be able to fully reconstruct an instance's types
  just from its instance.

* Type matching for imports now has been updated to take
  instances/modules into account.

One of the main downsides of this implementation is that type matching
of imports is duplicated between wasmparser and wasmtime, leading to
posssible bugs especially in the subtelties of module linking. I'm not
sure how best to unify these two pieces of validation, however, and it
may be more trouble than it's worth.

cc #2094

* Update wat/wast/wasmparser

* Review comments

* Fix a bug in publish script to vendor the right witx

Currently there's two witx binaries in our repository given the two wasi
spec submodules, so this updates the publication script to vendor the
right one.
2020-12-03 10:15:42 -06:00
Alex Crichton
9ac7d01288 Implement the module linking alias section (#2451)
This commit is intended to do almost everything necessary for processing
the alias section of module linking. Most of this is internal
refactoring, the highlights being:

* Type contents are now stored separately from a `wasmtime_env::Module`.
  Given that modules can freely alias types and have them used all over
  the place, it seemed best to have one canonical location to type
  storage which everywhere else points to (with indices). A new
  `TypeTables` structure is produced during compilation which is shared
  amongst all member modules in a wasm blob.

* Instantiation is heavily refactored to account for module linking. The
  main gotcha here is that imports are now listed as "initializers". We
  have a sort of pseudo-bytecode-interpreter which interprets the
  initialization of a module. This is more complicated than just
  matching imports at this point because in the module linking proposal
  the module, alias, import, and instance sections may all be
  interleaved. This means that imports aren't guaranteed to show up at
  the beginning of the address space for modules/instances.

Otherwise most of the changes here largely fell out from these two
design points. Aliases are recorded as initializers in this scheme.
Copying around type information and/or just knowing type information
during compilation is also pretty easy since everything is just a
pointer into a `TypeTables` and we don't have to actually copy any types
themselves. Lots of various refactorings were necessary to accomodate
these changes.

Tests are hoped to cover a breadth of functionality here, but not
necessarily a depth. There's still one more piece of the module linking
proposal missing which is exporting instances/modules, which will come
in a future PR.

It's also worth nothing that there's one large TODO which isn't
implemented in this change that I plan on opening an issue for.
With module linking when a set of modules comes back from compilation
each modules has all the trampolines for the entire set of modules. This
is quite a lot of duplicate trampolines across module-linking modules.
We'll want to refactor this at some point to instead have only one set
of trampolines per set of module linking modules and have them shared
from there. I figured it was best to separate out this change, however,
since it's purely related to resource usage, and doesn't impact
non-module-linking modules at all.

cc #2094
2020-12-02 17:24:06 -06:00
Alex Crichton
51c1d4bbd6 Provide filename/line number information in Trap (#2452)
* Provide filename/line number information in `Trap`

This commit extends the `Trap` type and `Store` to retain DWARF debug
information found in a wasm file unconditionally, if it's present. This
then enables us to print filenames and line numbers which point back to
actual source code when a trap backtrace is printed. Additionally the
`FrameInfo` type has been souped up to return filename/line number
information as well.

The implementation here is pretty simplistic currently. The meat of all
the work happens in `gimli` and `addr2line`, and otherwise wasmtime is
just schlepping around bytes of dwarf debuginfo here and there!

The general goal here is to assist with debugging when using wasmtime
because filenames and line numbers are generally orders of magnitude
better even when you already have a stack trace. Another nicety here is
that backtraces will display inlined frames (learned through debug
information), improving the experience in release mode as well.

An example of this is that with this file:

```rust
fn main() {
    panic!("hello");
}
```

we get this stack trace:

```
$ rustc foo.rs --target wasm32-wasi -g
$ cargo run foo.wasm
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/wasmtime foo.wasm`
thread 'main' panicked at 'hello', foo.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: failed to run main module `foo.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: unreachable
       wasm backtrace:
           0: 0x6c1c - panic_abort::__rust_start_panic::abort::h2d60298621b1ccbf
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:77:17
                     - __rust_start_panic
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:32:5
           1: 0x68c7 - rust_panic
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:626:9
           2: 0x65a1 - std::panicking::rust_panic_with_hook::h2345fb0909b53e12
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:596:5
           3: 0x1436 - std::panicking::begin_panic::{{closure}}::h106f151a6db8c8fb
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:506:9
           4:  0xda8 - std::sys_common::backtrace::__rust_end_short_backtrace::he55aa13f22782798
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:153:18
           5: 0x1324 - std::panicking::begin_panic::h1727e7d1d719c76f
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:505:12
           6:  0xfde - foo::main::h2db1313a64510850
                           at /Users/acrichton/code/wasmtime/foo.rs:2:5
           7: 0x11d5 - core::ops::function::FnOnce::call_once::h20ee1cc04aeff1fc
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227:5
           8:  0xddf - std::sys_common::backtrace::__rust_begin_short_backtrace::h054493e41e27e69c
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:137:18
           9: 0x1d5a - std::rt::lang_start::{{closure}}::hd83784448d3fcb42
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:66:18
          10: 0x69d8 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h564d3dad35014917
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259:13
                     - std::panicking::try::do_call::hdca4832ace5a8603
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:381:40
                     - std::panicking::try::ha8624a1a6854b456
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:345:19
                     - std::panic::catch_unwind::h71421f57cf2bc688
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panic.rs:382:14
                     - std::rt::lang_start_internal::h260050c92cd470af
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:51:25
          11: 0x1d0c - std::rt::lang_start::h0b4bcf3c5e498224
                           at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:65:5
          12:  0xffc - <unknown>!__original_main
          13:  0x393 - __muloti4
                           at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.35/src/macros.rs:269
```

This is relatively noisy by default but there's filenames and line
numbers! Additionally frame 10 can be seen to have lots of frames
inlined into it. All information is always available to the embedder but
we could try to handle the `__rust_begin_short_backtrace` and
`__rust_end_short_backtrace` markers to trim the backtrace by default as
well.

The only gotcha here is that it looks like `__muloti4` is out of place.
That's because the libc that Rust ships with doesn't have dwarf
information, although I'm not sure why we land in that function for
symbolizing it...

* Add a configuration switch for debuginfo

* Control debuginfo by default with `WASM_BACKTRACE_DETAILS`

* Try cpp_demangle on demangling as well

* Rename to WASMTIME_BACKTRACE_DETAILS
2020-12-01 16:56:23 -06:00
Alex Crichton
88a8a8993a Instantiate nested modules for module linking (#2447)
This commit implements the interpretation necessary of the instance
section of the module linking proposal. Instantiating a module which
itself has nested instantiated instances will now instantiate the nested
instances properly. This isn't all that useful without the ability to
alias exports off the result, but we can at least observe the side
effects of instantiation through the `start` function.

cc #2094
2020-12-01 14:01:31 -06:00
Alex Crichton
62be6841e4 Propagate optional import names to the wasmtime/C API
With the module linking proposal the field name on imports is now
optional, and only the module is required to be specified. This commit
propagates this API change to the boundary of wasmtime's API, ensuring
consumers are aware of what's optional with module linking and what
isn't. Note that it's expected that all existing users will either
update accordingly or unwrap the result since module linking is
presumably disabled.
2020-11-23 15:26:26 -08:00
Andrew Brown
c9e8889d47 Update clippy annotation to use latest version (#2375) 2020-11-09 09:24:59 -06:00
Alex Crichton
73cda83548 Propagate module-linking types to wasmtime (#2115)
This commit adds lots of plumbing to get the type section from the
module linking proposal plumbed all the way through to the `wasmtime`
crate and the `wasmtime-c-api` crate. This isn't all that useful right
now because Wasmtime doesn't support imported/exported
modules/instances, but this is all necessary groundwork to getting that
exported at some point. I've added some light tests but I suspect the
bulk of the testing will come in a future commit.

One major change in this commit is that `SignatureIndex` no longer
follows type type index space in a wasm module. Instead a new
`TypeIndex` type is used to track that. Function signatures, still
indexed by `SignatureIndex`, are then packed together tightly.
2020-11-06 14:48:09 -06:00
Alex Crichton
77827a48a9 Start compiling module-linking modules (#2093)
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on #2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
2020-11-06 13:32:30 -06:00
Alex Crichton
6b137c2a3d Move native signatures out of Module (#2362)
After compilation there's actually no need to hold onto the native
signature for a wasm function type, so this commit moves out the
`ir::Signature` value from a `Module` into a separate field that's
deallocated when compilation is finished. This simplifies the
`SignatureRegistry` because it only needs to track wasm functino types
and it also means less work is done for `Func::wrap`.
2020-11-04 14:22:37 -06:00
Alex Crichton
10b5cc50c3 Further compress the in-memory representation of address maps (#2324)
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
2020-11-02 20:37:18 -06:00
Alex Crichton
3461ffa563 Remove source_loc from TrapInformation (#2325)
Turns out this wasn't needed anywhere! Additionally we can construct it
from `InstructionAddressMap` anyway. There's so many pieces of trap
information that it's best to keep these structures small as well.
2020-10-28 13:05:05 -05:00
Alex Crichton
e659d5cecd Add initial support for the multi-memory proposal (#2263)
This commit adds initial (gated) support for the multi-memory wasm
proposal. This was actually quite easy since almost all of wasmtime
already expected multi-memory to be implemented one day. The only real
substantive change is the `memory.copy` intrinsic changes, which now
accounts for the source/destination memories possibly being different.
2020-10-13 19:13:52 -05:00
Alex Crichton
2c6841041d Validate modules while translating (#2059)
* Validate modules while translating

This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.

Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.

This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasmparser#62) provides the necessary tools to create
a `FuncValidator` for Gecko, but this is something I'll want careful
review for before landing!

* Read function operators until EOF

This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
2020-10-05 11:02:01 -05:00
Alex Crichton
693c6ea771 wasmtime: Extract cranelift/lightbeam compilers to separate crates (#2117)
This commit extracts the two implementations of `Compiler` into two
separate crates, `wasmtime-cranelfit` and `wasmtime-lightbeam`. The
`wasmtime-jit` crate then depends on these two and instantiates them
appropriately. The goal here is to start reducing the weight of the
`wasmtime-environ` crate, which currently serves as a common set of
types between all `wasmtime-*` crates. Long-term I'd like to remove the
dependency on Cranelift from `wasmtime-environ`, but that's going to
take a lot more work.

In the meantime I figure it's a good way to get started by separating
out the lightbeam/cranelift function compilers from the
`wasmtime-environ` crate. We can continue to iterate on moving things
out in the future, too.
2020-08-20 11:34:31 +02:00
Alex Crichton
08f9eb1725 Making caching support optional in Wasmtime (#2119)
This commit moves all of the caching support that currently lives in
`wasmtime-environ` into a `wasmtime-cache` crate and makes it optional. The
goal here is to slim down the `wasmtime-environ` crate and clearly separate
boundaries where caching is a standalone and optional feature, not intertwined
with other crates.
2020-08-07 15:42:40 -05:00
Nick Fitzgerald
05bf9ea3f3 Rename "Stackmap" to "StackMap"
And "stackmap" to "stack_map".

This commit is purely mechanical.
2020-08-07 10:08:44 -07:00
Alex Crichton
3d2e0e55f2 Remove the local field of Module (#2091)
This was added long ago at this point to assist with caching, but
caching has moved to a different level such that this wonky second level
of a `Module` isn't necessary. This commit removes the `ModuleLocal`
type to simplify accessors and generally make it easier to work with.
2020-08-04 12:29:16 -05:00
Julian Seward
25e31739a6 Implement Wasm Atomics for Cranelift/newBE/aarch64.
The implementation is pretty straightforward.  Wasm atomic instructions fall
into 5 groups

* atomic read-modify-write
* atomic compare-and-swap
* atomic loads
* atomic stores
* fences

and the implementation mirrors that structure, at both the CLIF and AArch64
levels.

At the CLIF level, there are five new instructions, one for each group.  Some
comments about these:

* for those that take addresses (all except fences), the address is contained
  entirely in a single `Value`; there is no offset field as there is with
  normal loads and stores.  Wasm atomics require alignment checks, and
  removing the offset makes implementation of those checks a bit simpler.

* atomic loads and stores get their own instructions, rather than reusing the
  existing load and store instructions, for two reasons:

  - per above comment, makes alignment checking simpler

  - reuse of existing loads and stores would require extension of `MemFlags`
    to indicate atomicity, which sounds semantically unclean.  For example,
    then *any* instruction carrying `MemFlags` could be marked as atomic, even
    in cases where it is meaningless or ambiguous.

* I tried to specify, in comments, the behaviour of these instructions as
  tightly as I could.  Unfortunately there is no way (per my limited CLIF
  knowledge) to enforce the constraint that they may only be used on I8, I16,
  I32 and I64 types, and in particular not on floating point or vector types.

The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.

At the AArch64 level, there are also five new instructions, one for each
group.  All of them except `::Fence` contain multiple real machine
instructions.  Atomic r-m-w and atomic c-a-s are emitted as the usual
load-linked store-conditional loops, guarded at both ends by memory fences.
Atomic loads and stores are emitted as a load preceded by a fence, and a store
followed by a fence, respectively.  The amount of fencing may be overkill, but
it reflects exactly what the SM Wasm baseline compiler for AArch64 does.

One reason to implement r-m-w and c-a-s as a single insn which is expanded
only at emission time is that we must be very careful what instructions we
allow in between the load-linked and store-conditional.  In particular, we
cannot allow *any* extra memory transactions in there, since -- particularly
on low-end hardware -- that might cause the transaction to fail, hence
deadlocking the generated code.  That implies that we can't present the LL/SC
loop to the register allocator as its constituent instructions, since it might
insert spills anywhere.  Hence we must present it as a single indivisible
unit, as we do here.  It also has the benefit of reducing the total amount of
work the RA has to do.

The only other notable feature of the r-m-w and c-a-s translations into
AArch64 code, is that they both need a scratch register internally.  Rather
than faking one up by claiming, in `get_regs` that it modifies an extra
scratch register, and having to have a dummy initialisation of it, these new
instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
x24-x28.  We rely on the RA's ability to coalesce V<-->R copies to make the
cost of the resulting extra copies zero or almost zero.  x24-x28 are chosen so
as to be call-clobbered, hence their use is less likely to interfere with long
live ranges that span calls.

One subtlety regarding the use of completely fixed input and output registers
is that we must be careful how the surrounding copy from/to of the arg/result
registers is done.  In particular, it is not safe to simply emit copies in
some arbitrary order if one of the arg registers is a real reg.  For that
reason, the arguments are first moved into virtual regs if they are not
already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
Again, we rely on coalescing to turn them into no-ops in the common case.

There is also a ridealong fix for the AArch64 lowering case for
`Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
in a row were generated.

In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
which I believe to be correct, but for which I would appreciate a second
opinion.  Unless otherwise directed, I will remove them for the final commit
but leave the associated code/comments unchanged.
2020-08-04 09:35:50 +02:00
Alex Crichton
65eaca35dd Refactor where results of compilation are stored (#2086)
* Refactor where results of compilation are stored

This commit refactors the internals of compilation in Wasmtime to change
where results of individual function compilation are stored. Previously
compilation resulted in many maps being returned, and compilation
results generally held all these maps together. This commit instead
switches this to have all metadata stored in a `CompiledFunction`
instead of having a separate map for each item that can be stored.

The motivation for this is primarily to help out with future
module-linking-related PRs. What exactly "module level" is depends on
how we interpret modules and how many modules are in play, so it's a bit
easier for operations in wasmtime to work at the function level where
possible. This means that we don't have to pass around multiple
different maps and a function index, but instead just one map or just
one entry representing a compiled function.

Additionally this change updates where the parallelism of compilation
happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`.
This is another goal where `wasmtime-jit` will have more knowledge about
module-level pieces with module linking in play. User-facing-wise this
should be the same in terms of parallel compilation, though.

The ultimate goal of this refactoring is to make it easier for the
results of compilation to actually be a set of wasm modules. This means
we won't be able to have a map-per-metadata where the primary key is the
function index, because there will be many modules within one "object
file".

* Don't clear out fields, just don't store them

Persist a smaller set of fields in `CompilationArtifacts` instead of
trying to clear fields out and dynamically not accessing them.
2020-08-03 12:20:51 -05:00
Alex Crichton
026fb8d388 Don't re-parse wasm for debuginfo (#2085)
* Don't re-parse wasm for debuginfo

This commit updates debuginfo parsing to happen during the main
translation of the original wasm module. This avoid re-parsing the wasm
module twice (at least the section-level headers). Additionally this
ties debuginfo directly to a `ModuleTranslation` which makes it easier
to process debuginfo for nested modules in the upcoming module linking
proposal.

The changes here are summarized by taking the `read_debuginfo` function
and merging it with the main module translation that happens which is
driven by cranelift. Some new hooks were added to the module environment
trait to support this, but most of it was integrating with existing hooks.

* Fix tests in debug crate
2020-08-03 09:59:20 -05:00
Yury Delendik
42127aac4e Refactor Cache logic to include debug information (#2065)
* move caching to the CompilationArtifacts

* mv cache_config from Compiler to CompiledModule

* hash isa flags

* no cache for wasm2obj

* mv caching to wasmtime crate

* account each Compiler field when hash
2020-07-23 12:10:13 -05:00
Alex Crichton
8a04fc3cdc Refactor wasmtime's internal cache slightly (#2057)
Be more generic over the type being serialized, and remove an
intermediate struct which was only used for serialization but isn't
necessary.
2020-07-22 10:32:53 -05:00
Yury Delendik
399ee0a54c Serialize and deserialize compilation artifacts. (#2020)
* Serialize and deserialize Module
* Use bincode to serialize
* Add wasm_module_serialize; docs
* Simple tests
2020-07-21 15:05:50 -05:00
Yury Delendik
b2551bb4d0 Make wasmtime_environ::Module serializable (#2005)
* Define WasmType/WasmFuncType in the Cranelift
* Make `Module` serializable
2020-07-10 15:56:43 -05:00
Yury Delendik
091373f9b8 Removes duplicate code in src/obj.rs, crates/obj and crates/jit/object.rs (#1993)
Changes:

 -  Moves object creation code from crates/jit/object.rs to the creates/obj (as ObjectBuilder)
 -   Removes legacy crates/obj/function.rs
 -  Removes write_debugsections
2020-07-08 12:14:19 -05:00
Daiki Ueno
2ce2dd0203 wasmtime: add build-time option for parallel compilation (#1903)
When running in embedded environments, threads creation is sometimes
undesirable. This adds a feature to toggle wasmtime's internal thread
creation for parallel compilation.
2020-07-06 11:22:05 -05:00
Nick Fitzgerald
3555f97906 wasmtime: Implement table.fill
Part of #929
2020-07-02 16:59:07 -07:00
Nick Fitzgerald
bffd54c016 wasmtime: Implement global.{get,set} for externref globals (#1969)
* wasmtime: Implement `global.{get,set}` for externref globals

We use libcalls to implement these -- unlike `table.{get,set}`, for which we
create inline JIT fast paths -- because no known toolchain actually uses
externref globals.

Part of #929

* wasmtime: Enable `{extern,func}ref` globals in the API
2020-07-02 16:04:01 -05:00
Nick Fitzgerald
a2f4202800 cranelift-frontend: Add the FunctionBuilder::insert_block_after method 2020-06-30 12:00:57 -07:00
Nick Fitzgerald
8c5f59c0cf wasmtime: Implement table.get and table.set
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).

Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.

Part of #929
2020-06-30 12:00:57 -07:00
Nick Fitzgerald
58bb5dd953 wasmtime: Add support for func.ref and table.grow with funcrefs
`funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`.

This should be more efficient than using a `VMExternRef` that points at a
`VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic
allocation, and some reference counting.

Note that the null function reference is *NOT* a null pointer; it is a
`VMCallerCheckedAnyfunc` that has a null `func_ptr` member.

Part of #929
2020-06-24 10:08:13 -07:00
Nick Fitzgerald
ddc2ce8080 cranelift-wasm: Make FuncEnvironment::translate_ref_func take a FuncIndex
It was previously taking a raw `u32`. This change makes it more clear what index
space that index points into.
2020-06-23 16:36:10 -07:00
Nick Fitzgerald
28fccaedc4 cranelift-wasm: Pass ir::Tables into all the translate_table_* methods
This serves two purposes:

1. It ensures that we call `get_or_create_table` to ensure that the embedder
already had a chance to create the given table (although this is mostly
redundant due to validation).

2. It allows the embedder to easily get the `ir::TableData` associated with this
table, and more easily emit whatever inline JIT code to translate the table
instruction (rather than falling back to VM calls).
2020-06-23 16:36:10 -07:00
Nick Fitzgerald
bbd99c5bfa reference types: Implement the table.size and table.grow instructions (#1894)
Part of #929
2020-06-18 08:57:18 -05:00
Benjamin Bouvier
c2692ecb8a Wasmtime: allow using the experimental Cranelift x64 backend in cli;
This introduces two changes:

- first, a Cargo feature is added to make it possible to use the
Cranelift x64 backend directly from wasmtime's CLI.
- second, when passing a `cranelift-flags` parameter, and the given
parameter's name doesn't exist at the target-independent flag level, try
to set it as a target-dependent setting.

These two changes make it possible to try out the new x64 backend with:

    cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm

Right now, this will fail because most opcodes required by the
trampolines are actually not implemented yet.
2020-06-17 17:18:46 +02:00
Nick Fitzgerald
647d2b4231 Merge pull request #1832 from fitzgen/externref-stack-maps
externref: implement stack map-based garbage collection
2020-06-15 18:26:24 -07:00
Nick Fitzgerald
7e167cae10 externref: Address review feedback 2020-06-15 15:39:26 -07:00
Nick Fitzgerald
8d671c21e2 wasmtime-runtime: Allow tables to internally hold externrefs (#1882)
This commit enables `wasmtime_runtime::Table` to internally hold elements of
either `funcref` (all that is currently supported) or `externref` (newly
introduced in this commit).

This commit updates `Table`'s API, but does NOT generally propagate those
changes outwards all the way through the Wasmtime embedding API. It only does
enough to get everything compiling and the current test suite passing. It is
expected that as we implement more of the reference types spec, we will bubble
these changes out and expose them to the embedding API.
2020-06-15 16:55:23 -05:00
Nick Fitzgerald
f30ce1fe97 externref: implement stack map-based garbage collection
For host VM code, we use plain reference counting, where cloning increments
the reference count, and dropping decrements it. We can avoid many of the
on-stack increment/decrement operations that typically plague the
performance of reference counting via Rust's ownership and borrowing system.
Moving a `VMExternRef` avoids mutating its reference count, and borrowing it
either avoids the reference count increment or delays it until if/when the
`VMExternRef` is cloned.

When passing a `VMExternRef` into compiled Wasm code, we don't want to do
reference count mutations for every compiled `local.{get,set}`, nor for
every function call. Therefore, we use a variation of **deferred reference
counting**, where we only mutate reference counts when storing
`VMExternRef`s somewhere that outlives the activation: into a global or
table. Simultaneously, we over-approximate the set of `VMExternRef`s that
are inside Wasm function activations. Periodically, we walk the stack at GC
safe points, and use stack map information to precisely identify the set of
`VMExternRef`s inside Wasm activations. Then we take the difference between
this precise set and our over-approximation, and decrement the reference
count for each of the `VMExternRef`s that are in our over-approximation but
not in the precise set. Finally, the over-approximation is replaced with the
precise set.

The `VMExternRefActivationsTable` implements the over-approximized set of
`VMExternRef`s referenced by Wasm activations. Calling a Wasm function and
passing it a `VMExternRef` moves the `VMExternRef` into the table, and the
compiled Wasm function logically "borrows" the `VMExternRef` from the
table. Similarly, `global.get` and `table.get` operations clone the gotten
`VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the
reference out of the table.

When a `VMExternRef` is returned to host code from a Wasm function, the host
increments the reference count (because the reference is logically
"borrowed" from the `VMExternRefActivationsTable` and the reference count
from the table will be dropped at the next GC).

For more general information on deferred reference counting, see *An
Examination of Deferred Reference Counting and Cycle Detection* by Quinane:
https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804
2020-06-15 09:39:37 -07:00
Nick Fitzgerald
d5bdce99c7 Remove executable bit from Rust source files 2020-06-01 15:09:51 -07:00
Nick Fitzgerald
a8ee0554a9 wasmtime: Initial, partial support for externref
This is enough to get an `externref -> externref` identity function
passing.

However, `externref`s that are dropped by compiled Wasm code are (safely)
leaked. Follow up work will leverage cranelift's stack maps to resolve this
issue.
2020-06-01 15:09:51 -07:00
Nick Fitzgerald
acf8ad0df7 cranelift_wasm: expose the original Wasm function signature
In the `ModuleEnvironment::declare_signature` callback, also pass the original
Wasm function signature, so that consumers may associate this information with
each compiled function. This is often necessary because while each Wasm
signature gets compiled down into a single native signature, multiple Wasm
signatures might compile down into the same native signature, and in these cases
the original Wasm signature is required for dynamic type checking of calls.
2020-06-01 14:53:10 -07:00
Nick Fitzgerald
1898d52966 runtime: Introduce VMExternRef and VMExternData
`VMExternRef` is a reference-counted box for any kind of data that is
external and opaque to running Wasm. Sometimes it might hold a Wasmtime
thing, other times it might hold something from a Wasmtime embedder and is
opaque even to us. It is morally equivalent to `Rc<dyn Any>` in Rust, but
additionally always fits in a pointer-sized word. `VMExternRef` is
non-nullable, but `Option<VMExternRef>` is a null pointer.

The one part of `VMExternRef` that can't ever be opaque to us is the
reference count. Even when we don't know what's inside an `VMExternRef`, we
need to be able to manipulate its reference count as we add and remove
references to it. And we need to do this from compiled Wasm code, so it must
be `repr(C)`!

`VMExternRef` itself is just a pointer to an `VMExternData`, which holds the
opaque, boxed value, its reference count, and its vtable pointer.

The `VMExternData` struct is *preceded* by the dynamically-sized value boxed
up and referenced by one or more `VMExternRef`s:

```ignore
     ,-------------------------------------------------------.
     |                                                       |
     V                                                       |
    +----------------------------+-----------+-----------+   |
    | dynamically-sized value... | ref_count | value_ptr |---'
    +----------------------------+-----------+-----------+
                                 | VMExternData          |
                                 +-----------------------+
                                  ^
+-------------+                   |
| VMExternRef |-------------------+
+-------------+                   |
                                  |
+-------------+                   |
| VMExternRef |-------------------+
+-------------+                   |
                                  |
  ...                            ===
                                  |
+-------------+                   |
| VMExternRef |-------------------'
+-------------+
```

The `value_ptr` member always points backwards to the start of the
dynamically-sized value (which is also the start of the heap allocation for
this value-and-`VMExternData` pair). Because it is a `dyn` pointer, it is
fat, and also points to the value's `Any` vtable.

The boxed value and the `VMExternRef` footer are held a single heap
allocation. The layout described above is used to make satisfying the
value's alignment easy: we just need to ensure that the heap allocation used
to hold everything satisfies its alignment. It also ensures that we don't
need a ton of excess padding between the `VMExternData` and the value for
values with large alignment.
2020-06-01 14:53:10 -07:00
Nick Fitzgerald
01a92aef95 cranelift_wasm: Use the TableIndex type instead of raw u32
About half of the `FuncEnvironment::translate_table_*` methods were using the
`TableIndex` newtype, while the other half were using raw `u32`s. This commit
makes everything use `TableIndex`.
2020-06-01 14:53:10 -07:00
Chris Fallin
72e6be9342 Rework of MachInst isel, branch fixups and lowering, and block ordering.
This patch includes:

- A complete rework of the way that CLIF blocks and edge blocks are
  lowered into VCode blocks. The new mechanism in `BlockLoweringOrder`
  computes RPO over the CFG, but with a twist: it merges edge blocks intto
  heads or tails of original CLIF blocks wherever possible, and it does
  this without ever actually materializing the full nodes-plus-edges
  graph first. The backend driver lowers blocks in final order so
  there's no need to reshuffle later.

- A new `MachBuffer` that replaces the `MachSection`. This is a special
  version of a code-sink that is far more than a humble `Vec<u8>`. In
  particular, it keeps a record of label definitions and label uses,
  with a machine-pluggable `LabelUse` trait that defines various types
  of fixups (basically internal relocations).

  Importantly, it implements some simple peephole-style branch rewrites
  *inline in the emission pass*, without any separate traversals over
  the code to use fallthroughs, swap taken/not-taken arms, etc. It
  tracks branches at the tail of the buffer and can (i) remove blocks
  that are just unconditional branches (by redirecting the label), (ii)
  understand a conditional/unconditional pair and swap the conditional
  polarity when it's helpful; and (iii) remove branches that branch to
  the fallthrough PC.

  The `MachBuffer` also implements branch-island support. On
  architectures like AArch64, this is needed to allow conditional
  branches within plausibly-attainable ranges (+/- 1MB on AArch64
  specifically). It also does this inline while streaming through the
  emission, without any sort of fixpoint algorithm or later moving of
  code, by simply tracking outstanding references and "deadlines" and
  emitting an island just-in-time when we're in danger of going out of
  range.

- A rework of the instruction selector driver. This is largely following
  the same algorithm as before, but is cleaned up significantly, in
  particular in the API: the machine backend can ask for an input arg
  and get any of three forms (constant, register, producing
  instruction), indicating it needs the register or can merge the
  constant or producing instruction as appropriate. This new driver
  takes special care to emit constants right at use-sites (and at phi
  inputs), minimizing their live-ranges, and also special-cases the
  "pinned register" to avoid superfluous moves.

Overall, on `bz2.wasm`, the results are:

    wasmtime full run (compile + runtime) of bz2:

    baseline:   9774M insns, 9742M cycles, 3.918s
    w/ changes: 7012M insns, 6888M cycles, 2.958s  (24.5% faster, 28.3% fewer insns)

    clif-util wasm compile bz2:

    baseline:   2633M insns, 3278M cycles, 1.034s
    w/ changes: 2366M insns, 2920M cycles, 0.923s  (10.7% faster, 10.1% fewer insns)

    All numbers are averages of two runs on an Ampere eMAG.
2020-05-16 23:08:22 -07:00
Yury Delendik
1873c0ae46 Fix value label ranges resolution (#1572)
There was a bug how value labels were resolved, which caused some DWARF expressions not be transformed, e.g. those are in the registers.

*    Implements FIXME in expression.rs
*    Move TargetIsa from CompiledExpression structure
*    Fix expression format for GDB
*    Add tests for parsing
*    Proper logic in ValueLabelRangesBuilder::process_label
*    Tests for ValueLabelRangesBuilder
*    Refactor build_with_locals to return Iterator instead of Vec<_>
*    Misc comments and magical numbers
2020-04-30 08:07:55 -05:00
Alex Crichton
363cd2d20f Expose memory-related options in Config (#1513)
* Expose memory-related options in `Config`

This commit was initially motivated by looking more into #1501, but it
ended up balooning a bit after finding a few issues. The high-level
items in this commit are:

* New configuration options via `wasmtime::Config` are exposed to
  configure the tunable limits of how memories are allocated and such.
* The `MemoryCreator` trait has been updated to accurately reflect the
  required allocation characteristics that JIT code expects.
* A bug has been fixed in the cranelift wasm code generation where if no
  guard page was present bounds checks weren't accurately performed.

The new `Config` methods allow tuning the memory allocation
characteristics of wasmtime. Currently 64-bit platforms will reserve 6GB
chunks of memory for each linear memory, but by tweaking various config
options you can change how this is allocate, perhaps at the cost of
slower JIT code since it needs more bounds checks. The methods are
intended to be pretty thoroughly documented as to the effect they have
on the JIT code and what values you may wish to select. These new
methods have been added to the spectest fuzzer to ensure that various
configuration values for these methods don't affect correctness.

The `MemoryCreator` trait previously only allocated memories with a
`MemoryType`, but this didn't actually reflect the guarantees that JIT
code expected. JIT code is generated with an assumption about the
minimum size of the guard region, as well as whether memory is static or
dynamic (whether the base pointer can be relocated). These properties
must be upheld by custom allocation engines for JIT code to perform
correctly, so extra parameters have been added to
`MemoryCreator::new_memory` to reflect this.

Finally the fuzzing with `Config` turned up an issue where if no guard
pages present the wasm code wouldn't correctly bounds-check memory
accesses. The issue here was that with a guard page we only need to
bounds-check the first byte of access, but without a guard page we need
to bounds-check the last byte of access. This meant that the code
generation needed to account for the size of the memory operation
(load/store) and use this as the offset-to-check in the no-guard-page
scenario. I've attempted to make the various comments in cranelift a bit
more exhaustive too to hopefully make it a bit clearer for future
readers!

Closes #1501

* Review comments

* Update a comment
2020-04-29 17:10:00 -07:00
Jef
957677c6f5 Integrate Lightbeam with latest Wasmtime master (#1232)
* Implement trap info in Lightbeam

* Start using wasm-reader instead of wasmparser for parsing operators

* Update to use wasm-reader, some reductions in allocation, support source location tracking for traps, start to support multi-value

The only thing that still needs to be supported for multi-value is stack returns, but we need to make it compatible with Cranelift.

* Error when running out of registers (although we'd hope it should be impossible) instead of panicking

* WIP: Update Lightbeam to work with latest Wasmtime

* WIP: Update Lightbeam to use current wasmtime

* WIP: Migrate to new system for builtin functions

* WIP: Update Lightbeam to work with latest Wasmtime

* Remove multi_mut

* Format

* Fix some bugs around arguments, add debuginfo offset tracking

* Complete integration with new Wasmtime

* Remove commented code

* Fix formatting

* Fix warnings, remove unused dependencies

* Fix `iter` if there are too many elements, fix compilation for latest wasmtime

* Fix float arguments on stack

* Remove wasm-reader and trap info work

* Allocate stack space _before_ passing arguments, fail if we can't zero a xmm reg

* Fix stack argument offset calculation

* Fix stack arguments in Lightbeam

* Re-add WASI because it somehow got removed during rebase

* Workaround for apparent `type_alias_impl_trait`-related bug in rustdoc

* Fix breakages caused by rebase, remove module offset info as it is unrelated to wasmtime integration PR and was broken by rebase

* Add TODO comment explaining `lightbeam::ModuleContext` trait
2020-04-29 16:26:40 -07:00
Chris Fallin
b691770faa MachInst backend: pass through SourceLoc information.
This change adds SourceLoc information per instruction in a `VCode<Inst>`
container, and keeps this information up-to-date across register allocation
and branch reordering. The information is initially collected during
instruction lowering, eventually collected on the MachSection, and finally
provided to the environment that wraps the codegen crate for wasmtime.
2020-04-24 13:18:01 -07:00