* Clear affine slots when dropping a `Module`
This commit implements a resource usage optimization for Wasmtime with
the pooling instance allocator by ensuring that when a `Module` is
dropped its backing virtual memory mappings are all removed. Currently
when a `Module` is dropped it releases a strong reference to its
internal memory image but the memory image may stick around in
individual pooling instance allocator slots. When using the `Random`
allocation strategy, for example, this means that the memory images
could stick around for a long time.
While not a pressing issue this has resource usage implications for
Wasmtime. Namely removing a `Module` does not guarantee the memfd, if in
use for a memory image, is closed and deallocated within the kernel.
Unfortunately simply closing the memfd is not sufficient as well as the
mappings into the address space additionally all need to be removed for
the kernel to release the resources for the memfd. This means that to
release all kernel-level resources for a `Module` all slots which have
the memory image mapped in must have the slot reset.
This problem isn't particularly present when using the `NextAvailable`
allocation strategy since the number of lingering memfds is proportional
to the maximum concurrent size of wasm instances. With the `Random` and
`ReuseAffinity` strategies, however, it's much more prominent because
the number of lingering memfds can reach the total number of slots
available. This can appear as a leak of kernel-level memory which can
cause other system instability.
To fix this issue this commit adds necessary instrumentation to `Drop
for Module` to purge all references to the module in the pooling
instance allocator. All index allocation strategies now maintain
affinity tracking to ensure that regardless of the strategy in use a
module that is dropped will remove all its memory mappings. A new
allocation method was added to the index allocator for allocating an
index without setting affinity and only allocating affine slots. This is
used to iterate over all the affine slots without holding the global
index lock for an unnecessarily long time while mappings are removed.
* Review comments
This commit refactors the internals of `wasmtime_runtime::SharedMemory`
a bit to expose the necessary functions to invoke from the
`wasmtime::SharedMemory` layer. Notably some items are moved out of the
`RwLock` from prior, such as the type and the `VMMemoryDefinition`.
Additionally the organization around the `atomic_*` methods has been
redone to ensure that the `wasmtime`-layer abstraction has a single
method to call into which everything else uses as well.
This commit replaces `wasmtime_environ::TrapCode` with `wasmtime::Trap`.
This is possible with past refactorings which slimmed down the `Trap`
definition in the `wasmtime` crate to a simple `enum`. This means that
there's one less place that all the various trap opcodes need to be
listed in Wasmtime.
* Remove explicit `S` type parameters
This commit removes the explicit `S` type parameter on `Func::typed` and
`Instance::get_typed_func`. Historical versions of Rust required that
this be a type parameter but recent rustcs support a mixture of explicit
type parameters and `impl Trait`. This removes, at callsites, a
superfluous `, _` argument which otherwise never needs specification.
* Fix mdbook examples
This commit is an attempt at improving the safety of using the return
value of the `SharedMemory::data` method. Previously this returned
`*mut [u8]` which, while correct, is unwieldy and unsafe to work with.
The new return value of `&[UnsafeCell<u8>]` has a few advantages:
* The lifetime of the returned data is now connected to the
`SharedMemory` itself, removing the possibility for a class of errors
of accidentally using the prior `*mut [u8]` beyond its original lifetime.
* It's not possibly to safely access `.len()` as opposed to requiring an
`unsafe` dereference before.
* The data internally within the slice is now what retains the `unsafe`
bits, namely indicating that accessing any memory inside of the
contents returned is `unsafe` but addressing it is safe.
I was inspired by the `wiggle`-based discussion on #5229 and felt it
appropriate to apply a similar change here.
* Unconditionally use `MemoryImageSlot`
This commit removes the internal branching within the pooling instance
allocator to sometimes use a `MemoryImageSlot` and sometimes now.
Instead this is now unconditionally used in all situations on all
platforms. This fixes an issue where the state of a slot could get
corrupted if modules being instantiated switched from having images to
not having an image or vice versa.
The bulk of this commit is the removal of the `memory-init-cow`
compile-time feature in addition to adding Windows support to the
`cow.rs` file.
* Fix compile on Unix
* Add a stricter assertion for static memory bounds
Double-check that when a memory is allocated the configuration required
is satisfied by the pooling allocator.
I noticed this in the backtrace of something that timed out on oss-fuzz
and there's no need to include this information in trampolines, so this
removes the extra sections from being generated.
When new wasm instances are created repeatedly in high-concurrency
environments one of the largest bottlenecks is the contention on
kernel-level locks having to do with the virtual memory. It's expected
that usage in this environment is leveraging the pooling instance
allocator with the `memory-init-cow` feature enabled which means that
the kernel level VM lock is acquired in operations such as:
1. Growing a heap with `mprotect` (write lock)
2. Faulting in memory during usage (read lock)
3. Resetting a heap's contents with `madvise` (read lock)
4. Shrinking a heap with `mprotect` when reusing a slot (write lock)
Rapid usage of these operations can lead to detrimental performance
especially on otherwise heavily loaded systems, worsening the more
frequent the above operations are. This commit is aimed at addressing
the (2) case above, reducing the number of page faults that are
fulfilled by the kernel.
Currently these page faults happen for three reasons:
* When memory is first accessed after the heap is grown.
* When the initial linear memory image is accessed for the first time.
* When the initial zero'd heap contents, not part of the linear memory
image, are accessed.
This PR is attempting to address the latter of these cases, and to a
lesser extent the first case as well. Specifically this PR provides the
ability to partially reset a pooled linear memory with `memset` rather
than `madvise`. This is done to have the same effect of resetting
contents to zero but namely has a different effect on paging, notably
keeping the pages resident in memory rather than returning them to the
kernel. This means that reuse of a linear memory slot on a page that was
previously `memset` will not trigger a page fault since everything
remains paged into the process.
The end result is that any access to linear memory which has been
touched by `memset` will no longer page fault on reuse. On more recent
kernels (6.0+) this also means pages which were zero'd by `memset`, made
inaccessible with `PROT_NONE`, and then made accessible again with
`PROT_READ | PROT_WRITE` will not page fault. This can be common when a
wasm instances grows its heap slightly, uses that memory, but then it's
shrunk when the memory is reused for the next instance. Note that this
kernel optimization requires a 6.0+ kernel.
This same optimization is furthermore applied to both async stacks with
the pooling memory allocator in addition to table elements. The defaults
of Wasmtime are not changing with this PR, instead knobs are being
exposed for embedders to turn if they so desire. This is currently being
experimented with at Fastly and I may come back and alter the defaults
of Wasmtime if it seems suitable after our measurements.
This commit changes the APIs in the `wasmtime` crate for configuring the
pooling allocator. I plan on adding a few more configuration options in
the near future and the current structure was feeling unwieldy for
adding these new abstractions.
The previous `struct`-based API has been replaced with a builder-style
API in a similar shape as to `Config`. This is done to help make it
easier to add more configuration options in the future through adding
more methods as opposed to adding more field which could break prior
initializations.
This commit fixes `cargo doc -p wasmtime --no-default-features` where
previously it would fail with many broken doc links because the crate is
missing many items that links refer to. Instead they're emitted as
warnings now which while noisy should prevent the build from being
entirely usable at least.
* Return `anyhow::Error` from host functions instead of `Trap`
This commit refactors how errors are modeled when returned from host
functions and additionally refactors how custom errors work with `Trap`.
At a high level functions in Wasmtime that previously worked with
`Result<T, Trap>` now work with `Result<T>` instead where the error is
`anyhow::Error`. This includes functions such as:
* Host-defined functions in a `Linker<T>`
* `TypedFunc::call`
* Host-related callbacks like call hooks
Errors are now modeled primarily as `anyhow::Error` throughout Wasmtime.
This subsequently removes the need for `Trap` to have the ability to
represent all host-defined errors as it previously did. Consequently the
`From` implementations for any error into a `Trap` have been removed
here and the only embedder-defined way to create a `Trap` is to use
`Trap::new` with a custom string.
After this commit the distinction between a `Trap` and a host error is
the wasm backtrace that it contains. Previously all errors in host
functions would flow through a `Trap` and get a wasm backtrace attached
to them, but now this only happens if a `Trap` itself is created meaning
that arbitrary host-defined errors flowing from a host import to the
other side won't get backtraces attached. Some internals of Wasmtime
itself were updated or preserved to use `Trap::new` to capture a
backtrace where it seemed useful, such as when fuel runs out.
The main motivation for this commit is that it now enables hosts to
thread a concrete error type from a host function all the way through to
where a wasm function was invoked. Previously this could not be done
since the host error was wrapped in a `Trap` that didn't provide the
ability to get at the internals.
A consequence of this commit is that when a host error is returned that
isn't a `Trap` we'll capture a backtrace and then won't have a `Trap` to
attach it to. To avoid losing the contextual information this commit
uses the `Error::context` method to attach the backtrace as contextual
information to ensure that the backtrace is itself not lost.
This is a breaking change for likely all users of Wasmtime, but it's
hoped to be a relatively minor change to workaround. Most use cases can
likely change `-> Result<T, Trap>` to `-> Result<T>` and otherwise
explicit creation of a `Trap` is largely no longer necessary.
* Fix some doc links
* add some tests and make a backtrace type public (#55)
* Trap: avoid a trailing newline in the Display impl
which in turn ends up with three newlines between the end of the
backtrace and the `Caused by` in the anyhow Debug impl
* make BacktraceContext pub, and add tests showing downcasting behavior of anyhow::Error to traps or backtraces
* Remove now-unnecesary `Trap` downcasts in `Linker::module`
* Fix test output expectations
* Remove `Trap::i32_exit`
This commit removes special-handling in the `wasmtime::Trap` type for
the i32 exit code required by WASI. This is now instead modeled as a
specific `I32Exit` error type in the `wasmtime-wasi` crate which is
returned by the `proc_exit` hostcall. Embedders which previously tested
for i32 exits now downcast to the `I32Exit` value.
* Remove the `Trap::new` constructor
This commit removes the ability to create a trap with an arbitrary error
message. The purpose of this commit is to continue the prior trend of
leaning into the `anyhow::Error` type instead of trying to recreate it
with `Trap`. A subsequent simplification to `Trap` after this commit is
that `Trap` will simply be an `enum` of trap codes with no extra
information. This commit is doubly-motivated by the desire to always use
the new `BacktraceContext` type instead of sometimes using that and
sometimes using `Trap`.
Most of the changes here were around updating `Trap::new` calls to
`bail!` calls instead. Tests which assert particular error messages
additionally often needed to use the `:?` formatter instead of the `{}`
formatter because the prior formats the whole `anyhow::Error` and the
latter only formats the top-most error, which now contains the
backtrace.
* Merge `Trap` and `TrapCode`
With prior refactorings there's no more need for `Trap` to be opaque or
otherwise contain a backtrace. This commit parse down `Trap` to simply
an `enum` which was the old `TrapCode`. All various tests and such were
updated to handle this.
The main consequence of this commit is that all errors have a
`BacktraceContext` context attached to them. This unfortunately means
that the backtrace is printed first before the error message or trap
code, but given all the prior simplifications that seems worth it at
this time.
* Rename `BacktraceContext` to `WasmBacktrace`
This feels like a better name given how this has turned out, and
additionally this commit removes having both `WasmBacktrace` and
`BacktraceContext`.
* Soup up documentation for errors and traps
* Fix build of the C API
Co-authored-by: Pat Hickey <pat@moreproductive.org>
* Pull `Module` out of `ModuleTextBuilder`
This commit is the first in what will likely be a number towards
preparing for serializing a compiled component to bytes, a precompiled
artifact. To that end my rough plan is to merge all of the compiled
artifacts for a component into one large object file instead of having
lots of separate object files and lots of separate mmaps to manage. To
that end I plan on eventually using `ModuleTextBuilder` to build one
large text section for all core wasm modules and trampolines, meaning
that `ModuleTextBuilder` is no longer specific to one module. I've
extracted out functionality such as function name calculation as well as
relocation resolving (now a closure passed in) in preparation for this.
For now this just keeps tests passing, and the trajectory for this
should become more clear over the following commits.
* Remove component-specific object emission
This commit removes the `ComponentCompiler::emit_obj` function in favor
of `Compiler::emit_obj`, now renamed `append_code`. This involved
significantly refactoring code emission to take a flat list of functions
into `append_code` and the caller is responsible for weaving together
various "families" of functions and un-weaving them afterwards.
* Consolidate ELF parsing in `CodeMemory`
This commit moves the ELF file parsing and section iteration from
`CompiledModule` into `CodeMemory` so one location keeps track of
section ranges and such. This is in preparation for sharing much of this
code with components which needs all the same sections to get tracked
but won't be using `CompiledModule`. A small side benefit from this is
that the section parsing done in `CodeMemory` and `CompiledModule` is no
longer duplicated.
* Remove separately tracked traps in components
Previously components would generate an "always trapping" function
and the metadata around which pc was allowed to trap was handled
manually for components. With recent refactorings the Wasmtime-standard
trap section in object files is now being generated for components as
well which means that can be reused instead of custom-tracking this
metadata. This commit removes the manual tracking for the `always_trap`
functions and plumbs the necessary bits around to make components look
more like modules.
* Remove a now-unnecessary `Arc` in `Module`
Not expected to have any measurable impact on performance, but
complexity-wise this should make it a bit easier to understand the
internals since there's no longer any need to store this somewhere else
than its owner's location.
* Merge compilation artifacts of components
This commit is a large refactoring of the component compilation process
to produce a single artifact instead of multiple binary artifacts. The
core wasm compilation process is refactored as well to share as much
code as necessary with the component compilation process.
This method of representing a compiled component necessitated a few
medium-sized changes internally within Wasmtime:
* A new data structure was created, `CodeObject`, which represents
metadata about a single compiled artifact. This is then stored as an
`Arc` within a component and a module. For `Module` this is always
uniquely owned and represents a shuffling around of data from one
owner to another. For a `Component`, however, this is shared amongst
all loaded modules and the top-level component.
* The "module registry" which is used for symbolicating backtraces and
for trap information has been updated to account for a single region
of loaded code holding possibly multiple modules. This involved adding
a second-level `BTreeMap` for now. This will likely slow down
instantiation slightly but if it poses an issue in the future this
should be able to be represented with a more clever data structure.
This commit additionally solves a number of longstanding issues with
components such as compiling only one host-to-wasm trampoline per
signature instead of possibly once-per-module. Additionally the
`SignatureCollection` registration now happens once-per-component
instead of once-per-module-within-a-component.
* Fix compile errors from prior commits
* Support AOT-compiling components
This commit adds support for AOT-compiled components in the same manner
as `Module`, specifically adding:
* `Engine::precompile_component`
* `Component::serialize`
* `Component::deserialize`
* `Component::deserialize_file`
Internally the support for components looks quite similar to `Module`.
All the prior commits to this made adding the support here
(unsurprisingly) easy. Components are represented as a single object
file as are modules, and the functions for each module are all piled
into the same object file next to each other (as are areas such as data
sections). Support was also added here to quickly differentiate compiled
components vs compiled modules via the `e_flags` field in the ELF
header.
* Prevent serializing exported modules on components
The current representation of a module within a component means that the
implementation of `Module::serialize` will not work if the module is
exported from a component. The reason for this is that `serialize`
doesn't actually do anything and simply returns the underlying mmap as a
list of bytes. The mmap, however, has `.wasmtime.info` describing
component metadata as opposed to this module's metadata. While rewriting
this section could be implemented it's not so easy to do so and is
otherwise seen as not super important of a feature right now anyway.
* Fix windows build
* Fix an unused function warning
* Update crates/environ/src/compilation.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Refactor metadata storage in AOT artifacts
This commit is a reorganization of how metadata is stored in Wasmtime's
compiled artifacts. Currently Wasmtime's ELF artifacts have data
appended after them to contain metadata about the `Engine` as well as
type information for the module itself. This extra data at the end of
the file is ignored by ELF-related utilities generally and is assembled
during the module serialization process.
In working on AOT-compiling components, though, I've discovered a number
of issues with this:
* Primarily it's possible to mistakenly change an artifact if it's
deserialized and then serialized again. This issue is probably
theoretical but the deserialized artifact records the `Engine`
configuration at time of creation but when re-serializing that it
serializes the current `Engine` state, not the original `Engine`
state.
* Additionally the serialization strategy here is tightly coupled to
`Module` and its serialization format. While this makes sense it is
not conducive for future refactorings to use a similar serialization
format for components. The engine metadata, for example, does not
necessarily need to be tied up with type information.
* The storage for this extra metadata is a bit wonky by shoving it at
the end of the ELF file. The original reason for this was to have a
compiled artifact be multiple objects concatenated with each other to
support serializing module-linking-using modules. Module linking is no
longer a thing and I have since decided that for the component model
all compilation artifacts will go into one object file to assist
debugability. This means that the extra stick-it-at-the-end is no
longer necessary.
To solve these issues this commit splits up the
`module/serialization.rs` file in two, mostly moving the logic to
`engine/serialization.rs`. The engine serialization logic now handles
everything related to `Engine` compatibility such as targets, compiler
flags, wasm features, etc. The module serialization logic is now
exclusively interested in type information.
The engine metadata and serialized type information additionally live in
sections of the final file now instead of at the end. This means that
there are three primary `bincode`-encoded sections that are parsed on
deserializing a file:
1. The `Engine`-specific metadata. This will be the same for both
modules and components.
2. The `CompiledModuleInfo` structure. For core wasm there's just one of
these but for the component model there will be multiple, one per
core wasm module.
3. The type information. For core wasm this is a `ModuleTypes` but for a
component this will be a `ComponentTypes`.
No true functional change is expected from this commit. Binary artifacts
might get inflated by a small handful of bytes due to using ELF sections
to represent this now.
A related change I made during this commit as well was the plumbing of
the `is_branch_protection_enabled` flag. This is technically
`Engine`-level metadata but I didn't want to plumb it all over the place
as was done now, so instead a new section was added to the final binary
just for this bti information. This means that it no longer needs to be
a parameter to `CodeMemory::publish` and additionally is more amenable
to a `Component`-is-just-one-object world where no single module owns
this piece of metadata.
* Exclude some functions in a cranelift-less build
* func_wrap_async typechecks
* func call async
* instantiate_async
* fixes
* async engine creation for tests
* start adding a component model test for async
* fix wrong check for async support, factor out Instance::new_started to an unchecked impl
* tests: wibbles
* component::Linker::func_wrap: replace IntoComponentFunc with directly accepting a closure
We find that this makes the Linker::func_wrap type signature much easier
to read. The IntoComponentFunc abstraction was adding a lot of weight to
"splat" a set of arguments from a tuple of types into individual
arguments to the closure. Additionally, making the StoreContextMut
argument optional, or the Result<return> optional, wasn't very
worthwhile.
* Fixes for the new style of closure required by component::Linker::func_wrap
* future of result of return
* add Linker::instantiate_async and {Typed}Func::post_return_async
* fix fuzzing generator
* note optimisation opportunity
* simplify test
* Add egraphs option to Wasmtime config, and add it to fuzzing config generation.
This PR adds a wrapper method for Cranelift's `use_egraphs` setting to
Wasmtime's `Config`, named `cranelift_use_egraphs` analogously to its
existing `cranelift_opt_level`.
Eventually this should become a no-op as egraph-based optimization
becomes the default, but until then it makes sense to expose this as
another kind of optimization option.
This PR then adds the option to the `Arbitrary`-based config generation
for fuzzing, so compilation with egraphs will be fuzzed (on its own and
against other configurations and oracles).
* Don't use `NamedTempFile` on Windows
It looks like this prevents mmap-ing since the named temporary file
holds a `File` open which conflicts with the rights we're trying to open
the file for mmap-ing. Instead use a temporary directory to try to fix
this issue.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* component::Linker::func_wrap: replace IntoComponentFunc with directly accepting a closure
We find that this makes the Linker::func_wrap type signature much easier
to read. The IntoComponentFunc abstraction was adding a lot of weight to
"splat" a set of arguments from a tuple of types into individual
arguments to the closure. Additionally, making the StoreContextMut
argument optional, or the Result<return> optional, wasn't very
worthwhile.
* Fixes for the new style of closure required by component::Linker::func_wrap
* fix fuzzing generator
* Plumb type exports in components around more
This commit adds some more plumbing for type exports to ensure that they
show up in the final compiled representation of a component. For now
they continued to be ignored for all purposes in the embedding API
itself but I found this useful to explore in `wit-bindgen` based tooling
which is leveraging the component parsing in Wasmtime.
* Add a field to `ModuleTranslation` to store the original wasm
This commit adds a field to be able to refer back to the original wasm
binary for a `ModuleTranslation`. This field is used in the upcoming
support for host generation in `wit-component` to "decompile" a
component into core wasm modules to get instantiated. This is used to
extract a core wasm module from the original component.
* FIx a build warning
* Add a benchmark for traps with many Wasm<-->host calls on the stack
* Add a test for expected Wasm stack traces with Wasm<--host calls on the stack when we trap
* Don't re-capture backtraces when propagating traps through host frames
This fixes some accidentally quadratic code where we would re-capture a Wasm
stack trace (takes `O(n)` time) every time we propagated a trap through a host
frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of
course. Whoops. After this commit, it trapping with a call stack that is `n`
frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and
is therefore just a proper `O(n)` time operation, as it is intended to be.
Now we explicitly track whether we need to capture a Wasm backtrace or not when
raising a trap. This unfortunately isn't as straightforward as one might hope,
however, because of the split between `wasmtime::Trap` and
`wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm
backtrace inside `wasmtime_runtime` but in order to determine whether to do that
or not we need to reflect on the `anyhow::Error` and see if it is a
`wasmtime::Trap` that already has a backtrace or not. This can't be done the
straightforward way because it would introduce a cyclic dependency between the
`wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap`
types-- at least not without effectively merging the whole `wasmtime` and
`wasmtime-runtime` crates together, which would be a good idea in a perfect
world but would be a *ton* of ocean boiling from where we currently are --
because `wasmtime::Trap` does symbolication of stack traces which relies on
module registration information data that resides inside the `wasmtime` crate
and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by
adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we
should capture a Wasm backtrace or not, and then determine whether we need a
backtrace or not at each of that function's call sites, which are in `wasmtime`
and therefore can do the reflection to determine whether the user trap already
has a backtrace or not. Phew!
Fixes#5037
* debug assert that we don't record unnecessary backtraces for traps
* Add assertions around `needs_backtrace`
Unfortunately we can't do
debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some());
because `needs_backtrace` doesn't consider whether Wasm backtraces have been
disabled via config.
* Consolidate `needs_backtrace` calculation followed by calling `raise_user_trap` into one place
* allow a ComponentTypeRef::Type to point to a component TypeDef
* component matching: don't assert exported Interface type definitions are "defined"
types may be exported by their name for consumption by some component
runtimes, but in wasmtime this doesn't matter (we lift and lower to
types, not define them) so we should ignore these.
* component-model instance tests: show that an import can export a type definition
this is meaningless, but it should be accepted. (previously rejected)
* egraph-based midend: draw the rest of the owl.
* Rename `egg` submodule of cranelift-codegen to `egraph`.
* Apply some feedback from @jsharp during code walkthrough.
* Remove recursion from find_best_node by doing a single pass.
Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.
* Make elaboration non-recursive.
Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).
* Make elaboration traversal of the domtree non-recursive/stack-safe.
* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.
* Apply static recursion limit to rule application.
* Fix aarch64 wrt dynamic-vector support -- broken rebase.
* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!
* Fix multi-result call testcase.
* Include `cranelift-egraph` in `PUBLISHED_CRATES`.
* Fix atomic_rmw: not really a load.
* Remove now-unnecessary PartialOrd/Ord derivations.
* Address some code-review comments.
* Review feedback.
* Review feedback.
* No overlap in mid-end rules, because we are defining a multi-constructor.
* rustfmt
* Review feedback.
* Review feedback.
* Review feedback.
* Review feedback.
* Remove redundant `mut`.
* Add comment noting what rules can do.
* Review feedback.
* Clarify comment wording.
* Update `has_memory_fence_semantics`.
* Apply @jameysharp's improved loop-level computation.
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Fix suggestion commit.
* Fix off-by-one in new loop-nest analysis.
* Review feedback.
* Review feedback.
* Review feedback.
* Use `Default`, not `std::default::Default`, as per @fitzgen
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Apply @fitzgen's comment elaboration to a doc-comment.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add stat for hitting the rewrite-depth limit.
* Some code motion in split prelude to make the diff a little clearer wrt `main`.
* Take @jameysharp's suggested `try_into()` usage for blockparam indices.
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Take @jameysharp's suggestion to avoid double-match on load op.
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Fix suggestion (add import).
* Review feedback.
* Fix stack_load handling.
* Remove redundant can_store case.
* Take @jameysharp's suggested improvement to FuncEGraph::build() logic
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Tweaks to FuncEGraph::build() on top of suggestion.
* Take @jameysharp's suggested clarified condition
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Clean up after suggestion (unused variable).
* Fix loop analysis.
* loop level asserts
* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.
* Take @jameysharp's suggestion re: result_tys
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Fix up after suggestion
* Take @jameysharp's suggestion to use fold rather than reduce
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Fixup after suggestion
* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.
* Clarifying comment in terminator insts.
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Update wasm-tools dependencies
This update brings in a number of features such as:
* The component model binary format and AST has been slightly adjusted
in a few locations. Names are dropped from parameters/results now in
the internal representation since they were not used anyway. At this
time the ability to bind a multi-return function has not been exposed.
* The `wasmparser` validator pass will now share allocations with prior
functions, providing what's probably a very minor speedup for Wasmtime
itself.
* The text format for many component-related tests now requires named
parameters.
* Some new relaxed-simd instructions are updated to be ignored.
I hope to have a follow-up to expose the multi-return ability to the
embedding API of components.
* Update audit information for new crates
* Leverage Cargo's workspace inheritance feature
This commit is an attempt to reduce the complexity of the Cargo
manifests in this repository with Cargo's workspace-inheritance feature
becoming stable in Rust 1.64.0. This feature allows specifying fields in
the root workspace `Cargo.toml` which are then reused throughout the
workspace. For example this PR shares definitions such as:
* All of the Wasmtime-family of crates now use `version.workspace =
true` to have a single location which defines the version number.
* All crates use `edition.workspace = true` to have one default edition
for the entire workspace.
* Common dependencies are listed in `[workspace.dependencies]` to avoid
typing the same version number in a lot of different places (e.g. the
`wasmparser = "0.89.0"` is now in just one spot.
Currently the workspace-inheritance feature doesn't allow having two
different versions to inherit, so all of the Cranelift-family of crates
still manually specify their version. The inter-crate dependencies,
however, are shared amongst the root workspace.
This feature can be seen as a method of "preprocessing" of sorts for
Cargo manifests. This will help us develop Wasmtime but shouldn't have
any actual impact on the published artifacts -- everything's dependency
lists are still the same.
* Fix wasi-crypto tests
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
* components: Limit the recursive size of types in Wasmtime
This commit is aimed at fixing #4814 by placing a hard limit on the
maximal recursive depth a type may have in the component model. The
component model theoretically allows for infinite recursion but many
various types of operations within the component model are naturally
written as recursion over the structure of a type which can lead to
stack overflow with deeply recursive types. Some examples of recursive
operations are:
* Lifting and lowering a type - currently the recursion here is modeled
in Rust directly with `#[derive]` implementations as well as the
implementations for the `Val` type.
* Compilation of adapter trampolines which iterates over the type
structure recursively.
* Historically many various calculations like the size of a type, the
flattened representation of a type, etc, were all done recursively.
Many of these are more efficiently done via other means but it was
still natural to implement these recursively initially.
By placing a hard limit on type recursion Wasmtime won't be able to load
some otherwise-valid modules. The hope, though, is that no human-written
program is likely to ever reach this limit. This limit can be revised
and/or the locations with recursion revised if it's ever reached.
The implementation of this feature is done by generalizing the current
flattened-representation calculation which now keeps track of a type's
depth and size. The size calculation isn't used just yet but I plan to
use it in fixing #4816 and it was natural enough to write here as well.
The depth is checked after a type is translated and if it exceeds the
maximum then an error is returned.
Additionally the `Arbitrary for Type` implementation was updated to
prevent generation of a type that's too-recursive.
Closes#4814
* Remove unused size calculation
* Bump up just under the limit
We were previously implicitly assuming that all Wasm frames in a stack used the
same `VMRuntimeLimits` as the previous frame we walked, but this is not true
when Wasm in store A calls into the host which then calls into Wasm in store B:
| ... |
| Host | |
+-----------------+ | stack
| Wasm in store A | | grows
+-----------------+ | down
| Host | |
+-----------------+ |
| Wasm in store B | V
+-----------------+
Trying to walk this stack would previously result in a runtime panic.
The solution is to push the maintenance of our list of saved Wasm FP/SP/PC
registers that allow us to identify contiguous regions of Wasm frames on the
stack deeper into `CallThreadState`. The saved registers list is now maintained
whenever updating the `CallThreadState` linked list by making the
`CallThreadState::prev` field private and only accessible via a getter and
setter, where the setter always maintains our invariants.
* Upgrade wasm-tools crates, namely the component model
This commit pulls in the latest versions of all of the `wasm-tools`
family of crates. There were two major changes that happened in
`wasm-tools` in the meantime:
* bytecodealliance/wasm-tools#697 - this commit introduced a new API for
more efficiently reading binary operators from a wasm binary. The old
`Operator`-based reading was left in place, however, and continues to
be what Wasmtime uses. I hope to update Wasmtime in a future PR to use
this new API, but for now the biggest change is...
* bytecodealliance/wasm-tools#703 - this commit was a major update to
the component model AST. This commit almost entirely deals with the
fallout of this change.
The changes made to the component model were:
1. The `unit` type no longer exists. This was generally a simple change
where the `Unit` case in a few different locations were all removed.
2. The `expected` type was renamed to `result`. This similarly was
relatively lightweight and mostly just a renaming on the surface. I
took this opportunity to rename `val::Result` to `val::ResultVal` and
`types::Result` to `types::ResultType` to avoid clashing with the
standard library types. The `Option`-based types were handled with
this as well.
3. The payload type of `variant` and `result` types are now optional.
This affected many locations that calculate flat type
representations, ABI information, etc. The `#[derive(ComponentType)]`
macro now specifically handles Rust-defined `enum` types which have
no payload to the equivalent in the component model.
4. Functions can now return multiple parameters. This changed the
signature of invoking component functions because the return value is
now bound by `ComponentNamedList` (renamed from `ComponentParams`).
This had a large effect in the tests, fuzz test case generation, etc.
5. Function types with 2-or-more parameters/results must uniquely name
all parameters/results. This mostly affected the text format used
throughout the tests.
I haven't added specifically new tests for multi-return but I changed a
number of tests to use it. Additionally I've updated the fuzzers to all
exercise multi-return as well so I think we should get some good
coverage with that.
* Update version numbers
* Use crates.io
* Fix a compile error on nightly Rust
It looks like Rust nightly has gotten a bit more strict about
attributes-on-expressions and previously accepted code is no longer
accepted. This commit updates the generated code for a macro to a form
which is accepted by rustc.
* Fix a soundness issue with lowering variants
This commit fixes a soundness issue lowering variants in the component
model where host memory could be leaked to the guest module by accident.
In reviewing code recently for `Val::lower` I noticed that the variant
lowering was extending the payload with `ValRaw::u32(0)` to
appropriately fit the size of the variant. In reading this it appeared
incorrect to me due to the fact that it should be `ValRaw::u64(0)` since
up to 64-bits can be read. Additionally this implementation was also
incorrect because the lowered representation of the payload itself was
not possibly zero-extended to 64-bits to accommodate other variants.
It turned out these issues were benign because with the dynamic
surface area to the component model the arguments were all initialized
to 0 anyway. The static version of the API, however, does not initialize
arguments to 0 and I wanted to initially align these two implementations
so I updated the variant implementation of lowering for dynamic values
and removed the zero-ing of arguments.
To test this change I updated the `debug` mode of adapter module
generation to assert that the upper bits of values in wasm are always
zero when the value is casted down (during `stack_get` which only
happens with variants). I then threaded through the `debug` boolean
configuration parameter into the dynamic and static fuzzers.
To my surprise this new assertion tripped even after the fix was
applied. It turns out, though, that there was other leakage of bits
through other means that I was previously unaware of. At the primitive
level lowerings of types like `u32` will have a `Lower` representation
of `ValRaw` and the lowering is simply `dst.write(ValRaw::i32(self))`,
or the equivalent thereof. The problem, that the fuzzers detected, with
this pattern is that the `ValRaw` type is 16-bytes, and
`ValRaw::i32(X)` only initializes the first 4. This meant that all the
lowerings for all primitives were writing up to 12 bytes of garbage from
the host for the wasm module to read.
It turned out that this write of a `ValRaw` was sometimes 16 bytes and
sometimes the appropriate size depending on the number of optimizations
in play. With enough inlining for example `dst.write(ValRaw::i32(self))`
would only write 4 bytes, as expected. In debug mode though without
inlining 16 bytes would be written, including the garbage from the upper
bits.
To solve this issue I ended up taking a somewhat different approach. I
primarily updated the `ValRaw` constructors to simply always extend the
values internally to 64-bits, meaning that the low 8 bytes of a `ValRaw`
is always initialized. This prevents any undefined data from leaking
from the host into a wasm module, and means that values are also
zero-extended even if they're only used in 32-bit contexts outside of a
variant. This felt like the best fix for now, though, in terms of
not really having a performance impact while additionally not requiring
a rewrite of all lowerings.
This solution ended up also neatly removing the "zero out the entire
payload" logic that was previously require. Now after a payload is
lowered only the tail end of the payload, up to the size of the variant,
is zeroed out. This means that each lowered argument is written to at
most once which should hopefully be a small performance boost for
calling into functions as well.
It looks like Rust nightly has gotten a bit more strict about
attributes-on-expressions and previously accepted code is no longer
accepted. This commit updates the generated code for a macro to a form
which is accepted by rustc.
* Optimize flat type representation calculations
Previously calculating the flat type representation would be done
recursively for an entire type tree every time it was visited.
Additionally the flat type representation was entirely built only to be
thrown away if it was too large at the end. This chiefly presented a
source of recursion based on the type structure in the component model
which fuzzing does not like as it reports stack overflows.
This commit overhauls the representation of flat types in Wasmtime by
caching the representation for each type in the compile-time
`ComponentTypesBuilder` structure. This avoids recalculating each time
the flat representation is queried and additionally allows opportunity
to have more short-circuiting to avoid building overly-large vectors.
* Remove duplicate flat count calculation in wasmtime
Roughly share the infrastructure in the `wasmtime-environ` crate, namely
the non-recursive and memoizing nature of the calculation.
* Fix component fuzz build
* Fix example compile
A `GtU` condition needed to actually be `GeU`, as the comment right
above it stated but apparently I forgot to translate the comment to
actual code. This fixes a fuzz bug that arose from oss-fuzz over the
weekend.
* Rename `MmapVec::drain` to `split_off`
As suggested on #4609
* Fix tests
* Make MmapVec::split_off work like Vec::split_off
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
This commit fixes#4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:
* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
fails.
This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.
The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.
The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during #4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.
Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why #4571 was the
"source" of the regression.
The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
This commit is an effort to reduce the amount of complexity around
managing the size/alignment calculations of types in the canonical ABI.
Previously the logic for the size/alignment of a type was spread out
across a number of locations. While each individual calculation is not
really the most complicated thing in the world having the duplication in
so many places was constantly worrying me.
I've opted in this commit to centralize all of this within the runtime
at least, and now there's only one "duplicate" of this information in
the fuzzing infrastructure which is to some degree less important to
deduplicate. This commit introduces a new `CanonicalAbiInfo` type to
house all abi size/align information for both memory32 and memory64.
This new type is then used pervasively throughout fused adapter
compilation, dynamic `Val` management, and typed functions. This type
was also able to reduce the complexity of the macro-generated code
meaning that even `wasmtime-component-macro` is performing less math
than it was before.
One other major feature of this commit is that this ABI information is
now saved within a `ComponentTypes` structure. This avoids recursive
querying of size/align information frequently and instead effectively
caching it. This was a worry I had for the fused adapter compiler which
frequently sought out size/align information and would recursively
descend each type tree each time. The `fact-valid-module` fuzzer is now
nearly 10x faster in terms of iterations/s which I suspect is due to
this caching.
The spec was expected to change to not bounds-check 0-byte lists/strings
but has since been updated to match `memory.copy` which does indeed
check the pointer for 0-byte copies.
This method configures whether native unwind information (e.g. `.eh_frame` on
Linux) is generated or not.
This helps integrate with third-party stack capturing tools, such as the system
unwinder or the `backtrace` crate. It does not affect whether Wasmtime can
capture stack traces in Wasm code that it is running or not.
Unwind info is always enabled on Windows, since the Windows ABI requires it.
This configuration option defaults to true.
Additionally, we deprecate `Config::wasm_backtrace` since we can always cheaply
capture stack traces ever since
https://github.com/bytecodealliance/wasmtime/pull/4431.
Fixes https://github.com/bytecodealliance/wasmtime/issues/4554
* Improve the `component_api` fuzzer on a few dimensions
* Update the generated component to use an adapter module. This involves
two core wasm instances communicating with each other to test that
data flows through everything correctly. The intention here is to fuzz
the fused adapter compiler. String encoding options have been plumbed
here to exercise differences in string encodings.
* Use `Cow<'static, ...>` and `static` declarations for each static test
case to try to cut down on rustc codegen time.
* Add `Copy` to derivation of fuzzed enums to make `derive(Clone)`
smaller.
* Use `Store<Box<dyn Any>>` to try to cut down on codegen by
monomorphizing fewer `Store<T>` implementation.
* Add debug logging to print out what's flowing in and what's flowing
out for debugging failures.
* Improve `Debug` representation of dynamic value types to more closely
match their Rust counterparts.
* Fix a variant issue with adapter trampolines
Previously the offset of the payload was calculated as the discriminant
aligned up to the alignment of a singular case, but instead this needs
to be aligned up to the alignment of all cases to ensure all cases start
at the same location.
* Fix a copy/paste error when copying masked integers
A 32-bit load was actually doing a 16-bit load by accident since it was
copied from the 16-bit load-and-mask case.
* Fix f32/i64 conversions in adapter modules
The adapter previously erroneously converted the f32 to f64 and then to
i64, where instead it should go from f32 to i32 to i64.
* Fix zero-sized flags in adapter modules
This commit corrects the size calculation for zero-sized flags in
adapter modules.
cc #4592
* Fix a variant size calculation bug in adapters
This fixes the same issue found with variants during normal host-side
fuzzing earlier where the size of a variant needs to align up the
summation of the discriminant and the maximum case size.
* Implement memory growth in libc bump realloc
Some fuzz-generated test cases are copying lists large enough to exceed
one page of memory so bake in a `memory.grow` to the bump allocator as
well.
* Avoid adapters of exponential size
This commit is an attempt to avoid adapters being exponentially sized
with respect to the type hierarchy of the input. Previously all
adaptation was done inline within each adapter which meant that if
something was structured as `tuple<T, T, T, T, ...>` the translation of
`T` would be inlined N times. For very deeply nested types this can
quickly create an exponentially sized adapter with types of the form:
(type $t0 (list u8))
(type $t1 (tuple $t0 $t0))
(type $t2 (tuple $t1 $t1))
(type $t3 (tuple $t2 $t2))
;; ...
where the translation of `t4` has 8 different copies of translating
`t0`.
This commit changes the translation of types through memory to almost
always go through a helper function. The hope here is that it doesn't
lose too much performance because types already reside in memory.
This can still lead to exponentially sized adapter modules to a lesser
degree where if the translation all happens on the "stack", e.g. via
`variant`s and their flat representation then many copies of one
translation could still be made. For now this commit at least gets the
problem under control for fuzzing where fuzzing doesn't trivially find
type hierarchies that take over a minute to codegen the adapter module.
One of the main tricky parts of this implementation is that when a
function is generated the index that it will be placed at in the final
module is not known at that time. To solve this the encoded form of the
`Call` instruction is saved in a relocation-style format where the
`Call` isn't encoded but instead saved into a different area for
encoding later. When the entire adapter module is encoded to wasm these
pseudo-`Call` instructions are encoded as real instructions at that
time.
* Fix some memory64 issues with string encodings
Introduced just before #4623 I had a few mistakes related to 64-bit
memories and mixing 32/64-bit memories.
* Actually insert into the `translate_mem_funcs` map
This... was the whole point of having the map!
* Assert memory growth succeeds in bump allocator
* Implement strings in adapter modules
This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.
The `CanonicalABI.md` in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:
* Fused adapters are first identified to determine src/dst string
encodings. This statically fixes what transcoding operation is being
performed.
* The generated adapter will be responsible for managing calls to
`realloc` and performing bounds checks. The adapter itself does not
perform memory copies or validation of string contents, however.
Instead each transcoding operation is modeled as an imported function
into the adapter module. This means that the adapter module
dynamically, during compile time, determines what string transcoders
are needed. Note that an imported transcoder is not only parameterized
over the transcoding operation but additionally which memory is the
source and which is the destination.
* The imported core wasm functions are modeled as a new
`CoreDef::Transcoder` structure. These transcoders end up being small
Cranelift-compiled trampolines. The Cranelift-compiled trampoline will
load the actual base pointer of memory and add it to the relative
pointers passed as function arguments. This trampoline then calls a
transcoder "libcall" which enters Rust-defined functions for actual
transcoding operations.
* Each possible transcoding operation is implemented in Rust with a
unique name and a unique signature depending on the needs of the
transcoder. I've tried to document inline what each transcoder does.
This means that the `Module::translate_string` in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
`realloc`-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.
Additionally in this PR is a full implementation in the host for the
`latin1+utf16` encoding which means that both lifting and lowering host
strings now works with this encoding.
Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the `encoding_rs` crate. I
initially tried to implement everything with `encoding_rs` but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.
Testing this commit has been somewhat difficult and my general plan,
like with the `(list T)` type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.
* Fix a temporarily commented out case
* Fix wasmtime-runtime tests
* Update deny.toml configuration
* Add `BSD-3-Clause` for the `encoding_rs` crate
* Remove some unused licenses
* Add an exemption for `encoding_rs` for now
* Split up the `translate_string` method
Move out all the closures and package up captured state into smaller
lists of arguments.
* Test out-of-bounds for zero-length strings
* Add a dataflow-based representation of components
This commit updates the inlining phase of compiling a component to
creating a dataflow-based representation of a component instead of
creating a final `Component` with a linear list of initializers. This
dataflow graph is then linearized in a final step to create the actual
final `Component`.
The motivation for this commit stems primarily from my work implementing
strings in fused adapters. In doing this my plan is to defer most
low-level transcoding to the host itself rather than implementing that
in the core wasm adapter modules. This means that small
cranelift-generated trampolines will be used for adapter modules to call
which then call "transcoding libcalls". The cranelift-generated
trampolines will get raw pointers into linear memory and pass those to
the libcall which core wasm doesn't have access to when passing
arguments to an import.
Implementing this with the previous representation of a `Component` was
becoming too tricky to bear. The initialization of a transcoder needed
to happen at just the right time: before the adapter module which needed
it was instantiated but after the linear memories referenced had been
extracted into the `VMComponentContext`. The difficulty here is further
compounded by the current adapter module injection pass already being
quite complicated. Adapter modules are already renumbering the index
space of runtime instances and shuffling items around in the
`GlobalInitializer` list. Perhaps the worst part of this was that
memories could already be referenced by host function imports or exports
to the host, and if adapters referenced the same memory it shouldn't be
referenced twice in the component. This meant that `ExtractMemory`
initializers ideally needed to be shuffled around in the initializer
list to happen as early as possible instead of wherever they happened to
show up during translation.
Overall I did my best to implement the transcoders but everything always
came up short. I have decided to throw my hands up in the air and try a
completely different approach to this, namely the dataflow-based
representation in this commit. This makes it much easier to edit the
component after initial translation for injection of adapters, injection
of transcoders, adding dependencies on possibly-already-existing items,
etc. The adapter module partitioning pass in this commit was greatly
simplified to something which I believe is functionally equivalent but
is probably an order of magnitude easier to understand.
The biggest downside of this representation I believe is having a
duplicate representation of a component. The `component::info` was
largely duplicated into the `component::dfg` module in this commit.
Personally though I think this is a more appropriate tradeoff than
before because it's very easy to reason about "convert representation A
to B" code whereas it was very difficult to reason about shuffling
around `GlobalInitializer` items in optimal fashions. This may also have
a cost at compile-time in terms of shuffling data around, but my hope is
that we have lots of other low-hanging fruit to optimize if it ever
comes to that which allows keeping this easier-to-understand
representation.
Finally, to reiterate, the final representation of components is not
changed by this PR. To the runtime internals everything is still the
same.
* Fix compile of factc
This addresses #4307.
For the static API we generate 100 arbitrary test cases at build time, each of
which includes 0-5 parameter types, a result type, and a WAT fragment containing
an imported function and an exported function. The exported function calls the
imported function, which is implemented by the host. At runtime, the fuzz test
selects a test case at random and feeds it zero or more sets of arbitrary
parameters and results, checking that values which flow host-to-guest and
guest-to-host make the transition unchanged.
The fuzz test for the dynamic API follows a similar pattern, the only difference
being that test cases are generated at runtime.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.
Copyright (c) 2021, Arm Limited.