Add support for `poll_oneoff` calls which just sleep on a relative
timeout. This fixes a bug handling code compiled with WASI libc's `sleep`
family of functions, which call `poll_oneoff` with a `CLOCK_REALTIME`
timer, which wasn't previously implemented.
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).
Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
* Add assert to `StackPool::deallocate` to ensure the fiber stack given to it
comes from the pool.
* Remove outdated comment about windows and stacks as the allocator now returns
fiber stacks.
* Remove conditional compilation around `stack_size` in the allocators as it
was just clutter.
This commit splits out a `FiberStack` from `Fiber`, allowing the instance
allocator trait to return `FiberStack` rather than raw stack pointers. This
keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand
instance allocator can make use of it.
The instance allocators no longer have to return a "not supported" error to
indicate that the store should allocate its own fiber stack.
This includes a bunch of cleanup in the instance allocator to scope stacks to
the new "async" feature in the runtime.
Closes#2708.
* Switch macOS to using mach ports for trap handling
This commit moves macOS to using mach ports instead of signals for
handling traps. The motivation for this is listed in #2456, namely that
once mach ports are used in a process that means traditional UNIX signal
handlers won't get used. This means that if Wasmtime is integrated with
Breakpad, for example, then Wasmtime's trap handler never fires and
traps don't work.
The `traphandlers` module is refactored as part of this commit to split
the platform-specific bits into their own files (it was growing quite a
lot for one inline `cfg_if!`). The `unix.rs` and `windows.rs` files
remain the same as they were before with a few minor tweaks for some
refactored interfaces. The `macos.rs` file is brand new and lifts almost
its entire implementation from SpiderMonkey, adapted for Wasmtime
though.
The main gotcha with mach ports is that a separate thread is what
services the exception. Some unsafe magic allows this separate thread to
read non-`Send` and temporary state from other threads, but is hoped to
be safe in this context. The unfortunate downside is that calling wasm
on macOS now involves taking a global lock and modifying a global hash
map twice-per-call. I'm not entirely sure how to get out of this cost
for now, but hopefully for any embeddings on macOS it's not the end of
the world.
Closes#2456
* Add a sketch of arm64 apple support
* store: maintain CallThreadState mapping when switching fibers
* cranelift/aarch64: generate unwind directives to disable pointer auth
Aarch64 post ARMv8.3 has a feature called pointer authentication,
designed to fight ROP/JOP attacks: some pointers may be signed using new
instructions, adding payloads to the high (previously unused) bits of
the pointers. More on this here: https://lwn.net/Articles/718888/
Unwinders on aarch64 need to know if some pointers contained on the call
frame contain an authentication code or not, to be able to properly
authenticate them or use them directly. Since native code may have
enabled it by default (as is the case on the Mac M1), and the default is
that this configuration value is inherited, we need to explicitly
disable it, for the only kind of supported pointers (return addresses).
To do so, we set the value of a non-existing dwarf pseudo register (34)
to 0, as documented in
https://github.com/ARM-software/abi-aa/blob/master/aadwarf64/aadwarf64.rst#note-8.
This is done at the function granularity, in the spirit of Cranelift
compilation model. Alternatively, a single directive could be generated
in the CIE, generating less information per module.
* Make exception handling work on Mac aarch64 too
* fibers: use a breakpoint instruction after the final call in wasmtime_fiber_start
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Redo the statically typed `Func` API
This commit reimplements the `Func` API with respect to statically typed
dispatch. Previously `Func` had a `getN` and `getN_async` family of
methods which were implemented for 0 to 16 parameters. The return value
of these functions was an `impl Fn(..)` closure with the appropriate
parameters and return values.
There are a number of downsides with this approach that have become
apparent over time:
* The addition of `*_async` doubled the API surface area (which is quite
large here due to one-method-per-number-of-parameters).
* The [documentation of `Func`][old-docs] are quite verbose and feel
"polluted" with all these getters, making it harder to understand the
other methods that can be used to interact with a `Func`.
* These methods unconditionally pay the cost of returning an owned `impl
Fn` with a `'static` lifetime. While cheap, this is still paying the
cost for cloning the `Store` effectively and moving data into the
closed-over environment.
* Storage of the return value into a struct, for example, always
requires `Box`-ing the returned closure since it otherwise cannot be
named.
* Recently I had the desire to implement an "unchecked" path for
invoking wasm where you unsafely assert the type signature of a wasm
function. Doing this with today's scheme would require doubling
(again) the API surface area for both async and synchronous calls,
further polluting the documentation.
The main benefit of the previous scheme is that by returning a `impl Fn`
it was quite easy and ergonomic to actually invoke the function. In
practice, though, examples would often have something akin to
`.get0::<()>()?()?` which is a lot of things to interpret all at once.
Note that `get0` means "0 parameters" yet a type parameter is passed.
There's also a double function invocation which looks like a lot of
characters all lined up in a row.
Overall, I think that the previous design is starting to show too many
cracks and deserves a rewrite. This commit is that rewrite.
The new design in this commit is to delete the `getN{,_async}` family of
functions and instead have a new API:
impl Func {
fn typed<P, R>(&self) -> Result<&Typed<P, R>>;
}
impl Typed<P, R> {
fn call(&self, params: P) -> Result<R, Trap>;
async fn call_async(&self, params: P) -> Result<R, Trap>;
}
This should entirely replace the current scheme, albeit by slightly
losing ergonomics use cases. The idea behind the API is that the
existence of `Typed<P, R>` is a "proof" that the underlying function
takes `P` and returns `R`. The `Func::typed` method peforms a runtime
type-check to ensure that types all match up, and if successful you get
a `Typed` value. Otherwise an error is returned.
Once you have a `Typed` then, like `Func`, you can either `call` or
`call_async`. The difference with a `Typed`, however, is that the
params/results are statically known and hence these calls can be much
more efficient.
This is a much smaller API surface area from before and should greatly
simplify the `Func` documentation. There's still a problem where
`Func::wrapN_async` produces a lot of functions to document, but that's
now the sole offender. It's a nice benefit that the
statically-typed-async verisons are now expressed with an `async`
function rather than a function-returning-a-future which makes it both
more efficient and easier to understand.
The type `P` and `R` are intended to either be bare types (e.g. `i32`)
or tuples of any length (including 0). At this time `R` is only allowed
to be `()` or a bare `i32`-style type because multi-value is not
supported with a native ABI (yet). The `P`, however, can be any size of
tuples of parameters. This is also where some ergonomics are lost
because instead of `f(1, 2)` you now have to write `f.call((1, 2))`
(note the double-parens). Similarly `f()` becomes `f.call(())`.
Overall I feel that this is a better tradeoff than before. While not
universally better due to the loss in ergonomics I feel that this design
is much more flexible in terms of what you can do with the return value
and also understanding the API surface area (just less to take in).
[old-docs]: https://docs.rs/wasmtime/0.24.0/wasmtime/struct.Func.html#method.get0
* Rename Typed to TypedFunc
* Implement multi-value returns through `Func::typed`
* Fix examples in docs
* Fix some more errors
* More test fixes
* Rebasing and adding `get_typed_func`
* Updating tests
* Fix typo
* More doc tweaks
* Tweak visibility on `Func::invoke`
* Fix tests again
This commit fixes a few issues around managing the thread-local state of
a wasmtime thread. We intentionally only have a singular TLS variable in
the whole world, and the problem is that when stack-switching off an
async thread we were not restoring the previous TLS state. This is
necessary in two cases:
* Futures aren't guaranteed to be polled/completed in a stack-like
fashion. If a poll sees that a future isn't ready then we may resume
execution in a previous wasm context that ends up needing the TLS
information.
* Futures can also cross threads (when the whole store crosses threads)
and we need to save/restore TLS state from the thread we're coming
from and the thread that we're going to.
The stack switching issue necessitates some more glue around suspension
and resumption of a stack to ensure we save/restore the TLS state on
both sides. The thread issue, however, also necessitates that we use
`#[inline(never)]` on TLS access functions and never have TLS borrows
live across a function which could result in running arbitrary code (as
was the case for the `tls::set` function.
* Implement defining host functions at the Config level.
This commit introduces defining host functions at the `Config` rather than with
`Func` tied to a `Store`.
The intention here is to enable a host to define all of the functions once
with a `Config` and then use a `Linker` (or directly with
`Store::get_host_func`) to use the functions when instantiating a module.
This should help improve the performance of use cases where a `Store` is
short-lived and redefining the functions at every module instantiation is a
noticeable performance hit.
This commit adds `add_to_config` to the code generation for Wasmtime's `Wasi`
type.
The new method adds the WASI functions to the given config as host functions.
This commit adds context functions to `Store`: `get` to get a context of a
particular type and `set` to set the context on the store.
For safety, `set` cannot replace an existing context value of the same type.
`Wasi::set_context` was added to set the WASI context for a `Store` when using
`Wasi::add_to_config`.
* Add `Config::define_host_func_async`.
* Make config "async" rather than store.
This commit moves the concept of "async-ness" to `Config` rather than `Store`.
Note: this is a breaking API change for anyone that's already adopted the new
async support in Wasmtime.
Now `Config::new_async` is used to create an "async" config and any `Store`
associated with that config is inherently "async".
This is needed for async shared host functions to have some sanity check during their
execution (async host functions, like "async" `Func`, need to be called with
the "async" variants).
* Update async function tests to smoke async shared host functions.
This commit updates the async function tests to also smoke the shared host
functions, plus `Func::wrap0_async`.
This also changes the "wrap async" method names on `Config` to
`wrap$N_host_func_async` to slightly better match what is on `Func`.
* Move the instance allocator into `Engine`.
This commit moves the instantiated instance allocator from `Config` into
`Engine`.
This makes certain settings in `Config` no longer order-dependent, which is how
`Config` should ideally be.
This also removes the confusing concept of the "default" instance allocator,
instead opting to construct the on-demand instance allocator when needed.
This does alter the semantics of the instance allocator as now each `Engine`
gets its own instance allocator rather than sharing a single one between all
engines created from a configuration.
* Make `Engine::new` return `Result`.
This is a breaking API change for anyone using `Engine::new`.
As creating the pooling instance allocator may fail (likely cause is not enough
memory for the provided limits), instead of panicking when creating an
`Engine`, `Engine::new` now returns a `Result`.
* Remove `Config::new_async`.
This commit removes `Config::new_async` in favor of treating "async support" as
any other setting on `Config`.
The setting is `Config::async_support`.
* Remove order dependency when defining async host functions in `Config`.
This commit removes the order dependency where async support must be enabled on
the `Config` prior to defining async host functions.
The check is now delayed to when an `Engine` is created from the config.
* Update WASI example to use shared `Wasi::add_to_config`.
This commit updates the WASI example to use `Wasi::add_to_config`.
As only a single store and instance are used in the example, it has no semantic
difference from the previous example, but the intention is to steer users
towards defining WASI on the config and only using `Wasi::add_to_linker` when
more explicit scoping of the WASI context is required.
This change makes the storage of `Table` more internally consistent.
Elements are stored as raw pointers for both static and dynamic table storage.
Explicitly storing elements as pointers removes assumptions being made by the
pooling allocator in terms of the size and default representation of the
elements.
However, care must be made to properly clone externrefs for table operations.
* Add more overflow checks in table/memory initialization.
* Comment for `with_allocation_strategy` to explain ignored `Config` options.
* Fix Wasmtime `Table` to not panic for type mismatches in `fill`/`copy`.
* Add tests for that fix.
* More use of `anyhow`.
* Change `make_accessible` into `protect_linear_memory` to better demonstrate
what it is used for; this will make the uffd implementation make a little
more sense.
* Remove `create_memory_map` in favor of just creating the `Mmap` instances in
the pooling allocator. This also removes the need for `MAP_NORESERVE` in the
uffd implementation.
* Moar comments.
* Remove `BasePointerIterator` in favor of `impl Iterator`.
* The uffd implementation now only monitors linear memory pages and will only
receive faults on pages that could potentially be accessible and never on a
statically known guard page.
* Stop allocating memory or table pools if the maximum limit of the memory or
table is 0.
* Add `anyhow` dependency to `wasmtime-runtime`.
* Revert `get_data` back to `fn`.
* Remove `DataInitializer` and box the data in `Module` translation instead.
* Improve comments on `MemoryInitialization`.
* Remove `MemoryInitialization::OutOfBounds` in favor of proper bulk memory
semantics.
* Use segmented memory initialization except for when the uffd feature is
enabled on Linux.
* Validate modules with the allocator after translation.
* Updated various functions in the runtime to return `anyhow::Result`.
* Use a slice when copying pages instead of `ptr::copy_nonoverlapping`.
* Remove unnecessary casts in `OnDemandAllocator::deallocate`.
* Better document the `uffd` feature.
* Use WebAssembly page-sized pages in the paged initialization.
* Remove the stack pool from the uffd handler and simply protect just the guard
pages.
Last minute code clean up to fix some comments and rename `address_space_size`
to `memory_reservation_size` to better describe what the option is doing.
This was originally written to support sourcing the table and memory
definitions differently for the pooling allocator.
However, both allocators do the exact same thing, so the closure arguments are
no longer necessary.
Additionally, this cleans up the code a bit to pass in the allocation request
rather than having individual parameters.
This commit implements copying paged initialization data upon a fault of a
linear memory page.
If the initialization data is "paged", then the appropriate pages are copied
into the Wasm page (or zeroed if the page is not present in the
initialization data).
If the initialization data is not "paged", the Wasm page is zeroed so that
module instantiation can initialize the pages.
As Windows uses the native fiber implementation, the stack tests should be
ignored on Windows as the implementation intentionally errors when handing out
stacks.
This commit implements the `uffd` feature which turns on support for utilizing
the `userfaultfd` system call on Linux for the pooling instance allocator.
By handling page faults in userland, we are able to detect guard page accesses
without having to constantly change memory page protections.
This should help reduce the number of syscalls as well as kernel lock
contentions when many threads are allocating and deallocating instances.
Additionally, the user fault handler can lazy initialize linear
memories of an instance (implementation to come).
This commit implements the pooling instance allocator.
The allocation strategy can be set with `Config::with_allocation_strategy`.
The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.
The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time and to reuse
committed memory pages wherever possible.
This commit implements allocating fiber stacks in an instance allocator.
The on-demand instance allocator doesn't support custom stacks, so the
implementation will use the allocation from `wasmtime-fiber` for the fiber
stacks.
In the future, the pooling instance allocator will return custom stacks to use
on Linux and macOS.
On Windows, the native fiber implementation will always be used.
With the change to artificially limit unbounded memories based on Tunables,
it's possible to hit the assert where the minimum might exceed the static
memory bound.
This commit removes the assert in favor of a check to see if the minimum also
fits within the static memory bound. It also corrects the maximum bounding to
ensure the minimum between the memory's maximum and the configured maximum is
used.
If it does not fit, the memory will be treated as dynamic. In the case of the
pooling instance allocator, the bounds will be checked again during translation
and an appropriate error will be returned as dynamic memories are not supported
for that allocator.
Handles created with `create_handle` need to be deallocated with the default
(on-demand) instance allocator.
This commit changes Store such that handles can be added with a flag that is
used to force deallocation via the default instance allocator when the Store is
dropped.
This commit changes `Instance` such that memories can be stored statically,
with just a base pointer, size, maximum, and a callback to make memory
accessible.
Previously the memories were being stored as boxed trait objects, which would
require the pooling allocator to do some unpleasant things to avoid
allocations.
With this change, the pooling allocator can simply define a memory for the
instance without using a trait object.
This commit changes how memories and tables are stored in `Instance`.
Previously, the memories and tables were stored as a `BoxedSlice`. Storing it
this way requires an allocation to change the length of the memories and
tables, which is desirable for a pooling instance allocator that is reusing an
`Instance` structure for a new instantiation.
By storing it instead as `PrimaryMap`, the memories and tables can be resized
without any allocations (the capacity of these maps will always be the
configured limits of the pooling allocator).
This commit refactors `Table` in the runtime such that it can be created from a
pointer to existing table data.
The current `Vec` backing of the `Table` is considered to be "dynamic" storage.
This will be used for the upcoming pooling allocator where table memory is
managed externally to the instance.
The `table.copy` implementation was improved to use slice primitives for doing
the copying.
Fixes#983.
This commit introduces two new methods on `InstanceAllocator`:
* `validate_module` - this method is used to validate a module after
translation but before compilation. It will be used for the upcoming pooling
allocator to ensure a module being compiled adheres to the limits of the
allocator.
* `adjust_tunables` - this method is used to adjust the `Tunables` given the
JIT compiler. The pooling allocator will use this to force all memories to
be static during compilation.
This commit refactors module instantiation in the runtime to allow for
different instance allocation strategy implementations.
It adds an `InstanceAllocator` trait with the current implementation put behind
the `OnDemandInstanceAllocator` struct.
The Wasmtime API has been updated to allow a `Config` to have an instance
allocation strategy set which will determine how instances get allocated.
This change is in preparation for an alternative *pooling* instance allocator
that can reserve all needed host process address space in advance.
This commit also makes changes to the `wasmtime_environ` crate to represent
compiled modules in a way that reduces copying at instantiation time.