Commit Graph

69 Commits

Author SHA1 Message Date
Alex Crichton
18dd82ba7d Improve signature lookup happening during instantiation (#2818)
This commit is intended to be a perf improvement for instantiation of
modules with lots of functions. Previously the `lookup_shared_signature`
callback was showing up quite high in profiles as part of instantiation.

As some background, this callback is used to translate from a module's
`SignatureIndex` to a `VMSharedSignatureIndex` which the instance
stores. This callback is called for two reasons, one is to translate all
of the module's own types into `VMSharedSignatureIndex` for the purposes
of `call_indirect` (the translation of that loads from this table to
compare indices). The second reason is that a `VMCallerCheckedAnyfunc`
is prepared for all functions and this embeds a `VMSharedSignatureIndex`
inside of it.

The slow part today is that the lookup callback was called
once-per-function and each lookup involved hashing a full
`WasmFuncType`. Albeit our hash algorithm is still Rust's default
SipHash algorithm which is quite slow, but we also shouldn't need to
re-hash each signature if we see it multiple times anyway.

The fix applied in this commit is to change this lookup callback to an
`enum` where one variant is that there's a table to lookup from. This
table is a `PrimaryMap` which means that lookup is quite fast. The only
thing we need to do is to prepare the table ahead of time. Currently
this happens on the instantiation path because in my measurments the
creation of the table is quite fast compared to the rest of
instantiation. If this becomes an issue, though, we can look into
creating the table as part of `SigRegistry::register_module` and caching
it somewhere (I'm not entirely sure where but I'm sure we can figure it
out).

There's in generally not a ton of efficiency around the `SigRegistry`
type. I'm hoping though that this fixes the next-lowest-hanging-fruit in
terms of performance without complicating the implementation too much. I
tried a few variants and this change seemed like the best balance
between simplicity and still a nice performance gain.

Locally I measured an improvement in instantiation time for a large-ish
module by reducing the time from ~3ms to ~2.6ms per instance.
2021-04-08 15:04:18 -05:00
Alex Crichton
c91e14d83f Precompute fields in VMOffsets
This commit updates the implementation of `VMOffsets` to frontload all
checked arithmetic on construction of the `VMOffsets` which allows
eliding all checked arithmetic when accessing the fields of `VMOffsets`.
For testing and such this adds a new constructor as well from a new
`VMOffsetsFields` structure which is a clone of the old definition.

This should help speed up some profile hot spots I've been seeing where
with all the checked arithmetic on field sizes this was slowing down the
various accessors during instantiation (which uses `VMOffsets` to
initialize various fields of the `VMContext`).
2021-04-08 12:46:17 -07:00
Benjamin Bouvier
7588565078 Tweaks some tests for Mac aarch64
- some tests don't pass because of bad interactions with the system's
libunwind; ignore them for now.
- the page size on mac aarch64 is 16K, not 4K; tweak some tests which
were expecting 4K or multiples of 4K pages to use a multiple of host page size
instead.
- a cranelift-native test needed an update for the new calling convention.
2021-04-07 14:54:50 +02:00
Alex Crichton
3f694ae319 Use stable Rust on CI to test the x64 backend (#2766)
* Use stable Rust on CI to test the x64 backend

This commit leverages the newly-released 1.51.0 compiler to test the
new backend on Windows and Linux with a stable compiler instead of a
nightly compiler. This isolates the nightly build to just the nightly
documentation generation and fuzzing, both of which rely on nightly for
the best results right now.

* Use updated stable in book build job

* Run rustfmt for new stable

* Silence new warnings for wasi-nn

* Allow some dead code in the x64 backend

Looks like new rustc is better about emitting some dead-code warnings

* Update rust in peepmatic job

* Fix a test in the pooling allocator

* Remove `package.metdata.docs.rs` temporarily

Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first

* Fix a warning in a wasi-nn example
2021-03-25 13:18:59 -05:00
Peter Huene
e6dda413a4 Code review feedback.
* Add assert to `StackPool::deallocate` to ensure the fiber stack given to it
  comes from the pool.
* Remove outdated comment about windows and stacks as the allocator now returns
  fiber stacks.
* Remove conditional compilation around `stack_size` in the allocators as it
  was just clutter.
2021-03-20 00:05:08 -07:00
Peter Huene
f8f51afac1 Split out fiber stacks from fibers.
This commit splits out a `FiberStack` from `Fiber`, allowing the instance
allocator trait to return `FiberStack` rather than raw stack pointers. This
keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand
instance allocator can make use of it.

The instance allocators no longer have to return a "not supported" error to
indicate that the store should allocate its own fiber stack.

This includes a bunch of cleanup in the instance allocator to scope stacks to
the new "async" feature in the runtime.

Closes #2708.
2021-03-18 20:21:02 -07:00
Benjamin Bouvier
4603b3b292 Bump dependencies to get a single version of rand (#2733)
This removes a few crates in the dependencies, and a few exceptions (at
the price of a new one) in the cargo-deny configuration.
2021-03-17 09:07:50 -05:00
Peter Huene
623290d42e Use anyhow::Error in instantiation errors.
This commit updates the error enums used in instantiation errors to encapsulate
an `anyhow::Error` rather than a string.
2021-03-08 11:27:30 -08:00
Peter Huene
5fa0f8d469 Move linear memory faulted guard page tracking into Memory.
This commit moves the tracking for faulted guard pages in a linear memory into
`Memory`.
2021-03-08 11:27:25 -08:00
Peter Huene
1a0493946d Make the storage of wasmtime_runtime::Table consistent.
This change makes the storage of `Table` more internally consistent.

Elements are stored as raw pointers for both static and dynamic table storage.

Explicitly storing elements as pointers removes assumptions being made by the
pooling allocator in terms of the size and default representation of the
elements.

However, care must be made to properly clone externrefs for table operations.
2021-03-05 18:36:14 -08:00
Peter Huene
a4084db096 More feedback changes.
* Don't reexport types from `wasmtime_runtime` from the `wasmtime` crate.
* Add more comments.
2021-03-04 22:27:27 -08:00
Peter Huene
ff840b3d3b More PR feedback changes.
* More use of `anyhow`.
* Change `make_accessible` into `protect_linear_memory` to better demonstrate
  what it is used for; this will make the uffd implementation make a little
  more sense.
* Remove `create_memory_map` in favor of just creating the `Mmap` instances in
  the pooling allocator. This also removes the need for `MAP_NORESERVE` in the
  uffd implementation.
* Moar comments.
* Remove `BasePointerIterator` in favor of `impl Iterator`.
* The uffd implementation now only monitors linear memory pages and will only
  receive faults on pages that could potentially be accessible and never on a
  statically known guard page.
* Stop allocating memory or table pools if the maximum limit of the memory or
  table is 0.
2021-03-04 20:14:40 -08:00
Peter Huene
a464465e2f Code review feedback changes.
* Add `anyhow` dependency to `wasmtime-runtime`.
* Revert `get_data` back to `fn`.
* Remove `DataInitializer` and box the data in `Module` translation instead.
* Improve comments on `MemoryInitialization`.
* Remove `MemoryInitialization::OutOfBounds` in favor of proper bulk memory
  semantics.
* Use segmented memory initialization except for when the uffd feature is
  enabled on Linux.
* Validate modules with the allocator after translation.
* Updated various functions in the runtime to return `anyhow::Result`.
* Use a slice when copying pages instead of `ptr::copy_nonoverlapping`.
* Remove unnecessary casts in `OnDemandAllocator::deallocate`.
* Better document the `uffd` feature.
* Use WebAssembly page-sized pages in the paged initialization.
* Remove the stack pool from the uffd handler and simply protect just the guard
  pages.
2021-03-04 18:19:46 -08:00
Peter Huene
505437e353 Code cleanup.
Last minute code clean up to fix some comments and rename `address_space_size`
to `memory_reservation_size` to better describe what the option is doing.
2021-03-04 18:19:46 -08:00
Peter Huene
9091f13dcd Refactor initialize_vmcontext.
This was originally written to support sourcing the table and memory
definitions differently for the pooling allocator.

However, both allocators do the exact same thing, so the closure arguments are
no longer necessary.

Additionally, this cleans up the code a bit to pass in the allocation request
rather than having individual parameters.
2021-03-04 18:19:45 -08:00
Peter Huene
f5c4d87c45 Implement on-demand memory initialization for the uffd feature.
This commit implements copying paged initialization data upon a fault of a
linear memory page.

If the initialization data is "paged", then the appropriate pages are copied
into the Wasm page (or zeroed if the page is not present in the
initialization data).

If the initialization data is not "paged", the Wasm page is zeroed so that
module instantiation can initialize the pages.
2021-03-04 18:19:45 -08:00
Peter Huene
a82f1a323f Skip the stack tests on Windows.
As Windows uses the native fiber implementation, the stack tests should be
ignored on Windows as the implementation intentionally errors when handing out
stacks.
2021-03-04 18:18:52 -08:00
Peter Huene
a2c439117a Implement user fault handling with userfaultfd on Linux.
This commit implements the `uffd` feature which turns on support for utilizing
the `userfaultfd` system call on Linux for the pooling instance allocator.

By handling page faults in userland, we are able to detect guard page accesses
without having to constantly change memory page protections.

This should help reduce the number of syscalls as well as kernel lock
contentions when many threads are allocating and deallocating instances.

Additionally, the user fault handler can lazy initialize linear
memories of an instance (implementation to come).
2021-03-04 18:18:52 -08:00
Peter Huene
e71ccbf9bc Implement the pooling instance allocator.
This commit implements the pooling instance allocator.

The allocation strategy can be set with `Config::with_allocation_strategy`.

The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.

The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time and to reuse
committed memory pages wherever possible.
2021-03-04 18:18:51 -08:00