wasmtime

Author	SHA1	Message	Date
Alex Crichton	7fdc616368	Remove the `Paged` memory initialization variant (#4046 ) * Remove the `Paged` memory initialization variant This commit simplifies the `MemoryInitialization` enum by removing the `Paged` variant. The `Paged` variant was originally added for uffd, but that support has now been removed in #4040. This is no longer necessary but is still used as an intermediate step of becoming a `Static` variant of initialized memory (which copy-on-write uses). As a result this commit largely modifies the static initialization of memory steps and folds the two methods together. * Apply suggestions from code review Co-authored-by: Peter Huene <peter@huene.dev> Co-authored-by: Peter Huene <peter@huene.dev>	2022-05-05 09:44:48 -05:00
Andrew Brown	3dbdcfa220	runtime: refactor `Memory` to always use `Box<dyn RuntimeLinearMemory>` (#4086 ) While working with the runtime `Memory` object, it became clear that some refactoring was needed. In order to implement shared memory from the threads proposal, we must be able to atomically change the memory size. Previously, the split into variants, `Memory::Static` and `Memory::Dynamic`, made any attempt to lock forced us to duplicate logic in various places. This change moves `enum Memory { Static..., Dynamic... }` to simply `struct Memory(Box<dyn RuntimeLinearMemory>)`. A new type, `ExternalMemory`, takes the place of `Memory::Static` and also implements the `RuntimeLinearMemory` trait, allowing `Memory` to contain the same two options as before: `MmapMemory` for `Memory::Dynamic` and `ExternalMemory` for `Memory::Static`. To interface with the `PoolingAllocator`, this change also required the ability to downcast to the internal representation.	2022-04-29 08:12:38 -07:00
Alex Crichton	3f3afb455e	Remove support for userfaultfd (#4040 ) This commit removes support for the `userfaultfd` or "uffd" syscall on Linux. This support was originally added for users migrating from Lucet to Wasmtime, but the recent developments of kernel-supported copy-on-write support for memory initialization wound up being more appropriate for these use cases than usefaultfd. The main reason for moving to copy-on-write initialization are: * The `userfaultfd` feature was never necessarily intended for this style of use case with wasm and was susceptible to subtle and rare bugs that were extremely difficult to track down. We were never 100% certain that there were kernel bugs related to userfaultfd but the suspicion never went away. * Handling faults with userfaultfd was always slow and single-threaded. Only one thread could handle faults and traveling to user-space to handle faults is inherently slower than handling them all in the kernel. The single-threaded aspect in particular presented a significant scaling bottleneck for embeddings that want to run many wasm instances in parallel. * One of the major benefits of userfaultfd was lazy initialization of wasm linear memory which is also achieved with the copy-on-write initialization support we have right now. * One of the suspected benefits of userfaultfd was less frobbing of the kernel vma structures when wasm modules are instantiated. Currently the copy-on-write support has a mitigation where we attempt to reuse the memory images where possible to avoid changing vma structures. When comparing this to userfaultfd's performance it was found that kernel modifications of vmas aren't a worrisome bottleneck so copy-on-write is suitable for this as well. Overall there are no remaining benefits that userfaultfd gives that copy-on-write doesn't, and copy-on-write solves a major downsides of userfaultfd, the scaling issue with a single faulting thread. Additionally copy-on-write support seems much more robust in terms of kernel implementation since it's only using standard memory-management syscalls which are heavily exercised. Finally copy-on-write support provides a new bonus where read-only memory in WebAssembly can be mapped directly to the same kernel cache page, even amongst many wasm instances of the same module, which was never possible with userfaultfd. In light of all this it's expected that all users of userfaultfd should migrate to the copy-on-write initialization of Wasmtime (which is enabled by default).	2022-04-18 12:42:26 -05:00
Yang Hau	bfae6384aa	fix typo (#4030 )	2022-04-14 09:35:53 -05:00
Alex Crichton	c22033bf93	Delete historical interruptable support in Wasmtime (#3925 ) * Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`.	2022-03-14 15:25:11 -05:00
Alex Crichton	15bb0c6903	Remove the `ModuleLimits` pooling configuration structure (#3837 ) * Remove the `ModuleLimits` pooling configuration structure This commit is an attempt to improve the usability of the pooling allocator by removing the need to configure a `ModuleLimits` structure. Internally this structure has limits on all forms of wasm constructs but this largely bottoms out in the size of an allocation for an instance in the instance pooling allocator. Maintaining this list of limits can be cumbersome as modules may get tweaked over time and there's otherwise no real reason to limit the number of globals in a module since the main goal is to limit the memory consumption of a `VMContext` which can be done with a memory allocation limit rather than fine-tuned control over each maximum and minimum. The new approach taken in this commit is to remove `ModuleLimits`. Some fields, such as `tables`, `table_elements` , `memories`, and `memory_pages` are moved to `InstanceLimits` since they're still enforced at runtime. A new field `size` is added to `InstanceLimits` which indicates, in bytes, the maximum size of the `VMContext` allocation. If the size of a `VMContext` for a module exceeds this value then instantiation will fail. This involved adding a few more checks to `{Table, Memory}::new_static` to ensure that the minimum size is able to fit in the allocation, since previously modules were validated at compile time of the module that everything fit and that validation no longer happens (it happens at runtime). A consequence of this commit is that Wasmtime will have no built-in way to reject modules at compile time if they'll fail to be instantiated within a particular pooling allocator configuration. Instead a module must attempt instantiation see if a failure happens. * Fix benchmark compiles * Fix some doc links * Fix a panic by ensuring modules have limited tables/memories * Review comments * Add back validation at `Module` time instantiation is possible This allows for getting an early signal at compile time that a module will never be instantiable in an engine with matching settings. * Provide a better error message when sizes are exceeded Improve the error message when an instance size exceeds the maximum by providing a breakdown of where the bytes are all going and why the large size is being requested. * Try to fix test in qemu * Flag new test as 64-bit only Sizes are all specific to 64-bit right now	2022-02-25 09:11:51 -06:00
Alex Crichton	bbd4a4a500	Enable copy-on-write heap initialization by default (#3825 ) * Enable copy-on-write heap initialization by default This commit enables the `Config::memfd` feature by default now that it's been fuzzed for a few weeks on oss-fuzz, and will continue to be fuzzed leading up to the next release of Wasmtime in early March. The documentation of the `Config` option has been updated as well as adding a CLI flag to disable the feature. * Remove ubiquitous "memfd" terminology Switch instead to forms of "memory image" or "cow" or some combination thereof. * Update new option names	2022-02-22 17:12:18 -06:00
Peter Huene	ef17a36852	Port fix for `CVE-2022-23636` to `main`. (#3818 ) * Port fix for `CVE-2022-23636` to `main`. This commit ports the fix for `CVE-2022-23636` to `main`, but performs a refactoring that makes it unnecessary for the instance itself to track if it has been initialized; such a change was not targeted enough for a security patch. The pooling allocator will now only initialize an instance if all of its associated resource creation succeeds. If the resource creation fails, no instance is dropped as none was initialized. Also updates `RELEASES.md` to include the related patch releases. * Add `Instance::new_at` to fully initialize an instance. Added `Instance::new_at` to fully initialize an instance at a given address. This will hopefully prevent the possibility that an `Instance` structure doesn't have an initialized `VMContext` when it is dropped.	2022-02-16 17:51:14 -06:00
Alex Crichton	b438617e12	Further minor optimizations to instantiation (#3791 ) * Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic	2022-02-11 09:55:08 -06:00
Alex Crichton	c0c368d151	Use mmap'd `.cwasm` as a source for memory initialization images (#3787 ) Skip memfd creation with precompiled modules This commit updates the memfd support internally to not actually use a memfd if a compiled module originally came from disk via the `wasmtime::Module::deserialize_file` API. In this situation we already have a file descriptor open and there's no need to copy a module's heap image to a new file descriptor. To facilitate a new source of `mmap` the currently-memfd-specific-logic of creating a heap image is generalized to a new form of `MemoryInitialization` which is attempted for all modules at module-compile-time. This means that the serialized artifact to disk will have the memory image in its entirety waiting for us. Furthermore the memory image is ensured to be padded and aligned carefully to the target system's page size, notably meaning that the data section in the final object file is page-aligned and the size of the data section is also page aligned. This means that when a precompiled module is mapped from disk we can reuse the underlying `File` to mmap all initial memory images. This means that the offset-within-the-memory-mapped-file can differ for memfd-vs-not, but that's just another piece of state to track in the memfd implementation. In the limit this waters down the term "memfd" for this technique of quickly initializing memory because we no longer use memfd unconditionally (only when the backing file isn't available). This does however open up an avenue in the future to porting this support to other OSes because while `memfd_create` is Linux-specific both macOS and Windows support mapping a file with copy-on-write. This porting isn't done in this PR and is left for a future refactoring. Closes #3758 * Enable "memfd" support on all unix systems Cordon off the Linux-specific bits and enable the memfd support to compile and run on platforms like macOS which have a Linux-like `mmap`. This only works if a module is mapped from a precompiled module file on disk, but that's better than not supporting it at all! * Fix linux compile * Use `Arc<File>` instead of `MmapVecFileBacking` * Use a named struct instead of mysterious tuples * Comment about unsafety in `Module::deserialize_file` * Fix tests * Fix uffd compile * Always align data segments No need to have conditional alignment since their sizes are all aligned anyway * Update comment in build.rs * Use rustix, not `region` * Fix some confusing logic/names around memory indexes These functions all work with memory indexes, not specifically defined memory indexes.	2022-02-10 15:40:40 -06:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
Peter Huene	1b27508a42	Fix incorrect use of `MemoryIndex` in the pooling allocator. (#3782 ) This commit corrects a few places where `MemoryIndex` was used and treated like a `DefinedMemoryIndex` in the pooling instance allocator. When the unstable `multi-memory` proposal is enabled, it is possible to cause a newly allocated instance to use an incorrect base address for any defined memories by having the module being instantiated also import a memory. This requires enabling the unstable `multi-memory` proposal, configuring the use of the pooling instance allocator (not the default), and then configuring the module limits to allow imported memories (also not the default). The fix is to replace all uses of `MemoryIndex` with `DefinedMemoryIndex` in the pooling instance allocator. Several `debug_assert!` have also been updated to `assert!` to sanity check the state of the pooling allocator even in release builds.	2022-02-09 09:39:29 -06:00
Chris Fallin	4f01711d42	Pooling allocator: Default for allocation policy should use memfd feature, not memfd-allocator. (#3777 ) Thanks to @peterheune for noticing this!	2022-02-08 10:29:45 -08:00
Chris Fallin	ddd39cdb84	Patch qemu in CI to fix madvise semantics. (#3770 ) We currently skip some tests when running our qemu-based tests for aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics -- specifically, it just ignores madvise() [1]. We could continue to whack-a-mole the tests whenever we create new functionality that relies on madvise() semantics, but ideally we'd just have emulation that properly emulates! The earlier discussions on the qemu mailing list [2] had a proposed patch for this, but (i) this patch doesn't seem to apply cleanly anymore (it's 3.5 years old) and (ii) it's pretty complex due to the need to handle qemu's ability to emulate differing page sizes on host and guest. It turns out that we only really need this for CI when host and guest have the same page size (4KiB), so we could just pass the madvise()s through. I wouldn't expect such a patch to ever land upstream in qemu, but it satisfies our needs I think. So this PR modifies our CI setup to patch qemu before building it locally with a little one-off patch. [1] https://github.com/bytecodealliance/wasmtime/pull/2518#issuecomment-747280133 [2] https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html	2022-02-07 15:56:54 -08:00
Alex Crichton	04d2caea7b	Consolidate methods of memory initialization (#3766 ) * Consolidate methods of memory initialization This commit consolidates the few locations that we have which are performing memory initialization. Namely the uffd logic for creating paged memory as well as the memfd logic for creating a memory image now share an implementation to avoid duplicating bounds-checks or other validation conditions. The main purpose of this commit is to fix a fuzz-bug where a multiplication overflowed. The overflow itself was benign but it seemed better to fix the overflow in only one place instead of multiple. The overflow in question is specifically when an initializer is checked to be statically out-of-bounds and multiplies a memory's minimum size by the wasm page size, returning the result as a `u64`. For memory64-memories of size `1 << 48` this multiplication will overflow. This was actually a preexisting bug with the `try_paged_init` function which was copied for memfd, but cropped up here since memfd is used more often than paged initialization. The fix here is to skip validation of the `end` index if the size of memory is `1 << 64` since if the `end` index can be represented as a `u64` then it's in-bounds. This is somewhat of an esoteric case, though, since a memory of minimum size `1 << 64` can't ever exist (we can't even ask the os for that much memory, and even if we could it would fail). * Fix memfd test * Fix some tests * Remove InitMemory enum * Add an `is_segmented` helper method * More clear variable name * Make arguments to `init_memory` more descriptive	2022-02-04 13:17:25 -06:00
Alex Crichton	b647561c44	memfd: Some minor follow-ups (#3759 ) * Tweak memfd-related features crates This commit changes the `memfd` feature for the `wasmtime-cli` crate from an always-on feature to a default-on feature which can be disabled at compile time. Additionally the `pooling-allocator` feature is also given similar treatment. Additionally some documentation was added for the `memfd` feature on the `wasmtime` crate. * Don't store `Arc<T>` in `InstanceAllocationRequest` Instead store `&Arc<T>` to avoid having the clone that lives in `InstanceAllocationRequest` not actually going anywhere. Otherwise all instance allocation requires an extra clone to create it for the request and an extra decrement when the request goes away. Internally clones are made as necessary when creating instances. * Enable the pooling allocator by default for `wasmtime-cli` While perhaps not the most useful option since the CLI doesn't have a great way to take advantage of this it probably makes sense to at least match the features of `wasmtime` itself. * Fix some lints and issues * More compile fixes	2022-02-03 09:17:04 -06:00
Alex Crichton	8ed79c8f57	memfd: Reduce some syscalls in the on-demand case (#3757 ) * memfd: Reduce some syscalls in the on-demand case This tweaks the internal organization of the `MemFdSlot` to avoid some syscalls in the default case as well as opportunistically in the pooling case. The two cases added here are: * A `MemFdSlot` is now created with a specified initial size. For pooling this is 0 but for the on-demand case this can be non-zero. * When `instantiate` is called with no prior image and the sizes match (as will be the case for on-demand allocation) then `mprotect` is skipped entirely. * In the `clear_and_remain-ready` case the `mprotect` is skipped if the heap wasn't grown at all. This should avoid ever using `mprotect` unnecessarily and makes the ranges we `mprotect` a bit smaller as well. * Review comments * Tweak allow to apply to whole crate	2022-02-02 16:09:47 -06:00
Chris Fallin	5deb1f1fbf	Merge pull request #3738 from cfallin/pooling-affinity Pooling allocator: add a reuse-affinity policy.	2022-02-02 13:11:39 -08:00
Chris Fallin	99ed8cc9be	Merge pull request #3697 from cfallin/memfd-cow memfd/madvise-based CoW pooling allocator	2022-02-02 13:04:26 -08:00
Chris Fallin	1cbd393930	Review comments.	2022-02-02 12:25:30 -08:00
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	2022-02-02 12:25:30 -08:00
Chris Fallin	01e6bb81fb	Review feedback.	2022-02-01 15:49:44 -08:00
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	2022-01-31 22:39:20 -08:00
Chris Fallin	982df2f2e5	Review feedback.	2022-01-31 16:40:14 -08:00
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	2022-01-31 13:59:51 -08:00
Chris Fallin	3702e81d30	Remove ftruncate-trick for heap growth with memfd backend. Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.	2022-01-31 12:53:22 -08:00
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	2022-01-31 12:53:18 -08:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
Dan Gohman	ea0cb971fb	Update to rustix 0.26.2. (#3521 ) This pulls in a fix for Android, where Android's seccomp policy on older versions is to make `openat2` irrecoverably crash the process, so we have to do a version check up front rather than relying on `ENOSYS` to determine if `openat2` is supported. And it pulls in the fix for the link errors when multiple versions of rsix/rustix are linked in. And it has updates for two crate renamings: rsix has been renamed to rustix, and unsafe-io has been renamed to io-extras.	2021-11-15 10:21:13 -08:00
Peter Huene	58aab85680	Add the `pooling-allocator` feature. This commit adds the `pooling-allocator` feature to both the `wasmtime` and `wasmtime-runtime` crates. The feature controls whether or not the pooling allocator implementation is built into the runtime and exposed as a supported instance allocation strategy in the wasmtime API. The feature is on by default for the `wasmtime` crate. Closes #3513.	2021-11-10 13:25:55 -08:00
Pat Hickey	52542b6c01	mock enough of the store to pass the uffd test	2021-10-22 08:56:13 -07:00
Pat Hickey	efef0769fe	make uffd test compile, but not pass	2021-10-22 08:39:00 -07:00
Pat Hickey	6c70b81ff5	review feedback	2021-10-21 12:10:03 -07:00
Pat Hickey	67a6c27e22	pooling needs the store earlier	2021-10-21 12:10:03 -07:00
Pat Hickey	18a355e092	give sychronous ResourceLimiter an async alternative	2021-10-21 12:10:03 -07:00
Alex Crichton	5b3b459ad5	Fix some nightly dead code warnings (#3404 ) * Fix some nightly dead code warnings Looks like the "struct field not used" lint has improved on nightly and caught a few more instances of fields that were never actually read. * Fix windows	2021-10-01 14:26:30 -05:00
Dan Gohman	e5ebef1b94	Use `empty()` instead of `NONE` with rsix flags types. `empty()` is provided by all `bitflags` types, so it's more idiomatic than having `NONE` values.	2021-09-30 08:14:13 -07:00
Dan Gohman	47490b4383	Use rsix to make system calls in Wasmtime. (#3355 ) * Use rsix to make system calls in Wasmtime. `rsix` is a system call wrapper crate that we use in `wasi-common`, which can provide the following advantages in the rest of Wasmtime: - It eliminates some `unsafe` blocks in Wasmtime's code. There's still an `unsafe` block in the library, but this way, the `unsafe` is factored out and clearly scoped. - And, it makes error handling more consistent, factoring out code for checking return values and `io::Error::last_os_error()`, and code that does `errno::set_errno(0)`. This doesn't cover all system calls; `rsix` doesn't implement signal-handling APIs, and this doesn't cover calls made through `std` or crates like `userfaultfd`, `rand`, and `region`.	2021-09-17 15:28:56 -07:00
Alex Crichton	a237e73b5a	Remove some allocations in `CodeMemory` (#3253 ) * Remove some allocations in `CodeMemory` This commit removes the `FinishedFunctions` type as well as allocations associated with trampolines when allocating inside of a `CodeMemory`. The main goal of this commit is to improve the time spent in `CodeMemory` where currently today a good portion of time is spent simply parsing symbol names and trying to extract function indices from them. Instead this commit implements a new strategy (different from #3236) where compilation records offset/length information for all functions/trampolines so this doesn't need to be re-learned from the object file later. A consequence of this commit is that this offset information will be decoded/encoded through `bincode` unconditionally, but we can also optimize that later if necessary as well. Internally this involved quite a bit of refactoring since the previous map for `FinishedFunctions` was relatively heavily relied upon. * comments	2021-08-30 10:35:17 -05:00
Alex Crichton	a662f5361d	Move wasm data sections out of wasmtime_environ::Module (#3231 ) * Reduce indentation in `to_paged` Use a few early-returns from `match` to avoid lots of extra indentation. * Move wasm data sections out of `wasmtime_environ::Module` This is the first step down the road of #3230. The long-term goal is that `Module` is always `bincode`-decoded, but wasm data segments are a possibly very-large portion of this residing in modules which we don't want to shove through bincode. This refactors the internals of wasmtime to be ok with this data living separately from the `Module` itself, providing access at necessary locations. Wasm data segments are now extracted from a wasm module and concatenated directly. Data sections then describe ranges within this concatenated list of data, and passive data works the same way. This implementation does not lend itself to eventually optimizing the case where passive data is dropped and no longer needed. That's left for a future PR.	2021-08-24 14:04:03 -05:00
Alex Crichton	f5041dd362	Implement a setting for reserved dynamic memory growth (#3215 ) * Implement a setting for reserved dynamic memory growth Dynamic memories aren't really that heavily used in Wasmtime right now because for most 32-bit memories they're classified as "static" which means they reserve 4gb of address space and never move. Growth of a static memory is simply making pages accessible, so it's quite fast. With the memory64 feature, however, this is no longer true since all memory64 memories are classified as "dynamic" at this time. Previous to this commit growth of a dynamic memory unconditionally moved the entire linear memory in the host's address space, always resulting in a new `Mmap` allocation. This behavior is causing fuzzers to time out when working with 64-bit memories because incrementally growing a memory by 1 page at a time can incur a quadratic time complexity as bytes are constantly moved. This commit implements a scheme where there is now a tunable setting for memory to be reserved at the end of a dynamic memory to grow into. This means that dynamic memory growth is ideally amortized as most calls to `memory.grow` will be able to grow into the pre-reserved space. Some calls, though, will still need to copy the memory around. This helps enable a commented out test for 64-bit memories now that it's fast enough to run in debug mode. This is because the growth of memory in the test no longer needs to copy 4gb of zeros. * Test fixes & review comments * More comments	2021-08-20 10:54:23 -05:00
Alex Crichton	87c33c2969	Remove `wasmtime-environ`'s dependency on `cranelift-codegen` (#3199 ) * Move `CompiledFunction` into wasmtime-cranelift This commit moves the `wasmtime_environ::CompiledFunction` type into the `wasmtime-cranelift` crate. This type has lots of Cranelift-specific pieces of compilation and doesn't need to be generated by all Wasmtime compilers. This replaces the usage in the `Compiler` trait with a `Box<Any>` type that each compiler can select. Each compiler must still produce a `FunctionInfo`, however, which is shared information we'll deserialize for each module. The `wasmtime-debug` crate is also folded into the `wasmtime-cranelift` crate as a result of this commit. One possibility was to move the `CompiledFunction` commit into its own crate and have `wasmtime-debug` depend on that, but since `wasmtime-debug` is Cranelift-specific at this time it didn't seem like it was too too necessary to keep it separate. If `wasmtime-debug` supports other backends in the future we can recreate a new crate, perhaps with it refactored to not depend on Cranelift. * Move wasmtime_environ::reference_type This now belongs in wasmtime-cranelift and nowhere else * Remove `Type` reexport in wasmtime-environ One less dependency on `cranelift-codegen`! * Remove `types` reexport from `wasmtime-environ` Less cranelift! * Remove `SourceLoc` from wasmtime-environ Change the `srcloc`, `start_srcloc`, and `end_srcloc` fields to a custom `FilePos` type instead of `ir::SourceLoc`. These are only used in a few places so there's not much to lose from an extra abstraction for these leaf use cases outside of cranelift. * Remove wasmtime-environ's dep on cranelift's `StackMap` This commit "clones" the `StackMap` data structure in to `wasmtime-environ` to have an independent representation that that chosen by Cranelift. This allows Wasmtime to decouple this runtime dependency of stack map information and let the two evolve independently, if necessary. An alternative would be to refactor cranelift's implementation into a separate crate and have wasmtime depend on that but it seemed a bit like overkill to do so and easier to clone just a few lines for this. * Define code offsets in wasmtime-environ with `u32` Don't use Cranelift's `binemit::CodeOffset` alias to define this field type since the `wasmtime-environ` crate will be losing the `cranelift-codegen` dependency soon. * Commit to using `cranelift-entity` in Wasmtime This commit removes the reexport of `cranelift-entity` from the `wasmtime-environ` crate and instead directly depends on the `cranelift-entity` crate in all referencing crates. The original reason for the reexport was to make cranelift version bumps easier since it's less versions to change, but nowadays we have a script to do that. Otherwise this encourages crates to use whatever they want from `cranelift-entity` since we'll always depend on the whole crate. It's expected that the `cranelift-entity` crate will continue to be a lean crate in dependencies and suitable for use at both runtime and compile time. Consequently there's no need to avoid its usage in Wasmtime at runtime, since "remove Cranelift at compile time" is primarily about the `cranelift-codegen` crate. * Remove most uses of `cranelift-codegen` in `wasmtime-environ` There's only one final use remaining, which is the reexport of `TrapCode`, which will get handled later. * Limit the glob-reexport of `cranelift_wasm` This commit removes the glob reexport of `cranelift-wasm` from the `wasmtime-environ` crate. This is intended to explicitly define what we're reexporting and is a transitionary step to curtail the amount of dependencies taken on `cranelift-wasm` throughout the codebase. For example some functions used by debuginfo mapping are better imported directly from the crate since they're Cranelift-specific. Note that this is intended to be a temporary state affairs, soon this reexport will be gone entirely. Additionally this commit reduces imports from `cranelift_wasm` and also primarily imports from `crate::wasm` within `wasmtime-environ` to get a better sense of what's imported from where and what will need to be shared. * Extract types from cranelift-wasm to cranelift-wasm-types This commit creates a new crate called `cranelift-wasm-types` and extracts type definitions from the `cranelift-wasm` crate into this new crate. The purpose of this crate is to be a shared definition of wasm types that can be shared both by compilers (like Cranelift) as well as wasm runtimes (e.g. Wasmtime). This new `cranelift-wasm-types` crate doesn't depend on `cranelift-codegen` and is the final step in severing the unconditional dependency from Wasmtime to `cranelift-codegen`. The final refactoring in this commit is to then reexport this crate from `wasmtime-environ`, delete the `cranelift-codegen` dependency, and then update all `use` paths to point to these new types. The main change of substance here is that the `TrapCode` enum is mirrored from Cranelift into this `cranelift-wasm-types` crate. While this unfortunately results in three definitions (one more which is non-exhaustive in Wasmtime itself) it's hopefully not too onerous and ideally something we can patch up in the future. * Get lightbeam compiling * Remove unnecessary dependency * Fix compile with uffd * Update publish script * Fix more uffd tests * Rename cranelift-wasm-types to wasmtime-types This reflects the purpose a bit more where it's types specifically intended for Wasmtime and its support. * Fix publish script	2021-08-18 13:14:52 -05:00
Alex Crichton	0313e30d76	Remove dependency on `TargetIsa` from Wasmtime crates (#3178 ) This commit started off by deleting the `cranelift_codegen::settings` reexport in the `wasmtime-environ` crate and then basically played whack-a-mole until everything compiled again. The main result of this is that the `wasmtime-` family of crates have generally less of a dependency on the `TargetIsa` trait and type from Cranelift. While the dependency isn't entirely severed yet this is at least a significant start. This commit is intended to be largely refactorings, no functional changes are intended here. The refactorings are: A `CompilerBuilder` trait has been added to `wasmtime_environ` which server as an abstraction used to create compilers and configure them in a uniform fashion. The `wasmtime::Config` type now uses this instead of cranelift-specific settings. The `wasmtime-jit` crate exports the ability to create a compiler builder from a `CompilationStrategy`, which only works for Cranelift right now. In a cranelift-less build of Wasmtime this is expected to return a trait object that fails all requests to compile. * The `Compiler` trait in the `wasmtime_environ` crate has been souped up with a number of methods that Wasmtime and other crates needed. * The `wasmtime-debug` crate is now moved entirely behind the `wasmtime-cranelift` crate. * The `wasmtime-cranelift` crate is now only depended on by the `wasmtime-jit` crate. * Wasm types in `cranelift-wasm` no longer contain their IR type, instead they only contain the `WasmType`. This is required to get everything to align correctly but will also be required in a future refactoring where the types used by `cranelift-wasm` will be extracted to a separate crate. * I moved around a fair bit of code in `wasmtime-cranelift`. * Some gdb-specific jit-specific code has moved from `wasmtime-debug` to `wasmtime-jit`.	2021-08-16 09:55:39 -05:00
Alex Crichton	e68aa99588	Implement the memory64 proposal in Wasmtime (#3153 ) * Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can't grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that's what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we're going to/from wasm. Overall I've tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren't aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn't want to enable this by default naturally until we've fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it's implemented with "dynamic" heaps which have a few consequences: * All memory accesses are bounds-checked. I'm not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven't stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it's really not all that comprehensive. I've tried adding more tests for basic things as I've had to implement guards for them, but I wouldn't really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages	2021-08-12 09:40:20 -05:00
Alex Crichton	63a3bbbf5a	Change VMMemoryDefinition::current_length to `usize` (#3134 ) * Change VMMemoryDefinition::current_length to `usize` This commit changes the definition of `VMMemoryDefinition::current_length` to `usize` from its previous definition of `u32`. This is a pretty impactful change because it also changes the cranelift semantics of "dynamic" heaps where the bound global value specifier must now match the pointer type for the platform rather than the index type for the heap. The motivation for this change is that the `current_length` field (or bound for the heap) is intended to reflect the current size of the heap. This is bound by `usize` on the host platform rather than `u32` or` u64`. The previous choice of `u32` couldn't represent a 4GB memory because we couldn't put a number representing 4GB into the `current_length` field. By using `usize`, which reflects the host's memory allocation, this should better reflect the size of the heap and allows Wasmtime to support a full 4GB heap for a wasm program (instead of 4GB minus one page). This commit also updates the legalization of the `heap_addr` clif instruction to appropriately cast the address to the platform's pointer type, handling bounds checks along the way. The practical impact for today's targets is that a `uextend` is happening sooner than it happened before, but otherwise there is no intended impact of this change. In the future when 64-bit memories are supported there will likely need to be fancier logic which handles offsets a bit differently (especially in the case of a 64-bit memory on a 32-bit host). The clif `filetest` changes should show the differences in codegen, and the Wasmtime changes are largely removing casts here and there. Closes #3022 * Add tests for memory.size at maximum memory size * Add a dfg helper method	2021-08-02 13:09:40 -05:00
Alex Crichton	992d85ae8b	Add a type parameter to `VMOffsets` for pointer size (#3020 ) * Add a type parameter to `VMOffsets` for pointer size This commit adds a type parameter to `VMOffsets` representing the pointer size to improve computations in `wasmtime-runtime` which always use a constant value of the host's pointer size. The type parameter is `u8` for `wasmtime-cranelift`'s use case where cross-compilation may be involved. * fix lightbeam	2021-07-13 09:52:27 -05:00
Alex Crichton	324d80729a	Refactor some internal accessors of Instance (#3021 ) This commit removes some one-use methods to inline them at their use site, and otherwise adds bounds checks to other functions like `imported_function` where previously the `FuncIndex` may have been accidentally out of bounds, which would cause memory unsafety. There's no actual bug this was fixing, just trying to improve the safety of the code internally a little.	2021-06-23 12:12:38 -05:00
Ulrich Weigand	83007b79e3	Fix access to VMMemoryDefinition::current_length on big-endian (#3013 ) The current_length member is defined as "usize" in Rust code, but generated wasm code refers to it as if it were "u32". While this happens to mostly work on little-endian machines (as long as the length is < 4GB), it will always fail on big-endian machines. Fixed by making current_length "u32" in Rust as well, and ensuring the actual memory size is always less than 4GB.	2021-06-23 11:45:32 -05:00
Anton Kirilov	cb93726250	Enable more tests on AArch64 (#2994 ) Copyright (c) 2021, Arm Limited.	2021-06-21 12:26:44 -05:00

1 2

87 Commits