wasmtime

Author	SHA1	Message	Date
Alex Crichton	601e8f3094	Remove dependency on the `region` crate (#4407 ) This commit removes Wasmtime's dependency on the `region` crate. The motivation for this came about when I was updating dependencies and saw that `region` had a new major version at 3.0.0 as opposed to our currently used 2.3 track. In reviewing the use cases of `region` within Wasmtime I found two trends in particular which motivated this commit: * Some unix-specific areas of `wasmtime_runtime` use `rustix::mm::mprotect` instead of `region::protect` already. This means that the usage of `region::protect` for changing virtual memory protections was already inconsistent. * Many uses of `region::protect` were already in unix-specific regions which could make use of `rustix`. Overall I opted to remove the dependency on the `region` crate to avoid chasing its versions over time. Unix-specific changes of protections were easily changed to `rustix::mm::mprotect`. There were two locations where a windows/unix split is now required and I subjectively ruled "that seems ok". Finally removing `region` also meant that the "what is the current page size" query needed to be inlined into `wasmtime_runtime`, which I have also subjectively ruled "that seems fine". Finally one final refactoring here was that the `unix.rs` and `linux.rs` split for the pooling allocator was merged. These two files already only differed in one function so I slapped a `cfg_if!` in there to help reduce the duplication.	2022-07-07 21:28:25 +00:00
Alex Crichton	77e06213b7	Refactor the internals of traps in `wasmtime_runtime` (#4326 ) This commit is a small refactoring of `wasmtime_runtime::Trap` and various internals. The `Trap` structure is now a reason plus backtrace, and the old `Trap` enum is mostly in `TrapReason` now. Additionally all `Trap`-returning methods of `wasmtime_runtime` are changed to returning a `TrapCode` to indicate that they never capture a backtrace. Finally the `UnwindReason` internally now no longer duplicates the trap reasons, instead only having two variants of "panic" and "trap". The motivation for this commit is mostly just cleaning up trap internals and removing the need for methods like `wasmtime_runtime::Trap::insert_backtrace` to leave it only happening at the `wasmtime` layer.	2022-06-27 12:35:14 -05:00
Andrew Brown	2b52f47b83	Add shared memories (#4187 ) * Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change 'const [u8]' to 'mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator	2022-06-08 12:13:40 -05:00
Alex Crichton	2af358dd9c	Add a `VMComponentContext` type and create it on instantiation (#4215 ) * Add a `VMComponentContext` type and create it on instantiation This commit fills out the `wasmtime-runtime` crate's support for `VMComponentContext` and creates it as part of the instantiation process. This moves a few maps that were temporarily allocated in an `InstanceData` into the `VMComponentContext` and additionally reads the canonical options data from there instead. This type still won't be used in its "full glory" until the lowering of host functions is completely implemented, however, which will be coming in a future commit. * Remove `DerefMut` implementation * Rebase conflicts	2022-06-03 13:34:50 -05:00
Alex Crichton	2a4851ad2b	Change some `VMContext` pointers to `()` pointers (#4190 ) * Change some `VMContext` pointers to `()` pointers This commit is motivated by my work on the component model implementation for imported functions. Currently all context pointers in wasm are `mut VMContext` but with the component model my plan is to make some pointers instead along the lines of `mut VMComponentContext`. In doing this though one worry I have is breaking what has otherwise been a core invariant of Wasmtime for quite some time, subtly introducing bugs by accident. To help assuage my worry I've opted here to erase knowledge of `mut VMContext` where possible. Instead where applicable a context pointer is simply known as `mut ()` and the embedder doesn't actually know anything about this context beyond the value of the pointer. This will help prevent Wasmtime from accidentally ever trying to interpret this context pointer as an actual `VMContext` when it might instead be a `VMComponentContext`. Overall this was a pretty smooth transition. The main change here is that the `VMTrampoline` (now sporting more docs) has its first argument changed to `mut ()`. The second argument, the caller context, is still configured as `mut VMContext` though because all functions are always called from wasm still. Eventually for component-to-component calls I think we'll probably "fake" the second argument as the same as the first argument, losing track of the original caller, as an intentional way of isolating components from each other. Along the way there are a few host locations which do actually assume that the first argument is indeed a `VMContext`. These are valid assumptions that are upheld from a correct implementation, but I opted to add a "magic" field to `VMContext` to assert this in debug mode. This new "magic" field is inintialized during normal vmcontext initialization and it's checked whenever a `VMContext` is reinterpreted as an `Instance` (but only in debug mode). My hope here is to catch any future accidental mistakes, if ever. * Use a VMOpaqueContext wrapper * Fix typos	2022-06-01 11:00:43 -05:00
Alex Crichton	3f9bff17c8	Support disabling backtraces at compile time (#3932 ) * Support disabling backtraces at compile time This commit adds support to Wasmtime to disable, at compile time, the gathering of backtraces on traps. The `wasmtime` crate now sports a `wasm-backtrace` feature which, when disabled, will mean that backtraces are never collected at compile time nor are unwinding tables inserted into compiled objects. The motivation for this commit stems from the fact that generating a backtrace is quite a slow operation. Currently backtrace generation is done with libunwind and `_Unwind_Backtrace` typically found in glibc or other system libraries. When thousands of modules are loaded into the same process though this means that the initial backtrace can take nearly half a second and all subsequent backtraces can take upwards of hundreds of milliseconds. Relative to all other operations in Wasmtime this is extremely expensive at this time. In the future we'd like to implement a more performant backtrace scheme but such an implementation would require coordination with Cranelift and is a big chunk of work that may take some time, so in the meantime if embedders don't need a backtrace they can still use this option to disable backtraces at compile time and avoid the performance pitfalls of collecting backtraces. In general I tried to originally make this a runtime configuration option but ended up opting for a compile-time option because `Trap::new` otherwise has no arguments and always captures a backtrace. By making this a compile-time option it was possible to configure, statically, the behavior of `Trap::new`. Additionally I also tried to minimize the amount of `#[cfg]` necessary by largely only having it at the producer and consumer sites. Also a noteworthy restriction of this implementation is that if backtrace support is disabled at compile time then reference types support will be unconditionally disabled at runtime. With backtrace support disabled there's no way to trace the stack of wasm frames which means that GC can't happen given our current implementation. * Always enable backtraces for the C API	2022-03-16 09:18:16 -05:00
Alex Crichton	c22033bf93	Delete historical interruptable support in Wasmtime (#3925 ) * Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`.	2022-03-14 15:25:11 -05:00
Alex Crichton	15bb0c6903	Remove the `ModuleLimits` pooling configuration structure (#3837 ) * Remove the `ModuleLimits` pooling configuration structure This commit is an attempt to improve the usability of the pooling allocator by removing the need to configure a `ModuleLimits` structure. Internally this structure has limits on all forms of wasm constructs but this largely bottoms out in the size of an allocation for an instance in the instance pooling allocator. Maintaining this list of limits can be cumbersome as modules may get tweaked over time and there's otherwise no real reason to limit the number of globals in a module since the main goal is to limit the memory consumption of a `VMContext` which can be done with a memory allocation limit rather than fine-tuned control over each maximum and minimum. The new approach taken in this commit is to remove `ModuleLimits`. Some fields, such as `tables`, `table_elements` , `memories`, and `memory_pages` are moved to `InstanceLimits` since they're still enforced at runtime. A new field `size` is added to `InstanceLimits` which indicates, in bytes, the maximum size of the `VMContext` allocation. If the size of a `VMContext` for a module exceeds this value then instantiation will fail. This involved adding a few more checks to `{Table, Memory}::new_static` to ensure that the minimum size is able to fit in the allocation, since previously modules were validated at compile time of the module that everything fit and that validation no longer happens (it happens at runtime). A consequence of this commit is that Wasmtime will have no built-in way to reject modules at compile time if they'll fail to be instantiated within a particular pooling allocator configuration. Instead a module must attempt instantiation see if a failure happens. * Fix benchmark compiles * Fix some doc links * Fix a panic by ensuring modules have limited tables/memories * Review comments * Add back validation at `Module` time instantiation is possible This allows for getting an early signal at compile time that a module will never be instantiable in an engine with matching settings. * Provide a better error message when sizes are exceeded Improve the error message when an instance size exceeds the maximum by providing a breakdown of where the bytes are all going and why the large size is being requested. * Try to fix test in qemu * Flag new test as 64-bit only Sizes are all specific to 64-bit right now	2022-02-25 09:11:51 -06:00
Alex Crichton	bbd4a4a500	Enable copy-on-write heap initialization by default (#3825 ) * Enable copy-on-write heap initialization by default This commit enables the `Config::memfd` feature by default now that it's been fuzzed for a few weeks on oss-fuzz, and will continue to be fuzzed leading up to the next release of Wasmtime in early March. The documentation of the `Config` option has been updated as well as adding a CLI flag to disable the feature. * Remove ubiquitous "memfd" terminology Switch instead to forms of "memory image" or "cow" or some combination thereof. * Update new option names	2022-02-22 17:12:18 -06:00
bjorn3	4ed353a7e1	Extract jit_int.rs and most of jitdump_linux.rs for use outside of wasmtime (#2744 ) * Extract gdb jit_int into wasmtime-jit-debug * Move a big chunk of the jitdump code to wasmtime-jit-debug * Fix doc markdown in perf_jitdump.rs	2022-02-22 09:23:44 -08:00
Alex Crichton	b438617e12	Further minor optimizations to instantiation (#3791 ) * Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic	2022-02-11 09:55:08 -06:00
Alex Crichton	c0c368d151	Use mmap'd `.cwasm` as a source for memory initialization images (#3787 ) Skip memfd creation with precompiled modules This commit updates the memfd support internally to not actually use a memfd if a compiled module originally came from disk via the `wasmtime::Module::deserialize_file` API. In this situation we already have a file descriptor open and there's no need to copy a module's heap image to a new file descriptor. To facilitate a new source of `mmap` the currently-memfd-specific-logic of creating a heap image is generalized to a new form of `MemoryInitialization` which is attempted for all modules at module-compile-time. This means that the serialized artifact to disk will have the memory image in its entirety waiting for us. Furthermore the memory image is ensured to be padded and aligned carefully to the target system's page size, notably meaning that the data section in the final object file is page-aligned and the size of the data section is also page aligned. This means that when a precompiled module is mapped from disk we can reuse the underlying `File` to mmap all initial memory images. This means that the offset-within-the-memory-mapped-file can differ for memfd-vs-not, but that's just another piece of state to track in the memfd implementation. In the limit this waters down the term "memfd" for this technique of quickly initializing memory because we no longer use memfd unconditionally (only when the backing file isn't available). This does however open up an avenue in the future to porting this support to other OSes because while `memfd_create` is Linux-specific both macOS and Windows support mapping a file with copy-on-write. This porting isn't done in this PR and is left for a future refactoring. Closes #3758 * Enable "memfd" support on all unix systems Cordon off the Linux-specific bits and enable the memfd support to compile and run on platforms like macOS which have a Linux-like `mmap`. This only works if a module is mapped from a precompiled module file on disk, but that's better than not supporting it at all! * Fix linux compile * Use `Arc<File>` instead of `MmapVecFileBacking` * Use a named struct instead of mysterious tuples * Comment about unsafety in `Module::deserialize_file` * Fix tests * Fix uffd compile * Always align data segments No need to have conditional alignment since their sizes are all aligned anyway * Update comment in build.rs * Use rustix, not `region` * Fix some confusing logic/names around memory indexes These functions all work with memory indexes, not specifically defined memory indexes.	2022-02-10 15:40:40 -06:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
Alex Crichton	8ed79c8f57	memfd: Reduce some syscalls in the on-demand case (#3757 ) * memfd: Reduce some syscalls in the on-demand case This tweaks the internal organization of the `MemFdSlot` to avoid some syscalls in the default case as well as opportunistically in the pooling case. The two cases added here are: * A `MemFdSlot` is now created with a specified initial size. For pooling this is 0 but for the on-demand case this can be non-zero. * When `instantiate` is called with no prior image and the sizes match (as will be the case for on-demand allocation) then `mprotect` is skipped entirely. * In the `clear_and_remain-ready` case the `mprotect` is skipped if the heap wasn't grown at all. This should avoid ever using `mprotect` unnecessarily and makes the ranges we `mprotect` a bit smaller as well. * Review comments * Tweak allow to apply to whole crate	2022-02-02 16:09:47 -06:00
Chris Fallin	01e6bb81fb	Review feedback.	2022-02-01 15:49:44 -08:00
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	2022-01-31 22:39:20 -08:00
Chris Fallin	982df2f2e5	Review feedback.	2022-01-31 16:40:14 -08:00
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	2022-01-31 13:59:51 -08:00
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	2022-01-31 12:53:18 -08:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
Mrmaxmeier	2afd6900f4	runtime: expose DefaultMemoryCreator (#3670 )	2022-01-18 09:17:33 -06:00
Peter Huene	58aab85680	Add the `pooling-allocator` feature. This commit adds the `pooling-allocator` feature to both the `wasmtime` and `wasmtime-runtime` crates. The feature controls whether or not the pooling allocator implementation is built into the runtime and exposed as a supported instance allocation strategy in the wasmtime API. The feature is on by default for the `wasmtime` crate. Closes #3513.	2021-11-10 13:25:55 -08:00
Pat Hickey	a1301f8dae	add table_grow_failed	2021-10-21 15:07:40 -07:00
Pat Hickey	a5007f318f	runtime: use anyhow::Error instead of Box<dyn std::error::Error...>	2021-10-21 12:10:03 -07:00
Pat Hickey	147c8f8ed7	rename	2021-10-21 12:10:03 -07:00
Pat Hickey	18a355e092	give sychronous ResourceLimiter an async alternative	2021-10-21 12:10:03 -07:00
Alex Crichton	bfdbd10a13	Add `_unchecked` variants of `Func` APIs for the C API (#3350 ) Add `_unchecked` variants of `Func` APIs for the C API This commit is what is hopefully going to be my last installment within the saga of optimizing function calls in/out of WebAssembly modules in the C API. This is yet another alternative approach to #3345 (sorry) but also contains everything necessary to make the C API fast. As in #3345 the general idea is just moving checks out of the call path in the same style of `TypedFunc`. This new strategy takes inspiration from previously learned attempts effectively "just" exposes how we previously passed `mut u128` through trampolines for arguments/results. This storage format is formalized through a new `ValRaw` union that is exposed from the `wasmtime` crate. By doing this it made it relatively easy to expose two new APIs: * `Func::new_unchecked` * `Func::call_unchecked` These are the same as their checked equivalents except that they're `unsafe` and they work with `mut ValRaw` rather than safe slices of `Val`. Working with these eschews type checks and such and requires callers/embedders to do the right thing. These two new functions are then exposed via the C API with new functions, enabling C to have a fast-path of calling/defining functions. This fast path is akin to `Func::wrap` in Rust, although that API can't be built in C due to C not having generics in the same way that Rust has. For some benchmarks, the benchmarks here are: `nop` - Call a wasm function from the host that does nothing and returns nothing. * `i64` - Call a wasm function from the host, the wasm function calls a host function, and the host function returns an `i64` all the way out to the original caller. * `many` - Call a wasm function from the host, the wasm calls host function with 5 `i32` parameters, and then an `i64` result is returned back to the original host * `i64` host - just the overhead of the wasm calling the host, so the wasm calls the host function in a loop. * `many` host - same as `i64` host, but calling the `many` host function. All numbers in this table are in nanoseconds, and this is just one measurement as well so there's bound to be some variation in the precise numbers here. \| Name \| Rust \| C (before) \| C (after) \| \|-----------\|------\|------------\|-----------\| \| nop \| 19 \| 112 \| 25 \| \| i64 \| 22 \| 207 \| 32 \| \| many \| 27 \| 189 \| 34 \| \| i64 host \| 2 \| 38 \| 5 \| \| many host \| 7 \| 75 \| 8 \| The main conclusion here is that the C API is significantly faster than before when using the `_unchecked` variants of APIs. The Rust implementation is still the ceiling (or floor I guess?) for performance The main reason that C is slower than Rust is that a little bit more has to travel through memory where on the Rust side of things we can monomorphize and inline a bit more to get rid of that. Overall though the costs are way way down from where they were originally and I don't plan on doing a whole lot more myself at this time. There's various things we theoretically could do I've considered but implementation-wise I think they'll be much more weighty. Tweak `wasmtime_externref_t` API comments	2021-09-24 14:05:45 -05:00
Alex Crichton	87c33c2969	Remove `wasmtime-environ`'s dependency on `cranelift-codegen` (#3199 ) * Move `CompiledFunction` into wasmtime-cranelift This commit moves the `wasmtime_environ::CompiledFunction` type into the `wasmtime-cranelift` crate. This type has lots of Cranelift-specific pieces of compilation and doesn't need to be generated by all Wasmtime compilers. This replaces the usage in the `Compiler` trait with a `Box<Any>` type that each compiler can select. Each compiler must still produce a `FunctionInfo`, however, which is shared information we'll deserialize for each module. The `wasmtime-debug` crate is also folded into the `wasmtime-cranelift` crate as a result of this commit. One possibility was to move the `CompiledFunction` commit into its own crate and have `wasmtime-debug` depend on that, but since `wasmtime-debug` is Cranelift-specific at this time it didn't seem like it was too too necessary to keep it separate. If `wasmtime-debug` supports other backends in the future we can recreate a new crate, perhaps with it refactored to not depend on Cranelift. * Move wasmtime_environ::reference_type This now belongs in wasmtime-cranelift and nowhere else * Remove `Type` reexport in wasmtime-environ One less dependency on `cranelift-codegen`! * Remove `types` reexport from `wasmtime-environ` Less cranelift! * Remove `SourceLoc` from wasmtime-environ Change the `srcloc`, `start_srcloc`, and `end_srcloc` fields to a custom `FilePos` type instead of `ir::SourceLoc`. These are only used in a few places so there's not much to lose from an extra abstraction for these leaf use cases outside of cranelift. * Remove wasmtime-environ's dep on cranelift's `StackMap` This commit "clones" the `StackMap` data structure in to `wasmtime-environ` to have an independent representation that that chosen by Cranelift. This allows Wasmtime to decouple this runtime dependency of stack map information and let the two evolve independently, if necessary. An alternative would be to refactor cranelift's implementation into a separate crate and have wasmtime depend on that but it seemed a bit like overkill to do so and easier to clone just a few lines for this. * Define code offsets in wasmtime-environ with `u32` Don't use Cranelift's `binemit::CodeOffset` alias to define this field type since the `wasmtime-environ` crate will be losing the `cranelift-codegen` dependency soon. * Commit to using `cranelift-entity` in Wasmtime This commit removes the reexport of `cranelift-entity` from the `wasmtime-environ` crate and instead directly depends on the `cranelift-entity` crate in all referencing crates. The original reason for the reexport was to make cranelift version bumps easier since it's less versions to change, but nowadays we have a script to do that. Otherwise this encourages crates to use whatever they want from `cranelift-entity` since we'll always depend on the whole crate. It's expected that the `cranelift-entity` crate will continue to be a lean crate in dependencies and suitable for use at both runtime and compile time. Consequently there's no need to avoid its usage in Wasmtime at runtime, since "remove Cranelift at compile time" is primarily about the `cranelift-codegen` crate. * Remove most uses of `cranelift-codegen` in `wasmtime-environ` There's only one final use remaining, which is the reexport of `TrapCode`, which will get handled later. * Limit the glob-reexport of `cranelift_wasm` This commit removes the glob reexport of `cranelift-wasm` from the `wasmtime-environ` crate. This is intended to explicitly define what we're reexporting and is a transitionary step to curtail the amount of dependencies taken on `cranelift-wasm` throughout the codebase. For example some functions used by debuginfo mapping are better imported directly from the crate since they're Cranelift-specific. Note that this is intended to be a temporary state affairs, soon this reexport will be gone entirely. Additionally this commit reduces imports from `cranelift_wasm` and also primarily imports from `crate::wasm` within `wasmtime-environ` to get a better sense of what's imported from where and what will need to be shared. * Extract types from cranelift-wasm to cranelift-wasm-types This commit creates a new crate called `cranelift-wasm-types` and extracts type definitions from the `cranelift-wasm` crate into this new crate. The purpose of this crate is to be a shared definition of wasm types that can be shared both by compilers (like Cranelift) as well as wasm runtimes (e.g. Wasmtime). This new `cranelift-wasm-types` crate doesn't depend on `cranelift-codegen` and is the final step in severing the unconditional dependency from Wasmtime to `cranelift-codegen`. The final refactoring in this commit is to then reexport this crate from `wasmtime-environ`, delete the `cranelift-codegen` dependency, and then update all `use` paths to point to these new types. The main change of substance here is that the `TrapCode` enum is mirrored from Cranelift into this `cranelift-wasm-types` crate. While this unfortunately results in three definitions (one more which is non-exhaustive in Wasmtime itself) it's hopefully not too onerous and ideally something we can patch up in the future. * Get lightbeam compiling * Remove unnecessary dependency * Fix compile with uffd * Update publish script * Fix more uffd tests * Rename cranelift-wasm-types to wasmtime-types This reflects the purpose a bit more where it's types specifically intended for Wasmtime and its support. * Fix publish script	2021-08-18 13:14:52 -05:00
Pat Hickey	8b4bdf92e2	make ResourceLimiter operate on Store data; add hooks for entering and exiting native code (#2952 ) * wasmtime_runtime: move ResourceLimiter defaults into this crate In preparation of changing wasmtime::ResourceLimiter to be a re-export of this definition, because translating between two traits was causing problems elsewhere. * wasmtime: make ResourceLimiter a re-export of wasmtime_runtime::ResourceLimiter * refactor Store internals to support ResourceLimiter as part of store's data * add hooks for entering and exiting native code to Store * wasmtime-wast, fuzz: changes to adapt ResourceLimiter API * fix tests * wrap calls into wasm with entering/exiting exit hooks as well * the most trivial test found a bug, lets write some more * store: mark some methods as #[inline] on Store, StoreInner, StoreInnerMost Co-authored-By: Alex Crichton <alex@alexcrichton.com> * improve tests for the entering/exiting native hooks Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2021-06-08 09:37:00 -05:00
Pat Hickey	ff87f45604	expose eager thread-local initialization by the Engine	2021-06-04 10:47:46 -07:00
Alex Crichton	7a1b7cdf92	Implement RFC 11: Redesigning Wasmtime's APIs (#2897 ) Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.	2021-06-03 09:10:53 -05:00
Peter Huene	f12b4c467c	Add resource limiting to the Wasmtime API. (#2736 ) * Add resource limiting to the Wasmtime API. This commit adds a `ResourceLimiter` trait to the Wasmtime API. When used in conjunction with `Store::new_with_limiter`, this can be used to monitor and prevent WebAssembly code from growing linear memories and tables. This is particularly useful when hosts need to take into account host resource usage to determine if WebAssembly code can consume more resources. A simple `StaticResourceLimiter` is also included with these changes that will simply limit the size of linear memories or tables for all instances created in the store based on static values. * Code review feedback. * Implemented `StoreLimits` and `StoreLimitsBuilder`. * Moved `max_instances`, `max_memories`, `max_tables` out of `Config` and into `StoreLimits`. * Moved storage of the limiter in the runtime into `Memory` and `Table`. * Made `InstanceAllocationRequest` use a reference to the limiter. * Updated docs. * Made `ResourceLimiterProxy` generic to remove a level of indirection. * Fixed the limiter not being used for `wasmtime::Memory` and `wasmtime::Table`. * Code review feedback and bug fix. * `Memory::new` now returns `Result<Self>` so that an error can be returned if the initial requested memory exceeds any limits placed on the store. * Changed an `Arc` to `Rc` as the `Arc` wasn't necessary. * Removed `Store` from the `ResourceLimiter` callbacks. Custom resource limiter implementations are free to capture any context they want, so no need to unnecessarily store a weak reference to `Store` from the proxy type. * Fixed a bug in the pooling instance allocator where an instance would be leaked from the pool. Previously, this would only have happened if the OS was unable to make the necessary linear memory available for the instance. With these changes, however, the instance might not be created due to limits placed on the store. We now properly deallocate the instance on error. * Added more tests, including one that covers the fix mentioned above. * Code review feedback. * Add another memory to `test_pooling_allocator_initial_limits_exceeded` to ensure a partially created instance is successfully deallocated. * Update some doc comments for better documentation of `Store` and `ResourceLimiter`.	2021-04-19 09:19:20 -05:00
Peter Huene	f8f51afac1	Split out fiber stacks from fibers. This commit splits out a `FiberStack` from `Fiber`, allowing the instance allocator trait to return `FiberStack` rather than raw stack pointers. This keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand instance allocator can make use of it. The instance allocators no longer have to return a "not supported" error to indicate that the store should allocate its own fiber stack. This includes a bunch of cleanup in the instance allocator to scope stacks to the new "async" feature in the runtime. Closes #2708.	2021-03-18 20:21:02 -07:00
Alex Crichton	918c012d00	Fix some issues around TLS management with async (#2709 ) This commit fixes a few issues around managing the thread-local state of a wasmtime thread. We intentionally only have a singular TLS variable in the whole world, and the problem is that when stack-switching off an async thread we were not restoring the previous TLS state. This is necessary in two cases: * Futures aren't guaranteed to be polled/completed in a stack-like fashion. If a poll sees that a future isn't ready then we may resume execution in a previous wasm context that ends up needing the TLS information. * Futures can also cross threads (when the whole store crosses threads) and we need to save/restore TLS state from the thread we're coming from and the thread that we're going to. The stack switching issue necessitates some more glue around suspension and resumption of a stack to ensure we save/restore the TLS state on both sides. The thread issue, however, also necessitates that we use `#[inline(never)]` on TLS access functions and never have TLS borrows live across a function which could result in running arbitrary code (as was the case for the `tls::set` function.	2021-03-11 11:32:33 -06:00
Peter Huene	e71ccbf9bc	Implement the pooling instance allocator. This commit implements the pooling instance allocator. The allocation strategy can be set with `Config::with_allocation_strategy`. The pooling strategy uses the pooling instance allocator to preallocate a contiguous region of memory for instantiating modules that adhere to various limits. The intention of the pooling instance allocator is to reserve as much of the host address space needed for instantiating modules ahead of time and to reuse committed memory pages wherever possible.	2021-03-04 18:18:51 -08:00
Peter Huene	16ca5e16d9	Implement allocating fiber stacks for an instance allocator. This commit implements allocating fiber stacks in an instance allocator. The on-demand instance allocator doesn't support custom stacks, so the implementation will use the allocation from `wasmtime-fiber` for the fiber stacks. In the future, the pooling instance allocator will return custom stacks to use on Linux and macOS. On Windows, the native fiber implementation will always be used.	2021-03-04 18:18:51 -08:00
Peter Huene	5beb81d02a	Change how Instance stores instantiated memories in the runtime. This commit changes `Instance` such that memories can be stored statically, with just a base pointer, size, maximum, and a callback to make memory accessible. Previously the memories were being stored as boxed trait objects, which would require the pooling allocator to do some unpleasant things to avoid allocations. With this change, the pooling allocator can simply define a memory for the instance without using a trait object.	2021-03-04 18:18:51 -08:00
Peter Huene	b58afbf849	Refactor module instantiation in the runtime. This commit refactors module instantiation in the runtime to allow for different instance allocation strategy implementations. It adds an `InstanceAllocator` trait with the current implementation put behind the `OnDemandInstanceAllocator` struct. The Wasmtime API has been updated to allow a `Config` to have an instance allocation strategy set which will determine how instances get allocated. This change is in preparation for an alternative pooling instance allocator that can reserve all needed host process address space in advance. This commit also makes changes to the `wasmtime_environ` crate to represent compiled modules in a way that reduces copying at instantiation time.	2021-03-04 18:18:50 -08:00
Alex Crichton	703762c49e	Update support for the module linking proposal This commit updates the various tooling used by wasmtime which has new updates to the module linking proposal. This is done primarily to sync with WebAssembly/module-linking#26. The main change implemented here is that wasmtime now supports creating instances from a set of values, nott just from instantiating a module. Additionally subtyping handling of modules with respect to imports is now properly handled by desugaring two-level imports to imports of instances. A number of small refactorings are included here as well, but most of them are in accordance with the changes to `wasmparser` and the updated binary format for module linking.	2021-01-14 10:37:39 -08:00
Alex Crichton	243ab3b542	Remove the global variable associated with traps This commit removes the global variable associated with wasm traps which stores frame information. The only purpose of this global is to help symbolicate `Trap`s created since we support creating a `Trap` without a `Store`. The global, however, is only used for wasm frames on the stack, and when wasm frames are on the stack we know that our thread local for "what was the last context" is set and configured. The change here is to hijack this thread-local some more to effectively store the `Store` inside of it. All frame information is then moved directly into `Store` and no longer lives off on the side in a global. Additionally support for registering/unregistering modules is now simplified because once a module is registered with a store it can never be unregistered. This has one slight functional change where if there are two instances of `Store` interleaving calls to wasm code on the stack we'll only be able to symbolicate one of them instead of both. That's arguably also a feature however because this is sort of a way to leak information across stores right now. Otherwise, though, this isn't intended to change any existing logic, but instead keep everything working as-is.	2020-11-12 14:33:02 -08:00
Andrew Brown	c9e8889d47	Update clippy annotation to use latest version (#2375 )	2020-11-09 09:24:59 -06:00
Alex Crichton	3887881800	Refactor how signatures/trampolines are stored in `Store` This commit refactors where trampolines and signature information is stored within a `Store`, namely moving them from `wasmtime_runtime::Instance` instead to `Store` itself. The goal here is to remove an allocation inside of an `Instance` and make them a bit cheaper to create. Additionally this should open up future possibilities like not creating duplicate trampolines for signatures already in the `Store` when using `Func::new`.	2020-11-02 07:54:18 -08:00
Nick Fitzgerald	bffd54c016	wasmtime: Implement `global.{get,set}` for externref globals (#1969 ) * wasmtime: Implement `global.{get,set}` for externref globals We use libcalls to implement these -- unlike `table.{get,set}`, for which we create inline JIT fast paths -- because no known toolchain actually uses externref globals. Part of #929 * wasmtime: Enable `{extern,func}ref` globals in the API	2020-07-02 16:04:01 -05:00
Nick Fitzgerald	58bb5dd953	wasmtime: Add support for `func.ref` and `table.grow` with `funcref`s `funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`. This should be more efficient than using a `VMExternRef` that points at a `VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic allocation, and some reference counting. Note that the null function reference is NOT a null pointer; it is a `VMCallerCheckedAnyfunc` that has a null `func_ptr` member. Part of #929	2020-06-24 10:08:13 -07:00
Nick Fitzgerald	647d2b4231	Merge pull request #1832 from fitzgen/externref-stack-maps externref: implement stack map-based garbage collection	2020-06-15 18:26:24 -07:00
Nick Fitzgerald	8d671c21e2	wasmtime-runtime: Allow tables to internally hold `externref`s (#1882 ) This commit enables `wasmtime_runtime::Table` to internally hold elements of either `funcref` (all that is currently supported) or `externref` (newly introduced in this commit). This commit updates `Table`'s API, but does NOT generally propagate those changes outwards all the way through the Wasmtime embedding API. It only does enough to get everything compiling and the current test suite passing. It is expected that as we implement more of the reference types spec, we will bubble these changes out and expose them to the embedding API.	2020-06-15 16:55:23 -05:00
Nick Fitzgerald	f30ce1fe97	externref: implement stack map-based garbage collection For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a `VMExternRef` avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the `VMExternRef` is cloned. When passing a `VMExternRef` into compiled Wasm code, we don't want to do reference count mutations for every compiled `local.{get,set}`, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing `VMExternRef`s somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of `VMExternRef`s that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of `VMExternRef`s inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the `VMExternRef`s that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set. The `VMExternRefActivationsTable` implements the over-approximized set of `VMExternRef`s referenced by Wasm activations. Calling a Wasm function and passing it a `VMExternRef` moves the `VMExternRef` into the table, and the compiled Wasm function logically "borrows" the `VMExternRef` from the table. Similarly, `global.get` and `table.get` operations clone the gotten `VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the reference out of the table. When a `VMExternRef` is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the `VMExternRefActivationsTable` and the reference count from the table will be dropped at the next GC). For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf cc #929 Fixes #1804	2020-06-15 09:39:37 -07:00
Nick Fitzgerald	1898d52966	runtime: Introduce `VMExternRef` and `VMExternData` `VMExternRef` is a reference-counted box for any kind of data that is external and opaque to running Wasm. Sometimes it might hold a Wasmtime thing, other times it might hold something from a Wasmtime embedder and is opaque even to us. It is morally equivalent to `Rc<dyn Any>` in Rust, but additionally always fits in a pointer-sized word. `VMExternRef` is non-nullable, but `Option<VMExternRef>` is a null pointer. The one part of `VMExternRef` that can't ever be opaque to us is the reference count. Even when we don't know what's inside an `VMExternRef`, we need to be able to manipulate its reference count as we add and remove references to it. And we need to do this from compiled Wasm code, so it must be `repr(C)`! `VMExternRef` itself is just a pointer to an `VMExternData`, which holds the opaque, boxed value, its reference count, and its vtable pointer. The `VMExternData` struct is preceded by the dynamically-sized value boxed up and referenced by one or more `VMExternRef`s: ```ignore ,-------------------------------------------------------. \| \| V \| +----------------------------+-----------+-----------+ \| \| dynamically-sized value... \| ref_count \| value_ptr \|---' +----------------------------+-----------+-----------+ \| VMExternData \| +-----------------------+ ^ +-------------+ \| \| VMExternRef \|-------------------+ +-------------+ \| \| +-------------+ \| \| VMExternRef \|-------------------+ +-------------+ \| \| ... === \| +-------------+ \| \| VMExternRef \|-------------------' +-------------+ ``` The `value_ptr` member always points backwards to the start of the dynamically-sized value (which is also the start of the heap allocation for this value-and-`VMExternData` pair). Because it is a `dyn` pointer, it is fat, and also points to the value's `Any` vtable. The boxed value and the `VMExternRef` footer are held a single heap allocation. The layout described above is used to make satisfying the value's alignment easy: we just need to ensure that the heap allocation used to hold everything satisfies its alignment. It also ensures that we don't need a ton of excess padding between the `VMExternData` and the value for values with large alignment.	2020-06-01 14:53:10 -07:00
Alex Crichton	c284ffe6c0	Move trap handler initialization to per-`Store` (#1644 ) Previously we initialized trap handling (signals/etc) once-per-instance but that's a bit too granular since we only need to do this as one-time per-program initialization. This moves the initialization to `Store` instead which means that we'll call this at least once per thread, which some platforms may need (none currently do, they all only need per-program initialization, but Fuchsia will need per-thread initialization).	2020-05-01 19:55:35 -05:00
Alex Crichton	d719ec7e1c	Don't try to handle non-wasmtime segfaults (#1577 ) This commit fixes an issue in Wasmtime where Wasmtime would accidentally "handle" non-wasm segfaults while executing host imports of wasm modules. If a host import segfaulted then Wasmtime would recognize that wasm code is on the stack, so it'd longjmp out of the wasm code. This papers over real bugs though in host code and erroneously classified segfaults as wasm traps. The fix here was to add a check to our wasm signal handler for if the faulting address falls in JIT code itself. Actually threading through all the right information for that check to happen is a bit tricky, though, so this involved some refactoring: * A closure parameter to `catch_traps` was added. This closure is responsible for classifying addresses as whether or not they fall in JIT code. Anything returning `false` means that the trap won't get handled and we'll forward to the next signal handler. * To avoid passing tons of context all over the place, the start function is now no longer automatically invoked by `InstanceHandle`. This avoids the need for passing all sorts of trap-handling contextual information like the maximum stack size and "is this a jit address" closure. Instead creators of `InstanceHandle` (like wasmtime) are now responsible for invoking the start function. * To avoid excessive use of `transmute` with lifetimes since the traphandler state now has a lifetime the per-instance custom signal handler is now replaced with a per-store custom signal handler. I'm not entirely certain the purpose of the custom signal handler, though, so I'd look for feedback on this part. A new test has been added which ensures that if a host function segfaults we don't accidentally try to handle it, and instead we correctly report the segfault.	2020-04-29 14:24:54 -05:00

1 2

72 Commits