wasmtime

Author	SHA1	Message	Date
Alex Crichton	d147802d51	Update wasm-tools crates (#3997 ) * Update wasm-tools crates This commit updates the wasm-tools family of crates as used in Wasmtime. Notably this brings in the update which removes module linking support as well as a number of internal refactorings around names and such within wasmparser itself. This updates all of the wasm translation support which binds to wasmparser as appropriate. Other crates all had API-compatible changes for at least what Wasmtime used so no further changes were necessary beyond updating version requirements. * Update a test expectation	2022-04-05 14:32:33 -05:00
Alex Crichton	011d2e1faa	Refactor away the `Instantiator` type in Wasmtime (#3972 ) * Refactor away the `Instantiator` type in Wasmtime This internal type in Wasmtime was primarily used for the module linking proposal to handle instantiation of many instances and refactor out the sync and async parts to minimize duplication. With the removal of the module linking proposal, however, this type isn't really necessary any longer. In working to implement the component model proposal I was looking already to refactor this and I figured it'd be good to land that ahead of time on `main` separate of other refactorings. This commit removes the `Instantiator` type in the `instance` module. The type was already private to Wasmtime so this shouldn't have any impact on consumers. This allows simplifying various code paths to avoid another abstraction. The meat of instantiation is moved to `Instance::new_raw` which should be reusable for the component model as well. One bug is actually fixed in this commit as well where `Linker::instantiate` and `InstancePre::instantiate` failed to check that async support was disabled on a store. This means that they could have led to a panic if used with an async store and a start function called an async import (or an async resource limiter yielded). A few tests were updated with this. * Review comments	2022-04-05 10:35:00 -05:00
wasmtime-publish	78a595ac88	Bump Wasmtime to 0.37.0 (#3994 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-04-05 09:24:28 -05:00
Alex Crichton	7b5176baea	Upgrade all crates to the Rust 2021 edition (#3991 ) * Upgrade all crates to the Rust 2021 edition I've personally started using the new format strings for things like `panic!("some message {foo}")` or similar and have been upgrading crates on a case-by-case basis, but I think it probably makes more sense to go ahead and blanket upgrade everything so 2021 features are always available. * Fix compile of the C API * Fix a warning * Fix another warning	2022-04-04 12:27:12 -05:00
Alex Crichton	c89dc55108	Add a two-week delay to Wasmtime's release process (#3955 ) * Bump to 0.36.0 * Add a two-week delay to Wasmtime's release process This commit is a proposal to update Wasmtime's release process with a two-week delay from branching a release until it's actually officially released. We've had two issues lately that came up which led to this proposal: * In #3915 it was realized that changes just before the 0.35.0 release weren't enough for an embedding use case, but the PR didn't meet the expectations for a full patch release. * At Fastly we were about to start rolling out a new version of Wasmtime when over the weekend the fuzz bug #3951 was found. This led to the desire internally to have a "must have been fuzzed for this long" period of time for Wasmtime changes which we felt were better reflected in the release process itself rather than something about Fastly's own integration with Wasmtime. This commit updates the automation for releases to unconditionally create a `release-X.Y.Z` branch on the 5th of every month. The actual release from this branch is then performed on the 20th of every month, roughly two weeks later. This should provide a period of time to ensure that all changes in a release are fuzzed for at least two weeks and avoid any further surprises. This should also help with any last-minute changes made just before a release if they need tweaking since backporting to a not-yet-released branch is much easier. Overall there are some new properties about Wasmtime with this proposal as well: * The `main` branch will always have a section in `RELEASES.md` which is listed as "Unreleased" for us to fill out. * The `main` branch will always be a version ahead of the latest release. For example it will be bump pre-emptively as part of the release process on the 5th where if `release-2.0.0` was created then the `main` branch will have 3.0.0 Wasmtime. * Dates for major versions are automatically updated in the `RELEASES.md` notes. The associated documentation for our release process is updated and the various scripts should all be updated now as well with this commit. * Add notes on a security patch * Clarify security fixes shouldn't be previewed early on CI	2022-04-01 13:11:10 -05:00
Alex Crichton	353f1b48ab	Split wasmtime-runtime's single getter into typed getters (#3987 ) This splits the existing `lookup_by_declaration` function into a lookup-per-type-of-item. This refactor ends up cleaning up a fair bit of code in the `wasmtime` crate by removing a number of `unreachable!()` blocks which are now no longer necessary.	2022-03-31 16:24:42 -05:00
Alex Crichton	fde84aa79c	Deduplicate creation of `CompiledModule` (#3986 ) Push the creation of a module's `CompiledModule` into one location of `Module::from_parts` instead of duplicating it across two callers.	2022-03-31 15:37:16 -05:00
Alex Crichton	d1d10dc8da	Refactor the `TypeTables` type (#3971 ) * Remove duplicate `TypeTables` type This was once needed historically but it is no longer needed. * Make the internals of `TypeTables` private Instead of reaching internally for the `wasm_signatures` map an `Index` implementation now exists to indirect accesses through the type of the index being accessed. For the component model this table of types will grow a number of other tables and this'll assist in consuming sites not having to worry so much about which map they're reaching into.	2022-03-30 13:51:25 -05:00
Alex Crichton	76b82910c9	Remove the module linking implementation in Wasmtime (#3958 ) * Remove the module linking implementation in Wasmtime This commit removes the experimental implementation of the module linking WebAssembly proposal from Wasmtime. The module linking is no longer intended for core WebAssembly but is instead incorporated into the component model now at this point. This means that very large parts of Wasmtime's implementation of module linking are no longer applicable and would change greatly with an implementation of the component model. The main purpose of this is to remove Wasmtime's reliance on the support for module-linking in `wasmparser` and tooling crates. With this reliance removed we can move over to the `component-model` branch of `wasmparser` and use the updated support for the component model. Additionally given the trajectory of the component model proposal the embedding API of Wasmtime will not look like what it looks like today for WebAssembly. For example the core wasm `Instance` will not change and instead a `Component` is likely to be added instead. Some more rationale for this is in #3941, but the basic idea is that I feel that it's not going to be viable to develop support for the component model on a non-`main` branch of Wasmtime. Additionaly I don't think it's viable, for the same reasons as `wasm-tools`, to support the old module linking proposal and the new component model at the same time. This commit takes a moment to not only delete the existing module linking implementation but some abstractions are also simplified. For example module serialization is a bit simpler that there's only one module. Additionally instantiation is much simpler since the only initializer we have to deal with are imports and nothing else. Closes #3941 * Fix doc link * Update comments	2022-03-23 14:57:34 -05:00
Adam Wick	6a60e8363f	Add support for `async` call hooks (#3876 ) * Instead of simply panicking, return an error when we attempt to resume on a dying fiber. This situation should never occur in the existing code base, but can be triggered if support for running outside async code in a call hook. * Shift `async_cx()` to return an `Option`, reflecting if the fiber is dying. This should never happen in the existing code base, but is a nice forward-looking guard. The current implementations simply lift the trap that would eventually be produced by such an operation into a `Trap` (or similar) at the invocation of `async_cx()`. * Add support for using `async` call hooks. This retains the ability to do non-async hooks. Hooks end up being implemented as an async trait with a handler call, to get around some issues passing around async closures. This change requires some of the prior changes to handle picking up blocked tasks during fiber shutdown, to avoid some panics during timeouts and other such events. * More fully specify a doc link, to avoid a doc-building error. * Revert the use of catchable traps on cancellation of a fiber; turn them into expect()/unwrap(). The justification for this revert is that (a) these events shouldn't happen, and (b) they wouldn't be catchable by wasm anyways. * Replace a duplicated check in `async` hook evaluation with a single check. This also moves the checks inside of their respective Async variants, meaning that if you're using an async-enabled version of wasmtime but using the synchronous versions of the callbacks, you won't pay any penalty for validating the async context. * Use `match &mut ...` insead of `ref mut`. * Add some documentation on why/when `async_cx` can return None. * Add two simple test cases for async call hooks. * Fix async_cx() to check both the box and the value for current_poll_cx. In the prior version, we only checked that the box had not been cleared, but had not ensured that there was an actual context for us to use. This updates the check to validate both, returning None if the inner context is missing. This allows us to skip a validation check inside `block_on`, since all callers will have run through the `async_cx` check prior to arrival. * Tweak the timeout test to address PR suggestions. * Add a test about dropping async hooks while suspended Should help exercise that the check for `None` is properly handled in a few more locations. Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-03-23 10:43:34 -05:00
Alex Crichton	41594dc5d9	Expose details for mlocking modules externally (#3944 ) This commit exposes some various details and config options for having finer-grain control over mlock-ing the memory of modules. This amounts to three different changes being present in this commit: * A new `Module::image_range` API is added to expose the range in host memory of where the compiled image resides. This enables embedders to make mlock-ing decisions independently of Wasmtime. Otherwise though there's not too much useful that can be done with this range information at this time. * A new `Config::force_memory_init_memfd` option has been added. This option is used to force the usage of `memfd_create` on Linux even when the original module comes from a file on disk. With mlock-ing the main purpose for Wasmtime is likely to be avoiding major page faults that go back to disk, so this is another major source of avoiding page faults by ensuring that the initialization contents of memory are always in RAM. * The `memory_images` field of a `Module` has gone back to being lazily created on the first instantiation, effectively reverting #3914. This enables embedders to defer the creation of the image to as late as possible to allow modules to be created from precompiled images without actually loading all the contents of the data segments from disk immediately. These changes are all somewhat low-level controls which aren't intended to be generally used by embedders. If fine-grained control is desired though it's hoped that these knobs provide what's necessary to be achieved.	2022-03-18 13:51:55 -05:00
Alex Crichton	3f9bff17c8	Support disabling backtraces at compile time (#3932 ) * Support disabling backtraces at compile time This commit adds support to Wasmtime to disable, at compile time, the gathering of backtraces on traps. The `wasmtime` crate now sports a `wasm-backtrace` feature which, when disabled, will mean that backtraces are never collected at compile time nor are unwinding tables inserted into compiled objects. The motivation for this commit stems from the fact that generating a backtrace is quite a slow operation. Currently backtrace generation is done with libunwind and `_Unwind_Backtrace` typically found in glibc or other system libraries. When thousands of modules are loaded into the same process though this means that the initial backtrace can take nearly half a second and all subsequent backtraces can take upwards of hundreds of milliseconds. Relative to all other operations in Wasmtime this is extremely expensive at this time. In the future we'd like to implement a more performant backtrace scheme but such an implementation would require coordination with Cranelift and is a big chunk of work that may take some time, so in the meantime if embedders don't need a backtrace they can still use this option to disable backtraces at compile time and avoid the performance pitfalls of collecting backtraces. In general I tried to originally make this a runtime configuration option but ended up opting for a compile-time option because `Trap::new` otherwise has no arguments and always captures a backtrace. By making this a compile-time option it was possible to configure, statically, the behavior of `Trap::new`. Additionally I also tried to minimize the amount of `#[cfg]` necessary by largely only having it at the producer and consumer sites. Also a noteworthy restriction of this implementation is that if backtrace support is disabled at compile time then reference types support will be unconditionally disabled at runtime. With backtrace support disabled there's no way to trace the stack of wasm frames which means that GC can't happen given our current implementation. * Always enable backtraces for the C API	2022-03-16 09:18:16 -05:00
Alex Crichton	c22033bf93	Delete historical interruptable support in Wasmtime (#3925 ) * Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`.	2022-03-14 15:25:11 -05:00
Alex Crichton	bc1170c2dd	Eagerly create `ModuleMemoryImages` (#3914 ) This commit removes the currently existing laziness-via-`OnceCell` when a `Module` is created for creating a `ModuleMemoryImages` data structure. Processing of data is now already shifted to compile time for the wasm module which means that creating a `ModuleMemoryImages` is either cheap because the module is backed by a file on disk, it's a single `write` into the kernel to a memfd, or it's cheap as it's not supported. This should help make module instantiation time more deterministic, even for the first instantiation of a module.	2022-03-10 12:23:34 -06:00
Alex Crichton	2f4419cc6c	Implement runtime checks for compilation settings (#3899 ) * Implement runtime checks for compilation settings This commit fills out a few FIXME annotations by implementing run-time checks that when a `Module` is created it has compatible codegen settings for the current host (as `Module` is proof of "this code can run"). This is done by implementing new `Engine`-level methods which validate compiler settings. These settings are validated on `Module::new` as well as when loading serialized modules. Settings are split into two categories, one for "shared" top-level settings and one for ISA-specific settings. Both categories now have allow-lists hardcoded into `Engine` which indicate the acceptable values for each setting (if applicable). ISA-specific settings are checked with the Rust standard library's `std::is_x86_feature_detected!` macro. Other macros for other platforms are not stable at this time but can be added here if necessary. Closes #3897 * Fix fall-through logic to actually be correct * Use a `OnceCell`, not an `AtomicBool` * Fix some broken tests	2022-03-09 09:46:25 -06:00
wasmtime-publish	9137b4a50e	Bump Wasmtime to 0.35.0 (#3885 ) [automatically-tag-and-release-this-commit] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-03-07 15:18:34 -06:00
Alex Crichton	352908e960	Fix calling call hooks with `unchecked` func variants (#3881 ) This commit fixes calling the call hooks configured in a store for host functions defined with `Func::new_unchecked` or similar. I believe that this was just an accidental oversight and there's no fundamental reason to not support this.	2022-03-04 12:29:44 -06:00
Andrew Brown	a7567cb9ec	typo: fix typos in documentation comments (#3882 )	2022-03-04 10:16:37 -08:00
Alex Crichton	29ebfa4d93	Fix a nightly warning (#3863 ) Looks like this `unsafe` block is not necessary, even on stable, and nightly linting has picked it up now.	2022-02-28 17:18:37 -06:00
Alex Crichton	aeaca2062f	Decrease default wasm stack to 512k from 1M (#3861 ) This commit aims to achieve the goal of being able to run the test suite on Windows with `--test-threads 1`, or more notably allowing Wasmtime's defaults to work better with the main thread on Windows which appears to have a smaller stack by default than Linux by comparison. In decreasing the default wasm stack size a test is also update to probe for less stack to work on Windows' main thread by default, ideally allowing the full test suite to work with `--test-threads 1` (although this isn't added to CI as it's not really critical). Closes #3857	2022-02-28 12:18:11 -06:00
Alex Crichton	2a6969d2bd	Shrink the size of the anyfunc table in `VMContext` (#3850 ) * Shrink the size of the anyfunc table in `VMContext` This commit shrinks the size of the `VMCallerCheckedAnyfunc` table allocated into a `VMContext` to be the size of the number of "escaped" functions in a module rather than the number of functions in a module. Escaped functions include exports, table elements, etc, and are typically an order of magnitude smaller than the number of functions in general. This should greatly shrink the `VMContext` for some modules which while we aren't necessarily having any problems with that today shouldn't cause any problems in the future. The original motivation for this was that this came up during the recent lazy-table-initialization work and while it no longer has a direct performance benefit since tables aren't initialized at all on instantiation it should still improve long-running instances theoretically with smaller `VMContext` allocations as well as better locality between anyfuncs. * Fix some tests * Remove redundant hash set * Use a helper for pushing function type information * Use a more descriptive `is_escaping` method * Clarify a comment * Fix condition	2022-02-28 10:11:04 -06:00
Alex Crichton	15bb0c6903	Remove the `ModuleLimits` pooling configuration structure (#3837 ) * Remove the `ModuleLimits` pooling configuration structure This commit is an attempt to improve the usability of the pooling allocator by removing the need to configure a `ModuleLimits` structure. Internally this structure has limits on all forms of wasm constructs but this largely bottoms out in the size of an allocation for an instance in the instance pooling allocator. Maintaining this list of limits can be cumbersome as modules may get tweaked over time and there's otherwise no real reason to limit the number of globals in a module since the main goal is to limit the memory consumption of a `VMContext` which can be done with a memory allocation limit rather than fine-tuned control over each maximum and minimum. The new approach taken in this commit is to remove `ModuleLimits`. Some fields, such as `tables`, `table_elements` , `memories`, and `memory_pages` are moved to `InstanceLimits` since they're still enforced at runtime. A new field `size` is added to `InstanceLimits` which indicates, in bytes, the maximum size of the `VMContext` allocation. If the size of a `VMContext` for a module exceeds this value then instantiation will fail. This involved adding a few more checks to `{Table, Memory}::new_static` to ensure that the minimum size is able to fit in the allocation, since previously modules were validated at compile time of the module that everything fit and that validation no longer happens (it happens at runtime). A consequence of this commit is that Wasmtime will have no built-in way to reject modules at compile time if they'll fail to be instantiated within a particular pooling allocator configuration. Instead a module must attempt instantiation see if a failure happens. * Fix benchmark compiles * Fix some doc links * Fix a panic by ensuring modules have limited tables/memories * Review comments * Add back validation at `Module` time instantiation is possible This allows for getting an early signal at compile time that a module will never be instantiable in an engine with matching settings. * Provide a better error message when sizes are exceeded Improve the error message when an instance size exceeds the maximum by providing a breakdown of where the bytes are all going and why the large size is being requested. * Try to fix test in qemu * Flag new test as 64-bit only Sizes are all specific to 64-bit right now	2022-02-25 09:11:51 -06:00
Nick Fitzgerald	bad9a35418	`wasm-mutate` fuzz targets (#3836 ) * fuzzing: Add a custom mutator based on `wasm-mutate` * fuzz: Add a version of the `compile` fuzz target that uses `wasm-mutate` * Update `wasmparser` dependencies	2022-02-23 12:14:11 -08:00
Andrew Brown	5a5e401a9c	doc: fix typo (#3838 )	2022-02-22 22:30:32 -08:00
Alex Crichton	bbd4a4a500	Enable copy-on-write heap initialization by default (#3825 ) * Enable copy-on-write heap initialization by default This commit enables the `Config::memfd` feature by default now that it's been fuzzed for a few weeks on oss-fuzz, and will continue to be fuzzed leading up to the next release of Wasmtime in early March. The documentation of the `Config` option has been updated as well as adding a CLI flag to disable the feature. * Remove ubiquitous "memfd" terminology Switch instead to forms of "memory image" or "cow" or some combination thereof. * Update new option names	2022-02-22 17:12:18 -06:00
Alex Crichton	709f7e0c8a	Enable SSE 4.2 unconditionally (#3833 ) * Enable SSE 4.2 unconditionally Fuzzing over the weekend found that `i64x2` comparison operators require `pcmpgtq` which is an SSE 4.2 instruction. Along the lines of #3816 this commit unconditionally enables and requires SSE 4.2 for compilation and fuzzing. It will no longer be possible to create a compiler for x86_64 with simd enabled if SSE 4.2 is disabled. * Update comment	2022-02-22 13:23:51 -06:00
Chris Fallin	43d31c5bf7	memfd: make "dense image" heuristic limit configurable. (#3831 ) In #3820 we see an issue with the new heuristics that control use of memfd: it's entirely possible for a reasonable Wasm module produced by a snapshotting system to have a relatively sparse heap (less than 50% filled). A system that avoids memfd because of this would have an undesirable performance reduction on such modules. Ultimately we should try to implement a hybrid scheme where we support outlier/leftover initializers, but for now this PR makes the "always allow dense" limit configurable. This way, embedders that want to ensure that memfd is used can do so, if they have other knowledge about the maximum heap size allowed in their system. (Partially addresses #3820 but let's leave it open to track the hybrid idea)	2022-02-22 12:40:43 -06:00
Andrew Brown	c183e93b80	x64: enable VTune support by default (#3821 ) * x64: enable VTune support by default After significant work in the `ittapi-rs` crate, this dependency should build without issue on Wasmtime's supported operating systems: Windows, Linux, and macOS. The difference in the release binary is <20KB, so this change makes `vtune` a default build feature. This change upgrades `ittapi-rs` to v0.2.0 and updates the documentation. * review: add configuration for defaults in more places * review: remove OS conditional compilation, add architecture * review: do not default vtune feature in wasmtime-jit	2022-02-22 08:32:09 -08:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Alex Crichton	b438617e12	Further minor optimizations to instantiation (#3791 ) * Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic	2022-02-11 09:55:08 -06:00
Alex Crichton	c0c368d151	Use mmap'd `.cwasm` as a source for memory initialization images (#3787 ) Skip memfd creation with precompiled modules This commit updates the memfd support internally to not actually use a memfd if a compiled module originally came from disk via the `wasmtime::Module::deserialize_file` API. In this situation we already have a file descriptor open and there's no need to copy a module's heap image to a new file descriptor. To facilitate a new source of `mmap` the currently-memfd-specific-logic of creating a heap image is generalized to a new form of `MemoryInitialization` which is attempted for all modules at module-compile-time. This means that the serialized artifact to disk will have the memory image in its entirety waiting for us. Furthermore the memory image is ensured to be padded and aligned carefully to the target system's page size, notably meaning that the data section in the final object file is page-aligned and the size of the data section is also page aligned. This means that when a precompiled module is mapped from disk we can reuse the underlying `File` to mmap all initial memory images. This means that the offset-within-the-memory-mapped-file can differ for memfd-vs-not, but that's just another piece of state to track in the memfd implementation. In the limit this waters down the term "memfd" for this technique of quickly initializing memory because we no longer use memfd unconditionally (only when the backing file isn't available). This does however open up an avenue in the future to porting this support to other OSes because while `memfd_create` is Linux-specific both macOS and Windows support mapping a file with copy-on-write. This porting isn't done in this PR and is left for a future refactoring. Closes #3758 * Enable "memfd" support on all unix systems Cordon off the Linux-specific bits and enable the memfd support to compile and run on platforms like macOS which have a Linux-like `mmap`. This only works if a module is mapped from a precompiled module file on disk, but that's better than not supporting it at all! * Fix linux compile * Use `Arc<File>` instead of `MmapVecFileBacking` * Use a named struct instead of mysterious tuples * Comment about unsafety in `Module::deserialize_file` * Fix tests * Fix uffd compile * Always align data segments No need to have conditional alignment since their sizes are all aligned anyway * Update comment in build.rs * Use rustix, not `region` * Fix some confusing logic/names around memory indexes These functions all work with memory indexes, not specifically defined memory indexes.	2022-02-10 15:40:40 -06:00
Alex Crichton	520a7f26d7	Move function names out of `Module` (#3789 ) * Move function names out of `Module` This commit moves function names in a module out of the `wasmtime_environ::Module` type and into separate sections stored in the final compiled artifact. Spurred on by #3787 to look at module load times I noticed that a huge amount of time was spent in deserializing this map. The `spidermonkey.wasm` file, for example, has a 3MB name section which is a lot of unnecessary data to deserialize at module load time. The names of functions are now split out into their own dedicated section of the compiled artifact and metadata about them is stored in a more compact format at runtime by avoiding a `BTreeMap` and instead using a sorted array. Overall this improves deserialize times by up to 80% for modules with large name sections since the name section is no longer deserialized at load time and it's lazily paged in as names are actually referenced. * Fix a typo * Fix compiled module determinism Need to not only sort afterwards but also first to ensure the data of the name section is consistent.	2022-02-10 14:34:48 -06:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
Peter Huene	1b27508a42	Fix incorrect use of `MemoryIndex` in the pooling allocator. (#3782 ) This commit corrects a few places where `MemoryIndex` was used and treated like a `DefinedMemoryIndex` in the pooling instance allocator. When the unstable `multi-memory` proposal is enabled, it is possible to cause a newly allocated instance to use an incorrect base address for any defined memories by having the module being instantiated also import a memory. This requires enabling the unstable `multi-memory` proposal, configuring the use of the pooling instance allocator (not the default), and then configuring the module limits to allow imported memories (also not the default). The fix is to replace all uses of `MemoryIndex` with `DefinedMemoryIndex` in the pooling instance allocator. Several `debug_assert!` have also been updated to `assert!` to sanity check the state of the pooling allocator even in release builds.	2022-02-09 09:39:29 -06:00
Alex Crichton	5cd97c054d	Update memfd support with a runtime toggle (#3778 ) This commit updates the `memfd` support in Wasmtime to have a runtime toggle as to whether it's used or not. The compile-time feature gating `memfd` support is now also re-enabled by default, but the new runtime switch is still disabled-by-default. Additionally this commit updates our fuzz oracle to turn on/off the memfd flag to re-enable fuzzing with memfd on oss-fuzz.	2022-02-08 15:47:13 -06:00
wasmtime-publish	39b88e4e9e	Release Wasmtime 0.34.0 (#3768 ) * Bump Wasmtime to 0.34.0 [automatically-tag-and-release-this-commit] * Add release notes for 0.34.0 * Update release date to today Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-02-07 19:16:26 -06:00
Chris Fallin	88b53b12aa	Turn off memfd by default, at least for this upcoming release. (#3774 ) Since memfd support just landed, and has had only ~0.5 weeks to bake with fuzzing, we want to make release 0.34.0 of Wasmtime without it enabled by default. This PR disables memfd by default; it can be enabled by specifying the `memfd` feature for the `wasmtime` crate, or when building the commandline binary. We plan to explicitly add memfd-enabled fuzzing targets, let that go for a while, then probably re-enable memfd in the subsequent release if no issues come up.	2022-02-07 15:44:53 -06:00
Chris Fallin	2a24a0fbde	Make memfd image creation lazy (on first instantiation). As a followup to the recent memfd allocator work, this PR makes the memfd image creation occur on the first instantiation, rather than immediately when the `Module` is loaded. This shaves off a potentially surprising cost spike that would have otherwise occurred: prior to the memfd work, no allocator eagerly read the module's initial heap state into RAM. The behavior should now more closely resemble what happened before (and the improvements in overall instantiation time and performance, as compared to either eager init with pure-mmap memory or user-mode pagefault handling with uffd, remain).	2022-02-03 12:46:34 -08:00
Nick Fitzgerald	605c79fd05	Merge pull request #3756 from alexcrichton/update-wasm-tools Update wasm-tools crates	2022-02-03 11:19:55 -08:00
Alex Crichton	b647561c44	memfd: Some minor follow-ups (#3759 ) * Tweak memfd-related features crates This commit changes the `memfd` feature for the `wasmtime-cli` crate from an always-on feature to a default-on feature which can be disabled at compile time. Additionally the `pooling-allocator` feature is also given similar treatment. Additionally some documentation was added for the `memfd` feature on the `wasmtime` crate. * Don't store `Arc<T>` in `InstanceAllocationRequest` Instead store `&Arc<T>` to avoid having the clone that lives in `InstanceAllocationRequest` not actually going anywhere. Otherwise all instance allocation requires an extra clone to create it for the request and an extra decrement when the request goes away. Internally clones are made as necessary when creating instances. * Enable the pooling allocator by default for `wasmtime-cli` While perhaps not the most useful option since the CLI doesn't have a great way to take advantage of this it probably makes sense to at least match the features of `wasmtime` itself. * Fix some lints and issues * More compile fixes	2022-02-03 09:17:04 -06:00
Chris Fallin	5deb1f1fbf	Merge pull request #3738 from cfallin/pooling-affinity Pooling allocator: add a reuse-affinity policy.	2022-02-02 13:11:39 -08:00
Chris Fallin	99ed8cc9be	Merge pull request #3697 from cfallin/memfd-cow memfd/madvise-based CoW pooling allocator	2022-02-02 13:04:26 -08:00
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	2022-02-02 12:25:30 -08:00
Alex Crichton	65486a0680	Update wasm-tools crates Nothing major here, just a routine update with a few extra things to handle here-and-there.	2022-02-02 09:50:08 -08:00
Alex Crichton	c83968575a	Lazily populate a store's trampoline map (#3742 ) * Lazily populate a store's trampoline map This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement. * Review comments * Remove debug assert	2022-02-02 09:43:29 -06:00
Chris Fallin	01e6bb81fb	Review feedback.	2022-02-01 15:49:44 -08:00
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	2022-01-31 22:39:20 -08:00
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	2022-01-31 13:59:51 -08:00
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	2022-01-31 12:53:18 -08:00
Alex Crichton	7928a3ffb4	Lazily load types into `Func` (#3727 ) * Lazily load types into `Func` This commit changes the construction of a `Func` to lazily load the type information if required instead of always loading the type information at `Func`-construction time. The main purpose of this change is to accelerate instantiation of instances which have many imports. Currently in the fast way of doing this the instantiation loop looks like: let mut store = Store::new(&engine, ...); let instance = instance_pre.instantiate(&mut store); In this situation the `instance_pre` will typically load host-defined functions (defined via `Linker` APIs) into the `Store` as individual `Func` items and then perform the instantiation process. The operation of loading a `HostFunc` into a `Store` however currently involves two expensive operations: * First a read-only lock is taken on the `RwLock` around engine signatures. * Next a clone of the wasm type is made to pull it out of the engine signature registry. Neither of these is actually necessary for imported functions. The `FuncType` for imported functions is never used since all comparisons happen with the intern'd indices instead. The only time a `FuncType` is used typically is for exported functions when using `Func::typed` or similar APIs which need type information. This commit makes this path faster by storing `Option<FuncType>` instead of `FuncType` within a `Func`. This means that it starts out as `None` and is only filled in on-demand as necessary. This means that when instantiating a module with many imports no clones/locks are done. On a simple microbenchmark where a module with 100 imports is instantiated this PR improves instantiation time by ~35%. Due to the rwlock used here and the general inefficiency of pthreads rwlocks the effect is even more profound when many threads are performing the same instantiation process. On x86_64 with 8 threads performing instantiation this PR improves instantiation time by 80% and on arm64 it improves by 97% (wow read-contended glibc rwlocks on arm64 are slow). Note that much of the improvement here is also from memory allocatoins/deallocations no longer being performed because dropping functions within a store no longer requires deallocating the `FuncType` if it's not present. A downside of this PR is that `Func::ty` is now unconditionally taking an rwlock if the type hasn't already been filled in. (it uses the engine). If this is an issue in the future though we can investigate at that time using somthing like a `Once` to lazily fill in even when mutable access to the store isn't available. * Review comments	2022-01-26 14:23:37 -06:00

1 2 3 4 5 ...

407 Commits