wasmtime

Author	SHA1	Message	Date
Alex Crichton	4ba3404ca3	Fix MemFd's allocated memory for dynamic memories (#3763 ) This fixes a bug in the memfd-related management of a linear memory where for dynamic memories memfd wasn't informed of the extra room that the dynamic memory could grow into, only the actual size of linear memory, which ended up tripping an assert once the memory was grown. The fix here is pretty simple which is to factor in this extra space when passing the allocation size to the creation of the `MemFdSlot`.	2022-02-03 11:56:16 -06:00
Alex Crichton	b647561c44	memfd: Some minor follow-ups (#3759 ) * Tweak memfd-related features crates This commit changes the `memfd` feature for the `wasmtime-cli` crate from an always-on feature to a default-on feature which can be disabled at compile time. Additionally the `pooling-allocator` feature is also given similar treatment. Additionally some documentation was added for the `memfd` feature on the `wasmtime` crate. * Don't store `Arc<T>` in `InstanceAllocationRequest` Instead store `&Arc<T>` to avoid having the clone that lives in `InstanceAllocationRequest` not actually going anywhere. Otherwise all instance allocation requires an extra clone to create it for the request and an extra decrement when the request goes away. Internally clones are made as necessary when creating instances. * Enable the pooling allocator by default for `wasmtime-cli` While perhaps not the most useful option since the CLI doesn't have a great way to take advantage of this it probably makes sense to at least match the features of `wasmtime` itself. * Fix some lints and issues * More compile fixes	2022-02-03 09:17:04 -06:00
Alex Crichton	8ed79c8f57	memfd: Reduce some syscalls in the on-demand case (#3757 ) * memfd: Reduce some syscalls in the on-demand case This tweaks the internal organization of the `MemFdSlot` to avoid some syscalls in the default case as well as opportunistically in the pooling case. The two cases added here are: * A `MemFdSlot` is now created with a specified initial size. For pooling this is 0 but for the on-demand case this can be non-zero. * When `instantiate` is called with no prior image and the sizes match (as will be the case for on-demand allocation) then `mprotect` is skipped entirely. * In the `clear_and_remain-ready` case the `mprotect` is skipped if the heap wasn't grown at all. This should avoid ever using `mprotect` unnecessarily and makes the ranges we `mprotect` a bit smaller as well. * Review comments * Tweak allow to apply to whole crate	2022-02-02 16:09:47 -06:00
Chris Fallin	5deb1f1fbf	Merge pull request #3738 from cfallin/pooling-affinity Pooling allocator: add a reuse-affinity policy.	2022-02-02 13:11:39 -08:00
Chris Fallin	99ed8cc9be	Merge pull request #3697 from cfallin/memfd-cow memfd/madvise-based CoW pooling allocator	2022-02-02 13:04:26 -08:00
Chris Fallin	1cbd393930	Review comments.	2022-02-02 12:25:30 -08:00
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	2022-02-02 12:25:30 -08:00
Chris Fallin	9880eba2a8	Skip memfd tests when on qemu, due to differing madvise semantics.	2022-02-02 12:25:20 -08:00
Chris Fallin	d7b04f5ced	Review comments.	2022-02-02 11:41:31 -08:00
Chris Fallin	0ec45d3ae4	Add additional tests for MemFdSlot.	2022-02-02 11:33:05 -08:00
Alex Crichton	3f5cbddab5	Fix a text format test expectation	2022-02-02 10:17:18 -08:00
Chris Fallin	94410a8d4b	Review comments.	2022-02-02 10:03:31 -08:00
Alex Crichton	9d1e517615	Update some more version reqs	2022-02-02 09:51:27 -08:00
Alex Crichton	65486a0680	Update wasm-tools crates Nothing major here, just a routine update with a few extra things to handle here-and-there.	2022-02-02 09:50:08 -08:00
Alex Crichton	c83968575a	Lazily populate a store's trampoline map (#3742 ) * Lazily populate a store's trampoline map This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement. * Review comments * Remove debug assert	2022-02-02 09:43:29 -06:00
Dan Gohman	ffa9fe32aa	Use is-terminal instead of atty. Following up on #3696, use the new is-terminal crate to test for a tty rather than having platform-specific logic in Wasmtime. The is-terminal crate has a platform-independent API which takes a handle. This also updates the tree to cap-std 0.24 etc., to avoid depending on multiple versions of io-lifetimes at once, as enforced by the cargo deny check.	2022-02-01 17:48:49 -08:00
Chris Fallin	84a8368e88	Fix to the optimization: mprotect(NONE) sometimes needed after skipping the initial mmap.	2022-02-01 16:34:06 -08:00
Chris Fallin	01e6bb81fb	Review feedback.	2022-02-01 15:49:44 -08:00
Nick Fitzgerald	491e98233e	Merge pull request #3750 from bytecodealliance/pch/fix_3749 fix #3749: returns count should count the returns, not the params.	2022-02-01 10:29:17 -08:00
Pat Hickey	aa4c81a4e7	fix #3749 : returns count should count the returns, not the params.	2022-02-01 09:46:46 -08:00
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	2022-01-31 22:39:20 -08:00
Chris Fallin	ccfa245261	Optimization: only mprotect the new bit of heap, not all of it. (This was not a correctness bug, but is an obvious performance bug...)	2022-01-31 21:25:40 -08:00
Chris Fallin	982df2f2e5	Review feedback.	2022-01-31 16:40:14 -08:00
Harald Hoyer	853a025613	Implement `sock_accept` With the addition of `sock_accept()` in `wasi-0.11.0`, wasmtime can now implement basic networking for pre-opened sockets. For Windows `AsHandle` was replaced with `AsRawHandleOrSocket` to cope with the duality of Handles and Sockets. For Unix a `wasi_cap_std_sync::net::Socket` enum was created to handle the {Tcp,Unix}{Listener,Stream} more efficiently in `WasiCtxBuilder::preopened_socket()`. The addition of that many `WasiFile` implementors was mainly necessary, because of the difference in the `num_ready_bytes()` function. A known issue is Windows now busy polling on sockets, because except for `stdin`, nothing is querying the status of windows handles/sockets. Another know issue on Windows, is that there is no crate providing support for `fcntl(fd, F_GETFL, 0)` on a socket. Signed-off-by: Harald Hoyer <harald@profian.com>	2022-01-31 16:25:11 -08:00
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	2022-01-31 13:59:51 -08:00
Chris Fallin	3702e81d30	Remove ftruncate-trick for heap growth with memfd backend. Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.	2022-01-31 12:53:22 -08:00
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	2022-01-31 12:53:18 -08:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00
Alex Crichton	2f494240f8	Lazily allocate the bump-alloc chunk in the externref table (#3739 ) This commit updates the allocation of a `VMExternRefActivationsTable` structure to perform zero malloc memory allocations. Previously it would allocate a page-size of `chunk` plus some space in hash sets for future insertions. The main trick here implemented is that after the first gc during the slow path the fast chunk allocation is allocated and configured. The motivation for this PR is that given our recent work to further refine and optimize the instantiation process this allocation started to show up in a nontrivial fashion. Most modules today never touch this table anyway as almost none of them use reference types, so the time spent allocation and deallocating the table per-store was largely wasted time. Concretely on a microbenchmark this PR speeds up instantiation of a module with one function by 30%, decreasing the instantiation cost from 1.8us to 1.2us. Overall a pretty minor win but when the instantiation times we're measuring start being in the single-digit microseconds this win ends up getting magnified!	2022-01-28 16:10:05 -06:00
Nick Fitzgerald	19f8d94959	Expand on activations table invariants comment in `libcalls.rs`	2022-01-28 09:47:05 -08:00
Nick Fitzgerald	f292ff55cf	Do another GC after running Wasm in the `table_ops` test oracle	2022-01-27 16:53:10 -08:00
Nick Fitzgerald	cc8d7778e2	Make the `table_ops` test case generator use globals as well This will make it generate `table.set`s that come from `global.get`s and `global.get`s that come from `table.set`s. Ultimately, it should give us much more fuzzing coverage of `externref` globals, their barriers, and passing `externref`s into and out of Wasm to get or set globals.	2022-01-27 16:53:10 -08:00
Nick Fitzgerald	cbc6f6071f	Fix a debug assertion in `externref` garbage collections When we GC, we assert the invariant that all `externref`s we find on the stack have a corresponding entry in the `VMExternRefActivationsTable`. However, we also might be in code that is in the process of fixing up this invariant and adding an entry to the table, but the table's bump chunk is full, and so we do a GC and then add the entry into the table. This will ultimately maintain our desired invariant, but there is a moment in time when we are doing the GC where the invariant is relaxed which is okay because the reference will be in the table before we return to Wasm or do anything else. This isn't a possible UAF, in other words. To make it so that the assertion won't trip, we explicitly insert the reference into the table before we GC, so that the invariant is not relaxed across a possibly-GCing operation (even though it would be safe in this particular case).	2022-01-27 14:46:01 -08:00
Alex Crichton	7928a3ffb4	Lazily load types into `Func` (#3727 ) * Lazily load types into `Func` This commit changes the construction of a `Func` to lazily load the type information if required instead of always loading the type information at `Func`-construction time. The main purpose of this change is to accelerate instantiation of instances which have many imports. Currently in the fast way of doing this the instantiation loop looks like: let mut store = Store::new(&engine, ...); let instance = instance_pre.instantiate(&mut store); In this situation the `instance_pre` will typically load host-defined functions (defined via `Linker` APIs) into the `Store` as individual `Func` items and then perform the instantiation process. The operation of loading a `HostFunc` into a `Store` however currently involves two expensive operations: * First a read-only lock is taken on the `RwLock` around engine signatures. * Next a clone of the wasm type is made to pull it out of the engine signature registry. Neither of these is actually necessary for imported functions. The `FuncType` for imported functions is never used since all comparisons happen with the intern'd indices instead. The only time a `FuncType` is used typically is for exported functions when using `Func::typed` or similar APIs which need type information. This commit makes this path faster by storing `Option<FuncType>` instead of `FuncType` within a `Func`. This means that it starts out as `None` and is only filled in on-demand as necessary. This means that when instantiating a module with many imports no clones/locks are done. On a simple microbenchmark where a module with 100 imports is instantiated this PR improves instantiation time by ~35%. Due to the rwlock used here and the general inefficiency of pthreads rwlocks the effect is even more profound when many threads are performing the same instantiation process. On x86_64 with 8 threads performing instantiation this PR improves instantiation time by 80% and on arm64 it improves by 97% (wow read-contended glibc rwlocks on arm64 are slow). Note that much of the improvement here is also from memory allocatoins/deallocations no longer being performed because dropping functions within a store no longer requires deallocating the `FuncType` if it's not present. A downside of this PR is that `Func::ty` is now unconditionally taking an rwlock if the type hasn't already been filled in. (it uses the engine). If this is an issue in the future though we can investigate at that time using somthing like a `Once` to lazily fill in even when mutable access to the store isn't available. * Review comments	2022-01-26 14:23:37 -06:00
Dan Gohman	5fc01bafc7	Fix `isatty` in WASI. (#3696 ) WASI doesn't have an `isatty` ioctl or syscall, so wasi-libc's `isatty` implementation uses the file descriptor type and rights to determine if the file descriptor is likely to be a tty. The real fix here will be to add an `isatty` call to WASI. But for now, have Wasmtime set the filetype and rights for file descriptors so that wasi-libc's `isatty` works as expected.	2022-01-24 11:45:16 -08:00
Dan Gohman	881c19473d	Use `ptr::cast` instead of `as` casts in several places. (#3507 ) `ptr::cast` has the advantage of being unable to silently cast `const T` to `mut T`. This turned up several places that were performing such casts, which this PR also fixes.	2022-01-21 13:03:17 -08:00
Alex Crichton	2b04466955	Document epoch interruption in async documentation (#3712 ) I, after-the-fact, now recall that we document fuel and other interruption schemes in the `Config::async_support` documentation so I've re-worded that section to mention epoch-based interruption as well.	2022-01-21 11:16:27 -06:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
Benjamin Bouvier	2649d2352c	Support vtune profiling of trampolines too (#3687 ) * Provide helpers for demangling function names * Profile trampolines in vtune too * get rid of mapping * avoid code duplication with jitdump_linux * maintain previous default display name for wasm functions * no dash, grrr * Remove unused profiling error type	2022-01-19 09:49:23 -06:00
Mrmaxmeier	2afd6900f4	runtime: expose DefaultMemoryCreator (#3670 )	2022-01-18 09:17:33 -06:00
Alex Crichton	46ade3dab3	Try to fix CI for Rust 1.58 (#3689 ) PATH lookup for WIndows command execution was tweaked slightly to not search the cwd, so let's see if this fixes things...	2022-01-13 16:38:32 -06:00
Chris Fallin	13f17db297	Merge pull request #3680 from bjorn3/remove_code_sink Remove the CodeSink interface in favor of MachBufferFinalized	2022-01-12 10:47:23 -08:00
Benjamin Bouvier	e53f213ac4	Try demangling names before forwarding them to the profiler Before this PR, each profiler (perf/vtune, at the moment) had to have a demangler for each of the programming languages that could have been compiled to wasm and fed into wasmtime. With this, wasmtime now demangles names before even forwarding them to the underlying profiler, which makes for a unified representation in profilers, and avoids incorrect demangling in profilers.	2022-01-12 19:17:42 +01:00
bjorn3	17021bc77a	Extract helper functions	2022-01-12 17:19:34 +01:00
bjorn3	f0e821b9e0	Remove all Sink traits	2022-01-11 19:03:10 +01:00
bjorn3	b803514d55	Remove sink arguments from compile_and_emit The data can be accessed after the fact using context.mach_compile_result	2022-01-11 18:17:29 +01:00
Nick Fitzgerald	2e2620c07f	Merge pull request #3671 from alexcrichton/table-ops-terminate fuzz: Fix infinite loops in table_ops fuzzers	2022-01-10 09:34:07 -08:00
Alex Crichton	fd2636c25a	wasmtime: Make rustix an unconditional dependency (#3675 ) This is unconditionally used on aarch64 and otherwise trying to manage the precise clause for making it conditional vs unconditional probably isn't worth it. Closes #3672	2022-01-10 10:47:35 -06:00
Piotr Sikora	642102e699	Fix build with clang on s390x. (#3673 ) Signed-off-by: Piotr Sikora <piotrsikora@google.com>	2022-01-10 09:27:27 -06:00
Alex Crichton	2459776424	fuzz: Fix infinite loops in table_ops fuzzers I forgot in the recent refactoring to add back in fuel support to the `table_ops` fuzzer. This commit re-adds the previously existent logic to always use fuel to cancel execution of the table_ops fuzzer.	2022-01-08 17:19:00 -08:00

... 2 3 4 5 6 ...

2159 Commits