wasmtime

Author	SHA1	Message	Date
Nick Fitzgerald	31e2d6b21c	Merge pull request #3769 from cfallin/fix-debuginfo-cold-blocks Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout.	2022-02-07 08:58:50 -08:00
Jonas Kruckenberg	79af8cd9ce	chore: update zstd (#3771 )	2022-02-07 09:38:12 -06:00
Chris Fallin	2cf3069b6b	Extend cold-blocks test to test debuginfo as well.	2022-02-04 23:15:16 -08:00
Chris Fallin	d9d6469422	Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout. The debuginfo analyses are written with the assumption that the order of instructions in the VCode is the order of instructions in the final machine ocde. This was previously a strong invariant, until we introduced support for cold blocks. Cold blocks are implemented by reordering during emission, because the VCode ordering has other requirements related to lowering (respecting def-use dependencies in the reverse pass), so it is much simpler to reorder instructions at the last moment. Unfortunately, this causes the breakage we now see. This commit fixes the issue by skipping all cold instructions when emitting value-label ranges (which are translated into debuginfo). This means that variables defined in cold blocks will not have DWARF metadata. But cold blocks are usually compiler-inserted slowpaths, not user code, so this is probably OK. Debuginfo is always best-effort, so in any case this does not violate any correctness constraints.	2022-02-04 23:15:04 -08:00
Chris Fallin	04269355ca	Merge pull request #3767 from avanhatt/patch-1 Add item to Cranelift 02-07 meeting agenda	2022-02-04 12:02:52 -08:00
Alexa VanHattum	f016a1d266	Add item to 02-07 meeting agenda	2022-02-04 14:58:39 -05:00
Alex Crichton	04d2caea7b	Consolidate methods of memory initialization (#3766 ) * Consolidate methods of memory initialization This commit consolidates the few locations that we have which are performing memory initialization. Namely the uffd logic for creating paged memory as well as the memfd logic for creating a memory image now share an implementation to avoid duplicating bounds-checks or other validation conditions. The main purpose of this commit is to fix a fuzz-bug where a multiplication overflowed. The overflow itself was benign but it seemed better to fix the overflow in only one place instead of multiple. The overflow in question is specifically when an initializer is checked to be statically out-of-bounds and multiplies a memory's minimum size by the wasm page size, returning the result as a `u64`. For memory64-memories of size `1 << 48` this multiplication will overflow. This was actually a preexisting bug with the `try_paged_init` function which was copied for memfd, but cropped up here since memfd is used more often than paged initialization. The fix here is to skip validation of the `end` index if the size of memory is `1 << 64` since if the `end` index can be represented as a `u64` then it's in-bounds. This is somewhat of an esoteric case, though, since a memory of minimum size `1 << 64` can't ever exist (we can't even ask the os for that much memory, and even if we could it would fail). * Fix memfd test * Fix some tests * Remove InitMemory enum * Add an `is_segmented` helper method * More clear variable name * Make arguments to `init_memory` more descriptive	2022-02-04 13:17:25 -06:00
Nick Fitzgerald	a519e5ab64	Merge pull request #3752 from fitzgen/newtypes-for-register-classes cranelift: Add newtype wrappers for x64 register classes	2022-02-03 14:56:57 -08:00
Nick Fitzgerald	2c77cf866a	ISLE: Rename `{gpr,xmm}_mem_new` constructors to `reg_mem_to_{gpr,xmm}_mem`	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	e1f4e29efe	ISLE: Add a `nodebug` type attribute to disable `derive(Debug)` I need this to move the x64 `Inst` definition into ISLE without losing its custom `Debug` implementation that prints the assembly for the `Inst`.	2022-02-03 14:08:08 -08:00
Chris Fallin	b3b83efdbe	Merge pull request #3760 from cfallin/memfd-lazy-create Make memfd image creation lazy (on first instantiation).	2022-02-03 13:20:24 -08:00
Chris Fallin	2a24a0fbde	Make memfd image creation lazy (on first instantiation). As a followup to the recent memfd allocator work, this PR makes the memfd image creation occur on the first instantiation, rather than immediately when the `Module` is loaded. This shaves off a potentially surprising cost spike that would have otherwise occurred: prior to the memfd work, no allocator eagerly read the module's initial heap state into RAM. The behavior should now more closely resemble what happened before (and the improvements in overall instantiation time and performance, as compared to either eager init with pure-mmap memory or user-mode pagefault handling with uffd, remain).	2022-02-03 12:46:34 -08:00
Nick Fitzgerald	605c79fd05	Merge pull request #3756 from alexcrichton/update-wasm-tools Update wasm-tools crates	2022-02-03 11:19:55 -08:00
Chris Fallin	43de2dca1f	Merge pull request #3765 from cfallin/cranelift-isle-license Add LICENSE file to cranelift/isle/.	2022-02-03 10:39:19 -08:00
Alex Crichton	4ba3404ca3	Fix MemFd's allocated memory for dynamic memories (#3763 ) This fixes a bug in the memfd-related management of a linear memory where for dynamic memories memfd wasn't informed of the extra room that the dynamic memory could grow into, only the actual size of linear memory, which ended up tripping an assert once the memory was grown. The fix here is pretty simple which is to factor in this extra space when passing the allocation size to the creation of the `MemFdSlot`.	2022-02-03 11:56:16 -06:00
Chris Fallin	695c64f2b2	Add LICENSE file to cranelift/isle/. This is the same Apache 2.0 license file that is in all of our other crates; omitting it here was just an oversight. Fixes #3761.	2022-02-03 09:54:58 -08:00
Andrew Brown	31d4d2cbe0	meeting: add notes (#3764 )	2022-02-03 09:46:17 -08:00
Chris Fallin	8fb7cbae9e	Merge pull request #3762 from cfallin/meeting-20220203 Add item to Wasmtime meeting	2022-02-03 08:54:55 -08:00
Chris Fallin	aacf563e38	Add item to Wasmtime meeting	2022-02-03 08:53:37 -08:00
Alex Crichton	b647561c44	memfd: Some minor follow-ups (#3759 ) * Tweak memfd-related features crates This commit changes the `memfd` feature for the `wasmtime-cli` crate from an always-on feature to a default-on feature which can be disabled at compile time. Additionally the `pooling-allocator` feature is also given similar treatment. Additionally some documentation was added for the `memfd` feature on the `wasmtime` crate. * Don't store `Arc<T>` in `InstanceAllocationRequest` Instead store `&Arc<T>` to avoid having the clone that lives in `InstanceAllocationRequest` not actually going anywhere. Otherwise all instance allocation requires an extra clone to create it for the request and an extra decrement when the request goes away. Internally clones are made as necessary when creating instances. * Enable the pooling allocator by default for `wasmtime-cli` While perhaps not the most useful option since the CLI doesn't have a great way to take advantage of this it probably makes sense to at least match the features of `wasmtime` itself. * Fix some lints and issues * More compile fixes	2022-02-03 09:17:04 -06:00
Alex Crichton	8ed79c8f57	memfd: Reduce some syscalls in the on-demand case (#3757 ) * memfd: Reduce some syscalls in the on-demand case This tweaks the internal organization of the `MemFdSlot` to avoid some syscalls in the default case as well as opportunistically in the pooling case. The two cases added here are: * A `MemFdSlot` is now created with a specified initial size. For pooling this is 0 but for the on-demand case this can be non-zero. * When `instantiate` is called with no prior image and the sizes match (as will be the case for on-demand allocation) then `mprotect` is skipped entirely. * In the `clear_and_remain-ready` case the `mprotect` is skipped if the heap wasn't grown at all. This should avoid ever using `mprotect` unnecessarily and makes the ranges we `mprotect` a bit smaller as well. * Review comments * Tweak allow to apply to whole crate	2022-02-02 16:09:47 -06:00
Chris Fallin	5deb1f1fbf	Merge pull request #3738 from cfallin/pooling-affinity Pooling allocator: add a reuse-affinity policy.	2022-02-02 13:11:39 -08:00
Chris Fallin	99ed8cc9be	Merge pull request #3697 from cfallin/memfd-cow memfd/madvise-based CoW pooling allocator	2022-02-02 13:04:26 -08:00
Chris Fallin	1cbd393930	Review comments.	2022-02-02 12:25:30 -08:00
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	2022-02-02 12:25:30 -08:00
Chris Fallin	9880eba2a8	Skip memfd tests when on qemu, due to differing madvise semantics.	2022-02-02 12:25:20 -08:00
Chris Fallin	d7b04f5ced	Review comments.	2022-02-02 11:41:31 -08:00
Chris Fallin	0ec45d3ae4	Add additional tests for MemFdSlot.	2022-02-02 11:33:05 -08:00
Alex Crichton	3f5cbddab5	Fix a text format test expectation	2022-02-02 10:17:18 -08:00
Chris Fallin	94410a8d4b	Review comments.	2022-02-02 10:03:31 -08:00
Alex Crichton	9d1e517615	Update some more version reqs	2022-02-02 09:51:27 -08:00
Alex Crichton	65486a0680	Update wasm-tools crates Nothing major here, just a routine update with a few extra things to handle here-and-there.	2022-02-02 09:50:08 -08:00
Alex Crichton	c83968575a	Lazily populate a store's trampoline map (#3742 ) * Lazily populate a store's trampoline map This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement. * Review comments * Remove debug assert	2022-02-02 09:43:29 -06:00
Dan Gohman	ffa9fe32aa	Use is-terminal instead of atty. Following up on #3696, use the new is-terminal crate to test for a tty rather than having platform-specific logic in Wasmtime. The is-terminal crate has a platform-independent API which takes a handle. This also updates the tree to cap-std 0.24 etc., to avoid depending on multiple versions of io-lifetimes at once, as enforced by the cargo deny check.	2022-02-01 17:48:49 -08:00
Chris Fallin	84a8368e88	Fix to the optimization: mprotect(NONE) sometimes needed after skipping the initial mmap.	2022-02-01 16:34:06 -08:00
Chris Fallin	01e6bb81fb	Review feedback.	2022-02-01 15:49:44 -08:00
Nick Fitzgerald	491e98233e	Merge pull request #3750 from bytecodealliance/pch/fix_3749 fix #3749: returns count should count the returns, not the params.	2022-02-01 10:29:17 -08:00
Pat Hickey	aa4c81a4e7	fix #3749 : returns count should count the returns, not the params.	2022-02-01 09:46:46 -08:00
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	2022-01-31 22:39:20 -08:00
Chris Fallin	ccfa245261	Optimization: only mprotect the new bit of heap, not all of it. (This was not a correctness bug, but is an obvious performance bug...)	2022-01-31 21:25:40 -08:00
Chris Fallin	982df2f2e5	Review feedback.	2022-01-31 16:40:14 -08:00
Harald Hoyer	853a025613	Implement `sock_accept` With the addition of `sock_accept()` in `wasi-0.11.0`, wasmtime can now implement basic networking for pre-opened sockets. For Windows `AsHandle` was replaced with `AsRawHandleOrSocket` to cope with the duality of Handles and Sockets. For Unix a `wasi_cap_std_sync::net::Socket` enum was created to handle the {Tcp,Unix}{Listener,Stream} more efficiently in `WasiCtxBuilder::preopened_socket()`. The addition of that many `WasiFile` implementors was mainly necessary, because of the difference in the `num_ready_bytes()` function. A known issue is Windows now busy polling on sockets, because except for `stdin`, nothing is querying the status of windows handles/sockets. Another know issue on Windows, is that there is no crate providing support for `fcntl(fd, F_GETFL, 0)` on a socket. Signed-off-by: Harald Hoyer <harald@profian.com>	2022-01-31 16:25:11 -08:00
Nick Fitzgerald	8ba3294881	Merge pull request #3745 from uweigand/move-emit Move emit and emit_safepoint to prelude.isle	2022-01-31 14:43:36 -08:00
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	2022-01-31 13:59:51 -08:00
Ulrich Weigand	a3e2f5c28b	Move emit and emit_safepoint to prelude.isle Even though the implementation of emit and emit_safepoint may be platform-specific, the interface ought to be common so that other code in prelude.isle may safely call these constructors. This patch moves the definition of emit (from all platforms) and emit_safepoint (s390x only) to prelude.isle. This required adding an emit_safepoint implementation to aarch64 and x64 as well - the latter is still a stub as special move mitosis handling will be required.	2022-01-31 22:54:04 +01:00
Nick Fitzgerald	3c2f695dd3	Merge pull request #3724 from uweigand/s390x-isle-branchtrap s390x: Migrate branches and traps to ISLE	2022-01-31 13:42:49 -08:00
Chris Fallin	3702e81d30	Remove ftruncate-trick for heap growth with memfd backend. Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.	2022-01-31 12:53:22 -08:00
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	2022-01-31 12:53:18 -08:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00

... 3 4 5 6 7 ...

9738 Commits