wasmtime

Author	SHA1	Message	Date
Alex Crichton	b438617e12	Further minor optimizations to instantiation (#3791 ) * Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic	2022-02-11 09:55:08 -06:00
Alex Crichton	c0c368d151	Use mmap'd `.cwasm` as a source for memory initialization images (#3787 ) Skip memfd creation with precompiled modules This commit updates the memfd support internally to not actually use a memfd if a compiled module originally came from disk via the `wasmtime::Module::deserialize_file` API. In this situation we already have a file descriptor open and there's no need to copy a module's heap image to a new file descriptor. To facilitate a new source of `mmap` the currently-memfd-specific-logic of creating a heap image is generalized to a new form of `MemoryInitialization` which is attempted for all modules at module-compile-time. This means that the serialized artifact to disk will have the memory image in its entirety waiting for us. Furthermore the memory image is ensured to be padded and aligned carefully to the target system's page size, notably meaning that the data section in the final object file is page-aligned and the size of the data section is also page aligned. This means that when a precompiled module is mapped from disk we can reuse the underlying `File` to mmap all initial memory images. This means that the offset-within-the-memory-mapped-file can differ for memfd-vs-not, but that's just another piece of state to track in the memfd implementation. In the limit this waters down the term "memfd" for this technique of quickly initializing memory because we no longer use memfd unconditionally (only when the backing file isn't available). This does however open up an avenue in the future to porting this support to other OSes because while `memfd_create` is Linux-specific both macOS and Windows support mapping a file with copy-on-write. This porting isn't done in this PR and is left for a future refactoring. Closes #3758 * Enable "memfd" support on all unix systems Cordon off the Linux-specific bits and enable the memfd support to compile and run on platforms like macOS which have a Linux-like `mmap`. This only works if a module is mapped from a precompiled module file on disk, but that's better than not supporting it at all! * Fix linux compile * Use `Arc<File>` instead of `MmapVecFileBacking` * Use a named struct instead of mysterious tuples * Comment about unsafety in `Module::deserialize_file` * Fix tests * Fix uffd compile * Always align data segments No need to have conditional alignment since their sizes are all aligned anyway * Update comment in build.rs * Use rustix, not `region` * Fix some confusing logic/names around memory indexes These functions all work with memory indexes, not specifically defined memory indexes.	2022-02-10 15:40:40 -06:00
Alex Crichton	1cb08d4e67	Minor instantiation benchmark updates (#3790 ) This commit has a few minor updates and some improvements to the instantiation benchmark harness: * A `once_cell::unsync::Lazy` type is now used to guard creation of modules/engines/etc. This enables running singular benchmarks to be much faster since the benchmark no longer compiles all other benchmarks that are filtered out. Unfortunately I couldn't find a way in criterion to test whether a `BenchmarkId` is filtered out or not so we rely on the runtime laziness to initialize on the first run for benchmarks that do so. * All files located in `benches/instantiation` are now loaded for benchmarking instead of a hardcoded list. This makes it a bit easier to throw files into the directory and have them benchmarked instead of having to recompile when working with new files. * Finally a module deserialization benchmark was added to measure the time it takes to deserialize a precompiled module from disk (inspired by discussion on #3787) While I was at it I also upped some limits to be able to instantiate cfallin's `spidermonkey.wasm`.	2022-02-10 15:40:30 -06:00
Alex Crichton	520a7f26d7	Move function names out of `Module` (#3789 ) * Move function names out of `Module` This commit moves function names in a module out of the `wasmtime_environ::Module` type and into separate sections stored in the final compiled artifact. Spurred on by #3787 to look at module load times I noticed that a huge amount of time was spent in deserializing this map. The `spidermonkey.wasm` file, for example, has a 3MB name section which is a lot of unnecessary data to deserialize at module load time. The names of functions are now split out into their own dedicated section of the compiled artifact and metadata about them is stored in a more compact format at runtime by avoiding a `BTreeMap` and instead using a sorted array. Overall this improves deserialize times by up to 80% for modules with large name sections since the name section is no longer deserialized at load time and it's lazily paged in as names are actually referenced. * Fix a typo * Fix compiled module determinism Need to not only sort afterwards but also first to ensure the data of the name section is consistent.	2022-02-10 14:34:48 -06:00
Peter Huene	41eb225765	Add the instance allocation strategy to generated fuzzing configs. (#3780 ) * Add the instance allocation strategy to generated fuzzing configs. This commit adds support for generating configs with arbitrary instance allocation strategies. With this, the pooling allocator will be fuzzed as part of the existing fuzz targets. * Refine maximum constants for arbitrary module limits. * Add an `instantiate-many` fuzz target. This commit adds a new `instantiate-many` fuzz target that will attempt to instantiate and terminate modules in an arbitrary order. It generates up to 5 modules, from which a random sequence of instances will be created. The primary benefactor of this fuzz target is the pooling instance allocator. * Allow no aliasing in generated modules when using the pooling allocator. This commit prevents aliases in the generated modules as they might count against the configured import limits of the pooling allocator. As the existing module linking proposal implementation will eventually be deprecated in favor of the component model proposal, it isn't very important that we test aliases in generated modules with the pooling allocator. * Improve distribution of memory config in fuzzing. The previous commit attempted to provide a 32-bit upper bound to 64-bit arbitrary values, which skewed the distribution heavily in favor of the upper bound. This commit removes the constraint and instead uses arbitrary 32-bit values that are converted to 64-bit values in the `Arbitrary` implementation.	2022-02-10 11:55:44 -08:00
Alex Crichton	027dea549a	Fuzz using precompiled modules on CI (#3788 ) In working on #3787 I see now that our coverage of loading precompiled files specifically is somewhat lacking, so this adds a config option to the fuzzers where, if enabled, will round-trip all compiled modules through the filesystem to test out the mmapped-file case.	2022-02-10 11:55:18 -06:00
Dan Gohman	f2bf254a79	Update to cap-std 0.24.1, fixing compilation on Right nightly. (#3786 ) Other than doc updates, this just contains bytecodealliance/cap-std#235, a fix for compilation errors on Rust nightly that look like this: ``` error[E0308]: mismatched types --> cap-primitives/src/fs/via_parent/rename.rs:22:58 \| 22 \| let (old_dir, old_basename) = open_parent(old_start, &old_path)?; \| ^^^^^^^^^ expected struct `Path`, found opaque type \| ::: cap-primitives/src/rustix/fs/dir_utils.rs:67:48 \| 67 \| pub(crate) fn strip_dir_suffix(path: &Path) -> impl Deref<Target = Path> + '_ { \| ------------------------------ the found opaque type \| = note: expected struct `Path` found opaque type `impl Deref<Target = Path>` ```	2022-02-09 16:22:05 -08:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
Peter Huene	1b27508a42	Fix incorrect use of `MemoryIndex` in the pooling allocator. (#3782 ) This commit corrects a few places where `MemoryIndex` was used and treated like a `DefinedMemoryIndex` in the pooling instance allocator. When the unstable `multi-memory` proposal is enabled, it is possible to cause a newly allocated instance to use an incorrect base address for any defined memories by having the module being instantiated also import a memory. This requires enabling the unstable `multi-memory` proposal, configuring the use of the pooling instance allocator (not the default), and then configuring the module limits to allow imported memories (also not the default). The fix is to replace all uses of `MemoryIndex` with `DefinedMemoryIndex` in the pooling instance allocator. Several `debug_assert!` have also been updated to `assert!` to sanity check the state of the pooling allocator even in release builds.	2022-02-09 09:39:29 -06:00
Ulrich Weigand	10198553c7	ISLE: Common accessors for some insn data fields (#3781 ) Add accessors to prelude.isle to access data fields of `func_addr` and `symbol_value` instructions. These are based on similar versions I had added to the s390x back-end, but are a bit more straightforward to use. - func_ref_data: Extract SigRef, ExternalName, and RelocDistance fields given a FuncRef. - symbol_value_data: Extract ExternalName, RelocDistance, and offset fields given a GlobalValue representing a Symbol. - reloc_distance_near: Test for RelocDistance::Near. The s390x back-end is changed to use these common versions. Note that this exposed a bug in common isle code: This extractor: (extractor (load_sym inst) (and inst (load _ (def_inst (symbol_value (symbol_value_data _ (reloc_distance_near) offset))) (i64_from_offset (memarg_symbol_offset_sum <offset _))))) would raise an assertion in sema.rs due to a supposed cycle in extractor definitions. But there was no actual cycle, it was simply that the extractor tree refers twice to the `insn_data` extractor (once via the `load` and once via the `symbol_value` extractor). Fixed by checking for pre-existing definitions only along one path in the tree, not across the whole tree.	2022-02-08 17:57:27 -08:00
Ulrich Weigand	9c5c872b3b	s390x: Add support for all remaining atomic operations (#3746 ) This adds support for all atomic operations that were unimplemented so far in the s390x back end: - atomic_rmw operations xchg, nand, smin, smax, umin, umax - $I8 and $I16 versions of atomic_rmw and atomic_cas - little endian versions of atomic_rmw and atomic_cas All of these have to be implemented by a compare-and-swap loop; and for the $I8 and $I16 versions the actual atomic instruction needs to operate on the surrounding aligned 32-bit word. Since we cannot emit new control flow during ISLE instruction selection, these compare-and-swap loops are emitted as a single meta-instruction to be expanded at emit time. However, since there is a large number of different versions of the loop required to implement all the above operations, I've implemented a facility to allow specifying the loop bodies from within ISLE after all, by creating a vector of MInst structures that will be emitted as part of the meta-instruction. There are still restrictions, in particular instructions that are part of the loop body may not modify any virtual register. But even so, this approach looks preferable to doing everything in emit.rs. A few instructions needed in those compare-and-swap loop bodies were added as well, in particular the RxSBG family of instructions as well as the LOAD REVERSED in-register byte-swap instructions. This patch also adds filetest runtests to verify the semantics of all operations, in particular the subword and little-endian variants (those are currently only executed on s390x).	2022-02-08 13:48:44 -08:00
Alex Crichton	5cd97c054d	Update memfd support with a runtime toggle (#3778 ) This commit updates the `memfd` support in Wasmtime to have a runtime toggle as to whether it's used or not. The compile-time feature gating `memfd` support is now also re-enabled by default, but the new runtime switch is still disabled-by-default. Additionally this commit updates our fuzz oracle to turn on/off the memfd flag to re-enable fuzzing with memfd on oss-fuzz.	2022-02-08 15:47:13 -06:00
Chris Fallin	4f01711d42	Pooling allocator: Default for allocation policy should use memfd feature, not memfd-allocator. (#3777 ) Thanks to @peterheune for noticing this!	2022-02-08 10:29:45 -08:00
Sam Parker	fbeb806b50	Meeting notes 02/07/22 (#3776 )	2022-02-08 06:38:43 -08:00
wasmtime-publish	39b88e4e9e	Release Wasmtime 0.34.0 (#3768 ) * Bump Wasmtime to 0.34.0 [automatically-tag-and-release-this-commit] * Add release notes for 0.34.0 * Update release date to today Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-02-07 19:16:26 -06:00
Chris Fallin	ddd39cdb84	Patch qemu in CI to fix madvise semantics. (#3770 ) We currently skip some tests when running our qemu-based tests for aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics -- specifically, it just ignores madvise() [1]. We could continue to whack-a-mole the tests whenever we create new functionality that relies on madvise() semantics, but ideally we'd just have emulation that properly emulates! The earlier discussions on the qemu mailing list [2] had a proposed patch for this, but (i) this patch doesn't seem to apply cleanly anymore (it's 3.5 years old) and (ii) it's pretty complex due to the need to handle qemu's ability to emulate differing page sizes on host and guest. It turns out that we only really need this for CI when host and guest have the same page size (4KiB), so we could just pass the madvise()s through. I wouldn't expect such a patch to ever land upstream in qemu, but it satisfies our needs I think. So this PR modifies our CI setup to patch qemu before building it locally with a little one-off patch. [1] https://github.com/bytecodealliance/wasmtime/pull/2518#issuecomment-747280133 [2] https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html	2022-02-07 15:56:54 -08:00
Alex Crichton	43b37944ff	Tweak parallelism and the instantiation benchmark (#3775 ) Currently the "sequential" and "parallel" benchmarks reports somewhat different timings. For sequential it's time-to-instantiate, but for parallel it's time-to-instantiate-10k instances. The parallelism in the parallel benchmark can also theoretically be affected by rayon's work-stealing. For example if rayon doesn't actually do any work stealing at all then this ends up being a sequential test again. Otherwise though it's possible for some threads to finish much earlier as rayon isn't guaranteed to keep threads busy. This commit applies a few updates to the benchmark: * First an `InstancePre<T>` is now used instead of a `Linker<T>` to front-load type-checking and avoid that on each instantiation (and this is generally the fastest path to instantiate right now). * Next the instantiation benchmark is changed to measure one instantiation-per-iteration to measure per-instance instantiation to better compare with sequential numbers. * Finally rayon is removed in favor of manually creating background threads that infinitely do work until we tell them to stop. These background threads are guaranteed to be working for the entire time the benchmark is executing and should theoretically exhibit what the situation that there's N units of work all happening at once. I also applied some minor updates here such as having the parallel instantiation defined conditionally for multiple modules as well as upping the limits of the pooling allocator to handle a large module (rustpython.wasm) that I threw at it.	2022-02-07 15:55:38 -08:00
Harald Hoyer	fa889b4fd2	wasmtime: add CLI options for pre-opened TCP listen sockets (#3729 ) This patch implements CLI options to insert pre-opened sockets. `--listenfd` : Inherit environment variables and file descriptors following the systemd listen fd specification (UNIX only). `--tcplisten <SOCKET ADDRESS>`: Grant access to the given TCP listen socket. Signed-off-by: Harald Hoyer <harald@profian.com>	2022-02-07 14:26:38 -08:00
Chris Fallin	88b53b12aa	Turn off memfd by default, at least for this upcoming release. (#3774 ) Since memfd support just landed, and has had only ~0.5 weeks to bake with fuzzing, we want to make release 0.34.0 of Wasmtime without it enabled by default. This PR disables memfd by default; it can be enabled by specifying the `memfd` feature for the `wasmtime` crate, or when building the commandline binary. We plan to explicitly add memfd-enabled fuzzing targets, let that go for a while, then probably re-enable memfd in the subsequent release if no issues come up.	2022-02-07 15:44:53 -06:00
Nick Fitzgerald	ff622667f7	Merge pull request #3773 from fitzgen/x64-traps-safepoints ISLE: emit traps as safepoints on x64	2022-02-07 10:57:27 -08:00
Nick Fitzgerald	bb7ae46ecd	ISLE: emit traps as safepoints on x64	2022-02-07 10:01:23 -08:00
Nick Fitzgerald	31e2d6b21c	Merge pull request #3769 from cfallin/fix-debuginfo-cold-blocks Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout.	2022-02-07 08:58:50 -08:00
Jonas Kruckenberg	79af8cd9ce	chore: update zstd (#3771 )	2022-02-07 09:38:12 -06:00
Chris Fallin	2cf3069b6b	Extend cold-blocks test to test debuginfo as well.	2022-02-04 23:15:16 -08:00
Chris Fallin	d9d6469422	Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout. The debuginfo analyses are written with the assumption that the order of instructions in the VCode is the order of instructions in the final machine ocde. This was previously a strong invariant, until we introduced support for cold blocks. Cold blocks are implemented by reordering during emission, because the VCode ordering has other requirements related to lowering (respecting def-use dependencies in the reverse pass), so it is much simpler to reorder instructions at the last moment. Unfortunately, this causes the breakage we now see. This commit fixes the issue by skipping all cold instructions when emitting value-label ranges (which are translated into debuginfo). This means that variables defined in cold blocks will not have DWARF metadata. But cold blocks are usually compiler-inserted slowpaths, not user code, so this is probably OK. Debuginfo is always best-effort, so in any case this does not violate any correctness constraints.	2022-02-04 23:15:04 -08:00
Chris Fallin	04269355ca	Merge pull request #3767 from avanhatt/patch-1 Add item to Cranelift 02-07 meeting agenda	2022-02-04 12:02:52 -08:00
Alexa VanHattum	f016a1d266	Add item to 02-07 meeting agenda	2022-02-04 14:58:39 -05:00
Alex Crichton	04d2caea7b	Consolidate methods of memory initialization (#3766 ) * Consolidate methods of memory initialization This commit consolidates the few locations that we have which are performing memory initialization. Namely the uffd logic for creating paged memory as well as the memfd logic for creating a memory image now share an implementation to avoid duplicating bounds-checks or other validation conditions. The main purpose of this commit is to fix a fuzz-bug where a multiplication overflowed. The overflow itself was benign but it seemed better to fix the overflow in only one place instead of multiple. The overflow in question is specifically when an initializer is checked to be statically out-of-bounds and multiplies a memory's minimum size by the wasm page size, returning the result as a `u64`. For memory64-memories of size `1 << 48` this multiplication will overflow. This was actually a preexisting bug with the `try_paged_init` function which was copied for memfd, but cropped up here since memfd is used more often than paged initialization. The fix here is to skip validation of the `end` index if the size of memory is `1 << 64` since if the `end` index can be represented as a `u64` then it's in-bounds. This is somewhat of an esoteric case, though, since a memory of minimum size `1 << 64` can't ever exist (we can't even ask the os for that much memory, and even if we could it would fail). * Fix memfd test * Fix some tests * Remove InitMemory enum * Add an `is_segmented` helper method * More clear variable name * Make arguments to `init_memory` more descriptive	2022-02-04 13:17:25 -06:00
Nick Fitzgerald	a519e5ab64	Merge pull request #3752 from fitzgen/newtypes-for-register-classes cranelift: Add newtype wrappers for x64 register classes	2022-02-03 14:56:57 -08:00
Nick Fitzgerald	2c77cf866a	ISLE: Rename `{gpr,xmm}_mem_new` constructors to `reg_mem_to_{gpr,xmm}_mem`	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	e1f4e29efe	ISLE: Add a `nodebug` type attribute to disable `derive(Debug)` I need this to move the x64 `Inst` definition into ISLE without losing its custom `Debug` implementation that prints the assembly for the `Inst`.	2022-02-03 14:08:08 -08:00
Chris Fallin	b3b83efdbe	Merge pull request #3760 from cfallin/memfd-lazy-create Make memfd image creation lazy (on first instantiation).	2022-02-03 13:20:24 -08:00
Chris Fallin	2a24a0fbde	Make memfd image creation lazy (on first instantiation). As a followup to the recent memfd allocator work, this PR makes the memfd image creation occur on the first instantiation, rather than immediately when the `Module` is loaded. This shaves off a potentially surprising cost spike that would have otherwise occurred: prior to the memfd work, no allocator eagerly read the module's initial heap state into RAM. The behavior should now more closely resemble what happened before (and the improvements in overall instantiation time and performance, as compared to either eager init with pure-mmap memory or user-mode pagefault handling with uffd, remain).	2022-02-03 12:46:34 -08:00
Nick Fitzgerald	605c79fd05	Merge pull request #3756 from alexcrichton/update-wasm-tools Update wasm-tools crates	2022-02-03 11:19:55 -08:00
Chris Fallin	43de2dca1f	Merge pull request #3765 from cfallin/cranelift-isle-license Add LICENSE file to cranelift/isle/.	2022-02-03 10:39:19 -08:00
Alex Crichton	4ba3404ca3	Fix MemFd's allocated memory for dynamic memories (#3763 ) This fixes a bug in the memfd-related management of a linear memory where for dynamic memories memfd wasn't informed of the extra room that the dynamic memory could grow into, only the actual size of linear memory, which ended up tripping an assert once the memory was grown. The fix here is pretty simple which is to factor in this extra space when passing the allocation size to the creation of the `MemFdSlot`.	2022-02-03 11:56:16 -06:00
Chris Fallin	695c64f2b2	Add LICENSE file to cranelift/isle/. This is the same Apache 2.0 license file that is in all of our other crates; omitting it here was just an oversight. Fixes #3761.	2022-02-03 09:54:58 -08:00
Andrew Brown	31d4d2cbe0	meeting: add notes (#3764 )	2022-02-03 09:46:17 -08:00
Chris Fallin	8fb7cbae9e	Merge pull request #3762 from cfallin/meeting-20220203 Add item to Wasmtime meeting	2022-02-03 08:54:55 -08:00
Chris Fallin	aacf563e38	Add item to Wasmtime meeting	2022-02-03 08:53:37 -08:00
Alex Crichton	b647561c44	memfd: Some minor follow-ups (#3759 ) * Tweak memfd-related features crates This commit changes the `memfd` feature for the `wasmtime-cli` crate from an always-on feature to a default-on feature which can be disabled at compile time. Additionally the `pooling-allocator` feature is also given similar treatment. Additionally some documentation was added for the `memfd` feature on the `wasmtime` crate. * Don't store `Arc<T>` in `InstanceAllocationRequest` Instead store `&Arc<T>` to avoid having the clone that lives in `InstanceAllocationRequest` not actually going anywhere. Otherwise all instance allocation requires an extra clone to create it for the request and an extra decrement when the request goes away. Internally clones are made as necessary when creating instances. * Enable the pooling allocator by default for `wasmtime-cli` While perhaps not the most useful option since the CLI doesn't have a great way to take advantage of this it probably makes sense to at least match the features of `wasmtime` itself. * Fix some lints and issues * More compile fixes	2022-02-03 09:17:04 -06:00
Alex Crichton	8ed79c8f57	memfd: Reduce some syscalls in the on-demand case (#3757 ) * memfd: Reduce some syscalls in the on-demand case This tweaks the internal organization of the `MemFdSlot` to avoid some syscalls in the default case as well as opportunistically in the pooling case. The two cases added here are: * A `MemFdSlot` is now created with a specified initial size. For pooling this is 0 but for the on-demand case this can be non-zero. * When `instantiate` is called with no prior image and the sizes match (as will be the case for on-demand allocation) then `mprotect` is skipped entirely. * In the `clear_and_remain-ready` case the `mprotect` is skipped if the heap wasn't grown at all. This should avoid ever using `mprotect` unnecessarily and makes the ranges we `mprotect` a bit smaller as well. * Review comments * Tweak allow to apply to whole crate	2022-02-02 16:09:47 -06:00
Chris Fallin	5deb1f1fbf	Merge pull request #3738 from cfallin/pooling-affinity Pooling allocator: add a reuse-affinity policy.	2022-02-02 13:11:39 -08:00
Chris Fallin	99ed8cc9be	Merge pull request #3697 from cfallin/memfd-cow memfd/madvise-based CoW pooling allocator	2022-02-02 13:04:26 -08:00
Chris Fallin	1cbd393930	Review comments.	2022-02-02 12:25:30 -08:00
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	2022-02-02 12:25:30 -08:00
Chris Fallin	9880eba2a8	Skip memfd tests when on qemu, due to differing madvise semantics.	2022-02-02 12:25:20 -08:00
Chris Fallin	d7b04f5ced	Review comments.	2022-02-02 11:41:31 -08:00
Chris Fallin	0ec45d3ae4	Add additional tests for MemFdSlot.	2022-02-02 11:33:05 -08:00

... 2 3 4 5 6 ...

9709 Commits