wasmtime

Author	SHA1	Message	Date
Alex Crichton	51d82aebfd	Store the `ValRaw` type in little-endian format (#4035 ) * Store the `ValRaw` type in little-endian format This commit changes the internal representation of the `ValRaw` type to an unconditionally little-endian format instead of its current native-endian format. The documentation and various accessors here have been updated as well as the associated trampolines that read `ValRaw` to always work with little-endian values, converting to the host endianness as necessary. The motivation for this change originally comes from the implementation of the component model that I'm working on. One aspect of the component model's canonical ABI is how variants are passed to functions as immediate arguments. For example for a component model function: ``` foo: function(x: expected<i32, f64>) ``` This translates to a core wasm function: ```wasm (module (func (export "foo") (param i32 i64) ;; ... ) ) ``` The first `i32` parameter to the core wasm function is the discriminant of whether the result is an "ok" or an "err". The second `i64`, however, is the "join" operation on the `i32` and `f64` payloads. Essentially these two types are unioned into one type to get passed into the function. Currently in the implementation of the component model my plan is to construct a `mut [ValRaw]` to pass through to WebAssembly, always invoking component exports through host trampolines. This means that the implementation for `Result<T, E>` needs to do the correct "join" operation here when encoding a particular case into the corresponding `ValRaw`. I personally found this particularly tricky to do structurally. The solution that I settled on with fitzgen was that if `ValRaw` was always stored in a little endian format then we could employ a trick where when encoding a variant we first set all the `ValRaw` slots to zero, then the associated case we have is encoding. Afterwards the `ValRaw` values are already encoded into the correct format as if they'd been "join"ed. For example if we were to encode `Ok(1i32)` then this would produce `ValRaw { i32: 1 }`, which memory-wise is equivalent to `ValRaw { i64: 1 }` if the other bytes in the `ValRaw` are guaranteed to be zero. Similarly storing `ValRaw { f64 }` is equivalent to the storage required for `ValRaw { i64 }` here in the join operation. Note, though, that this equivalence relies on everything being little-endian. Otherwise the in-memory representations of `ValRaw { i32: 1 }` and `ValRaw { i64: 1 }` are different. That motivation is what leads to this change. It's expected that this is a low-to-zero cost change in the sense that little-endian platforms will see no change and big-endian platforms are already required to efficiently byte-swap loads/stores as WebAssembly requires that. Additionally the `ValRaw` type is an esoteric niche use case primarily used for accelerating the C API right now, so it's expected that not many users will have to update for this change. Track down some more endianness conversions	2022-04-14 13:09:32 -05:00
Alex Crichton	d147802d51	Update wasm-tools crates (#3997 ) * Update wasm-tools crates This commit updates the wasm-tools family of crates as used in Wasmtime. Notably this brings in the update which removes module linking support as well as a number of internal refactorings around names and such within wasmparser itself. This updates all of the wasm translation support which binds to wasmparser as appropriate. Other crates all had API-compatible changes for at least what Wasmtime used so no further changes were necessary beyond updating version requirements. * Update a test expectation	2022-04-05 14:32:33 -05:00
wasmtime-publish	78a595ac88	Bump Wasmtime to 0.37.0 (#3994 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-04-05 09:24:28 -05:00
Alex Crichton	c89dc55108	Add a two-week delay to Wasmtime's release process (#3955 ) * Bump to 0.36.0 * Add a two-week delay to Wasmtime's release process This commit is a proposal to update Wasmtime's release process with a two-week delay from branching a release until it's actually officially released. We've had two issues lately that came up which led to this proposal: * In #3915 it was realized that changes just before the 0.35.0 release weren't enough for an embedding use case, but the PR didn't meet the expectations for a full patch release. * At Fastly we were about to start rolling out a new version of Wasmtime when over the weekend the fuzz bug #3951 was found. This led to the desire internally to have a "must have been fuzzed for this long" period of time for Wasmtime changes which we felt were better reflected in the release process itself rather than something about Fastly's own integration with Wasmtime. This commit updates the automation for releases to unconditionally create a `release-X.Y.Z` branch on the 5th of every month. The actual release from this branch is then performed on the 20th of every month, roughly two weeks later. This should provide a period of time to ensure that all changes in a release are fuzzed for at least two weeks and avoid any further surprises. This should also help with any last-minute changes made just before a release if they need tweaking since backporting to a not-yet-released branch is much easier. Overall there are some new properties about Wasmtime with this proposal as well: * The `main` branch will always have a section in `RELEASES.md` which is listed as "Unreleased" for us to fill out. * The `main` branch will always be a version ahead of the latest release. For example it will be bump pre-emptively as part of the release process on the 5th where if `release-2.0.0` was created then the `main` branch will have 3.0.0 Wasmtime. * Dates for major versions are automatically updated in the `RELEASES.md` notes. The associated documentation for our release process is updated and the various scripts should all be updated now as well with this commit. * Add notes on a security patch * Clarify security fixes shouldn't be previewed early on CI	2022-04-01 13:11:10 -05:00
Alex Crichton	d1d10dc8da	Refactor the `TypeTables` type (#3971 ) * Remove duplicate `TypeTables` type This was once needed historically but it is no longer needed. * Make the internals of `TypeTables` private Instead of reaching internally for the `wasm_signatures` map an `Index` implementation now exists to indirect accesses through the type of the index being accessed. For the component model this table of types will grow a number of other tables and this'll assist in consuming sites not having to worry so much about which map they're reaching into.	2022-03-30 13:51:25 -05:00
Alex Crichton	3f9bff17c8	Support disabling backtraces at compile time (#3932 ) * Support disabling backtraces at compile time This commit adds support to Wasmtime to disable, at compile time, the gathering of backtraces on traps. The `wasmtime` crate now sports a `wasm-backtrace` feature which, when disabled, will mean that backtraces are never collected at compile time nor are unwinding tables inserted into compiled objects. The motivation for this commit stems from the fact that generating a backtrace is quite a slow operation. Currently backtrace generation is done with libunwind and `_Unwind_Backtrace` typically found in glibc or other system libraries. When thousands of modules are loaded into the same process though this means that the initial backtrace can take nearly half a second and all subsequent backtraces can take upwards of hundreds of milliseconds. Relative to all other operations in Wasmtime this is extremely expensive at this time. In the future we'd like to implement a more performant backtrace scheme but such an implementation would require coordination with Cranelift and is a big chunk of work that may take some time, so in the meantime if embedders don't need a backtrace they can still use this option to disable backtraces at compile time and avoid the performance pitfalls of collecting backtraces. In general I tried to originally make this a runtime configuration option but ended up opting for a compile-time option because `Trap::new` otherwise has no arguments and always captures a backtrace. By making this a compile-time option it was possible to configure, statically, the behavior of `Trap::new`. Additionally I also tried to minimize the amount of `#[cfg]` necessary by largely only having it at the producer and consumer sites. Also a noteworthy restriction of this implementation is that if backtrace support is disabled at compile time then reference types support will be unconditionally disabled at runtime. With backtrace support disabled there's no way to trace the stack of wasm frames which means that GC can't happen given our current implementation. * Always enable backtraces for the C API	2022-03-16 09:18:16 -05:00
Alex Crichton	c22033bf93	Delete historical interruptable support in Wasmtime (#3925 ) * Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`.	2022-03-14 15:25:11 -05:00
Alex Crichton	884ca1f75b	Remove more dead relocation handling code (#3924 ) Forgotten from #3905 I now realized.	2022-03-14 12:29:45 -05:00
Alex Crichton	4d404c90b4	Ensure functions are aligned properly on AArch64 (#3908 ) Previously (as in an hour ago) #3905 landed a new ability for fuzzing to arbitrarily insert padding between functions. Running some fuzzers locally though this instantly hit a lot of problems on AArch64 because the arbitrary padding isn't aligned to 4 bytes like all other functions are. To fix this issue appending functions now correctly aligns the output as appropriate for the platform. The alignment argument for appending was switched to `None` where `None` means "use the platform default" and otherwise and explicit alignment can be specified for inserting other data (like arbitrary padding or Windows unwind tables).	2022-03-09 15:45:30 -06:00
Alex Crichton	f21aa98ccb	Fuzz-code-coverage motivated improvements (#3905 ) * fuzz: Fuzz padding between compiled functions This commit hooks up the custom `wasmtime_linkopt_padding_between_functions` configuration option to the cranelift compiler into the fuzz configuration, enabling us to ensure that randomly inserting a moderate amount of padding between functions shouldn't tamper with any results. * fuzz: Fuzz the `Config::generate_address_map` option This commit adds fuzz configuration where `generate_address_map` is either enabled or disabled, unlike how it's always enabled for fuzzing today. * Remove unnecessary handling of relocations This commit removes a number of bits and pieces all related to handling relocations in JIT code generated by Wasmtime. None of this is necessary nowadays that the "old backend" has been removed (quite some time ago) and relocations are no longer expected to be in the JIT code at all. Additionally with the minimum x86_64 features required to run wasm code it should be expected that no libcalls are required either for Wasmtime-based JIT code.	2022-03-09 10:58:27 -08:00
wasmtime-publish	9137b4a50e	Bump Wasmtime to 0.35.0 (#3885 ) [automatically-tag-and-release-this-commit] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-03-07 15:18:34 -06:00
Alex Crichton	2a6969d2bd	Shrink the size of the anyfunc table in `VMContext` (#3850 ) * Shrink the size of the anyfunc table in `VMContext` This commit shrinks the size of the `VMCallerCheckedAnyfunc` table allocated into a `VMContext` to be the size of the number of "escaped" functions in a module rather than the number of functions in a module. Escaped functions include exports, table elements, etc, and are typically an order of magnitude smaller than the number of functions in general. This should greatly shrink the `VMContext` for some modules which while we aren't necessarily having any problems with that today shouldn't cause any problems in the future. The original motivation for this was that this came up during the recent lazy-table-initialization work and while it no longer has a direct performance benefit since tables aren't initialized at all on instantiation it should still improve long-running instances theoretically with smaller `VMContext` allocations as well as better locality between anyfuncs. * Fix some tests * Remove redundant hash set * Use a helper for pushing function type information * Use a more descriptive `is_escaping` method * Clarify a comment * Fix condition	2022-02-28 10:11:04 -06:00
Nick Fitzgerald	bad9a35418	`wasm-mutate` fuzz targets (#3836 ) * fuzzing: Add a custom mutator based on `wasm-mutate` * fuzz: Add a version of the `compile` fuzz target that uses `wasm-mutate` * Update `wasmparser` dependencies	2022-02-23 12:14:11 -08:00
bjorn3	bbd52772de	Make VMOffset calculation more readable (#3793 ) * Fix typo * Move vmoffset field size and field name together The previous code was quite confusing about what applied to which field. The new code also makes it easier to move fields around and insert and delete fields. * Move builtin_functions before all variable sized fields This allows the offset to be calculated at compile time * Add cadd and cmul convenience functions * Remove comment * Change fields! syntax as per review * Add implicit u32::from to fields!	2022-02-22 09:48:53 -06:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Alex Crichton	b438617e12	Further minor optimizations to instantiation (#3791 ) * Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic	2022-02-11 09:55:08 -06:00
Alex Crichton	c0c368d151	Use mmap'd `.cwasm` as a source for memory initialization images (#3787 ) Skip memfd creation with precompiled modules This commit updates the memfd support internally to not actually use a memfd if a compiled module originally came from disk via the `wasmtime::Module::deserialize_file` API. In this situation we already have a file descriptor open and there's no need to copy a module's heap image to a new file descriptor. To facilitate a new source of `mmap` the currently-memfd-specific-logic of creating a heap image is generalized to a new form of `MemoryInitialization` which is attempted for all modules at module-compile-time. This means that the serialized artifact to disk will have the memory image in its entirety waiting for us. Furthermore the memory image is ensured to be padded and aligned carefully to the target system's page size, notably meaning that the data section in the final object file is page-aligned and the size of the data section is also page aligned. This means that when a precompiled module is mapped from disk we can reuse the underlying `File` to mmap all initial memory images. This means that the offset-within-the-memory-mapped-file can differ for memfd-vs-not, but that's just another piece of state to track in the memfd implementation. In the limit this waters down the term "memfd" for this technique of quickly initializing memory because we no longer use memfd unconditionally (only when the backing file isn't available). This does however open up an avenue in the future to porting this support to other OSes because while `memfd_create` is Linux-specific both macOS and Windows support mapping a file with copy-on-write. This porting isn't done in this PR and is left for a future refactoring. Closes #3758 * Enable "memfd" support on all unix systems Cordon off the Linux-specific bits and enable the memfd support to compile and run on platforms like macOS which have a Linux-like `mmap`. This only works if a module is mapped from a precompiled module file on disk, but that's better than not supporting it at all! * Fix linux compile * Use `Arc<File>` instead of `MmapVecFileBacking` * Use a named struct instead of mysterious tuples * Comment about unsafety in `Module::deserialize_file` * Fix tests * Fix uffd compile * Always align data segments No need to have conditional alignment since their sizes are all aligned anyway * Update comment in build.rs * Use rustix, not `region` * Fix some confusing logic/names around memory indexes These functions all work with memory indexes, not specifically defined memory indexes.	2022-02-10 15:40:40 -06:00
Alex Crichton	520a7f26d7	Move function names out of `Module` (#3789 ) * Move function names out of `Module` This commit moves function names in a module out of the `wasmtime_environ::Module` type and into separate sections stored in the final compiled artifact. Spurred on by #3787 to look at module load times I noticed that a huge amount of time was spent in deserializing this map. The `spidermonkey.wasm` file, for example, has a 3MB name section which is a lot of unnecessary data to deserialize at module load time. The names of functions are now split out into their own dedicated section of the compiled artifact and metadata about them is stored in a more compact format at runtime by avoiding a `BTreeMap` and instead using a sorted array. Overall this improves deserialize times by up to 80% for modules with large name sections since the name section is no longer deserialized at load time and it's lazily paged in as names are actually referenced. * Fix a typo * Fix compiled module determinism Need to not only sort afterwards but also first to ensure the data of the name section is consistent.	2022-02-10 14:34:48 -06:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
wasmtime-publish	39b88e4e9e	Release Wasmtime 0.34.0 (#3768 ) * Bump Wasmtime to 0.34.0 [automatically-tag-and-release-this-commit] * Add release notes for 0.34.0 * Update release date to today Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-02-07 19:16:26 -06:00
Alex Crichton	65486a0680	Update wasm-tools crates Nothing major here, just a routine update with a few extra things to handle here-and-there.	2022-02-02 09:50:08 -08:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
bjorn3	17021bc77a	Extract helper functions	2022-01-12 17:19:34 +01:00
bjorn3	f0e821b9e0	Remove all Sink traits	2022-01-11 19:03:10 +01:00
bjorn3	b803514d55	Remove sink arguments from compile_and_emit The data can be accessed after the fact using context.mach_compile_result	2022-01-11 18:17:29 +01:00
bjorn3	58c25d9e24	Add text_section_builder method to TargetIsa	2022-01-06 14:39:50 +01:00
wasmtime-publish	8043c1f919	Release Wasmtime 0.33.0 (#3648 ) * Bump Wasmtime to 0.33.0 [automatically-tag-and-release-this-commit] * Update relnotes for 0.33.0 * Wordsmithing relnotes Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-01-05 13:26:50 -06:00
bjorn3	e98a85e1e2	Make get_mach_backend non-optional	2022-01-04 15:48:19 +01:00
Alex Crichton	f1225dfd93	Add a compilation section to disable address maps (#3598 ) * Add a compilation section to disable address maps This commit adds a new `Config::generate_address_map` compilation setting which is used to disable emission of the `.wasmtime.addrmap` section of compiled artifacts. This section is currently around the size of the entire `.text` section itself unfortunately and for size reasons may wish to be omitted. Functionality-wise all that is lost is knowing the precise wasm module offset address of a faulting instruction or in a backtrace of instructions. This also means that if the module has DWARF debugging information available with it Wasmtime isn't able to produce a filename and line number in the backtrace. This option remains enabled by default. This option may not be needed in the future with #3547 perhaps, but in the meantime it seems reasonable enough to support a configuration mode where the section is entirely omitted if the smallest module possible is desired. * Fix some CI issues * Update tests/all/traps.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Do less work in compilation for address maps But only when disabled Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2021-12-13 13:48:05 -06:00
wasmtime-publish	c1c4c59670	Release Wasmtime 0.32.0 (#3589 ) * Bump Wasmtime to 0.32.0 [automatically-tag-and-release-this-commit] * Update release notes for 0.32.0 Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2021-12-13 13:47:30 -06:00
Alex Crichton	0e90d4b903	Update addr2line and gimli deps (#3580 ) Just a routine update, figured it was good to stay close to their most recent versions	2021-12-01 15:48:36 -06:00
Adam Bratschi-Kaye	12bfbdfaca	Skip generating DWARF info for dead code (#3498 ) When encountering a subprogram that is dead code (as indicated by the dead code proposal https://dwarfstd.org/ShowIssue.php?issue=200609.1), don't generate debug output for the subprogram or any of its children.	2021-11-08 09:31:04 -06:00
wasmtime-publish	c1a6a0523d	Release Wasmtime 0.31.0 (#3489 ) * Bump Wasmtime to 0.31.0 [automatically-tag-and-release-this-commit] * Update 0.31.0 release notes Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2021-10-29 09:09:35 -05:00
Alex Crichton	490d49a768	Adjust dependency directives between crates (#3420 ) * Adjust dependency directives between crates This commit is a preparation for the release process for Wasmtime. The specific changes here are to delineate which crates are "public", and all version requirements on non-public crates will now be done with `=A.B.C` version requirements instead of today's `A.B.C` version requirements. The purpose for doing this is to assist with patch releases that might happen in the future. Patch releases of wasmtime are already required to not break the APIs of "public" crates, but no such guarantee is given about "internal" crates. This means that a patch release runs the risk, for example, of breaking an internal API. In doing so though we would also need to release a new major version of the internal crate, but we wouldn't have a great hole in the number scheme of major versions to do so. By using `=A.B.C` requirements for internal crates it means we can safely ignore strict semver-compatibility between releases of internal crates for patch releases, since the only consumers of the crate will be the corresponding patch release of the `wasmtime` crate itself (or other public crates). The `publish.rs` script has been updated with a check to verify that dependencies on internal crates are all specified with an `=` dependency, and dependnecies on all public crates are without a `=` dependency. This will hopefully make it so we don't have to worry about what to use where, we just let CI tell us what to do. Using this modification all version dependency declarations have been updated. Note that some crates were adjusted to simply remove their `version` requirement in cases such as the crate wasn't published anyway (`publish = false` was specified) or it's in the `dev-dependencies` section which doesn't need version specifiers for path dependencies. * Switch to normal sever deps for cranelift dependencies These crates will now all be considered "public" where in patch releases they will be guaranteed to not have breaking changes.	2021-10-26 09:06:03 -05:00
Alex Crichton	e2a724ce18	Update the `object` crate to 0.27.0 (#3465 ) Mostly just keeping us up to date with changes there since we somewhat heavily rely on it now.	2021-10-20 10:52:06 -05:00
Alex Crichton	9c6884e28d	Update the spec reference testsuite submodule (#3450 ) * Update the spec reference testsuite submodule This commit brings in recent updates to the spec test suite. Most of the changes here were already fixed in `wasmparser` with some tweaks to esoteric modules, but Wasmtime also gets a bug fix where where import matching for the size of tables/memories is based on the current runtime size of the table/memory rather than the original type of the table/memory. This means that during type matching the actual value is consulted for its size rather than using the minimum size listed in its type. * Fix now-missing directories in build script	2021-10-13 16:14:12 -05:00
Alex Crichton	713ce07d35	Add some debug logging for timing in module compiles (#3417 ) * Add some debug logging for timing in module compiles This is sometimes helpful when debugging slow compiles from fuzz bugs or similar. * Fix total duration calculation to not double-count	2021-10-11 12:50:15 -05:00
bjorn3	d78f436daf	Remove reloc_constant It is no longer used by the new backends	2021-10-10 14:43:55 +02:00
bjorn3	2db3b5b9df	Remove code offsets from Function (#3412 ) * Remove code offsets from Function * Remove reloc_jt and fix wasmtime-cranelift	2021-10-07 15:54:00 +02:00
Benjamin Bouvier	43a86f14d5	Remove more old backend ISA concepts (#3402 ) This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate. Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!	2021-10-04 10:36:12 +02:00
Benjamin Bouvier	bae4ec6427	Remove ancient register allocation (#3401 )	2021-09-30 21:27:23 +02:00
bjorn3	9e34df33b9	Remove the old x86 backend	2021-09-29 16:13:46 +02:00
Nick Fitzgerald	a1f4b46f64	Bump Wasmtime to version 0.30.0; cranelift to 0.77.0	2021-09-17 10:33:50 -07:00
Nick Fitzgerald	d2ce1ac753	Fix a use-after-free bug when passing `ExternRef`s to Wasm We _must not_ trigger a GC when moving refs from host code into Wasm (e.g. returned from a host function or passed as arguments to a Wasm function). After insertion into the table, this reference is no longer rooted. If multiple references are being sent from the host into Wasm and we allowed GCs during insertion, then the following events could happen: * Reference A is inserted into the activations table. This does not trigger a GC, but does fill the table to capacity. * The caller's reference to A is removed. Now the only reference to A is from the activations table. * Reference B is inserted into the activations table. Because the table is at capacity, a GC is triggered. * A is reclaimed because the only reference keeping it alive was the activation table's reference (it isn't inside any Wasm frames on the stack yet, so stack scanning and stack maps don't increment its reference count). * We transfer control to Wasm, giving it A and B. Wasm uses A. That's a use after free. To prevent uses after free, we cannot GC when moving refs into the `VMExternRefActivationsTable` because we are passing them from the host to Wasm. On the other hand, when we are cloning -- as opposed to moving -- refs from the host to Wasm, then it is fine to GC while inserting into the activations table, because the original referent that we are cloning from is still alive and rooting the ref.	2021-09-14 14:23:42 -07:00
Nick Fitzgerald	4b256ab968	Place unwind info directly after the text section, even when debug info is enabled When debug info was enabled, we would put the debug info sections in between the text section and the unwind info section. But the unwind info is encoded in a position-independent manner (so that we don't need relocs for it) that relies on it directly following the text section. The result of the misplacement was some crashes inside the unwinder.	2021-09-09 13:39:30 -07:00
Nick Fitzgerald	0499cca2fa	Name unwind info `.eh_frame` in the Wasmtime's compiled ELF artifact We were previously using `_wasmtime_eh_frame` but there is no good reason to add the prefix Wasmtime-specific prefix. Using the standard name allows for better inspection with standard tools like `dwarfdump`.	2021-09-09 12:54:49 -07:00
Nick Fitzgerald	dd0bc3237e	Do not write a DWARF section if it is empty There is no point in writing an empty DWARF section, and this will make our ELF files a tiny bit smaller.	2021-09-09 12:54:13 -07:00
Benjamin Bouvier	fb94b81538	Use 16K code pages on Mac M1 Fixes #3278.	2021-09-02 09:16:34 +02:00
Alex Crichton	1532516a36	Use relative `call` instructions between wasm functions (#3275 ) * Use relative `call` instructions between wasm functions This commit is a relatively major change to the way that Wasmtime generates code for Wasm modules and how functions call each other. Prior to this commit all function calls between functions, even if they were defined in the same module, were done indirectly through a register. To implement this the backend would emit an absolute 8-byte relocation near all function calls, load that address into a register, and then call it. While this technique is simple to implement and easy to get right, it has two primary downsides associated with it: * Function calls are always indirect which means they are more difficult to predict, resulting in worse performance. * Generating a relocation-per-function call requires expensive relocation resolution at module-load time, which can be a large contributing factor to how long it takes to load a precompiled module. To fix these issues, while also somewhat compromising on the previously simple implementation technique, this commit switches wasm calls within a module to using the `colocated` flag enabled in Cranelift-speak, which basically means that a relative call instruction is used with a relocation that's resolved relative to the pc of the call instruction itself. When switching the `colocated` flag to `true` this commit is also then able to move much of the relocation resolution from `wasmtime_jit::link` into `wasmtime_cranelift::obj` during object-construction time. This frontloads all relocation work which means that there's actually no relocations related to function calls in the final image, solving both of our points above. The main gotcha in implementing this technique is that there are hardware limitations to relative function calls which mean we can't simply blindly use them. AArch64, for example, can only go +/- 64 MB from the `bl` instruction to the target, which means that if the function we're calling is a greater distance away then we would fail to resolve that relocation. On x86_64 the limits are +/- 2GB which are much larger, but theoretically still feasible to hit. Consequently the main increase in implementation complexity is fixing this issue. This issue is actually already present in Cranelift itself, and is internally one of the invariants handled by the `MachBuffer` type. When generating a function relative jumps between basic blocks have similar restrictions. This commit adds new methods for the `MachBackend` trait and updates the implementation of `MachBuffer` to account for all these new branches. Specifically the changes to `MachBuffer` are: * For AAarch64 the `LabelUse::Branch26` value now supports veneers, and AArch64 calls use this to resolve relocations. * The `emit_island` function has been rewritten internally to handle some cases which previously didn't come up before, such as: * When emitting an island the deadline is now recalculated, where previously it was always set to infinitely in the future. This was ok prior since only a `Branch19` supported veneers and once it was promoted no veneers were supported, so without multiple layers of promotion the lack of a new deadline was ok. * When emitting an island all pending fixups had veneers forced if their branch target wasn't known yet. This was generally ok for 19-bit fixups since the only kind getting a veneer was a 19-bit fixup, but with mixed kinds it's a bit odd to force veneers for a 26-bit fixup just because a nearby 19-bit fixup needed a veneer. Instead fixups are now re-enqueued unless they're known to be out-of-bounds. This may run the risk of generating more islands for 19-bit branches but it should also reduce the number of islands for between-function calls. * Otherwise the internal logic was tweaked to ideally be a bit more simple, but that's a pretty subjective criteria in compilers... I've added some simple testing of this for now. A synthetic compiler option was create to simply add padded 0s between functions and test cases implement various forms of calls that at least need veneers. A test is also included for x86_64, but it is unfortunately pretty slow because it requires generating 2GB of output. I'm hoping for now it's not too bad, but we can disable the test if it's prohibitive and otherwise just comment the necessary portions to be sure to run the ignored test if these parts of the code have changed. The final end-result of this commit is that for a large module I'm working with the number of relocations dropped to zero, meaning that nothing actually needs to be done to the text section when it's loaded into memory (yay!). I haven't run final benchmarks yet but this is the last remaining source of significant slowdown when loading modules, after I land a number of other PRs both active and ones that I only have locally for now. * Fix arm32 * Review comments	2021-09-01 13:27:38 -05:00

1 2 3

110 Commits