wasmtime

Author	SHA1	Message	Date
Anton Kirilov	a897742593	Initial back-edge CFI implementation (#3606 ) Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited.	2022-08-03 11:08:29 -07:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Alex Crichton	ff6082c0af	Improve readability of memory64 compat in `fact` (#4581 ) This commit aims to improve the readability of supporting the memory64 proposal in the `fact` adapter trampoline compiler. Previously there were a few sprinkled blocks that used `if` to generate different instructions inline, but as I've worked on support for strings this has become pretty unwieldy as strings do far more memory manipulation than other type conversions. A pattern that's easier to read is to have small instruction helpers that take the pointer width as an argument and internally dispatch to the correct instruction. This keeps the main translation code branch-free and a bit easier to follow. Additionally for more complicated branching logic it allows for deduplicating the main translation path by having lots of little branches instead of one large branch with everything duplicated on both halves.	2022-08-03 17:12:00 +00:00
Alex Crichton	9f82644cc3	Some minor cleanups/refactorings in components (#4582 ) This is a collection of some minor renamings, refactorings, sharing of code, etc. This was all discovered during my addition of string support to adapter functions and I figured it'd be best to frontload this and land it ahead of the full patch since it's getting complex.	2022-08-03 11:21:55 -05:00
Alex Crichton	0a6baeddf4	Improve some support in `factc`: (#4580 ) * Support CLI parameters for string encoding * Fix `--skip-validate` * Fix printing binary to stdout	2022-08-03 11:21:30 -05:00
Alex Crichton	f587b10eb9	Reduce wasm invocations in the `stacks` fuzzer (#4595 ) On oss-fuzz a test case has been found that executes 30k iterations of a wasm trap which with a 60s timeout leaves 2ms for each invocation which under fuzzing instrumentation is a bit of a stretch with a ~20x slowdown. This commit places a limit on the number of inputs to the fuzzer at 200 to keep it reasonably sized.	2022-08-03 16:08:36 +00:00
Nick Fitzgerald	55215bbd1e	Use a `SmallVec` for `ABIArgSlot`s (#4586 ) These are always length 1 for Wasm benchmarks. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 328624015.86 ± 40274677.93 (confidence = 99%) main.so is 0.88x to 0.91x faster than slots-smallvec.so! slots-smallvec.so is 1.10x to 1.13x faster than main.so! [3070752447 3203778792.55 3446269274] main.so [2503544039 2875154776.69 3197966713] slots-smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 9685705.06 ± 3221286.87 (confidence = 99%) main.so is 0.91x to 0.96x faster than slots-smallvec.so! slots-smallvec.so is 1.05x to 1.09x faster than main.so! [129356493 145594942.79 165038803] main.so [118555011 135909237.73 188780619] slots-smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [79080493 86757564.46 112649639] main.so [78083384 85934125.69 94992743] slots-smallvec.so ```	2022-08-02 17:40:36 -07:00
Nick Fitzgerald	ab1cf3df2d	Use a `SmallVec` for `ABIArg`s (#4584 ) Instead of a regular `Vec`. These vectors are usually very small, for example here is the histogram of sizes when running Sightglass's `pulldown-cmark` benchmark: ``` ;; Number of samples = 10332 ;; Min = 0 ;; Max = 11 ;; ;; Mean = 2.496128532713901 ;; Standard deviation = 2.2859559855427243 ;; Variance = 5.225594767838607 ;; ;; Each ∎ is a count of 62 ;; 0 .. 1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 1 .. 2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 2 .. 3 [ 159 ]: ∎∎ 3 .. 4 [ 838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 4 .. 5 [ 970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 5 .. 6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 6 .. 7 [ 303 ]: ∎∎∎∎ 7 .. 8 [ 272 ]: ∎∎∎∎ 8 .. 9 [ 40 ]: 9 .. 10 [ 18 ]: ``` By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap allocations and get some nice benchmark wins of up to ~1.11x faster compilation. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 340361395.90 ± 63384608.15 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [3101467423 3425524333.41 4060621653] main.so [2820915877 3085162937.51 3375167352] smallvec.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 988446098.59 ± 184075718.89 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [9006994951 9948091070.66 11792481990] main.so [8192243090 8959644972.07 9801848982] smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 7854567.87 ± 2215491.16 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [80354527 93864666.76 119789198] main.so [77554917 86010098.89 94726994] smallvec.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 22810509.85 ± 6434024.63 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [233358190 272593088.57 347880715] main.so [225227821 249782578.72 275097380] smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 10849521.41 ± 4324757.85 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [133875427 156859544.47 222455440] main.so [126073854 146010023.06 181611647] smallvec.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 31508176.97 ± 12559561.91 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [388788638 455536988.31 646034523] main.so [366132033 424028811.34 527419755] smallvec.so ```	2022-08-02 15:53:44 -07:00
Nick Fitzgerald	edf7f9f2bb	wasmtime: Add lots of logging for `externref`s and `table_ops` fuzz target (#4583 ) I essentially add these same logs back in every time I'm debugging something related to this fuzz target or `externref`s in general. Probably like 5 times I've added roughly these logs. We should just make them available whenever we need them via `RUST_LOG=wasmtime_runtime=trace`. This also changes a couple `if let`s to `unwrap`s that are now infallible after	2022-08-02 15:06:44 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Ulrich Weigand	6b4e6523f7	[abi_impl] Respect extension for incoming stack arguments (#4576 ) The gen_copy_arg_to_regs routine currently ignores argument extension flags when loading incoming arguments. This causes a problem with stack arguments on big-endian systems, since the argument address points to the word on the stack as extended by the caller, but the generated code only loads the inner type from the address, causing it to receive an incorrect value. (This happens to work on little- endian systems.) Fixed by loading extended arguments as full words.	2022-08-02 13:54:13 -07:00
Alex Crichton	ee5b192d35	Re-enable component model `.wast` tests (#4577 ) Re-enable component model `.wast` tests These accidentally stopped running as part of #4556 on CI since I forgot one more location to touch a feature gate. Enable logging in component tests This is a small convenience to get log messages during testing for components by default.	2022-08-02 15:43:33 -05:00
Peter Huene	43125aa994	components: fix trampoline compilation for lists. (#4579 ) This commit fixes trampoline compilation for lists where the loop condition would only branch if the amount remaining was 0 instead of not 0. It resulted in only the first element of the list being copied.	2022-08-02 20:28:43 +00:00
Chris Fallin	8dddd6f1f7	Cranelift: Remove `ifcmp_sp` opcode. (#4578 ) This was temporarily added back in #3502 due to a need from Lucet; now that Lucet is EOL, the opcode is no longer needed and we can remove it.	2022-08-02 13:15:39 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Benjamin Bouvier	ff37c9d8a4	[cranelift] Rejigger the `compile` API (#4540 ) * Move `emit_to_memory` to `MachCompileResult` This small refactoring makes it clearer to me that emitting to memory doesn't require anything else from the compilation `Context`. While it's a trivial change, it's a small public API change that shouldn't cause too much trouble, and doesn't seem RFC-worthy. Happy to hear different opinions about this, though! * hide the MachCompileResult behind a method * Add a `CompileError` wrapper type that references a `Function` * Rename MachCompileResult to CompiledCode * Additionally remove the last unsafe API in cranelift-codegen	2022-08-02 12:05:40 -07:00
Sam Parker	37cd96beff	[AArch64] i64x2 support for min/max (#4575 ) Also added interpreter support for vector min/max. Copyright (c) 2022, Arm Limited.	2022-08-02 11:42:05 -07:00
Nick Fitzgerald	c77bec4dcb	Cranelift: don't `emit` inside lowering rules for aarch64 (#4572 ) * Cranelift: Don't `emit` inside lowering rules in aarch64 The lowering rules should be "pure" and side-effect free, using helpers defined in `inst.isle` to perform actual side effects like emitting instructions. * Cranelift: use 80 width for section separators in aarch64 lowering rules	2022-08-01 16:43:42 -07:00
Alex Crichton	fb59de15af	Implement fused adapters for `(list T)` types (#4558 ) * Implement fused adapters for `(list T)` types This commit implements one of the two remaining types for adapter fusion, lists. This implementation is particularly tricky for a number of reasons: * Lists have a number of validity checks which need to be carefully implemented. For example the byte length of the list passed to allocation in the destination module could overflow the 32-bit index space. Additionally lists in 32-bit memories need a check that their final address is in-bounds in the address space. * In the effort to go ahead and support memory64 at the lowest layers this is where much of the magic happens. Lists are naturally always stored in memory and shifting between 64/32-bit address spaces is done here. This notably required plumbing an `Options` around during flattening/size/alignment calculations due to the size/types of lists changing depending on the memory configuration. I've also added a small `factc` program in this commit which should hopefully assist in exploring and debugging adapter modules. This takes as input a component (text or binary format) and then generates an adapter module for all component function signatures found internally. This commit notably does not include tests for lists. I tried to figure out a good way to add these but I felt like there were too many cases to test and the tests would otherwise be extremely verbose. Instead I think the best testing strategy for this commit will be through #4537 which should be relatively extensible to testing adapters between modules in addition to host-based lifting/lowering. * Improve handling of lists of 0-size types * Skip overflow checks on byte sizes for 0-size types * Skip the copy loop entirely when src/dst are both 0 * Skip the increments of src/dst pointers if either is 0-size * Update semantics for zero-sized lists/strings When a list/string has a 0-byte-size the base pointer is no longer verified to be in-bounds to match the supposedly desired adapter semantics where no trap happens because no turn of the loop happens.	2022-08-01 17:02:08 -05:00
Trevor Elliott	586ec95c11	ISLE: Allow shadowing in let expressions (#4562 ) * Support shadowing in isle * Re-run the isle build.rs if the examples change * Print error messages when isle tests fail * Move run tests * Refactor `let` uses that don't need to introduce unique names	2022-08-01 21:10:28 +00:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Anton Kirilov	a47a82d2e5	Cranelift AArch64: Harden the Spectre mitigations (#4555 ) Use the `CSDB` instruction following Arm's recommendation. Copyright (c) 2022, Arm Limited.	2022-08-01 10:20:48 -07:00
Alex Crichton	893fadb485	components: Fix support for 0-sized flags (#4560 ) This commit goes through and updates support in the various argument passing routines to support 0-sized flags. A bit of a degenerate case but clarified in WebAssembly/component-model#76 as intentional.	2022-08-01 16:05:09 +00:00
Alex Crichton	05e6abf2f6	Fix the `stacks` fuzzer in the face of stack overflow (#4557 ) When the `stacks` fuzzer hits a stack overflow the trace generated by Wasmtime will have one more frame than the trace generated by the wasm itself. This comes about due to the wasm not actually pushing the final frame when it stack overflows. The host, however, will still see the final frame that triggered the stack overflow. In this situation the fuzzer asserts that the host has one extra frame and then discards the frame.	2022-08-01 11:03:23 -05:00
Alex Crichton	04631ad0af	Unconditionally enable component-model tests (#4556 ) * Unconditionally enable component-model tests * Remove an outdated test that wasn't previously being compiled * Fix a component model doc test * Try to decrease memory usage in qemu	2022-08-01 15:43:37 +00:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Chris Fallin	8e9e9c52a1	ISLE: support more flexible integer constants. (#4559 ) The ISLE language's lexer previously used a very primitive `i64::from_str_radix` call to parse integer constants, allowing values in the range -2^63..2^63 only. Also, underscores to separate digits (as is allwoed in Rust) were not supported. Finally, 128-bit constants were not supported at all. This PR addresses all issues above: - Integer constants are internally stored as 128-bit values. - Parsing supports either signed (-2^127..2^127) or unsigned (0..2^128) range. Negation works independently of that, so one can write `-0xffff..ffff` (128 bits wide, i.e., -(2^128-1)) to get a `1`. - Underscores are supported to separate groups of digits, so one can write `0xffff_ffff`. - A minor oversight was fixed: hex constants can start with `0X` (uppercase) as well as `0x`, for consistency with Rust and C. This PR also adds a new kind of ISLE test that actually runs a driver linked to compiled ISLE code; we previously didn't have any such tests, but it is now quite useful to assert correct interpretation of constant values.	2022-07-29 21:52:14 +00:00
Alex Crichton	b1273548fb	Try using `windows-latest` CI (#4553 ) See if our `windows-2019` woes are solved now that backtraces are using frame pointers instead of native APIs.	2022-07-29 12:15:54 -05:00
Alex Crichton	6d6e7e0f6a	Add definitions of tiers-of-support for Wasmtime (#4479 ) * Add definitions of tiers-of-support for Wasmtime This commit adds documentation of a Tiers-based system for classifying how supported a component is within Wasmtime. This was somewhat pioneered in the [Wasmtime 1.0 RFC][rfc] but the documentation here is expanded to include more than just API stability but additionally other components. Inspiration for this is drawn from Rust's definition of [support tiers][rust] as well. The motivation for this is to help clarify what exactly it means to live at each tier and what is expected. For example one thing this document clarifies is the requirements necessary for landing new major changes in Wasmtime at all. Additionally this helps clarify what it means to have the highest level of support vs "otherwise well supported". [rfc]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-one-dot-oh.md#tier-1---api-stable-production-quality [rust]: https://doc.rust-lang.org/rustc/target-tier-policy.html * Review comments * Review comments	2022-07-29 15:11:16 +00:00
Afonso Bordado	1f058a02c0	cranelift: Add MinGW `fma` regression tests (#4517 ) * cranelift: Add MinGW `fma` regression tests * cranelift: Fix FMA in interpreter * cranelift: Add separate `fma` test suite for the interpreter The interpreter can run `fma.clif` on most platforms, however on `x86_64-pc-windows-gnu` we use libm which has issues with some inputs. We should delete `fma-interpreter.clif` and enable the interpreter on the main `fma.clif` file once those are fixed.	2022-07-29 09:09:37 -05:00
Nick Fitzgerald	46782b18c2	`wasmtime`: Implement fast Wasm stack walking (#4431 ) * Always preserve frame pointers in Wasmtime This allows us to efficiently and simply capture Wasm stacks without maintaining and synchronizing any safety-critical side tables between the compiler and the runtime. * wasmtime: Implement fast Wasm stack walking Why do we want Wasm stack walking to be fast? Because we capture stacks whenever there is a trap and traps actually happen fairly frequently with short-lived programs and WASI's `exit`. Previously, we would rely on generating the system unwind info (e.g. `.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk the full stack and filter out any non-Wasm stack frames. This can, unfortunately, be slow for two primary reasons: 1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than `O(wasm-frames)` work. 2. System unwind info and the system unwinder need to be much more general than a purpose-built stack walker for Wasm needs to be. It has to handle any kind of stack frame that any compiler might emit where as our Wasm frames are emitted by Cranelift and always have frame pointers. This translates into implementation complexity and general overhead. There can also be unnecessary-for-our-use-cases global synchronization and locks involved, further slowing down stack walking in the presence of multiple threads trying to capture stacks in parallel. This commit introduces a purpose-built stack walker for traversing just our Wasm frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit frame pointer)` pairs. This linked list is maintained via Wasm-to-host and host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use frame pointers (which Cranelift preserves) to find the next older Wasm frame on the stack, and we keep doing this until we reach the entry stack pointer, meaning that the next older frame will be a host frame. The trampolines need to avoid a couple stumbling blocks. First, they need to be compiled ahead of time, since we may not have access to a compiler at runtime (e.g. if the `cranelift` feature is disabled) but still want to be able to call functions that have already been compiled and get stack traces for those functions. Usually this means we would compile the appropriate trampolines inside `Module::new` and the compiled module object would hold the trampolines. However, we also need to support calling host functions that are wrapped into `wasmtime::Func`s and there doesn't exist any ahead-of-time compiled module object to hold the appropriate trampolines: ```rust // Define a host function. let func_type = wasmtime::FuncType::new( vec![wasmtime::ValType::I32], vec![wasmtime::ValType::I32], ); let func = Func::new(&mut store, func_type, \|_, params, results\| { // ... Ok(()) }); // Call that host function. let mut results = vec![wasmtime::Val::I32(0)]; func.call(&[wasmtime::Val::I32(0)], &mut results)?; ``` Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline in assembly that work for all Wasm and host function signatures. These trampolines are careful to only use volatile registers, avoid touching any register that is an argument in the calling convention ABI, and tail call to the target callee function. This allows forwarding any set of arguments and any returns to and from the callee, while also allowing us to maintain our linked list of Wasm stack and frame pointers before transferring control to the callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing the host-Wasm boundary, so they do not impose overhead on regular calls. (And if using one trampoline for all host-Wasm boundary crossing ever breaks branch prediction enough in the CPU to become any kind of bottleneck, we can do fun things like have multiple copies of the same trampoline and choose a random copy for each function, sharding the functions across branch predictor entries.) Finally, this commit also ends the use of a synthetic `Module` and allocating a stubbed out `VMContext` for host functions. Instead, we define a `VMHostFuncContext` with its own magic value, similar to `VMComponentContext`, specifically for host functions. <h2>Benchmarks</h2> <h3>Traps and Stack Traces</h3> Large improvements to taking stack traces on traps, ranging from shaving off 64% to 99.95% of the time it used to take. <details> ``` multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us] thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s] change: time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05) thrpt: [+560.90% +573.56% +585.84%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us] thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s] change: time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05) thrpt: [+1023.1% +1048.6% +1070.3%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us] thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s] change: time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05) thrpt: [+1503.1% +1542.0% +1578.0%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us] thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s] change: time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05) thrpt: [+1339.2% +1353.6% +1369.1%] Performance has improved. multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us] thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s] change: time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05) thrpt: [+1624.7% +1709.2% +1806.1%] Performance has improved. multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us] thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s] change: time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05) thrpt: [+2821.5% +3048.1% +3281.1%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild many-modules-registered-traps/1 time: [2.6278 us 2.6361 us 2.6447 us] thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s] change: time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05) thrpt: [+562.65% +571.51% +580.76%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe many-modules-registered-traps/8 time: [2.6294 us 2.6460 us 2.6623 us] thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s] change: time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05) thrpt: [+567.63% +588.95% +608.95%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe many-modules-registered-traps/64 time: [2.6218 us 2.6329 us 2.6452 us] thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s] change: time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05) thrpt: [+1431.4% +1450.6% +1469.5%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild many-modules-registered-traps/512 time: [2.6569 us 2.6737 us 2.6923 us] thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s] change: time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05) thrpt: [+13417% +13566% +13731%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild many-modules-registered-traps/4096 time: [2.7258 us 2.7390 us 2.7535 us] thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s] change: time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05) thrpt: [+221417% +223380% +224881%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe many-stack-frames-traps/1 time: [1.4658 us 1.4719 us 1.4784 us] thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s] change: time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05) thrpt: [+860.23% +894.72% +938.21%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe many-stack-frames-traps/8 time: [2.4772 us 2.4870 us 2.4973 us] thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s] change: time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05) thrpt: [+575.65% +583.51% +592.03%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe many-stack-frames-traps/64 time: [10.109 us 10.171 us 10.236 us] thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s] change: time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05) thrpt: [+341.22% +350.38% +357.55%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe many-stack-frames-traps/512 time: [126.16 us 126.54 us 126.96 us] thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s] change: time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05) thrpt: [+181.32% +185.17% +188.71%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe ``` </details> <h3>Calls</h3> There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call performance due the new trampolines. It seems to be on the order of about 2-10 nanoseconds per call, depending on the benchmark. I believe this regression is ultimately acceptable because 1. this overhead will be vastly dominated by whatever work a non-nop callee actually does, 2. we will need these trampolines, or something like them, when implementing the Wasm exceptions proposal to do things like translate Wasm's exceptions into Rust's `Result`s, 3. and because the performance improvements to trapping and capturing stack traces are of such a larger magnitude than this call regressions. <details> ``` sync/no-hook/host-to-wasm - typed - nop time: [28.683 ns 28.757 ns 28.844 ns] change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/host-to-wasm - untyped - nop time: [42.515 ns 42.652 ns 42.841 ns] change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop time: [33.936 ns 34.052 ns 34.179 ns] change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/host-to-wasm - typed - nop-params-and-results time: [34.290 ns 34.388 ns 34.502 ns] change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe sync/no-hook/host-to-wasm - untyped - nop-params-and-results time: [62.546 ns 62.721 ns 62.919 ns] change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop-params-and-results time: [42.609 ns 42.710 ns 42.831 ns] change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop time: [29.546 ns 29.675 ns 29.818 ns] change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop time: [45.448 ns 45.699 ns 45.961 ns] change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop time: [34.334 ns 34.437 ns 34.558 ns] change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop-params-and-results time: [36.594 ns 36.763 ns 36.974 ns] change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [63.541 ns 63.831 ns 64.194 ns] change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results time: [43.968 ns 44.169 ns 44.437 ns] change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) high mild 12 (12.00%) high severe async/no-hook/host-to-wasm - typed - nop time: [4.9612 us 4.9743 us 4.9889 us] change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async/no-hook/host-to-wasm - untyped - nop time: [5.0030 us 5.0211 us 5.0439 us] change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe async/no-hook/host-to-wasm - typed - nop-params-and-results time: [4.9273 us 4.9468 us 4.9700 us] change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async/no-hook/host-to-wasm - untyped - nop-params-and-results time: [5.1151 us 5.1338 us 5.1555 us] change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/hook-sync/host-to-wasm - typed - nop time: [4.9330 us 4.9394 us 4.9467 us] change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe async/hook-sync/host-to-wasm - untyped - nop time: [5.0073 us 5.0183 us 5.0310 us] change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe async/hook-sync/host-to-wasm - typed - nop-params-and-results time: [4.9610 us 4.9839 us 5.0097 us] change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [5.0995 us 5.1272 us 5.1617 us] change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop time: [2.4242 us 2.4316 us 2.4396 us] change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop time: [2.5102 us 2.5155 us 2.5210 us] change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop-params-and-results time: [2.4203 us 2.4310 us 2.4440 us] change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results time: [2.5501 us 2.5593 us 2.5700 us] change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop time: [2.4135 us 2.4190 us 2.4254 us] change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop time: [2.5172 us 2.5248 us 2.5357 us] change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results time: [2.4214 us 2.4353 us 2.4532 us] change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) high mild 13 (13.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [2.5499 us 2.5607 us 2.5748 us] change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe sync/no-hook/wasm-to-host - nop - typed time: [6.6135 ns 6.6288 ns 6.6452 ns] change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.930 ns 15.993 ns 16.067 ns] change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe sync/no-hook/wasm-to-host - nop - untyped time: [20.596 ns 20.640 ns 20.690 ns] change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - untyped time: [42.659 ns 42.882 ns 43.159 ns] change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe sync/no-hook/wasm-to-host - nop - unchecked time: [10.671 ns 10.691 ns 10.713 ns] change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.136 ns 11.190 ns 11.263 ns] change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop - typed time: [6.7964 ns 6.8087 ns 6.8226 ns] change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.865 ns 15.921 ns 15.985 ns] change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe sync/hook-sync/wasm-to-host - nop - untyped time: [21.505 ns 21.587 ns 21.677 ns] change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [44.018 ns 44.128 ns 44.261 ns] change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe sync/hook-sync/wasm-to-host - nop - unchecked time: [11.264 ns 11.326 ns 11.387 ns] change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.816 ns 11.865 ns 11.920 ns] change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop - typed time: [6.6221 ns 6.6385 ns 6.6569 ns] change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.884 ns 15.929 ns 15.983 ns] change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/no-hook/wasm-to-host - nop - untyped time: [20.615 ns 20.702 ns 20.821 ns] change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.956 ns 42.207 ns 42.521 ns] change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async/no-hook/wasm-to-host - nop - unchecked time: [10.440 ns 10.474 ns 10.513 ns] change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.476 ns 11.512 ns 11.554 ns] change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe async/no-hook/wasm-to-host - nop - async-typed time: [26.427 ns 26.478 ns 26.532 ns] change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [28.557 ns 28.693 ns 28.880 ns] change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe async/hook-sync/wasm-to-host - nop - typed time: [6.7488 ns 6.7630 ns 6.7784 ns] change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.928 ns 16.031 ns 16.149 ns] change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 9 (9.00%) high mild 2 (2.00%) high severe async/hook-sync/wasm-to-host - nop - untyped time: [21.930 ns 22.114 ns 22.296 ns] change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.684 ns 42.858 ns 43.081 ns] change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) high mild 12 (12.00%) high severe async/hook-sync/wasm-to-host - nop - unchecked time: [11.026 ns 11.053 ns 11.086 ns] change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.840 ns 11.900 ns 11.982 ns] change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async/hook-sync/wasm-to-host - nop - async-typed time: [27.601 ns 27.709 ns 27.882 ns] change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [28.955 ns 29.174 ns 29.413 ns] change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async-pool/no-hook/wasm-to-host - nop - typed time: [6.5626 ns 6.5733 ns 6.5851 ns] change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.820 ns 15.886 ns 15.969 ns] change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) high mild 13 (13.00%) high severe async-pool/no-hook/wasm-to-host - nop - untyped time: [20.481 ns 20.521 ns 20.566 ns] change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.834 ns 41.998 ns 42.189 ns] change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe async-pool/no-hook/wasm-to-host - nop - unchecked time: [10.353 ns 10.380 ns 10.414 ns] change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.123 ns 11.168 ns 11.228 ns] change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe async-pool/no-hook/wasm-to-host - nop - async-typed time: [27.442 ns 27.528 ns 27.638 ns] change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [29.014 ns 29.148 ns 29.312 ns] change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/wasm-to-host - nop - typed time: [6.7916 ns 6.8116 ns 6.8325 ns] change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.917 ns 15.975 ns 16.051 ns] change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop - untyped time: [21.558 ns 21.612 ns 21.679 ns] change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.475 ns 42.614 ns 42.775 ns] change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/hook-sync/wasm-to-host - nop - unchecked time: [11.150 ns 11.195 ns 11.247 ns] change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.639 ns 11.695 ns 11.760 ns] change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe async-pool/hook-sync/wasm-to-host - nop - async-typed time: [27.480 ns 27.712 ns 27.984 ns] change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [29.218 ns 29.380 ns 29.600 ns] change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe ``` </details> * Add s390x support for frame pointer-based stack walking * wasmtime: Allow `Caller::get_export` to get all exports * fuzzing: Add a fuzz target to check that our stack traces are correct We generate Wasm modules that keep track of their own stack as they call and return between functions, and then we periodically check that if the host captures a backtrace, it matches what the Wasm module has recorded. * Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code * Add doc comment with stack walking implementation notes * Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods * Add extensive comments for stack walking function * Factor architecture-specific bits of stack walking out into modules * Initialize store-related fields in a vmctx to null when there is no store yet Rather than leaving them as uninitialized data. * Use `set_callee` instead of manually setting the vmctx field * Use a more informative compile error message for unsupported architectures * Document unsafety of `prepare_host_to_wasm_trampoline` * Use `bti c` instead of `hint #34` in inline aarch64 assembly * Remove outdated TODO comment * Remove setting of `last_wasm_exit_fp` in `set_jit_trap` This is no longer needed as the value is plumbed through to the backtrace code directly now. * Only set the stack limit once, in the face of re-entrancy into Wasm * Add comments for s390x-specific stack walking bits * Use the helper macro for all libcalls If we forget to use it, and then trigger a GC from the libcall, that means we could miss stack frames when walking the stack, fail to find live GC refs, and then get use after free bugs. Much less risky to always use the helper macro that takes care of all of that for us. * Use the `asm_sym!` macro in Wasm-to-libcall trampolines This macro handles the macOS-specific underscore prefix stuff for us. * wasmtime: add size and align to `externref` assertion error message * Extend the `stacks` fuzzer to have host frames in between Wasm frames This way we get one or more contiguous sequences of Wasm frames on the stack, instead of exactly one. * Add documentation for aarch64-specific backtrace helpers * Clarify that we only support little-endian aarch64 in trampoline comment * Use `.machine z13` in s390x assembly file Since apparently our CI machines have pretty old assemblers that don't have `.machine z14`. This should be fine though since these trampolines don't make use of anything that is introduced in z14. * Fix aarch64 build * Fix macOS build * Document the `asm_sym!` macro * Add windows support to the `wasmtime-asm-macros` crate * Add windows support to host<--->Wasm trampolines * Fix trap handler build on windows * Run `rustfmt` on s390x trampoline source file * Temporarily disable some assertions about a trap's backtrace in the component model tests Follow up to re-enable this and fix the associated issue: https://github.com/bytecodealliance/wasmtime/issues/4535 * Refactor libcall definitions with less macros This refactors the `libcall!` macro to use the `foreach_builtin_function!` macro to define all of the trampolines. Additionally the macro surrounding each libcall itself is no longer necessary and helps avoid too many macros. * Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new` * Move `backtrace` module to be submodule of `traphandlers` This avoids making some things `pub(crate)` in `traphandlers` that really shouldn't be. * Fix macOS aarch64 build * Use "i64" instead of "word" in aarch64-specific file * Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions Also clean up assertions surrounding our saved entry/exit registers. * Put "typed" vs "untyped" in the same position of call benchmark names Regardless if we are doing wasm-to-host or host-to-wasm * Fix stacks test case generator build for new `wasm-encoder` * Fix build for s390x * Expand libcalls in s390x asm * Disable more parts of component tests now that backtrace assertions are a bit tighter * Remove assertion that can maybe fail on s390x Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-07-28 15:46:14 -07:00
Damian Heaton	5e3bb588a8	Port `Fence`, `IsNull`/`IsInvalid` & `Debugtrap` to ISLE (AArch64) (#4548 ) Ported the existing implementation of the following Opcodes for AArch64 to ISLE: - `Fence` - `IsNull` - `IsInvalid` - `Debugtrap` Copyright (c) 2022 Arm Limited	2022-07-28 15:36:13 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Alex Crichton	32979b2714	Implement flags in fused adapters (#4549 ) This implements the `flags` type for fused adapters and converting between modules. The main logic here is handling the variable size of flags in addition to the masking which happens to ignore unrelated bits when the values pass through the canonical ABI.	2022-07-28 14:56:32 -05:00
Pure White	fb7d51033c	doc: fix c set strategy doc (#4550 ) In #4252 I changed the signature and doc of the rust code, but forgot to change the doc for c. Set strategy no longer returns error. This commit fixes that.	2022-07-28 18:01:03 +00:00
Alex Crichton	ce7bbef24d	Implement other variant-like types in adapter fusion (#4547 ) This commit fills out the adapter fusion compiler for the `union`, `enum`, `option,` and `result` types. The preexisting support for `variant` types was refactored slightly to be extensible to all of these other types and they all now call into the same common translation code.	2022-07-28 11:47:15 -05:00
Alex Crichton	e1148e43be	Implement `char` type in adapter fusion (#4544 ) This commit implements the translation of `char` which validates that it's in the valid range of unicode scalar values. The precise validation here is lifted from LLVM in the hopes that it's probably better than whatever I would concoct by hand.	2022-07-28 11:47:01 -05:00
Andrew Brown	8137432e67	x64: only enable VTune dependencies on x86_64 targets (#4533 ) As described in #4523, the `ittapi` dependency necessary for Wasmtime's VTune support does not compile on `aarch64-linux-android`. There are several incompatible parts here: though `ittapi` supports all OS combinations that Wasmtime does and builds on all CPU targets, VTune is not primarily intended for aarch64 profiling and `linux-android` is not a high priority platform for the library. We could conditionally depend on `ittapi` for Wasmtime's supported OS combinations, but I think a better answer is to limit it to x86_64 targets, since this more clearly shows why the `ittapi`/VTune support is present and also fixes #4523.	2022-07-28 09:22:27 -05:00
Jamey Sharp	ad050e6fb2	cranelift-wasm: Only allocate if vectors need bitcasts (#4543 ) For wasm programs using SIMD vector types, the type known at function entry or exit may not match the type used within the body of the function, so we have to bitcast them. We have to check all calls and returns for this condition, which involves comparing a subset of a function's signature with the CLIF types we're trying to use. Currently, this check heap-allocates a short-lived Vec for the appropriate subset of the signature. But most of the time none of the values need a bitcast. So this patch avoids allocating unless at least one bitcast is actually required, and only saves the pointers to values that need fixing up. I leaned heavily on iterators to keep space usage constant until we discover allocation is necessary after all. I don't think it's possible to eliminate the allocation entirely, because the function signature we're examining is borrowed from the FuncBuilder, but we need to mutably borrow the FuncBuilder to insert the bitcast instructions. Fortunately, the FromIterator implementation for Vec doesn't reserve any heap space if the iterator is empty, so we can unconditionally collect into a Vec and still avoid unnecessary allocations. Since the relationship between a function signature and a list of CLIF values is somewhat complicated, I refactored all the uses of `bitcast_arguments` and `wasm_param_types`. Instead there's `bitcast_wasm_params` and `bitcast_wasm_returns` which share a helper that combines the previous pair of functions into one. According to DHAT, when compiling the Sightglass Spidermonkey benchmark, this avoids 52k allocations averaging about 9 bytes each, which are freed on average within 2k instructions. Most Sightglass benchmarks, including Spidermonkey, show no performance difference with this change. Only one has a slowdown, and it's small: compilation :: nanoseconds :: benchmarks/shootout-matrix/benchmark.wasm Δ = 689373.34 ± 593720.78 (confidence = 99%) lazy-bitcast.so is 0.94x to 1.00x faster than main-e121c209f.so! main-e121c209f.so is 1.00x to 1.06x faster than lazy-bitcast.so! [19741713 21375844.46 32976047] lazy-bitcast.so [19345471 20686471.12 30872267] main-e121c209f.so But several Sightglass benchmarks have notable speed-ups, with smaller improvements for shootout-ed25519, meshoptimizer, and pulldown-cmark, and larger ones as follows: compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 20071545.47 ± 2950014.92 (confidence = 99%) lazy-bitcast.so is 1.26x to 1.36x faster than main-e121c209f.so! main-e121c209f.so is 0.73x to 0.80x faster than lazy-bitcast.so! [55995164 64849257.68 89083031] lazy-bitcast.so [79382460 84920803.15 98016185] main-e121c209f.so compilation :: nanoseconds :: benchmarks/blake3-scalar/benchmark.wasm Δ = 16620780.61 ± 5395162.13 (confidence = 99%) lazy-bitcast.so is 1.14x to 1.28x faster than main-e121c209f.so! main-e121c209f.so is 0.77x to 0.88x faster than lazy-bitcast.so! [54604352 79877776.35 103666598] lazy-bitcast.so [94011835 96498556.96 106200091] main-e121c209f.so compilation :: nanoseconds :: benchmarks/intgemm-simd/benchmark.wasm Δ = 36891254.34 ± 12403663.50 (confidence = 99%) lazy-bitcast.so is 1.12x to 1.24x faster than main-e121c209f.so! main-e121c209f.so is 0.79x to 0.90x faster than lazy-bitcast.so! [131610845 201289587.88 247341883] lazy-bitcast.so [232065032 238180842.22 250957563] main-e121c209f.so	2022-07-27 16:41:30 -07:00
Alex Crichton	174b60dcf7	Add `.wast` support for invoking components (#4526 ) This commit builds on bytecodealliance/wasm-tools#690 to add support to testing of the component model to execute functions when running `.wast` files. This support is all built on #4442 as functions are invoked through a "dynamic" API. Right now the testing and integration is fairly crude but I'm hoping that we can try to improve it over time as necessary. For now this should provide a hopefully more convenient syntax for unit tests and the like.	2022-07-27 21:02:16 +00:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Afonso Bordado	e121c209fc	cranelift: Fix `urem`/`srem` in interpreter (#4532 )	2022-07-27 10:47:08 -07:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Alex Crichton	285bc5ce24	Implement variant translation in fused adapters (#4534 ) * Implement variant translation in fused adapters This commit implements the most general case of variants for fused adapter trampolines. Additionally a number of other primitive types are filled out here to assist with testing variants. The implementation internally was relatively straightforward given the shape of variants, but there's room for future optimization as necessary especially around converting locals to various types. This commit also introduces a "one off" fuzzer for adapters to ensure that the generated adapter is valid. I hope to extend this fuzz generator as more types are implemented to assist in various corner cases that might arise. For now the fuzzer simply tests that the output wasm module is valid, not that it actually executes correctly. I hope to integrate with a fuzzer along the lines of #4307 one day to test the run-time-correctness of the generated adapters as well, at which point this fuzzer would become obsolete. Finally this commit also fixes an issue with `u8` translation where upper bits weren't zero'd out and were passed raw across modules. Instead smaller-than-32 types now all mask out their upper bits and do sign-extension as appropriate for unsigned/signed variants. * Fuzz memory64 in the new trampoline fuzzer Currently memory64 isn't supported elsewhere in the component model implementation of Wasmtime but the trampoline compiler seems as good a place as any to ensure that it at least works in isolation. This plumbs through fuzz input into a `memory64` boolean which gets fed into compilation. Some miscellaneous bugs were fixed as a result to ensure that memory64 trampolines all validate correctly. * Tweak manifest for doc build	2022-07-27 09:14:43 -05:00
Jamey Sharp	799e8919fe	Don't allocate in DataFlowGraph::block_param_types (#4538 ) DHAT reports that when compiling the Spidermonkey Sightglass benchmark, there are over 100k of these Vec allocations, averaging less than 4 bytes, and with an average lifetime of only about 500 instructions. This function is only called from one place, which immediately converts it into an iterator. So this commit just returns the iterator that was previously being collected into a Vec. The iterator has to borrow from the DataFlowGraph, so this would change borrow-check results, but in the one caller that turns out to be okay. (That sole caller is in cranelift/codegen/src/machinst/lower.rs, in Lower::lower().) According to Sightglass, this is a compile-time improvement of between 2% and 12% on the Spidermonkey benchmark: instantiation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 14628.76 ± 10318.59 (confidence = 99%) main-0e6ffd024.so is 0.87x to 0.98x faster than no-small-vecs.so! no-small-vecs.so is 1.02x to 1.14x faster than main-0e6ffd024.so! [142023 187464.24 301522] main-0e6ffd024.so [103742 172835.48 263917] no-small-vecs.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 362392705.93 ± 267070467.06 (confidence = 99%) main-0e6ffd024.so is 0.89x to 0.98x faster than no-small-vecs.so! no-small-vecs.so is 1.02x to 1.12x faster than main-0e6ffd024.so! [3655734131 5522594697.83 6471126699] main-0e6ffd024.so [3278129811 5160201991.90 5810600015] no-small-vecs.so	2022-07-27 01:59:18 +00:00
Nick Fitzgerald	50b9195882	cranelift-frontend: Reuse visited block sets in `SSABuilder::can_optimize_var_lookup` (#4536 ) First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet` will deallocate the btree's nodes but clearing a `HashSet` will not deallocate the backing hash table, saving the space to reuse for future insertions. Then, we reuse the same set (and therefore the same allocation) across every call to `can_optimize_var_lookup`. This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks: ``` compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 39478181.76 ± 3441880.32 (confidence = 99%) main.so is 0.75x to 0.79x faster than reuse-set.so! reuse-set.so is 1.27x to 1.32x faster than main.so! [160128343 172174751.09 213325968] main.so [115055695 132696569.33 200782128] reuse-set.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 22576954.88 ± 1830771.68 (confidence = 99%) main.so is 0.77x to 0.81x faster than reuse-set.so! reuse-set.so is 1.25x to 1.29x faster than main.so! [100449245 106820149.65 118628066] main.so [77039172 84243194.77 128168647] reuse-set.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 664533554.97 ± 22109170.05 (confidence = 99%) main.so is 0.81x to 0.82x faster than reuse-set.so! reuse-set.so is 1.22x to 1.23x faster than main.so! [3549762523 3640587103.35 3798662501] main.so [2793335181 2976053548.38 3192950484] reuse-set.so ```	2022-07-26 18:38:24 -07:00
Dan Gohman	0e6ffd0243	Don't try to report file size or timestamps for stdio streams. (#4531 ) * Don't try to report file size or timestamps for stdio streams. Calling `File::metadata()` on a stdio stream handle fails on Windows, where the stdio streams are not files. This `File::metadata()` call was effectively only being used to add file size and timestamps to the result of `filestat_get`. It's common for users to redirect stdio streams to interesting places, and applications generally shouldn't change their behavior depending on the size or timestamps of the file, if the streams are redirected to a file, so just leave these fields to 0, which is commonly understood to represent "unknown". Fixes #4497.	2022-07-26 15:53:17 -07:00
Anton Kirilov	ead6edb0c5	Cranelift AArch64: Migrate Splat to ISLE (#4521 ) Copyright (c) 2022, Arm Limited.	2022-07-26 17:57:15 +00:00
Alex Crichton	1321c234e5	Remove dependency on `more-asserts` (#4408 ) * Remove dependency on `more-asserts` In my recent adventures to do a bit of gardening on our dependencies I noticed that there's a new major version for the `more-asserts` crate. Instead of updating to this though I've opted to instead remove the dependency since I don't think we heavily lean on this crate and otherwise one-off prints are probably sufficient to avoid the need for pulling in a whole crate for this. * Remove exemption for `more-asserts`	2022-07-26 16:47:33 +00:00
Afonso Bordado	1183191d7d	fuzzgen: Add i128 support (#4529 )	2022-07-26 09:40:12 -07:00

... 2 3 4 5 6 ...

10242 Commits