wasmtime

Author	SHA1	Message	Date
Nick Fitzgerald	a2f846f124	Don't re-capture backtraces when propagating traps through host frames (#5049 ) * Add a benchmark for traps with many Wasm<-->host calls on the stack * Add a test for expected Wasm stack traces with Wasm<--host calls on the stack when we trap * Don't re-capture backtraces when propagating traps through host frames This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a ton of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes #5037 * debug assert that we don't record unnecessary backtraces for traps * Add assertions around `needs_backtrace` Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config. * Consolidate `needs_backtrace` calculation followed by calling `raise_user_trap` into one place	2022-10-13 07:22:46 -07:00
Nick Fitzgerald	ff0e84ecf4	Wasmtime: fix stack walking across frames from different stores (#4779 ) We were previously implicitly assuming that all Wasm frames in a stack used the same `VMRuntimeLimits` as the previous frame we walked, but this is not true when Wasm in store A calls into the host which then calls into Wasm in store B: \| ... \| \| Host \| \| +-----------------+ \| stack \| Wasm in store A \| \| grows +-----------------+ \| down \| Host \| \| +-----------------+ \| \| Wasm in store B \| V +-----------------+ Trying to walk this stack would previously result in a runtime panic. The solution is to push the maintenance of our list of saved Wasm FP/SP/PC registers that allow us to identify contiguous regions of Wasm frames on the stack deeper into `CallThreadState`. The saved registers list is now maintained whenever updating the `CallThreadState` linked list by making the `CallThreadState::prev` field private and only accessible via a getter and setter, where the setter always maintains our invariants.	2022-08-30 18:28:00 +00:00
Benjamin Bouvier	a40b253792	Uncomment unwind stack frame tests that now pass on aarch64 (#4687 ) Thanks to #4431 and @fitzgen who implemented it!	2022-08-11 15:09:04 +00:00
Nick Fitzgerald	ec47335b9c	wasmtime: Add a `Config::native_unwind_info` method (#4643 ) This method configures whether native unwind information (e.g. `.eh_frame` on Linux) is generated or not. This helps integrate with third-party stack capturing tools, such as the system unwinder or the `backtrace` crate. It does not affect whether Wasmtime can capture stack traces in Wasm code that it is running or not. Unwind info is always enabled on Windows, since the Windows ABI requires it. This configuration option defaults to true. Additionally, we deprecate `Config::wasm_backtrace` since we can always cheaply capture stack traces ever since https://github.com/bytecodealliance/wasmtime/pull/4431. Fixes https://github.com/bytecodealliance/wasmtime/issues/4554	2022-08-08 13:54:51 -07:00
Nick Fitzgerald	46782b18c2	`wasmtime`: Implement fast Wasm stack walking (#4431 ) * Always preserve frame pointers in Wasmtime This allows us to efficiently and simply capture Wasm stacks without maintaining and synchronizing any safety-critical side tables between the compiler and the runtime. * wasmtime: Implement fast Wasm stack walking Why do we want Wasm stack walking to be fast? Because we capture stacks whenever there is a trap and traps actually happen fairly frequently with short-lived programs and WASI's `exit`. Previously, we would rely on generating the system unwind info (e.g. `.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk the full stack and filter out any non-Wasm stack frames. This can, unfortunately, be slow for two primary reasons: 1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than `O(wasm-frames)` work. 2. System unwind info and the system unwinder need to be much more general than a purpose-built stack walker for Wasm needs to be. It has to handle any kind of stack frame that any compiler might emit where as our Wasm frames are emitted by Cranelift and always have frame pointers. This translates into implementation complexity and general overhead. There can also be unnecessary-for-our-use-cases global synchronization and locks involved, further slowing down stack walking in the presence of multiple threads trying to capture stacks in parallel. This commit introduces a purpose-built stack walker for traversing just our Wasm frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit frame pointer)` pairs. This linked list is maintained via Wasm-to-host and host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use frame pointers (which Cranelift preserves) to find the next older Wasm frame on the stack, and we keep doing this until we reach the entry stack pointer, meaning that the next older frame will be a host frame. The trampolines need to avoid a couple stumbling blocks. First, they need to be compiled ahead of time, since we may not have access to a compiler at runtime (e.g. if the `cranelift` feature is disabled) but still want to be able to call functions that have already been compiled and get stack traces for those functions. Usually this means we would compile the appropriate trampolines inside `Module::new` and the compiled module object would hold the trampolines. However, we also need to support calling host functions that are wrapped into `wasmtime::Func`s and there doesn't exist any ahead-of-time compiled module object to hold the appropriate trampolines: ```rust // Define a host function. let func_type = wasmtime::FuncType::new( vec![wasmtime::ValType::I32], vec![wasmtime::ValType::I32], ); let func = Func::new(&mut store, func_type, \|_, params, results\| { // ... Ok(()) }); // Call that host function. let mut results = vec![wasmtime::Val::I32(0)]; func.call(&[wasmtime::Val::I32(0)], &mut results)?; ``` Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline in assembly that work for all Wasm and host function signatures. These trampolines are careful to only use volatile registers, avoid touching any register that is an argument in the calling convention ABI, and tail call to the target callee function. This allows forwarding any set of arguments and any returns to and from the callee, while also allowing us to maintain our linked list of Wasm stack and frame pointers before transferring control to the callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing the host-Wasm boundary, so they do not impose overhead on regular calls. (And if using one trampoline for all host-Wasm boundary crossing ever breaks branch prediction enough in the CPU to become any kind of bottleneck, we can do fun things like have multiple copies of the same trampoline and choose a random copy for each function, sharding the functions across branch predictor entries.) Finally, this commit also ends the use of a synthetic `Module` and allocating a stubbed out `VMContext` for host functions. Instead, we define a `VMHostFuncContext` with its own magic value, similar to `VMComponentContext`, specifically for host functions. <h2>Benchmarks</h2> <h3>Traps and Stack Traces</h3> Large improvements to taking stack traces on traps, ranging from shaving off 64% to 99.95% of the time it used to take. <details> ``` multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us] thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s] change: time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05) thrpt: [+560.90% +573.56% +585.84%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us] thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s] change: time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05) thrpt: [+1023.1% +1048.6% +1070.3%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us] thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s] change: time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05) thrpt: [+1503.1% +1542.0% +1578.0%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us] thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s] change: time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05) thrpt: [+1339.2% +1353.6% +1369.1%] Performance has improved. multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us] thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s] change: time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05) thrpt: [+1624.7% +1709.2% +1806.1%] Performance has improved. multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us] thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s] change: time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05) thrpt: [+2821.5% +3048.1% +3281.1%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild many-modules-registered-traps/1 time: [2.6278 us 2.6361 us 2.6447 us] thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s] change: time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05) thrpt: [+562.65% +571.51% +580.76%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe many-modules-registered-traps/8 time: [2.6294 us 2.6460 us 2.6623 us] thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s] change: time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05) thrpt: [+567.63% +588.95% +608.95%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe many-modules-registered-traps/64 time: [2.6218 us 2.6329 us 2.6452 us] thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s] change: time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05) thrpt: [+1431.4% +1450.6% +1469.5%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild many-modules-registered-traps/512 time: [2.6569 us 2.6737 us 2.6923 us] thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s] change: time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05) thrpt: [+13417% +13566% +13731%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild many-modules-registered-traps/4096 time: [2.7258 us 2.7390 us 2.7535 us] thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s] change: time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05) thrpt: [+221417% +223380% +224881%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe many-stack-frames-traps/1 time: [1.4658 us 1.4719 us 1.4784 us] thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s] change: time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05) thrpt: [+860.23% +894.72% +938.21%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe many-stack-frames-traps/8 time: [2.4772 us 2.4870 us 2.4973 us] thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s] change: time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05) thrpt: [+575.65% +583.51% +592.03%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe many-stack-frames-traps/64 time: [10.109 us 10.171 us 10.236 us] thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s] change: time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05) thrpt: [+341.22% +350.38% +357.55%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe many-stack-frames-traps/512 time: [126.16 us 126.54 us 126.96 us] thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s] change: time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05) thrpt: [+181.32% +185.17% +188.71%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe ``` </details> <h3>Calls</h3> There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call performance due the new trampolines. It seems to be on the order of about 2-10 nanoseconds per call, depending on the benchmark. I believe this regression is ultimately acceptable because 1. this overhead will be vastly dominated by whatever work a non-nop callee actually does, 2. we will need these trampolines, or something like them, when implementing the Wasm exceptions proposal to do things like translate Wasm's exceptions into Rust's `Result`s, 3. and because the performance improvements to trapping and capturing stack traces are of such a larger magnitude than this call regressions. <details> ``` sync/no-hook/host-to-wasm - typed - nop time: [28.683 ns 28.757 ns 28.844 ns] change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/host-to-wasm - untyped - nop time: [42.515 ns 42.652 ns 42.841 ns] change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop time: [33.936 ns 34.052 ns 34.179 ns] change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/host-to-wasm - typed - nop-params-and-results time: [34.290 ns 34.388 ns 34.502 ns] change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe sync/no-hook/host-to-wasm - untyped - nop-params-and-results time: [62.546 ns 62.721 ns 62.919 ns] change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop-params-and-results time: [42.609 ns 42.710 ns 42.831 ns] change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop time: [29.546 ns 29.675 ns 29.818 ns] change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop time: [45.448 ns 45.699 ns 45.961 ns] change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop time: [34.334 ns 34.437 ns 34.558 ns] change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop-params-and-results time: [36.594 ns 36.763 ns 36.974 ns] change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [63.541 ns 63.831 ns 64.194 ns] change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results time: [43.968 ns 44.169 ns 44.437 ns] change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) high mild 12 (12.00%) high severe async/no-hook/host-to-wasm - typed - nop time: [4.9612 us 4.9743 us 4.9889 us] change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async/no-hook/host-to-wasm - untyped - nop time: [5.0030 us 5.0211 us 5.0439 us] change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe async/no-hook/host-to-wasm - typed - nop-params-and-results time: [4.9273 us 4.9468 us 4.9700 us] change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async/no-hook/host-to-wasm - untyped - nop-params-and-results time: [5.1151 us 5.1338 us 5.1555 us] change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/hook-sync/host-to-wasm - typed - nop time: [4.9330 us 4.9394 us 4.9467 us] change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe async/hook-sync/host-to-wasm - untyped - nop time: [5.0073 us 5.0183 us 5.0310 us] change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe async/hook-sync/host-to-wasm - typed - nop-params-and-results time: [4.9610 us 4.9839 us 5.0097 us] change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [5.0995 us 5.1272 us 5.1617 us] change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop time: [2.4242 us 2.4316 us 2.4396 us] change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop time: [2.5102 us 2.5155 us 2.5210 us] change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop-params-and-results time: [2.4203 us 2.4310 us 2.4440 us] change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results time: [2.5501 us 2.5593 us 2.5700 us] change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop time: [2.4135 us 2.4190 us 2.4254 us] change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop time: [2.5172 us 2.5248 us 2.5357 us] change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results time: [2.4214 us 2.4353 us 2.4532 us] change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) high mild 13 (13.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [2.5499 us 2.5607 us 2.5748 us] change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe sync/no-hook/wasm-to-host - nop - typed time: [6.6135 ns 6.6288 ns 6.6452 ns] change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.930 ns 15.993 ns 16.067 ns] change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe sync/no-hook/wasm-to-host - nop - untyped time: [20.596 ns 20.640 ns 20.690 ns] change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - untyped time: [42.659 ns 42.882 ns 43.159 ns] change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe sync/no-hook/wasm-to-host - nop - unchecked time: [10.671 ns 10.691 ns 10.713 ns] change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.136 ns 11.190 ns 11.263 ns] change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop - typed time: [6.7964 ns 6.8087 ns 6.8226 ns] change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.865 ns 15.921 ns 15.985 ns] change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe sync/hook-sync/wasm-to-host - nop - untyped time: [21.505 ns 21.587 ns 21.677 ns] change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [44.018 ns 44.128 ns 44.261 ns] change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe sync/hook-sync/wasm-to-host - nop - unchecked time: [11.264 ns 11.326 ns 11.387 ns] change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.816 ns 11.865 ns 11.920 ns] change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop - typed time: [6.6221 ns 6.6385 ns 6.6569 ns] change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.884 ns 15.929 ns 15.983 ns] change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/no-hook/wasm-to-host - nop - untyped time: [20.615 ns 20.702 ns 20.821 ns] change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.956 ns 42.207 ns 42.521 ns] change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async/no-hook/wasm-to-host - nop - unchecked time: [10.440 ns 10.474 ns 10.513 ns] change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.476 ns 11.512 ns 11.554 ns] change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe async/no-hook/wasm-to-host - nop - async-typed time: [26.427 ns 26.478 ns 26.532 ns] change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [28.557 ns 28.693 ns 28.880 ns] change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe async/hook-sync/wasm-to-host - nop - typed time: [6.7488 ns 6.7630 ns 6.7784 ns] change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.928 ns 16.031 ns 16.149 ns] change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 9 (9.00%) high mild 2 (2.00%) high severe async/hook-sync/wasm-to-host - nop - untyped time: [21.930 ns 22.114 ns 22.296 ns] change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.684 ns 42.858 ns 43.081 ns] change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) high mild 12 (12.00%) high severe async/hook-sync/wasm-to-host - nop - unchecked time: [11.026 ns 11.053 ns 11.086 ns] change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.840 ns 11.900 ns 11.982 ns] change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async/hook-sync/wasm-to-host - nop - async-typed time: [27.601 ns 27.709 ns 27.882 ns] change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [28.955 ns 29.174 ns 29.413 ns] change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async-pool/no-hook/wasm-to-host - nop - typed time: [6.5626 ns 6.5733 ns 6.5851 ns] change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.820 ns 15.886 ns 15.969 ns] change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) high mild 13 (13.00%) high severe async-pool/no-hook/wasm-to-host - nop - untyped time: [20.481 ns 20.521 ns 20.566 ns] change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.834 ns 41.998 ns 42.189 ns] change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe async-pool/no-hook/wasm-to-host - nop - unchecked time: [10.353 ns 10.380 ns 10.414 ns] change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.123 ns 11.168 ns 11.228 ns] change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe async-pool/no-hook/wasm-to-host - nop - async-typed time: [27.442 ns 27.528 ns 27.638 ns] change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [29.014 ns 29.148 ns 29.312 ns] change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/wasm-to-host - nop - typed time: [6.7916 ns 6.8116 ns 6.8325 ns] change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.917 ns 15.975 ns 16.051 ns] change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop - untyped time: [21.558 ns 21.612 ns 21.679 ns] change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.475 ns 42.614 ns 42.775 ns] change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/hook-sync/wasm-to-host - nop - unchecked time: [11.150 ns 11.195 ns 11.247 ns] change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.639 ns 11.695 ns 11.760 ns] change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe async-pool/hook-sync/wasm-to-host - nop - async-typed time: [27.480 ns 27.712 ns 27.984 ns] change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [29.218 ns 29.380 ns 29.600 ns] change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe ``` </details> * Add s390x support for frame pointer-based stack walking * wasmtime: Allow `Caller::get_export` to get all exports * fuzzing: Add a fuzz target to check that our stack traces are correct We generate Wasm modules that keep track of their own stack as they call and return between functions, and then we periodically check that if the host captures a backtrace, it matches what the Wasm module has recorded. * Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code * Add doc comment with stack walking implementation notes * Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods * Add extensive comments for stack walking function * Factor architecture-specific bits of stack walking out into modules * Initialize store-related fields in a vmctx to null when there is no store yet Rather than leaving them as uninitialized data. * Use `set_callee` instead of manually setting the vmctx field * Use a more informative compile error message for unsupported architectures * Document unsafety of `prepare_host_to_wasm_trampoline` * Use `bti c` instead of `hint #34` in inline aarch64 assembly * Remove outdated TODO comment * Remove setting of `last_wasm_exit_fp` in `set_jit_trap` This is no longer needed as the value is plumbed through to the backtrace code directly now. * Only set the stack limit once, in the face of re-entrancy into Wasm * Add comments for s390x-specific stack walking bits * Use the helper macro for all libcalls If we forget to use it, and then trigger a GC from the libcall, that means we could miss stack frames when walking the stack, fail to find live GC refs, and then get use after free bugs. Much less risky to always use the helper macro that takes care of all of that for us. * Use the `asm_sym!` macro in Wasm-to-libcall trampolines This macro handles the macOS-specific underscore prefix stuff for us. * wasmtime: add size and align to `externref` assertion error message * Extend the `stacks` fuzzer to have host frames in between Wasm frames This way we get one or more contiguous sequences of Wasm frames on the stack, instead of exactly one. * Add documentation for aarch64-specific backtrace helpers * Clarify that we only support little-endian aarch64 in trampoline comment * Use `.machine z13` in s390x assembly file Since apparently our CI machines have pretty old assemblers that don't have `.machine z14`. This should be fine though since these trampolines don't make use of anything that is introduced in z14. * Fix aarch64 build * Fix macOS build * Document the `asm_sym!` macro * Add windows support to the `wasmtime-asm-macros` crate * Add windows support to host<--->Wasm trampolines * Fix trap handler build on windows * Run `rustfmt` on s390x trampoline source file * Temporarily disable some assertions about a trap's backtrace in the component model tests Follow up to re-enable this and fix the associated issue: https://github.com/bytecodealliance/wasmtime/issues/4535 * Refactor libcall definitions with less macros This refactors the `libcall!` macro to use the `foreach_builtin_function!` macro to define all of the trampolines. Additionally the macro surrounding each libcall itself is no longer necessary and helps avoid too many macros. * Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new` * Move `backtrace` module to be submodule of `traphandlers` This avoids making some things `pub(crate)` in `traphandlers` that really shouldn't be. * Fix macOS aarch64 build * Use "i64" instead of "word" in aarch64-specific file * Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions Also clean up assertions surrounding our saved entry/exit registers. * Put "typed" vs "untyped" in the same position of call benchmark names Regardless if we are doing wasm-to-host or host-to-wasm * Fix stacks test case generator build for new `wasm-encoder` * Fix build for s390x * Expand libcalls in s390x asm * Disable more parts of component tests now that backtrace assertions are a bit tighter * Remove assertion that can maybe fail on s390x Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-07-28 15:46:14 -07:00
Pat Hickey	bffce37050	make backtrace collection a Config field rather than a cargo feature (#4183 ) * sorta working in runtime * wasmtime-runtime: get rid of wasm-backtrace feature * wasmtime: factor to make backtraces recording optional. not configurable yet * get rid of wasm-backtrace features * trap tests: now a Trap optionally contains backtrace * eliminate wasm-backtrace feature * code review fixes * ci: no more wasm-backtrace feature * c_api: backtraces always enabled * config: unwind required by backtraces and ref types * plumbed * test that disabling backtraces works * code review comments * fuzzing generator: wasm_backtrace is a runtime config now * doc fix	2022-05-25 12:25:50 -07:00
Alex Crichton	f1225dfd93	Add a compilation section to disable address maps (#3598 ) * Add a compilation section to disable address maps This commit adds a new `Config::generate_address_map` compilation setting which is used to disable emission of the `.wasmtime.addrmap` section of compiled artifacts. This section is currently around the size of the entire `.text` section itself unfortunately and for size reasons may wish to be omitted. Functionality-wise all that is lost is knowing the precise wasm module offset address of a faulting instruction or in a backtrace of instructions. This also means that if the module has DWARF debugging information available with it Wasmtime isn't able to produce a filename and line number in the backtrace. This option remains enabled by default. This option may not be needed in the future with #3547 perhaps, but in the meantime it seems reasonable enough to support a configuration mode where the section is entirely omitted if the smallest module possible is desired. * Fix some CI issues * Update tests/all/traps.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Do less work in compilation for address maps But only when disabled Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2021-12-13 13:48:05 -06:00
Dan Gohman	42b23dac4a	Make the trap name for `unreachable` traps more descriptive. (#3568 ) Following up on WebAssembly/wasi-sdk#210, this makes the trap message for `unreachable` traps more descriptive of what actually caused the trap, so that it doesn't sound like maybe Wasmtime itself executed a `unreachable!()` macro in Rust. Before: ``` wasm trap: unreachable wasm backtrace: [...] ``` After: ``` wasm trap: wasm `unreachable` instruction executed wasm backtrace: [...] ```	2021-11-29 15:55:10 -08:00
Alex Crichton	bcf3544924	Optimize `Func::call` and its C API (#3319 ) * Optimize `Func::call` and its C API This commit is an alternative to #3298 which achieves effectively the same goal of optimizing the `Func::call` API as well as its C API sibling of `wasmtime_func_call`. The strategy taken here is different than #3298 though where a new API isn't created, rather a small tweak to an existing API is done. Specifically this commit handles the major sources of slowness with `Func::call` with: * Looking up the type of a function, to typecheck the arguments with and use to guide how the results should be loaded, no longer hits the rwlock in the `Engine` but instead each `Func` contains its own `FuncType`. This can be an unnecessary allocation for funcs not used with `Func::call`, so this is a downside of this implementation relative to #3298. A mitigating factor, though, is that instance exports are loaded lazily into the `Store` and in theory not too many funcs are active in the store as `Func` objects. * Temporary storage is amortized with a long-lived `Vec` in the `Store` rather than allocating a new vector on each call. This is basically the same strategy as #3294 only applied to different types in different places. Specifically `wasmtime::Store` now retains a `Vec<u128>` for `Func::call`, and the C API retains a `Vec<Val>` for calling `Func::call`. * Finally, an API breaking change is made to `Func::call` and its type signature (as well as `Func::call_async`). Instead of returning `Box<[Val]>` as it did before this function now takes a `results: &mut [Val]` parameter. This allows the caller to manage the allocation and we can amortize-remove it in `wasmtime_func_call` by using space after the parameters in the `Vec<Val>` we're passing in. This change is naturally a breaking change and we'll want to consider it carefully, but mitigating factors are that most embeddings are likely using `TypedFunc::call` instead and this signature taking a mutable slice better aligns with `Func::new` which receives a mutable slice for the results. Overall this change, in the benchmark of "call a nop function from the C API" is not quite as good as #3298. It's still a bit slower, on the order of 15ns, because there's lots of capacity checks around vectors and the type checks are slightly less optimized than before. Overall though this is still significantly better than today because allocations and the rwlock to acquire the type information are both avoided. I personally feel that this change is the best to do because it has less of an API impact than #3298. * Rebase issues	2021-09-21 14:07:05 -05:00
Alex Crichton	3da677796b	Reword env var hint for dwarf debug info (#3081 ) * Reword env var hint for dwarf debug info Try not to declare that more information will indeed be displayed, instead suggest that the output may improve if the env var is set since dwarf debug info wasn't parsed. cc bytecodealliance/wasmtime-go#90 * Fix test assertion	2021-07-15 16:33:47 -05:00
Alex Crichton	7a1b7cdf92	Implement RFC 11: Redesigning Wasmtime's APIs (#2897 ) Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.	2021-06-03 09:10:53 -05:00
Pat Hickey	0f5bdc6497	only wasi_cap_std_sync and wasi_tokio need to define WasiCtxBuilders (#2917 ) * wasmtime-wasi: re-exporting this WasiCtxBuilder was shadowing the right one wasi-common's WasiCtxBuilder is really only useful wasi_cap_std_sync and wasi_tokio to implement their own Builder on top of. This re-export of wasi-common's is 1. not useful and 2. shadow's the re-export of the right one in sync::. wasi-common: eliminate WasiCtxBuilder, make the builder methods on WasiCtx instead * delete wasi-common::WasiCtxBuilder altogether just put those methods directly on &mut WasiCtx. As a bonus, the sync and tokio WasiCtxBuilder::build functions are no longer fallible! * bench fixes * more test fixes	2021-05-21 12:59:39 -05:00
Alex Crichton	7ec073cef1	Bring back per-thread lazy initialization (#2863 ) * Bring back per-thread lazy initialization Platforms Wasmtime supports may have per-thread initialization that needs to run before WebAssembly. For example Unix needs to setup a sigaltstack and macOS needs to set up mach ports. In #2757 this per-thread setup was moved out of the invocation of a wasm function, relying on the lack of Send for Store to initialize the thread at Store creation time and never worry about it later. This conflicted with [wasmtime's desired multithreading story](https://github.com/bytecodealliance/wasmtime/pull/2812) so a new [`Store::notify_switched_thread` was added](https://github.com/bytecodealliance/wasmtime/pull/2822) to explicitly indicate a Store has moved to another thread (if it unsafely did so). It turns out though that it's not always easy to determine when a `Store` moves to a new thread. For example the Go bindings for Wasmtime are generally unaware when a goroutine switches OS threads. This led to https://github.com/bytecodealliance/wasmtime-go/issues/74 where a SIGILL was left uncaught, making it appear that traps aren't working properly. This commit revisits the decision in #2757 and moves per-thread initialization back into the path of calling into WebAssembly. This is differently from before, though, where there's still only one TLS access on the path of calling into WebAssembly, unlike before where it was a separate access. This allows us to get the speed benefits of #2757 as well as the flexibility benefits of not having to explicitly move a store between threads. With this new ability this commit deletes the recently added `Store::notify_switched_thread` method since it's no longer necessary. * Fix a test compiling	2021-04-28 12:08:27 -05:00
Benjamin Bouvier	ba73b458b8	Introduce a new API that allows notifying that a Store has moved to a new thread (#2822 ) * Introduce a new API that allows notifying that a Store has moved to a new thread * Add backlink to documentation, and mention the new API in the multithreading doc;	2021-04-16 11:15:35 -05:00
Benjamin Bouvier	7588565078	Tweaks some tests for Mac aarch64 - some tests don't pass because of bad interactions with the system's libunwind; ignore them for now. - the page size on mac aarch64 is 16K, not 4K; tweak some tests which were expecting 4K or multiples of 4K pages to use a multiple of host page size instead. - a cranelift-native test needed an update for the new calling convention.	2021-04-07 14:54:50 +02:00
Chris Fallin	cb48ea406e	Switch default to new x86_64 backend. This PR switches the default backend on x86, for both the `cranelift-codegen` crate and for Wasmtime, to the new (`MachInst`-style, `VCode`-based) backend that has been under development and testing for some time now. The old backend is still available by default in builds with the `old-x86-backend` feature, or by requesting `BackendVariant::Legacy` from the appropriate APIs. As part of that switch, it adds some more runtime-configurable plumbing to the testing infrastructure so that tests can be run using the appropriate backend. `clif-util test` is now capable of parsing a backend selector option from filetests and instantiating the correct backend. CI has been updated so that the old x86 backend continues to run its tests, just as we used to run the new x64 backend separately. At some point, we will remove the old x86 backend entirely, once we are satisfied that the new backend has not caused any unforeseen issues and we do not need to revert.	2021-04-02 11:35:53 -07:00
Pat Hickey	ae4c5a9d7f	fixes in tests and examples	2021-03-26 15:37:57 -07:00
Alex Crichton	2697a18d2f	Redo the statically typed `Func` API (#2719 ) * Redo the statically typed `Func` API This commit reimplements the `Func` API with respect to statically typed dispatch. Previously `Func` had a `getN` and `getN_async` family of methods which were implemented for 0 to 16 parameters. The return value of these functions was an `impl Fn(..)` closure with the appropriate parameters and return values. There are a number of downsides with this approach that have become apparent over time: * The addition of `_async` doubled the API surface area (which is quite large here due to one-method-per-number-of-parameters). The [documentation of `Func`][old-docs] are quite verbose and feel "polluted" with all these getters, making it harder to understand the other methods that can be used to interact with a `Func`. * These methods unconditionally pay the cost of returning an owned `impl Fn` with a `'static` lifetime. While cheap, this is still paying the cost for cloning the `Store` effectively and moving data into the closed-over environment. * Storage of the return value into a struct, for example, always requires `Box`-ing the returned closure since it otherwise cannot be named. * Recently I had the desire to implement an "unchecked" path for invoking wasm where you unsafely assert the type signature of a wasm function. Doing this with today's scheme would require doubling (again) the API surface area for both async and synchronous calls, further polluting the documentation. The main benefit of the previous scheme is that by returning a `impl Fn` it was quite easy and ergonomic to actually invoke the function. In practice, though, examples would often have something akin to `.get0::<()>()?()?` which is a lot of things to interpret all at once. Note that `get0` means "0 parameters" yet a type parameter is passed. There's also a double function invocation which looks like a lot of characters all lined up in a row. Overall, I think that the previous design is starting to show too many cracks and deserves a rewrite. This commit is that rewrite. The new design in this commit is to delete the `getN{,_async}` family of functions and instead have a new API: impl Func { fn typed<P, R>(&self) -> Result<&Typed<P, R>>; } impl Typed<P, R> { fn call(&self, params: P) -> Result<R, Trap>; async fn call_async(&self, params: P) -> Result<R, Trap>; } This should entirely replace the current scheme, albeit by slightly losing ergonomics use cases. The idea behind the API is that the existence of `Typed<P, R>` is a "proof" that the underlying function takes `P` and returns `R`. The `Func::typed` method peforms a runtime type-check to ensure that types all match up, and if successful you get a `Typed` value. Otherwise an error is returned. Once you have a `Typed` then, like `Func`, you can either `call` or `call_async`. The difference with a `Typed`, however, is that the params/results are statically known and hence these calls can be much more efficient. This is a much smaller API surface area from before and should greatly simplify the `Func` documentation. There's still a problem where `Func::wrapN_async` produces a lot of functions to document, but that's now the sole offender. It's a nice benefit that the statically-typed-async verisons are now expressed with an `async` function rather than a function-returning-a-future which makes it both more efficient and easier to understand. The type `P` and `R` are intended to either be bare types (e.g. `i32`) or tuples of any length (including 0). At this time `R` is only allowed to be `()` or a bare `i32`-style type because multi-value is not supported with a native ABI (yet). The `P`, however, can be any size of tuples of parameters. This is also where some ergonomics are lost because instead of `f(1, 2)` you now have to write `f.call((1, 2))` (note the double-parens). Similarly `f()` becomes `f.call(())`. Overall I feel that this is a better tradeoff than before. While not universally better due to the loss in ergonomics I feel that this design is much more flexible in terms of what you can do with the return value and also understanding the API surface area (just less to take in). [old-docs]: https://docs.rs/wasmtime/0.24.0/wasmtime/struct.Func.html#method.get0 * Rename Typed to TypedFunc * Implement multi-value returns through `Func::typed` * Fix examples in docs * Fix some more errors * More test fixes * Rebasing and adding `get_typed_func` * Updating tests * Fix typo * More doc tweaks * Tweak visibility on `Func::invoke` * Fix tests again	2021-03-11 14:43:34 -06:00
Peter Huene	54c07d8f16	Implement shared host functions. (#2625 ) * Implement defining host functions at the Config level. This commit introduces defining host functions at the `Config` rather than with `Func` tied to a `Store`. The intention here is to enable a host to define all of the functions once with a `Config` and then use a `Linker` (or directly with `Store::get_host_func`) to use the functions when instantiating a module. This should help improve the performance of use cases where a `Store` is short-lived and redefining the functions at every module instantiation is a noticeable performance hit. This commit adds `add_to_config` to the code generation for Wasmtime's `Wasi` type. The new method adds the WASI functions to the given config as host functions. This commit adds context functions to `Store`: `get` to get a context of a particular type and `set` to set the context on the store. For safety, `set` cannot replace an existing context value of the same type. `Wasi::set_context` was added to set the WASI context for a `Store` when using `Wasi::add_to_config`. * Add `Config::define_host_func_async`. * Make config "async" rather than store. This commit moves the concept of "async-ness" to `Config` rather than `Store`. Note: this is a breaking API change for anyone that's already adopted the new async support in Wasmtime. Now `Config::new_async` is used to create an "async" config and any `Store` associated with that config is inherently "async". This is needed for async shared host functions to have some sanity check during their execution (async host functions, like "async" `Func`, need to be called with the "async" variants). * Update async function tests to smoke async shared host functions. This commit updates the async function tests to also smoke the shared host functions, plus `Func::wrap0_async`. This also changes the "wrap async" method names on `Config` to `wrap$N_host_func_async` to slightly better match what is on `Func`. * Move the instance allocator into `Engine`. This commit moves the instantiated instance allocator from `Config` into `Engine`. This makes certain settings in `Config` no longer order-dependent, which is how `Config` should ideally be. This also removes the confusing concept of the "default" instance allocator, instead opting to construct the on-demand instance allocator when needed. This does alter the semantics of the instance allocator as now each `Engine` gets its own instance allocator rather than sharing a single one between all engines created from a configuration. * Make `Engine::new` return `Result`. This is a breaking API change for anyone using `Engine::new`. As creating the pooling instance allocator may fail (likely cause is not enough memory for the provided limits), instead of panicking when creating an `Engine`, `Engine::new` now returns a `Result`. * Remove `Config::new_async`. This commit removes `Config::new_async` in favor of treating "async support" as any other setting on `Config`. The setting is `Config::async_support`. * Remove order dependency when defining async host functions in `Config`. This commit removes the order dependency where async support must be enabled on the `Config` prior to defining async host functions. The check is now delayed to when an `Engine` is created from the config. * Update WASI example to use shared `Wasi::add_to_config`. This commit updates the WASI example to use `Wasi::add_to_config`. As only a single store and instance are used in the example, it has no semantic difference from the previous example, but the intention is to steer users towards defining WASI on the config and only using `Wasi::add_to_linker` when more explicit scoping of the WASI context is required.	2021-03-11 10:14:03 -06:00
Pat Hickey	8e015ca9e7	traps needed stdio	2021-02-01 15:50:05 -08:00
Pat Hickey	8313a62e46	fix test	2021-01-29 19:54:27 -08:00
Alex Crichton	51c1d4bbd6	Provide filename/line number information in `Trap` (#2452 ) * Provide filename/line number information in `Trap` This commit extends the `Trap` type and `Store` to retain DWARF debug information found in a wasm file unconditionally, if it's present. This then enables us to print filenames and line numbers which point back to actual source code when a trap backtrace is printed. Additionally the `FrameInfo` type has been souped up to return filename/line number information as well. The implementation here is pretty simplistic currently. The meat of all the work happens in `gimli` and `addr2line`, and otherwise wasmtime is just schlepping around bytes of dwarf debuginfo here and there! The general goal here is to assist with debugging when using wasmtime because filenames and line numbers are generally orders of magnitude better even when you already have a stack trace. Another nicety here is that backtraces will display inlined frames (learned through debug information), improving the experience in release mode as well. An example of this is that with this file: ```rust fn main() { panic!("hello"); } ``` we get this stack trace: ``` $ rustc foo.rs --target wasm32-wasi -g $ cargo run foo.wasm Finished dev [unoptimized + debuginfo] target(s) in 0.16s Running `target/debug/wasmtime foo.wasm` thread 'main' panicked at 'hello', foo.rs:2:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Error: failed to run main module `foo.wasm` Caused by: 0: failed to invoke command default 1: wasm trap: unreachable wasm backtrace: 0: 0x6c1c - panic_abort::__rust_start_panic::abort::h2d60298621b1ccbf at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:77:17 - __rust_start_panic at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:32:5 1: 0x68c7 - rust_panic at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:626:9 2: 0x65a1 - std::panicking::rust_panic_with_hook::h2345fb0909b53e12 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:596:5 3: 0x1436 - std::panicking::begin_panic::{{closure}}::h106f151a6db8c8fb at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:506:9 4: 0xda8 - std::sys_common::backtrace::__rust_end_short_backtrace::he55aa13f22782798 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:153:18 5: 0x1324 - std::panicking::begin_panic::h1727e7d1d719c76f at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:505:12 6: 0xfde - foo::main::h2db1313a64510850 at /Users/acrichton/code/wasmtime/foo.rs:2:5 7: 0x11d5 - core::ops::function::FnOnce::call_once::h20ee1cc04aeff1fc at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227:5 8: 0xddf - std::sys_common::backtrace::__rust_begin_short_backtrace::h054493e41e27e69c at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:137:18 9: 0x1d5a - std::rt::lang_start::{{closure}}::hd83784448d3fcb42 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:66:18 10: 0x69d8 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h564d3dad35014917 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259:13 - std::panicking::try::do_call::hdca4832ace5a8603 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:381:40 - std::panicking::try::ha8624a1a6854b456 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:345:19 - std::panic::catch_unwind::h71421f57cf2bc688 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panic.rs:382:14 - std::rt::lang_start_internal::h260050c92cd470af at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:51:25 11: 0x1d0c - std::rt::lang_start::h0b4bcf3c5e498224 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:65:5 12: 0xffc - <unknown>!__original_main 13: 0x393 - __muloti4 at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.35/src/macros.rs:269 ``` This is relatively noisy by default but there's filenames and line numbers! Additionally frame 10 can be seen to have lots of frames inlined into it. All information is always available to the embedder but we could try to handle the `__rust_begin_short_backtrace` and `__rust_end_short_backtrace` markers to trim the backtrace by default as well. The only gotcha here is that it looks like `__muloti4` is out of place. That's because the libc that Rust ships with doesn't have dwarf information, although I'm not sure why we land in that function for symbolizing it... * Add a configuration switch for debuginfo * Control debuginfo by default with `WASM_BACKTRACE_DETAILS` * Try cpp_demangle on demangling as well * Rename to WASMTIME_BACKTRACE_DETAILS	2020-12-01 16:56:23 -06:00
Yury Delendik	b2b7bc10e2	machinst aarch64: New backend unwind (#2313 ) * Unwind information for aarch64 backend.	2020-11-06 08:02:45 -06:00
Alex Crichton	a277cf5ee4	Store `WasmFuncType` in `FuncType` (#2365 ) This commit updates `wasmtime::FuncType` to exactly store an internal `WasmFuncType` from the cranelift crates. This allows us to remove a translation layer when we are given a `FuncType` and want to get an internal cranelift type out as a result. The other major change from this commit was changing the constructor and accessors of `FuncType` to be iterator-based instead of exposing implementation details.	2020-11-05 08:49:03 -06:00
Leonardo Yvens	bde9555793	Add Trap::trap_code (#2309 ) * add Trap::trap_code * Add non-exhaustive wasmtime::TrapCode * wasmtime: Better document TrapCode * move and refactor test	2020-10-27 16:30:45 -05:00
Yury Delendik	de4af90af6	machinst x64: New backend unwind (#2266 ) Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.	2020-10-23 15:19:41 -05:00
Benjamin Bouvier	79abcdb035	machinst x64: add testing to the CI;	2020-07-30 10:32:00 +02:00
Yury Delendik	15c68f2cc1	Disconnects `Store` state fields from `Compiler` (#1761 ) * Moves CodeMemory, VMInterrupts and SignatureRegistry from Compiler * CompiledModule holds CodeMemory and GdbJitImageRegistration * Store keeps track of its JIT code * Makes "jit_int.rs" stuff Send+Sync * Adds the threads example.	2020-06-02 13:44:39 -05:00
Nick Fitzgerald	a8ee0554a9	wasmtime: Initial, partial support for `externref` This is enough to get an `externref -> externref` identity function passing. However, `externref`s that are dropped by compiled Wasm code are (safely) leaked. Follow up work will leverage cranelift's stack maps to resolve this issue.	2020-06-01 15:09:51 -07:00
Leonardo Yvens	0b3b9c298e	impl From<anyhow::Error> for Trap (#1753 ) * From<anyhow::Error> for Trap * Add TrapReason::Error * wasmtime: Improve Error to Trap conversion * Remove Trap::message	2020-05-29 15:24:12 -05:00
Alex Crichton	db92dcd990	Update test ignore annotations for aarch64 (#1643 ) Looks like everything is in general passing now so it's probably time to close #1521 and all other remaining tests that are failing are classified under new more focused issues. Closes #1521	2020-05-01 11:24:53 -05:00
Alex Crichton	d1aa86f91a	Add AArch64 tests to CI (#1526 ) * Add AArch64 tests to CI This commit enhances our CI with an AArch64 builder. Currently we have no physical hardware to run on so for now we run all tests in an emulator. The AArch64 build is cross-compiled from x86_64 from Linux. Tests all happen in release mode with a recent version of QEMU (recent version because it's so much faster, and in release mode because debug mode tests take quite a long time in an emulator). The goal here was not to get all tests passing on CI, but rather to get AArch64 running on CI and get it green at the same time. To achieve that goal many tests are now ignored on aarch64 platforms. Many tests fail due to unimplemented functionality in the aarch64 backend (#1521), and all wasmtime tests involving compilation are also disabled due to panicking attempting to generate generate instruction offset information for trap symbolication (#1523). Despite this, though, all Cranelift tests and other wasmtime tests should be runnin on AArch64 through QEMU with this PR. Additionally we'll have an AArch64 binary release of Wasmtime for Linux, although it won't be too useful just yet since it will panic on almost all wasm modules. * Review comments	2020-04-22 12:56:54 -05:00
Dan Gohman	9364eb1d98	Refactor (#1524 ) * Compute instance exports on demand. Instead having instances eagerly compute a Vec of Externs, and bumping the refcount for each Extern, compute Externs on demand. This also enables `Instance::get_export` to avoid doing a linear search. This also means that the closure returned by `get0` and friends now holds an `InstanceHandle` to dynamically hold the instance live rather than being scoped to a lifetime. * Compute module imports and exports on demand too. And compute Extern::ty on demand too. * Add a utility function for computing an ExternType. * Add a utility function for looking up a function's signature. * Add a utility function for computing the ValType of a Global. * Rename wasmtime_environ::Export to EntityIndex. This helps differentiate it from other Export types in the tree, and describes what it is. * Fix a typo in a comment. * Simplify module imports and exports. * Make `Instance::exports` return the export names. This significantly simplifies the public API, as it's relatively common to need the names, and this avoids the need to do a zip with `Module::exports`. This also changes `ImportType` and `ExportType` to have public members instead of private members and accessors, as I find that simplifies the usage particularly in cases where there are temporary instances. * Remove `Instance::module`. This doesn't quite remove `Instance`'s `module` member, it gets a step closer. * Use a InstanceHandle utility function. * Don't consume self in the `Func::get` methods. Instead, just create a closure containing the instance handle and the export for them to call. Use `ExactSizeIterator` to avoid needing separate `num_` methods. Rename `Extern::func()` etc. to `into_func()` etc. * Revise examples to avoid using `nth`. * Add convenience methods to instance for getting specific extern types. * Use the convenience functions in more tests and examples. * Avoid cloning strings for `ImportType` and `ExportType`. * Remove more obviated clone() calls. * Simplify `Func`'s closure state. * Make wasmtime::Export's fields private. This makes them more consistent with ExportType. * Fix compilation error. * Make a lifetime parameter explicit, and use better lifetime names. Instead of 'me, use 'instance and 'module to make it clear what the lifetime is. * More lifetime cleanups.	2020-04-20 15:55:33 -05:00
Alex Crichton	4c82da440a	Move most wasmtime tests into one test suite (#1544 ) * Move most wasmtime tests into one test suite This commit moves most wasmtime tests into a single test suite which gets compiled into one executable instead of having lots of test executables. The goal here is to reduce disk space on CI, and this should be achieved by having fewer executables which means fewer copies of `libwasmtime.rlib` linked across binaries on the system. More importantly though this means that DWARF debug information should only be in one executable rather than duplicated across many. * Share more build caches Globally set `RUSTFLAGS` to `-Dwarnings` instead of individually so all build steps share the same value. * Allow some dead code in cranelift-codegen Prevents having to fix all warnings for all possible feature combinations, only the main ones which come up. * Update some debug file paths	2020-04-17 17:22:12 -05:00

34 Commits