c4fd6a95da582a4a21802d981062c9202746ef19
62 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
82a31680d6 |
Use a StoreOpaque during backtraces for metadata (#4325)
Previous to this commit Wasmtime would use the `GlobalModuleRegistry` when learning information about a trap such as its trap code, the symbols for each frame, etc. This has a downside though of holding a global read-write lock for the duration of this operation which hinders registration of new modules in parallel. In addition there was a fair amount of internal duplication between this "global module registry" and the store-local module registry. Finally relying on global state for information like this gets a bit more brittle over time as it seems best to scope global queries to precisely what's necessary rather than holding extra information. With the refactoring in wasm backtraces done in #4183 it's now possible to always have a `StoreOpaque` reference when a backtrace is collected for symbolication and otherwise Trap-identification purposes. This commit adds a `StoreOpaque` parameter to the `Trap::from_runtime` constructor and then plumbs that everywhere. Note that while doing this I changed the internal `traphandlers::lazy_per_thread_init` function to no longer return a `Result` and instead just `panic!` on Unix if memory couldn't be allocated for a stack. This removed quite a lot of error-handling code for a case that's expected to quite rarely happen. If necessary in the future we can add a fallible initialization point but this feels like a better default balance for the code here. With a `StoreOpaque` in use when a trap is being symbolicated that means we have a `ModuleRegistry` which can be used for queries and such. This meant that the `GlobalModuleRegistry` state could largely be dismantled and moved to per-`Store` state (within the `ModuleRegistry`, mostly just moving methods around). The final state is that the global rwlock is not exclusively scoped around insertions/deletions/`is_wasm_trap_pc` which is just a lookup and atomic add. Otherwise symbolication for a backtrace exclusively uses store-local state now (as intended). The original motivation for this commit was that frame information lookup and pieces were looking to get somewhat complicated with the addition of components which are a new vector of traps coming out of Cranelift-generated code. My hope is that by having a `Store` around for more operations it's easier to plumb all this through. |
||
|
|
77e06213b7 |
Refactor the internals of traps in wasmtime_runtime (#4326)
This commit is a small refactoring of `wasmtime_runtime::Trap` and various internals. The `Trap` structure is now a reason plus backtrace, and the old `Trap` enum is mostly in `TrapReason` now. Additionally all `Trap`-returning methods of `wasmtime_runtime` are changed to returning a `TrapCode` to indicate that they never capture a backtrace. Finally the `UnwindReason` internally now no longer duplicates the trap reasons, instead only having two variants of "panic" and "trap". The motivation for this commit is mostly just cleaning up trap internals and removing the need for methods like `wasmtime_runtime::Trap::insert_backtrace` to leave it only happening at the `wasmtime` layer. |
||
|
|
7d3639522e |
Capture unresolved backtraces on traps (#4193)
I was running tests recently and was surprised that the `--test all` test was taking more than a minute to run when I didn't recall it ever taking more than a minute historically. A bisection pointed out #4183 as the cause and after re-reviewing I realized I forgot that we capture unresolved backtraces by default (and don't actually resolve them anywhere yet but that's a problem for another day) rather than resolved backtraces. This means that it's intended that we use `Backtrace::new_unresolved` instead of `Backtrace::new` in the traphandlers crate. The reason that tests were running so slowly is that the tests which deal with deep stacks (e.g. stack overflow) would take forever in testing as the Rust-based decoding of DWARF information is egregiously slow in unoptimized mode. I did discover independently that optimizing these dependencies makes the tests ~6x faster, but that's irrelevant if we're not symbolicating in the first place. |
||
|
|
bffce37050 |
make backtrace collection a Config field rather than a cargo feature (#4183)
* sorta working in runtime * wasmtime-runtime: get rid of wasm-backtrace feature * wasmtime: factor to make backtraces recording optional. not configurable yet * get rid of wasm-backtrace features * trap tests: now a Trap optionally contains backtrace * eliminate wasm-backtrace feature * code review fixes * ci: no more wasm-backtrace feature * c_api: backtraces always enabled * config: unwind required by backtraces and ref types * plumbed * test that disabling backtraces works * code review comments * fuzzing generator: wasm_backtrace is a runtime config now * doc fix |
||
|
|
3f9bff17c8 |
Support disabling backtraces at compile time (#3932)
* Support disabling backtraces at compile time This commit adds support to Wasmtime to disable, at compile time, the gathering of backtraces on traps. The `wasmtime` crate now sports a `wasm-backtrace` feature which, when disabled, will mean that backtraces are never collected at compile time nor are unwinding tables inserted into compiled objects. The motivation for this commit stems from the fact that generating a backtrace is quite a slow operation. Currently backtrace generation is done with libunwind and `_Unwind_Backtrace` typically found in glibc or other system libraries. When thousands of modules are loaded into the same process though this means that the initial backtrace can take nearly half a second and all subsequent backtraces can take upwards of hundreds of milliseconds. Relative to all other operations in Wasmtime this is extremely expensive at this time. In the future we'd like to implement a more performant backtrace scheme but such an implementation would require coordination with Cranelift and is a big chunk of work that may take some time, so in the meantime if embedders don't need a backtrace they can still use this option to disable backtraces at compile time and avoid the performance pitfalls of collecting backtraces. In general I tried to originally make this a runtime configuration option but ended up opting for a compile-time option because `Trap::new` otherwise has no arguments and always captures a backtrace. By making this a compile-time option it was possible to configure, statically, the behavior of `Trap::new`. Additionally I also tried to minimize the amount of `#[cfg]` necessary by largely only having it at the producer and consumer sites. Also a noteworthy restriction of this implementation is that if backtrace support is disabled at compile time then reference types support will be unconditionally disabled at runtime. With backtrace support disabled there's no way to trace the stack of wasm frames which means that GC can't happen given our current implementation. * Always enable backtraces for the C API |
||
|
|
c22033bf93 |
Delete historical interruptable support in Wasmtime (#3925)
* Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`. |
||
|
|
62a6a7ab6c |
Use const-initialized thread locals (#3923)
This was a relatively recent feature added to the Rust standard library which should help accelerate calls into WebAssembly slightly. |
||
|
|
2f2c5231b4 | Add Alex's solution for null handling in TlsRestore | ||
|
|
a5007f318f | runtime: use anyhow::Error instead of Box<dyn std::error::Error...> | ||
|
|
c8f55ed688 |
Optimize codegen slightly calling wasm functions
Currently wasm-calls work with `Result<T, Trap>` internally but `Trap` is an enum defined in `wasmtime-runtime` which is actually quite large. Since traps are supposed to be rare this commit changes these functions to return a `Box<Trap>` which is un-boxed later up in the `wasmtime` crate within a `#[cold]` function. |
||
|
|
87c33c2969 |
Remove wasmtime-environ's dependency on cranelift-codegen (#3199)
* Move `CompiledFunction` into wasmtime-cranelift This commit moves the `wasmtime_environ::CompiledFunction` type into the `wasmtime-cranelift` crate. This type has lots of Cranelift-specific pieces of compilation and doesn't need to be generated by all Wasmtime compilers. This replaces the usage in the `Compiler` trait with a `Box<Any>` type that each compiler can select. Each compiler must still produce a `FunctionInfo`, however, which is shared information we'll deserialize for each module. The `wasmtime-debug` crate is also folded into the `wasmtime-cranelift` crate as a result of this commit. One possibility was to move the `CompiledFunction` commit into its own crate and have `wasmtime-debug` depend on that, but since `wasmtime-debug` is Cranelift-specific at this time it didn't seem like it was too too necessary to keep it separate. If `wasmtime-debug` supports other backends in the future we can recreate a new crate, perhaps with it refactored to not depend on Cranelift. * Move wasmtime_environ::reference_type This now belongs in wasmtime-cranelift and nowhere else * Remove `Type` reexport in wasmtime-environ One less dependency on `cranelift-codegen`! * Remove `types` reexport from `wasmtime-environ` Less cranelift! * Remove `SourceLoc` from wasmtime-environ Change the `srcloc`, `start_srcloc`, and `end_srcloc` fields to a custom `FilePos` type instead of `ir::SourceLoc`. These are only used in a few places so there's not much to lose from an extra abstraction for these leaf use cases outside of cranelift. * Remove wasmtime-environ's dep on cranelift's `StackMap` This commit "clones" the `StackMap` data structure in to `wasmtime-environ` to have an independent representation that that chosen by Cranelift. This allows Wasmtime to decouple this runtime dependency of stack map information and let the two evolve independently, if necessary. An alternative would be to refactor cranelift's implementation into a separate crate and have wasmtime depend on that but it seemed a bit like overkill to do so and easier to clone just a few lines for this. * Define code offsets in wasmtime-environ with `u32` Don't use Cranelift's `binemit::CodeOffset` alias to define this field type since the `wasmtime-environ` crate will be losing the `cranelift-codegen` dependency soon. * Commit to using `cranelift-entity` in Wasmtime This commit removes the reexport of `cranelift-entity` from the `wasmtime-environ` crate and instead directly depends on the `cranelift-entity` crate in all referencing crates. The original reason for the reexport was to make cranelift version bumps easier since it's less versions to change, but nowadays we have a script to do that. Otherwise this encourages crates to use whatever they want from `cranelift-entity` since we'll always depend on the whole crate. It's expected that the `cranelift-entity` crate will continue to be a lean crate in dependencies and suitable for use at both runtime and compile time. Consequently there's no need to avoid its usage in Wasmtime at runtime, since "remove Cranelift at compile time" is primarily about the `cranelift-codegen` crate. * Remove most uses of `cranelift-codegen` in `wasmtime-environ` There's only one final use remaining, which is the reexport of `TrapCode`, which will get handled later. * Limit the glob-reexport of `cranelift_wasm` This commit removes the glob reexport of `cranelift-wasm` from the `wasmtime-environ` crate. This is intended to explicitly define what we're reexporting and is a transitionary step to curtail the amount of dependencies taken on `cranelift-wasm` throughout the codebase. For example some functions used by debuginfo mapping are better imported directly from the crate since they're Cranelift-specific. Note that this is intended to be a temporary state affairs, soon this reexport will be gone entirely. Additionally this commit reduces imports from `cranelift_wasm` and also primarily imports from `crate::wasm` within `wasmtime-environ` to get a better sense of what's imported from where and what will need to be shared. * Extract types from cranelift-wasm to cranelift-wasm-types This commit creates a new crate called `cranelift-wasm-types` and extracts type definitions from the `cranelift-wasm` crate into this new crate. The purpose of this crate is to be a shared definition of wasm types that can be shared both by compilers (like Cranelift) as well as wasm runtimes (e.g. Wasmtime). This new `cranelift-wasm-types` crate doesn't depend on `cranelift-codegen` and is the final step in severing the unconditional dependency from Wasmtime to `cranelift-codegen`. The final refactoring in this commit is to then reexport this crate from `wasmtime-environ`, delete the `cranelift-codegen` dependency, and then update all `use` paths to point to these new types. The main change of substance here is that the `TrapCode` enum is mirrored from Cranelift into this `cranelift-wasm-types` crate. While this unfortunately results in three definitions (one more which is non-exhaustive in Wasmtime itself) it's hopefully not too onerous and ideally something we can patch up in the future. * Get lightbeam compiling * Remove unnecessary dependency * Fix compile with uffd * Update publish script * Fix more uffd tests * Rename cranelift-wasm-types to wasmtime-types This reflects the purpose a bit more where it's types specifically intended for Wasmtime and its support. * Fix publish script |
||
|
|
f08491eeca |
Restore POSIX signal handling on MacOS behind a feature flag (#3063)
* Restore POSIX signal handling on MacOS behind a feature flag As described in Issue #3052, the switch to Mach Exception handling removed `unix::StoreExt` from the public API of crate on MacOS. That is a breaking change and makes it difficult for some application to upgrade to the current stable Wasmtime. As a workaround this PR introduces a feature flag called `posix-signals-on-macos` that restores the old behaviour on MacOS. The flag is disabled by default. * Fix test guard * Fix formatting in the test |
||
|
|
724beaf0f3 |
Connect helper C symbols to their static library
This commit adds a `#[link]` annotation to the block defining symbols coming from a native static library that we build and link. This is required by rustc to get symbols to get exported correctly when linking wasmtime into a Rust dynamic library instead of always as an rlib. While I was at it I went ahead and renamed the symbols now that they're no longer in C++ and they're doing setjmp/longjmp and not much else. Closes #3006 |
||
|
|
1136917192 | golf | ||
|
|
ff87f45604 | expose eager thread-local initialization by the Engine | ||
|
|
7a1b7cdf92 |
Implement RFC 11: Redesigning Wasmtime's APIs (#2897)
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread. |
||
|
|
7ec073cef1 |
Bring back per-thread lazy initialization (#2863)
* Bring back per-thread lazy initialization Platforms Wasmtime supports may have per-thread initialization that needs to run before WebAssembly. For example Unix needs to setup a sigaltstack and macOS needs to set up mach ports. In #2757 this per-thread setup was moved out of the invocation of a wasm function, relying on the lack of Send for Store to initialize the thread at Store creation time and never worry about it later. This conflicted with [wasmtime's desired multithreading story](https://github.com/bytecodealliance/wasmtime/pull/2812) so a new [`Store::notify_switched_thread` was added](https://github.com/bytecodealliance/wasmtime/pull/2822) to explicitly indicate a Store has moved to another thread (if it unsafely did so). It turns out though that it's not always easy to determine when a `Store` moves to a new thread. For example the Go bindings for Wasmtime are generally unaware when a goroutine switches OS threads. This led to https://github.com/bytecodealliance/wasmtime-go/issues/74 where a SIGILL was left uncaught, making it appear that traps aren't working properly. This commit revisits the decision in #2757 and moves per-thread initialization back into the path of calling into WebAssembly. This is differently from before, though, where there's still only one TLS access on the path of calling into WebAssembly, unlike before where it was a separate access. This allows us to get the speed benefits of #2757 as well as the flexibility benefits of not having to explicitly move a store between threads. With this new ability this commit deletes the recently added `Store::notify_switched_thread` method since it's no longer necessary. * Fix a test compiling |
||
|
|
d4b54ee0a8 |
More optimizations for calling into WebAssembly (#2759)
* Combine stack-based cleanups for faster wasm calls This commit is an extension of #2757 where the goal is to optimize entry into WebAssembly. Currently wasmtime has two stack-based cleanups when entering wasm, one for the externref activation table and another for stack limits getting reset. This commit fuses these two cleanups together into one and moves some code around which enables less captures for fewer closures and such to speed up calls in to wasm a bit more. Overall this drops the execution time from 88ns to 80ns locally for me. This also updates the atomic orderings when updating the stack limit from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting point the usage here should be safe to use `Relaxed` since we're not using the atomics to actually protect any memory, it's simply receiving signals from other threads. * Determine whether a pc is wasm via a global map The macOS implementation of traps recently changed to using mach ports for handlers instead of signal handlers. This means that a previously relied upon invariant, each thread fixes its own trap, was broken. The macOS implementation worked around this by maintaining a global map from thread id to thread local information, however, to solve the problem. This global map is quite slow though. It involves taking a lock and updating a hash map on all calls into WebAssembly. In my local testing this accounts for >70% of the overhead of calling into WebAssembly on macOS. Naturally it'd be great to remove this! This commit fixes this issue and removes the global lock/map that is updated on all calls into WebAssembly. The fix is to maintain a global map of wasm modules and their trap addresses in the `wasmtime` crate. Doing so is relatively simple since we're already tracking this information at the `Store` level. Once we've got a global map then the macOS implementation can use this from a foreign thread and everything works out. Locally this brings the overhead, on macOS specifically, of calling into wasm from 80ns to ~20ns. * Fix compiles * Review comments |
||
|
|
c95971ab59 |
Optimize calling a WebAssembly function (#2757)
This commit implements a few optimizations, mainly inlining, that should improve the performance of calling a WebAssembly function. This code path can be quite hot depending on the embedding case and we hadn't really put much effort into optimizing the nitty gritty. The predominant optimization here is adding `#[inline]` to trivial functions so performance is improved without having to compile with LTO. Another optimization is to call `lazy_per_thread_init` when traps are initialized per-thread (when a `Store` is created) rather than each time a function is called. The next optimization is to change the unwind reason in the `CallThreadState` to `MaybeUninit` to avoid extra checks in the default case about whether we need to drop its variants (since in the happy path we never need to drop it). The final optimization is to optimize out a few checks when `async` support is disabled for a small speed boost. In a small benchmark where wasmtime calls a simple wasm function my macOS computer dropped from 110ns to 86ns overhead, a 20% decrease. The macOS overhead is still largely dominated by the global lock acquisition and hash table management for traps right now, but I suspect the Linux overhead is much better (should be on the order of ~30 or so ns). We still have a long way to go to compete with SpiderMonkey which, in testing, seem to have ~6ns overhead in calling the same wasm function on my computer. |
||
|
|
5fecdfa491 |
Mach ports continued + support aarch64-apple unwinding (#2723)
* Switch macOS to using mach ports for trap handling This commit moves macOS to using mach ports instead of signals for handling traps. The motivation for this is listed in #2456, namely that once mach ports are used in a process that means traditional UNIX signal handlers won't get used. This means that if Wasmtime is integrated with Breakpad, for example, then Wasmtime's trap handler never fires and traps don't work. The `traphandlers` module is refactored as part of this commit to split the platform-specific bits into their own files (it was growing quite a lot for one inline `cfg_if!`). The `unix.rs` and `windows.rs` files remain the same as they were before with a few minor tweaks for some refactored interfaces. The `macos.rs` file is brand new and lifts almost its entire implementation from SpiderMonkey, adapted for Wasmtime though. The main gotcha with mach ports is that a separate thread is what services the exception. Some unsafe magic allows this separate thread to read non-`Send` and temporary state from other threads, but is hoped to be safe in this context. The unfortunate downside is that calling wasm on macOS now involves taking a global lock and modifying a global hash map twice-per-call. I'm not entirely sure how to get out of this cost for now, but hopefully for any embeddings on macOS it's not the end of the world. Closes #2456 * Add a sketch of arm64 apple support * store: maintain CallThreadState mapping when switching fibers * cranelift/aarch64: generate unwind directives to disable pointer auth Aarch64 post ARMv8.3 has a feature called pointer authentication, designed to fight ROP/JOP attacks: some pointers may be signed using new instructions, adding payloads to the high (previously unused) bits of the pointers. More on this here: https://lwn.net/Articles/718888/ Unwinders on aarch64 need to know if some pointers contained on the call frame contain an authentication code or not, to be able to properly authenticate them or use them directly. Since native code may have enabled it by default (as is the case on the Mac M1), and the default is that this configuration value is inherited, we need to explicitly disable it, for the only kind of supported pointers (return addresses). To do so, we set the value of a non-existing dwarf pseudo register (34) to 0, as documented in https://github.com/ARM-software/abi-aa/blob/master/aadwarf64/aadwarf64.rst#note-8. This is done at the function granularity, in the spirit of Cranelift compilation model. Alternatively, a single directive could be generated in the CIE, generating less information per module. * Make exception handling work on Mac aarch64 too * fibers: use a breakpoint instruction after the final call in wasmtime_fiber_start Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
||
|
|
918c012d00 |
Fix some issues around TLS management with async (#2709)
This commit fixes a few issues around managing the thread-local state of a wasmtime thread. We intentionally only have a singular TLS variable in the whole world, and the problem is that when stack-switching off an async thread we were not restoring the previous TLS state. This is necessary in two cases: * Futures aren't guaranteed to be polled/completed in a stack-like fashion. If a poll sees that a future isn't ready then we may resume execution in a previous wasm context that ends up needing the TLS information. * Futures can also cross threads (when the whole store crosses threads) and we need to save/restore TLS state from the thread we're coming from and the thread that we're going to. The stack switching issue necessitates some more glue around suspension and resumption of a stack to ensure we save/restore the TLS state on both sides. The thread issue, however, also necessitates that we use `#[inline(never)]` on TLS access functions and never have TLS borrows live across a function which could result in running arbitrary code (as was the case for the `tls::set` function. |
||
|
|
54c07d8f16 |
Implement shared host functions. (#2625)
* Implement defining host functions at the Config level. This commit introduces defining host functions at the `Config` rather than with `Func` tied to a `Store`. The intention here is to enable a host to define all of the functions once with a `Config` and then use a `Linker` (or directly with `Store::get_host_func`) to use the functions when instantiating a module. This should help improve the performance of use cases where a `Store` is short-lived and redefining the functions at every module instantiation is a noticeable performance hit. This commit adds `add_to_config` to the code generation for Wasmtime's `Wasi` type. The new method adds the WASI functions to the given config as host functions. This commit adds context functions to `Store`: `get` to get a context of a particular type and `set` to set the context on the store. For safety, `set` cannot replace an existing context value of the same type. `Wasi::set_context` was added to set the WASI context for a `Store` when using `Wasi::add_to_config`. * Add `Config::define_host_func_async`. * Make config "async" rather than store. This commit moves the concept of "async-ness" to `Config` rather than `Store`. Note: this is a breaking API change for anyone that's already adopted the new async support in Wasmtime. Now `Config::new_async` is used to create an "async" config and any `Store` associated with that config is inherently "async". This is needed for async shared host functions to have some sanity check during their execution (async host functions, like "async" `Func`, need to be called with the "async" variants). * Update async function tests to smoke async shared host functions. This commit updates the async function tests to also smoke the shared host functions, plus `Func::wrap0_async`. This also changes the "wrap async" method names on `Config` to `wrap$N_host_func_async` to slightly better match what is on `Func`. * Move the instance allocator into `Engine`. This commit moves the instantiated instance allocator from `Config` into `Engine`. This makes certain settings in `Config` no longer order-dependent, which is how `Config` should ideally be. This also removes the confusing concept of the "default" instance allocator, instead opting to construct the on-demand instance allocator when needed. This does alter the semantics of the instance allocator as now each `Engine` gets its own instance allocator rather than sharing a single one between all engines created from a configuration. * Make `Engine::new` return `Result`. This is a breaking API change for anyone using `Engine::new`. As creating the pooling instance allocator may fail (likely cause is not enough memory for the provided limits), instead of panicking when creating an `Engine`, `Engine::new` now returns a `Result`. * Remove `Config::new_async`. This commit removes `Config::new_async` in favor of treating "async support" as any other setting on `Config`. The setting is `Config::async_support`. * Remove order dependency when defining async host functions in `Config`. This commit removes the order dependency where async support must be enabled on the `Config` prior to defining async host functions. The check is now delayed to when an `Engine` is created from the config. * Update WASI example to use shared `Wasi::add_to_config`. This commit updates the WASI example to use `Wasi::add_to_config`. As only a single store and instance are used in the example, it has no semantic difference from the previous example, but the intention is to steer users towards defining WASI on the config and only using `Wasi::add_to_linker` when more explicit scoping of the WASI context is required. |
||
|
|
707f83d413 |
Fix preservation of the sigaltstack on macOS (#2676)
* Fix preservation of the sigaltstack on macOS This commit fixes an issue discovered in the wasmtime-go bindings when the Go runtime was crashing on macOS only when running wasm code that trapped. It turns out that our switch to `siglongjmp` from `longjmp` actually broke macOS! This breakage happens because all subsequent signals after the first signal are all delivered on the main stack, not the sigaltstack, even if the sigaltstack is configured. This causes the Go runtime to crash since it expects to run on the sigaltstack. The fix in this commit is to actually return from the signal handler to trigger the kernel's updating of the sigaltstack no longer being in use. Before we return, however, we configure the register context to return to to call some custom code which immediately does the unwind we would otherwise have done. This works around the issue on macOS hopefully without adding too many portability problems. Ideally this will all go away as well with #2632 as well. * Fix compile warning |
||
|
|
0e41861662 |
Implement limiting WebAssembly execution with fuel (#2611)
* Consume fuel during function execution This commit adds codegen infrastructure necessary to instrument wasm code to consume fuel as it executes. Currently nothing is really done with the fuel, but that'll come in later commits. The focus of this commit is to implement the codegen infrastructure necessary to consume fuel and account for fuel consumed correctly. * Periodically check remaining fuel in wasm JIT code This commit enables wasm code to periodically check to see if fuel has run out. When fuel runs out an intrinsic is called which can do what it needs to do in the result of fuel running out. For now a trap is thrown to have at least some semantics in synchronous stores, but another planned use for this feature is for asynchronous stores to periodically yield back to the host based on fuel running out. Checks for remaining fuel happen in the same locations as interrupt checks, which is to say the start of the function as well as loop headers. * Improve codegen by caching `*const VMInterrupts` The location of the shared interrupt value and fuel value is through a double-indirection on the vmctx (load through the vmctx and then load through that pointer). The second pointer in this chain, however, never changes, so we can alter codegen to account for this and remove some extraneous load instructions and hopefully reduce some register pressure even maybe. * Add tests fuel can abort infinite loops * More fuzzing with fuel Use fuel to time out modules in addition to time, using fuzz input to figure out which. * Update docs on trapping instructions * Fix doc links * Fix a fuzz test * Change setting fuel to adding fuel * Fix a doc link * Squelch some rustdoc warnings |
||
|
|
39f677d2dd |
Only handle signals at pcs with trap information
Previously wasmtime would handle any signal originating from wasm JIT code. This would, however, handle bugs in JIT code as-if they were wasm traps. Instead this commit switches signal handling to specifically check for whether the precise program counter is expected to be a trap. This way if a program counter traps and it's not expected to trap the signal isn't handled and the process is aborted (presumably leading to further debugging of whomever happens to work on the JIT at that time). |
||
|
|
f8268b2139 |
Bugfix of issue #2575
Bugfix of issue #2575. Use libc 0.2.82 on aarch64-apple-darwin Apple Silicon, and local test passes. |
||
|
|
243ab3b542 |
Remove the global variable associated with traps
This commit removes the global variable associated with wasm traps which stores frame information. The only purpose of this global is to help symbolicate `Trap`s created since we support creating a `Trap` without a `Store`. The global, however, is only used for wasm frames on the stack, and when wasm frames are on the stack we know that our thread local for "what was the last context" is set and configured. The change here is to hijack this thread-local some more to effectively store the `Store` inside of it. All frame information is then moved directly into `Store` and no longer lives off on the side in a global. Additionally support for registering/unregistering modules is now simplified because once a module is registered with a store it can never be unregistered. This has one slight functional change where if there are two instances of `Store` interleaving calls to wasm code on the stack we'll only be able to symbolicate one of them instead of both. That's arguably also a feature however because this is sort of a way to leak information across stores right now. Otherwise, though, this isn't intended to change any existing logic, but instead keep everything working as-is. |
||
|
|
ea3306e74c |
Use the psm crate to figure out the current stack pointer (#2358)
Currently the runtime needs to acquire the current stack pointer so it can set a limit for where if the wasm stack goes below that point it will abort the wasm code. Acquiring the stack pointer is done in a brittle way right now which involves looking at the address of what we hope is an on-stack structure. This turns out to not work at all with ASan as well. Instead this commit switches to the `psm` crate which is used by the Rust compiler team for stack manipulation, namely a coarse version of segmented stacks to avoid stack overflow in the compiler. We don't need most of the implementation of `psm`, just the `stack_pointer` function, but it shouldn't be a burden to bring in! Closes #2344 |
||
|
|
04e85b044e |
Don't store Arc<VMInterrupts> in instances
Similar to other data structures owned by the `Store` there's no need for `Instance` to have a strong `Arc` reference, instead it's sufficient for `Store` to have the owning reference. |
||
|
|
d91f0c3933 |
get pc for freebsd (#2270)
* get pc for freebsd * whitespace :| * fix; i386 to x86 * remove x86 since uc_mcontext isn't yet in libc * freebsd build of rust uses libcc/unwind |
||
|
|
0387169934 | Allow android compilation (#2002) | ||
|
|
7ab5f2a869 |
Remove custom signal handler restrictions (#1843)
* remove custom signal handler origin restriction * add a test for handling signals from a hostcall * cargo fmt |
||
|
|
26ee986c2f | runtime: handle traps on Windows x32. (#1740) | ||
|
|
962f057c8a |
Remove no-longer-needed C shims (#1686)
The published version of `libc` now has all that's necessary to natively read these fields! |
||
|
|
b3a9882466 | Add X86-32 support to wasm-runtime trap handler. | ||
|
|
c284ffe6c0 |
Move trap handler initialization to per-Store (#1644)
Previously we initialized trap handling (signals/etc) once-per-instance but that's a bit too granular since we only need to do this as one-time per-program initialization. This moves the initialization to `Store` instead which means that we'll call this at least once per thread, which some platforms may need (none currently do, they all only need per-program initialization, but Fuchsia will need per-thread initialization). |
||
|
|
d719ec7e1c |
Don't try to handle non-wasmtime segfaults (#1577)
This commit fixes an issue in Wasmtime where Wasmtime would accidentally "handle" non-wasm segfaults while executing host imports of wasm modules. If a host import segfaulted then Wasmtime would recognize that wasm code is on the stack, so it'd longjmp out of the wasm code. This papers over real bugs though in host code and erroneously classified segfaults as wasm traps. The fix here was to add a check to our wasm signal handler for if the faulting address falls in JIT code itself. Actually threading through all the right information for that check to happen is a bit tricky, though, so this involved some refactoring: * A closure parameter to `catch_traps` was added. This closure is responsible for classifying addresses as whether or not they fall in JIT code. Anything returning `false` means that the trap won't get handled and we'll forward to the next signal handler. * To avoid passing tons of context all over the place, the start function is now no longer automatically invoked by `InstanceHandle`. This avoids the need for passing all sorts of trap-handling contextual information like the maximum stack size and "is this a jit address" closure. Instead creators of `InstanceHandle` (like wasmtime) are now responsible for invoking the start function. * To avoid excessive use of `transmute` with lifetimes since the traphandler state now has a lifetime the per-instance custom signal handler is now replaced with a per-store custom signal handler. I'm not entirely certain the purpose of the custom signal handler, though, so I'd look for feedback on this part. A new test has been added which ensures that if a host function segfaults we don't accidentally try to handle it, and instead we correctly report the segfault. |
||
|
|
c9a0ba81a0 |
Implement interrupting wasm code, reimplement stack overflow (#1490)
* Implement interrupting wasm code, reimplement stack overflow This commit is a relatively large change for wasmtime with two main goals: * Primarily this enables interrupting executing wasm code with a trap, preventing infinite loops in wasm code. Note that resumption of the wasm code is not a goal of this commit. * Additionally this commit reimplements how we handle stack overflow to ensure that host functions always have a reasonable amount of stack to run on. This fixes an issue where we might longjmp out of a host function, skipping destructors. Lots of various odds and ends end up falling out in this commit once the two goals above were implemented. The strategy for implementing this was also lifted from Spidermonkey and existing functionality inside of Cranelift. I've tried to write up thorough documentation of how this all works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are. A brief summary of how this works is that each function and each loop header now checks to see if they're interrupted. Interrupts and the stack overflow check are actually folded into one now, where function headers check to see if they've run out of stack and the sentinel value used to indicate an interrupt, checked in loop headers, tricks functions into thinking they're out of stack. An interrupt is basically just writing a value to a location which is read by JIT code. When interrupts are delivered and what triggers them has been left up to embedders of the `wasmtime` crate. The `wasmtime::Store` type has a method to acquire an `InterruptHandle`, where `InterruptHandle` is a `Send` and `Sync` type which can travel to other threads (or perhaps even a signal handler) to get notified from. It's intended that this provides a good degree of flexibility when interrupting wasm code. Note though that this does have a large caveat where interrupts don't work when you're interrupting host code, so if you've got a host import blocking for a long time an interrupt won't actually be received until the wasm starts running again. Some fallout included from this change is: * Unix signal handlers are no longer registered with `SA_ONSTACK`. Instead they run on the native stack the thread was already using. This is possible since stack overflow isn't handled by hitting the guard page, but rather it's explicitly checked for in wasm now. Native stack overflow will continue to abort the process as usual. * Unix sigaltstack management is now no longer necessary since we don't use it any more. * Windows no longer has any need to reset guard pages since we no longer try to recover from faults on guard pages. * On all targets probestack intrinsics are disabled since we use a different mechanism for catching stack overflow. * The C API has been updated with interrupts handles. An example has also been added which shows off how to interrupt a module. Closes #139 Closes #860 Closes #900 * Update comment about magical interrupt value * Store stack limit as a global value, not a closure * Run rustfmt * Handle review comments * Add a comment about SA_ONSTACK * Use `usize` for type of `INTERRUPTED` * Parse human-readable durations * Bring back sigaltstack handling Allows libstd to print out stack overflow on failure still. * Add parsing and emission of stack limit-via-preamble * Fix new example for new apis * Fix host segfault test in release mode * Fix new doc example |
||
|
|
7da6101732 |
Merge pull request #1494 from cfallin/arm64-merge
Add new `MachInst` backend and ARM64 support. |
||
|
|
48cf2c2f50 |
Address review comments:
- Undo temporary changes to default features (`all-arch`) and a
signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
"undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
rust-lang/compiler-builtins [1], LLVM only supports stack probes on
x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
mutate the input IR.
- Lots of other miscellaneous cleanups.
[1]
|
||
|
|
be85242a3f |
Expose precise offset information in wasmtime::FrameInfo (#1495)
* Consolidate trap/frame information This commit removes `TrapRegistry` in favor of consolidating this information in the `FRAME_INFO` we already have in the `wasmtime` crate. This allows us to keep information generally in one place and have one canonical location for "map this PC to some original wasm stuff". The intent for this is to next update with enough information to go from a program counter to a position in the original wasm file. * Expose module offset information in `FrameInfo` This commit implements functionality for `FrameInfo`, the wasm stack trace of a `Trap`, to return the module/function offset. This allows knowing the precise wasm location of each stack frame, instead of only the main trap itself. The intention here is to provide more visibility into the wasm source when something traps, so you know precisely where calls were and where traps were, in order to assist in debugging. Eventually we might use this information for mapping back to native source languages as well (given sufficient debug information). This change makes a previously-optional artifact of compilation always computed on the cranelift side of things. This `ModuleAddressMap` is then propagated to the same store of information other frame information is stored within. This also removes the need for passing a `SourceLoc` with wasm traps or to wasm trap creation, since the backtrace's wasm frames will be able to infer their own `SourceLoc` from the relevant program counters. |
||
|
|
bab0c79c31 |
ARM64 backend, part 9 / 11: wasmtime support.
This commit adds a few odds and ends required to build wasmtime on ARM64 with the new backend. In particular, it adds: - Support for the `Arm64Call` relocation type. - Support for fetching the trap PC when a signal is received. - A hook for `SIGTRAP`, which is sent by the `brk` opcode (in contrast to x86's `SIGILL`). With the patch sequence up to and including this patch applied, `wasmtime` can now compile and successfully execute code on arm64. Not all tests pass yet, but basic Wasm/WASI tests work correctly. |
||
|
|
9ca3bf532e |
Increase the size of the sigaltstack. (#1315)
* Increase the size of the sigaltstack. Rust's stack overflow handler installs a sigaltstack stack with size SIGSTKSZ, which is too small for some of the things we do in signal handlers, and as of this writing lacks a guard page. Install bigger sigaltstack stacks so that we have enough space, and have a guard page. |
||
|
|
f700efeb03 |
Remove C++ dependency from wasmtime (#1365)
* Remove C++ dependency from `wasmtime` This commit removes the last wads of C++ that we have in wasmtime, meaning that building wasmtime no longer requires a C++ compiler. It still does require a C toolchain for some minor purposes, but hopefully we can remove that over time too! The motivation for doing this is to consolidate all our signal-handling code into one location in one language so you don't have to keep crossing back and forth when understanding what's going on. This also allows us to remove some extra cruft that wasn't necessary from the C++ original implementation. Additionally this should also make building wasmtime a bit more portable since it's often easier to acquire a C toolchain than it is to acquire a C++ toolchain. (e.g. if you're cross-compiling to a musl target) * Typos |
||
|
|
afd980b4f6 |
Refactor the internals of Func to remove layers of indirection (#1363)
* Remove `WrappedCallable` indirection At this point `Func` has evolved quite a bit since inception and the `WrappedCallable` trait I don't believe is needed any longer. This should help clean up a few entry points by having fewer traits in play. * Remove the `Callable` trait This commit removes the `wasmtime::Callable` trait, changing the signature of `Func::new` to take an appropriately typed `Fn`. Additionally the function now always takes `&Caller` like `Func::wrap` optionally can, to empower `Func::new` to have the same capabilities of `Func::wrap`. * Add a test for an already-fixed issue Closes #849 * rustfmt * Update more locations for `Callable` * rustfmt * Remove a stray leading borrow * Review feedback * Remove unneeded `wasmtime_call_trampoline` shim |
||
|
|
3e2be43502 |
Pre-generate trampoline functions (#957)
* Refactor wasmtime_runtime::Export Instead of an enumeration with variants that have data fields have an enumeration where each variant has a struct, and each struct has the data fields. This allows us to store the structs in the `wasmtime` API and avoid lots of `panic!` calls and various extraneous matches. * Pre-generate trampoline functions The `wasmtime` crate supports calling arbitrary function signatures in wasm code, and to do this it generates "trampoline functions" which have a known ABI that then internally convert to a particular signature's ABI and call it. These trampoline functions are currently generated on-the-fly and are cached in the global `Store` structure. This, however, is suboptimal for a few reasons: * Due to how code memory is managed each trampoline resides in its own 64kb allocation of memory. This means if you have N trampolines you're using N * 64kb of memory, which is quite a lot of overhead! * Trampolines are never free'd, even if the referencing module goes away. This is similar to #925. * Trampolines are a source of shared state which prevents `Store` from being easily thread safe. This commit refactors how trampolines are managed inside of the `wasmtime` crate and jit/runtime internals. All trampolines are now allocated in the same pass of `CodeMemory` that the main module is allocated into. A trampoline is generated per-signature in a module as well, instead of per-function. This cache of trampolines is stored directly inside of an `Instance`. Trampolines are stored based on `VMSharedSignatureIndex` so they can be looked up from the internals of the `ExportFunction` value. The `Func` API has been updated with various bits and pieces to ensure the right trampolines are registered in the right places. Overall this should ensure that all trampolines necessary are generated up-front rather than lazily. This allows us to remove the trampoline cache from the `Compiler` type, and move one step closer to making `Compiler` threadsafe for usage across multiple threads. Note that as one small caveat the `Func::wrap*` family of functions don't need to generate a trampoline at runtime, they actually generate the trampoline at compile time which gets passed in. Also in addition to shuffling a lot of code around this fixes one minor bug found in `code_memory.rs`, where `self.position` was loaded before allocation, but the allocation may push a new chunk which would cause `self.position` to be zero instead. * Pass the `SignatureRegistry` as an argument to where it's needed. This avoids the need for storing it in an `Arc`. * Ignore tramoplines for functions with lots of arguments Co-authored-by: Dan Gohman <sunfish@mozilla.com> |
||
|
|
ef0cabf8b4 | Address review feedback | ||
|
|
33b4a37bcb |
Add support for table.copy
This adds support for the `table.copy` instruction from the bulk memory proposal. It also supports multiple tables, which were introduced by the reference types proposal. Part of #928 |
||
|
|
80b095f2e2 |
Add API to statically assert signature of a Func (#955)
* Add API to statically assert signature of a `Func` This commit add a family of APIs to `Func` named `getN` where `N` is the number of arguments. Each function will attempt to statically assert the signature of a `Func` and, if matching, returns a corresponding closure which can be used to invoke the underlying function. The purpose of this commit is to add a highly optimized way to enter a wasm module, performing type checks up front and avoiding all the costs of boxing and unboxing arguments within a `Val`. In general this should be much more optimized than the previous `call` API for entering a wasm module, if the signature is statically known. * rustfmt * Remove stray debugging |