af226d37c24d22d5a1c49d3a2c1e0c07a8c89675
46 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
6e099720af |
wasmtime-runtime: Remove old libcalls that aren't used anymore (#4470)
These were for x86 (32-bit) where the ISA didn't have instructions for these things, but now that we don't support that, and always have SSE2 for x86_64, we never need or use these libcalls anymore. |
||
|
|
77e06213b7 |
Refactor the internals of traps in wasmtime_runtime (#4326)
This commit is a small refactoring of `wasmtime_runtime::Trap` and various internals. The `Trap` structure is now a reason plus backtrace, and the old `Trap` enum is mostly in `TrapReason` now. Additionally all `Trap`-returning methods of `wasmtime_runtime` are changed to returning a `TrapCode` to indicate that they never capture a backtrace. Finally the `UnwindReason` internally now no longer duplicates the trap reasons, instead only having two variants of "panic" and "trap". The motivation for this commit is mostly just cleaning up trap internals and removing the need for methods like `wasmtime_runtime::Trap::insert_backtrace` to leave it only happening at the `wasmtime` layer. |
||
|
|
2b52f47b83 |
Add shared memories (#4187)
* Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `*mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. * review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change '*const [u8]' to '*mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator |
||
|
|
bffce37050 |
make backtrace collection a Config field rather than a cargo feature (#4183)
* sorta working in runtime * wasmtime-runtime: get rid of wasm-backtrace feature * wasmtime: factor to make backtraces recording optional. not configurable yet * get rid of wasm-backtrace features * trap tests: now a Trap optionally contains backtrace * eliminate wasm-backtrace feature * code review fixes * ci: no more wasm-backtrace feature * c_api: backtraces always enabled * config: unwind required by backtraces and ref types * plumbed * test that disabling backtraces works * code review comments * fuzzing generator: wasm_backtrace is a runtime config now * doc fix |
||
|
|
3f9bff17c8 |
Support disabling backtraces at compile time (#3932)
* Support disabling backtraces at compile time This commit adds support to Wasmtime to disable, at compile time, the gathering of backtraces on traps. The `wasmtime` crate now sports a `wasm-backtrace` feature which, when disabled, will mean that backtraces are never collected at compile time nor are unwinding tables inserted into compiled objects. The motivation for this commit stems from the fact that generating a backtrace is quite a slow operation. Currently backtrace generation is done with libunwind and `_Unwind_Backtrace` typically found in glibc or other system libraries. When thousands of modules are loaded into the same process though this means that the initial backtrace can take nearly half a second and all subsequent backtraces can take upwards of hundreds of milliseconds. Relative to all other operations in Wasmtime this is extremely expensive at this time. In the future we'd like to implement a more performant backtrace scheme but such an implementation would require coordination with Cranelift and is a big chunk of work that may take some time, so in the meantime if embedders don't need a backtrace they can still use this option to disable backtraces at compile time and avoid the performance pitfalls of collecting backtraces. In general I tried to originally make this a runtime configuration option but ended up opting for a compile-time option because `Trap::new` otherwise has no arguments and always captures a backtrace. By making this a compile-time option it was possible to configure, statically, the behavior of `Trap::new`. Additionally I also tried to minimize the amount of `#[cfg]` necessary by largely only having it at the producer and consumer sites. Also a noteworthy restriction of this implementation is that if backtrace support is disabled at compile time then reference types support will be unconditionally disabled at runtime. With backtrace support disabled there's no way to trace the stack of wasm frames which means that GC can't happen given our current implementation. * Always enable backtraces for the C API |
||
|
|
39a52ceb4f |
Implement lazy funcref table and anyfunc initialization. (#3733)
During instance initialization, we build two sorts of arrays eagerly:
- We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function
in an instance.
- We initialize every element of a funcref table with an initializer to
a pointer to one of these anyfuncs.
Most instances will not touch (via call_indirect or table.get) all
funcref table elements. And most anyfuncs will never be referenced,
because most functions are never placed in tables or used with
`ref.func`. Thus, both of these initialization tasks are quite wasteful.
Profiling shows that a significant fraction of the remaining
instance-initialization time after our other recent optimizations is
going into these two tasks.
This PR implements two basic ideas:
- The anyfunc array can be lazily initialized as long as we retain the
information needed to do so. For now, in this PR, we just recreate the
anyfunc whenever a pointer is taken to it, because doing so is fast
enough; in the future we could keep some state to know whether the
anyfunc has been written yet and skip this work if redundant.
This technique allows us to leave the anyfunc array as uninitialized
memory, which can be a significant savings. Filling it with
initialized anyfuncs is very expensive, but even zeroing it is
expensive: e.g. in a large module, it can be >500KB.
- A funcref table can be lazily initialized as long as we retain a link
to its corresponding instance and function index for each element. A
zero in a table element means "uninitialized", and a slowpath does the
initialization.
Funcref tables are a little tricky because funcrefs can be null. We need
to distinguish "element was initially non-null, but user stored explicit
null later" from "element never touched" (ie the lazy init should not
blow away an explicitly stored null). We solve this by stealing the LSB
from every funcref (anyfunc pointer): when the LSB is set, the funcref
is initialized and we don't hit the lazy-init slowpath. We insert the
bit on storing to the table and mask it off after loading.
We do have to set up a precomputed array of `FuncIndex`s for the table
in order for this to work. We do this as part of the module compilation.
This PR also refactors the way that the runtime crate gains access to
information computed during module compilation.
Performance effect measured with in-tree benches/instantiation.rs, using
SpiderMonkey built for WASI, and with memfd enabled:
```
BEFORE:
sequential/default/spidermonkey.wasm
time: [68.569 us 68.696 us 68.856 us]
sequential/pooling/spidermonkey.wasm
time: [69.406 us 69.435 us 69.465 us]
parallel/default/spidermonkey.wasm: with 1 background thread
time: [69.444 us 69.470 us 69.497 us]
parallel/default/spidermonkey.wasm: with 16 background threads
time: [183.72 us 184.31 us 184.89 us]
parallel/pooling/spidermonkey.wasm: with 1 background thread
time: [69.018 us 69.070 us 69.136 us]
parallel/pooling/spidermonkey.wasm: with 16 background threads
time: [326.81 us 337.32 us 347.01 us]
WITH THIS PR:
sequential/default/spidermonkey.wasm
time: [6.7821 us 6.8096 us 6.8397 us]
change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05)
Performance has improved.
sequential/pooling/spidermonkey.wasm
time: [3.0410 us 3.0558 us 3.0724 us]
change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05)
Performance has improved.
parallel/default/spidermonkey.wasm: with 1 background thread
time: [7.2643 us 7.2689 us 7.2735 us]
change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05)
Performance has improved.
parallel/default/spidermonkey.wasm: with 16 background threads
time: [147.36 us 148.99 us 150.74 us]
change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05)
Performance has improved.
parallel/pooling/spidermonkey.wasm: with 1 background thread
time: [3.1009 us 3.1021 us 3.1033 us]
change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05)
Performance has improved.
parallel/pooling/spidermonkey.wasm: with 16 background threads
time: [49.449 us 50.475 us 51.540 us]
change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05)
Performance has improved.
```
So an improvement of something like 80-95% for a very large module (7420
functions in its one funcref table, 31928 functions total).
|
||
|
|
a25f7bdba5 |
Don't copy VMBuiltinFunctionsArray into each VMContext (#3741)
* Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments |
||
|
|
19f8d94959 |
Expand on activations table invariants comment in libcalls.rs
|
||
|
|
cbc6f6071f |
Fix a debug assertion in externref garbage collections
When we GC, we assert the invariant that all `externref`s we find on the stack have a corresponding entry in the `VMExternRefActivationsTable`. However, we also might be in code that is in the process of fixing up this invariant and adding an entry to the table, but the table's bump chunk is full, and so we do a GC and then add the entry into the table. This will ultimately maintain our desired invariant, but there is a moment in time when we are doing the GC where the invariant is relaxed which is okay because the reference will be in the table before we return to Wasm or do anything else. This isn't a possible UAF, in other words. To make it so that the assertion won't trip, we explicitly insert the reference into the table *before* we GC, so that the invariant is not relaxed across a possibly-GCing operation (even though it would be safe in this particular case). |
||
|
|
8a55b5c563 |
Add epoch-based interruption for cooperative async timeslicing.
This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period. |
||
|
|
b00d811e83 | code review | ||
|
|
0370d5c1a2 | code review suggestion | ||
|
|
5aef8f47c8 | catch panic in libcalls for memory and table grow | ||
|
|
a5007f318f | runtime: use anyhow::Error instead of Box<dyn std::error::Error...> | ||
|
|
87c33c2969 |
Remove wasmtime-environ's dependency on cranelift-codegen (#3199)
* Move `CompiledFunction` into wasmtime-cranelift This commit moves the `wasmtime_environ::CompiledFunction` type into the `wasmtime-cranelift` crate. This type has lots of Cranelift-specific pieces of compilation and doesn't need to be generated by all Wasmtime compilers. This replaces the usage in the `Compiler` trait with a `Box<Any>` type that each compiler can select. Each compiler must still produce a `FunctionInfo`, however, which is shared information we'll deserialize for each module. The `wasmtime-debug` crate is also folded into the `wasmtime-cranelift` crate as a result of this commit. One possibility was to move the `CompiledFunction` commit into its own crate and have `wasmtime-debug` depend on that, but since `wasmtime-debug` is Cranelift-specific at this time it didn't seem like it was too too necessary to keep it separate. If `wasmtime-debug` supports other backends in the future we can recreate a new crate, perhaps with it refactored to not depend on Cranelift. * Move wasmtime_environ::reference_type This now belongs in wasmtime-cranelift and nowhere else * Remove `Type` reexport in wasmtime-environ One less dependency on `cranelift-codegen`! * Remove `types` reexport from `wasmtime-environ` Less cranelift! * Remove `SourceLoc` from wasmtime-environ Change the `srcloc`, `start_srcloc`, and `end_srcloc` fields to a custom `FilePos` type instead of `ir::SourceLoc`. These are only used in a few places so there's not much to lose from an extra abstraction for these leaf use cases outside of cranelift. * Remove wasmtime-environ's dep on cranelift's `StackMap` This commit "clones" the `StackMap` data structure in to `wasmtime-environ` to have an independent representation that that chosen by Cranelift. This allows Wasmtime to decouple this runtime dependency of stack map information and let the two evolve independently, if necessary. An alternative would be to refactor cranelift's implementation into a separate crate and have wasmtime depend on that but it seemed a bit like overkill to do so and easier to clone just a few lines for this. * Define code offsets in wasmtime-environ with `u32` Don't use Cranelift's `binemit::CodeOffset` alias to define this field type since the `wasmtime-environ` crate will be losing the `cranelift-codegen` dependency soon. * Commit to using `cranelift-entity` in Wasmtime This commit removes the reexport of `cranelift-entity` from the `wasmtime-environ` crate and instead directly depends on the `cranelift-entity` crate in all referencing crates. The original reason for the reexport was to make cranelift version bumps easier since it's less versions to change, but nowadays we have a script to do that. Otherwise this encourages crates to use whatever they want from `cranelift-entity` since we'll always depend on the whole crate. It's expected that the `cranelift-entity` crate will continue to be a lean crate in dependencies and suitable for use at both runtime and compile time. Consequently there's no need to avoid its usage in Wasmtime at runtime, since "remove Cranelift at compile time" is primarily about the `cranelift-codegen` crate. * Remove most uses of `cranelift-codegen` in `wasmtime-environ` There's only one final use remaining, which is the reexport of `TrapCode`, which will get handled later. * Limit the glob-reexport of `cranelift_wasm` This commit removes the glob reexport of `cranelift-wasm` from the `wasmtime-environ` crate. This is intended to explicitly define what we're reexporting and is a transitionary step to curtail the amount of dependencies taken on `cranelift-wasm` throughout the codebase. For example some functions used by debuginfo mapping are better imported directly from the crate since they're Cranelift-specific. Note that this is intended to be a temporary state affairs, soon this reexport will be gone entirely. Additionally this commit reduces imports from `cranelift_wasm` and also primarily imports from `crate::wasm` within `wasmtime-environ` to get a better sense of what's imported from where and what will need to be shared. * Extract types from cranelift-wasm to cranelift-wasm-types This commit creates a new crate called `cranelift-wasm-types` and extracts type definitions from the `cranelift-wasm` crate into this new crate. The purpose of this crate is to be a shared definition of wasm types that can be shared both by compilers (like Cranelift) as well as wasm runtimes (e.g. Wasmtime). This new `cranelift-wasm-types` crate doesn't depend on `cranelift-codegen` and is the final step in severing the unconditional dependency from Wasmtime to `cranelift-codegen`. The final refactoring in this commit is to then reexport this crate from `wasmtime-environ`, delete the `cranelift-codegen` dependency, and then update all `use` paths to point to these new types. The main change of substance here is that the `TrapCode` enum is mirrored from Cranelift into this `cranelift-wasm-types` crate. While this unfortunately results in three definitions (one more which is non-exhaustive in Wasmtime itself) it's hopefully not too onerous and ideally something we can patch up in the future. * Get lightbeam compiling * Remove unnecessary dependency * Fix compile with uffd * Update publish script * Fix more uffd tests * Rename cranelift-wasm-types to wasmtime-types This reflects the purpose a bit more where it's types specifically intended for Wasmtime and its support. * Fix publish script |
||
|
|
0313e30d76 |
Remove dependency on TargetIsa from Wasmtime crates (#3178)
This commit started off by deleting the `cranelift_codegen::settings` reexport in the `wasmtime-environ` crate and then basically played whack-a-mole until everything compiled again. The main result of this is that the `wasmtime-*` family of crates have generally less of a dependency on the `TargetIsa` trait and type from Cranelift. While the dependency isn't entirely severed yet this is at least a significant start. This commit is intended to be largely refactorings, no functional changes are intended here. The refactorings are: * A `CompilerBuilder` trait has been added to `wasmtime_environ` which server as an abstraction used to create compilers and configure them in a uniform fashion. The `wasmtime::Config` type now uses this instead of cranelift-specific settings. The `wasmtime-jit` crate exports the ability to create a compiler builder from a `CompilationStrategy`, which only works for Cranelift right now. In a cranelift-less build of Wasmtime this is expected to return a trait object that fails all requests to compile. * The `Compiler` trait in the `wasmtime_environ` crate has been souped up with a number of methods that Wasmtime and other crates needed. * The `wasmtime-debug` crate is now moved entirely behind the `wasmtime-cranelift` crate. * The `wasmtime-cranelift` crate is now only depended on by the `wasmtime-jit` crate. * Wasm types in `cranelift-wasm` no longer contain their IR type, instead they only contain the `WasmType`. This is required to get everything to align correctly but will also be required in a future refactoring where the types used by `cranelift-wasm` will be extracted to a separate crate. * I moved around a fair bit of code in `wasmtime-cranelift`. * Some gdb-specific jit-specific code has moved from `wasmtime-debug` to `wasmtime-jit`. |
||
|
|
e68aa99588 |
Implement the memory64 proposal in Wasmtime (#3153)
* Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can't grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that's what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we're going to/from wasm. Overall I've tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren't aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn't want to enable this by default naturally until we've fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it's implemented with "dynamic" heaps which have a few consequences: * All memory accesses are bounds-checked. I'm not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven't stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it's really not all that comprehensive. I've tried adding more tests for basic things as I've had to implement guards for them, but I wouldn't really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages |
||
|
|
85f16f488d |
Consolidate address calculations for atomics (#3143)
* Consolidate address calculations for atomics This commit consolidates all calcuations of guest addresses into one `prepare_addr` function. This notably remove the atomics-specifics paths as well as the `prepare_load` function (now renamed to `prepare_addr` and folded into `get_heap_addr`). The goal of this commit is to simplify how addresses are managed in the code generator for atomics to use all the shared infrastrucutre of other loads/stores as well. This additionally fixes #3132 via the use of `heap_addr` in clif for all operations. I also added a number of tests for loads/stores with varying alignments. Originally I was going to allow loads/stores to not be aligned since that's what the current formal specification says, but the overview of the threads proposal disagrees with the formal specification, so I figured I'd leave it as-is but adding tests probably doesn't hurt. Closes #3132 * Fix old backend * Guarantee misalignment checks happen before out-of-bounds |
||
|
|
a273add815 |
Simplify the list of builtin intrinsics Wasmtime needs
This commit slims down the list of builtin intrinsics. It removes the duplicated intrinsics for imported and locally defined items, instead always using one intrinsic for both. This was previously inconsistently applied where some intrinsics got two copies (one for imported one for local) and other intrinsics got only one copy. This does add an extra branch in intrinsics since they need to determine whether something is local or not, but that's generally much lower cost than the intrinsics themselves. This also removes the `memory32_size` intrinsic, instead inlining the codegen directly into the clif IR. This matches what the `table.size` instruction does and removes the need for a few functions on a `wasmtime_runtime::Instance`. |
||
|
|
7a1b7cdf92 |
Implement RFC 11: Redesigning Wasmtime's APIs (#2897)
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread. |
||
|
|
b775b68cfb |
Make module information lookup from runtime safe.
This commit uses a two-phase lookup of stack map information from modules rather than giving back raw pointers to stack maps. First the runtime looks up information about a module from a pc value, which returns an `Arc` it keeps a reference on while completing the stack map lookup. Second it then queries the module information for the stack map from a pc value, getting a reference to the stack map (which is now safe because of the `Arc` held by the runtime). |
||
|
|
ea72c621f0 |
Remove the stack map registry.
This commit removes the stack map registry and instead uses the existing information from the store's module registry to lookup stack maps. A trait is now used to pass the lookup context to the runtime, implemented by `Store` to do the lookup. With this change, module registration in `Store` is now entirely limited to inserting the module into the module registry. |
||
|
|
0e41861662 |
Implement limiting WebAssembly execution with fuel (#2611)
* Consume fuel during function execution This commit adds codegen infrastructure necessary to instrument wasm code to consume fuel as it executes. Currently nothing is really done with the fuel, but that'll come in later commits. The focus of this commit is to implement the codegen infrastructure necessary to consume fuel and account for fuel consumed correctly. * Periodically check remaining fuel in wasm JIT code This commit enables wasm code to periodically check to see if fuel has run out. When fuel runs out an intrinsic is called which can do what it needs to do in the result of fuel running out. For now a trap is thrown to have at least some semantics in synchronous stores, but another planned use for this feature is for asynchronous stores to periodically yield back to the host based on fuel running out. Checks for remaining fuel happen in the same locations as interrupt checks, which is to say the start of the function as well as loop headers. * Improve codegen by caching `*const VMInterrupts` The location of the shared interrupt value and fuel value is through a double-indirection on the vmctx (load through the vmctx and then load through that pointer). The second pointer in this chain, however, never changes, so we can alter codegen to account for this and remove some extraneous load instructions and hopefully reduce some register pressure even maybe. * Add tests fuel can abort infinite loops * More fuzzing with fuel Use fuel to time out modules in addition to time, using fuzz input to figure out which. * Update docs on trapping instructions * Fix doc links * Fix a fuzz test * Change setting fuel to adding fuel * Fix a doc link * Squelch some rustdoc warnings |
||
|
|
3580205f12 |
[Cranelift][Atomics] Add address folding for atomic notify/wait. (#2556)
* fold address in wasm wait and notify ops * add atomics addr folding tests |
||
|
|
e659d5cecd |
Add initial support for the multi-memory proposal (#2263)
This commit adds initial (gated) support for the multi-memory wasm proposal. This was actually quite easy since almost all of wasmtime already expected multi-memory to be implemented one day. The only real substantive change is the `memory.copy` intrinsic changes, which now accounts for the source/destination memories possibly being different. |
||
|
|
04cf94cea3 |
runtime: fix nearest for NaN inputs;
According to wasm's spec, nearest must do the following, for NaN inputs: - when the input is a canonical NaN, return a canonical NaN; - when the input is a non-canonical NaN, return an arithmetic NaN. This patch adds checks when the exponent is all ones if the input was a NaN, and will set the significand's most significant bit in that case. It works both for canonical inputs (which already had the bit set) and makes other NaN inputs canonical. |
||
|
|
a8f7041296 |
new implementations for nearest lib calls
use approach with copysign for handling negative zero format refactor for better branch prediction move copysign back to internal branch format fix use abs instead branches better comments switch arms for better branch prediction |
||
|
|
3555f97906 |
wasmtime: Implement table.fill
Part of #929 |
||
|
|
bffd54c016 |
wasmtime: Implement global.{get,set} for externref globals (#1969)
* wasmtime: Implement `global.{get,set}` for externref globals
We use libcalls to implement these -- unlike `table.{get,set}`, for which we
create inline JIT fast paths -- because no known toolchain actually uses
externref globals.
Part of #929
* wasmtime: Enable `{extern,func}ref` globals in the API
|
||
|
|
8c5f59c0cf |
wasmtime: Implement table.get and table.set
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).
Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.
Part of #929
|
||
|
|
58bb5dd953 |
wasmtime: Add support for func.ref and table.grow with funcrefs
`funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`. This should be more efficient than using a `VMExternRef` that points at a `VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic allocation, and some reference counting. Note that the null function reference is *NOT* a null pointer; it is a `VMCallerCheckedAnyfunc` that has a null `func_ptr` member. Part of #929 |
||
|
|
bbd99c5bfa |
reference types: Implement the table.size and table.grow instructions (#1894)
Part of #929 |
||
|
|
bc555468a7 |
cranelift: add i64.{ishl,ushr,ashr} libcalls.
These libcalls are useful for 32-bit platforms.
On x86_32 in particular, commit
|
||
|
|
b2e8ed4dc9 |
cranelift: add i64.[us]{div,rem} libcalls.
These libcalls are useful for 32-bit platforms. |
||
|
|
be85242a3f |
Expose precise offset information in wasmtime::FrameInfo (#1495)
* Consolidate trap/frame information This commit removes `TrapRegistry` in favor of consolidating this information in the `FRAME_INFO` we already have in the `wasmtime` crate. This allows us to keep information generally in one place and have one canonical location for "map this PC to some original wasm stuff". The intent for this is to next update with enough information to go from a program counter to a position in the original wasm file. * Expose module offset information in `FrameInfo` This commit implements functionality for `FrameInfo`, the wasm stack trace of a `Trap`, to return the module/function offset. This allows knowing the precise wasm location of each stack frame, instead of only the main trap itself. The intention here is to provide more visibility into the wasm source when something traps, so you know precisely where calls were and where traps were, in order to assist in debugging. Eventually we might use this information for mapping back to native source languages as well (given sufficient debug information). This change makes a previously-optional artifact of compilation always computed on the cranelift side of things. This `ModuleAddressMap` is then propagated to the same store of information other frame information is stored within. This also removes the need for passing a `SourceLoc` with wasm traps or to wasm trap creation, since the backtrace's wasm frames will be able to infer their own `SourceLoc` from the relevant program counters. |
||
|
|
674a6208d8 |
Implement data.drop and memory.init and get the rest of the bulk memory spec tests passing (#1264)
* Enable the already-passing `bulk-memoryoperations/imports.wast` test * Implement support for the `memory.init` instruction and passive data This adds support for passive data segments and the `memory.init` instruction from the bulk memory operations proposal. Passive data segments are stored on the Wasm module and then `memory.init` instructions copy their contents into memory. * Implement the `data.drop` instruction This allows wasm modules to deallocate passive data segments that it doesn't need anymore. We keep track of which segments have not been dropped on an `Instance` and when dropping them, remove the entry from the instance's hash map. The module always needs all of the segments for new instantiations. * Enable final bulk memory operations spec test This requires special casing an expected error message for an `assert_trap`, since the expected error message contains the index of an uninitialized table element, but our trap implementation doesn't save that diagnostic information and shepherd it out. |
||
|
|
8597930eed |
rename PassiveElemIndex to ElemIndex and same for PassiveDataIndex (#1188)
* rename PassiveElemIndex to ElemIndex and same for PassiveDataIndex (#1411) * rename PassiveDataIndex to DataIndex * rename PassiveElemIndex to ElemIndex * Apply renamings to wasmtime as well * Run rustfmt Co-authored-by: csmoe <csmoe@msn.com> |
||
|
|
235833ab97 | Ignore a doc test | ||
|
|
ef0cabf8b4 | Address review feedback | ||
|
|
44c28612fb |
Implement the memory.fill instruction from the bulk memory proposal
|
||
|
|
98ecef1700 |
Implement the memory.copy instruction from the bulk memory proposal
|
||
|
|
cb97e4ec8e |
Implement table.init and elem.drop from the bulk memory proposal
|
||
|
|
33b4a37bcb |
Add support for table.copy
This adds support for the `table.copy` instruction from the bulk memory proposal. It also supports multiple tables, which were introduced by the reference types proposal. Part of #928 |
||
|
|
cc6e8e1af2 |
Move cranelift dependencies to wasmtime-environ (#669)
Groups all CL data structures into single dependency to be used accross wasmtime project. |
||
|
|
22641de629 |
Initial reorg.
This is largely the same as #305, but updated for the current tree. |