7e947042644cd2956b2e963b0724abe3e16e2112
70 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b14551d7ca |
Refactor configuration for the pooling allocator (#5205)
This commit changes the APIs in the `wasmtime` crate for configuring the pooling allocator. I plan on adding a few more configuration options in the near future and the current structure was feeling unwieldy for adding these new abstractions. The previous `struct`-based API has been replaced with a builder-style API in a similar shape as to `Config`. This is done to help make it easier to add more configuration options in the future through adding more methods as opposed to adding more field which could break prior initializations. |
||
|
|
5ec92d59d2 |
[fuzz] Add a meta-differential fuzz target (#4515)
* [fuzz] Add `Module` enum, refactor `ModuleConfig` This change adds a way to create either a single-instruction module or a regular (big) `wasm-smith` module. It has some slight refactorings in preparation for the use of this new code. * [fuzz] Add `DiffValue` for differential evaluation In order to evaluate functions with randomly-generated values, we needed a common way to generate these values. Using the Wasmtime `Val` type is not great because we would like to be able to implement various traits on the new value type, e.g., to convert `Into` and `From` boxed values of other engines we differentially fuzz against. This new type, `DiffValue`, gives us a common ground for all the conversions and comparisons between the other engine types. * [fuzz] Add interface for differential engines In order to randomly choose an engine to fuzz against, we expect all of the engines to meet a common interface. The traits in this commit allow us to instantiate a module from its binary form, evaluate exported functions, and (possibly) hash the exported items of the instance. This change has some missing pieces, though: - the `wasm-spec-interpreter` needs some work to be able to create instances, evaluate a function by name, and expose exported items - the `v8` engine is not implemented yet due to the complexity of its Rust lifetimes * [fuzz] Use `ModuleFeatures` instead of existing configuration When attempting to use both wasm-smith and single-instruction modules, there is a mismatch in how we communicate what an engine must be able to support. In the first case, we could use the `ModuleConfig`, a wrapper for wasm-smith's `SwarmConfig`, but single-instruction modules do not have a `SwarmConfig`--the many options simply don't apply. Here, we instead add `ModuleFeatures` and adapt a `ModuleConfig` to that. `ModuleFeatures` then becomes the way to communicate what features an engine must support to evaluate functions in a module. * [fuzz] Add a new fuzz target using the meta-differential oracle This change adds the `differential_meta` target to the list of fuzz targets. I expect that sometime soon this could replace the other `differential*` targets, as it almost checks all the things those check. The major missing piece is that currently it only chooses single-instruction modules instead of also generating arbitrary modules using `wasm-smith`. Also, this change adds the concept of an ignorable error: some differential engines will choke with certain inputs (e.g., `wasmi` might have an old opcode mapping) which we do not want to flag as fuzz bugs. Here we wrap those errors in `DiffIgnoreError` and then use a new helper trait, `DiffIgnorable`, to downcast and inspect the `anyhow` error to only panic on non-ignorable errors; the ignorable errors are converted to one of the `arbitrary::Error` variants, which we already ignore. * [fuzz] Compare `DiffValue` NaNs more leniently Because arithmetic NaNs can contain arbitrary payload bits, checking that two differential executions should produce the same result should relax the comparison of the `F32` and `F64` types (and eventually `V128` as well... TODO). This change adds several considerations, however, so that in the future we make the comparison a bit stricter, e.g., re: canonical NaNs. This change, however, just matches the current logic used by other fuzz targets. * review: allow hashing mutate the instance state @alexcrichton requested that the interface be adapted to accommodate Wasmtime's API, in which even reading from an instance could trigger mutation of the store. * review: refactor where configurations are made compatible See @alexcrichton's [suggestion](https://github.com/bytecodealliance/wasmtime/pull/4515#discussion_r928974376). * review: convert `DiffValueType` using `TryFrom` See @alexcrichton's [comment](https://github.com/bytecodealliance/wasmtime/pull/4515#discussion_r928962394). * review: adapt target implementation to Wasmtime-specific RHS This change is joint work with @alexcrichton to adapt the structure of the fuzz target to his comments [here](https://github.com/bytecodealliance/wasmtime/pull/4515#pullrequestreview-1073247791). This change: - removes `ModuleFeatures` and the `Module` enum (for big and small modules) - upgrades `SingleInstModule` to filter out cases that are not valid for a given `ModuleConfig` - adds `DiffEngine::name()` - constructs each `DiffEngine` using a `ModuleConfig`, eliminating `DiffIgnoreError` completely - prints an execution rate to the `differential_meta` target Still TODO: - `get_exported_function_signatures` could be re-written in terms of the Wasmtime API instead `wasmparser` - the fuzzer crashes eventually, we think due to the signal handler interference between OCaml and Wasmtime - the spec interpreter has several cases that we skip for now but could be fuzzed with further work Co-authored-by: Alex Crichton <alex@alexcrichton.com> * fix: avoid SIGSEGV by explicitly initializing OCaml runtime first * review: use Wasmtime's API to retrieve exported functions Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
||
|
|
ed8908efcf |
implement fuzzing for component types (#4537)
This addresses #4307. For the static API we generate 100 arbitrary test cases at build time, each of which includes 0-5 parameter types, a result type, and a WAT fragment containing an imported function and an exported function. The exported function calls the imported function, which is implemented by the host. At runtime, the fuzz test selects a test case at random and feeds it zero or more sets of arbitrary parameters and results, checking that values which flow host-to-guest and guest-to-host make the transition unchanged. The fuzz test for the dynamic API follows a similar pattern, the only difference being that test cases are generated at runtime. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
a8ce7f123b |
fuzz: add a single instruction module generator (#4409)
* fuzz: add a single instruction module generator As proposed by @cfallin in #3251, this change adds a way to generate a Wasm module for a single instruction. It captures the necessary parameter and result types so that fuzzing can not only choose which instruction to check but also generate values to pass to the instruction. Not all instructions are available yet, but a significant portion of scalar instructions are implemented in this change. This does not wire the generator up to any fuzz targets. * review: use raw string in test * review: remove once_cell, use slices * review: refactor macros to use valtype! * review: avoid cloning when choosing a SingleInstModule |
||
|
|
c227063ff6 |
fuzz: refactor fuzz generators (#4404)
Previously, much of the logic for generating the various objects needed for fuzzing was concentrated primarily in `generators.rs`. In trying to piece together what code does what, the size of the file and the light documentation make it hard to discern what each part does. Since several generator structures had been split out as separate modules in the `generators/` directory, this change takes that refactoring further by moving the structures in `generators.rs` to their own modules. No logic changes were made, only the addition of documentation in a few places. |
||
|
|
52ad76ed7c |
Update differential fuzzing configuration (#4386)
* Update differential fuzzing configuration This uses some new features of `wasm-smith` and additionally tweaks the existing fuzz configuration: * More than one function is now allowed to be generated. There's no particular reason to limit differential execution to just one and we may want to explore other interesting module shapes. * More than one function type is now allowed to possibly allow more interesting `block` types. * Memories are now allowed to grow beyond one page, but still say small by staying underneath 10 pages. * Tables are now always limited in their growth to ensure consistent behavior across engines (e.g. with the pooling allocator vs v8). * The `export_everything` feature is used instead of specifying a min/max number of exports. The `wasmi` differential fuzzer was updated to still work if memory is exported, but otherwise the v8 differential fuzzer already worked if a function was exported but a memory wasn't. Both fuzzers continue to execute only the first exported function. Also notable from this update is that the `SwarmConfig` from `wasm-smith` will now include an arbitrary `allowed_instructions` configuration which may help explore the space of interesting modules more effectively. * Fix typos |
||
|
|
c8414cfca8 |
Fix guard size configuration when fuzzing (#4321)
Fuzzers weren't updated to account for #4262 where guard sizes are now validated rather than automatically sanitized. I'm not sure why oss-fuzz hasn't filed a bug about this yet because it's definitely crashing a lot on oss-fuzz... |
||
|
|
22e13fee1d |
fuzz: allow generating shared memories (#4266)
`wasm-smith` v0.11 has support for generating shared memories when the `threads_enabled` configuration flag is set. This change turns on that flag occasionally. This also upgrades `wasm-smith` to v0.11.1 to always generate shared memory with a known maximum. |
||
|
|
258dc9de42 |
fix(wasmtime):Config methods should be idempotent (#4252)
This commit refactored `Config` to use a seperate `CompilerConfig` field instead of operating on `CompilerBuilder` directly to make all its methods idempotent. Fixes #4189 |
||
|
|
bffce37050 |
make backtrace collection a Config field rather than a cargo feature (#4183)
* sorta working in runtime * wasmtime-runtime: get rid of wasm-backtrace feature * wasmtime: factor to make backtraces recording optional. not configurable yet * get rid of wasm-backtrace features * trap tests: now a Trap optionally contains backtrace * eliminate wasm-backtrace feature * code review fixes * ci: no more wasm-backtrace feature * c_api: backtraces always enabled * config: unwind required by backtraces and ref types * plumbed * test that disabling backtraces works * code review comments * fuzzing generator: wasm_backtrace is a runtime config now * doc fix |
||
|
|
6cf4c95585 |
Ensure simd is enabled for spectest fuzzing (#4172)
This is required now that the simd specification has been merged into the upstream specification, so to run the spec tests this must always be enabled instead of being left to the whims of the fuzzer about whether to enable it or not. |
||
|
|
a1e4b4b521 |
Enable AArch64 processor feature detection unconditionally (#4034)
std::arch::is_aarch64_feature_detected!() is now part of stable Rust, so we can always use it. Copyright (c) 2022, Arm Limited. |
||
|
|
76b82910c9 |
Remove the module linking implementation in Wasmtime (#3958)
* Remove the module linking implementation in Wasmtime This commit removes the experimental implementation of the module linking WebAssembly proposal from Wasmtime. The module linking is no longer intended for core WebAssembly but is instead incorporated into the component model now at this point. This means that very large parts of Wasmtime's implementation of module linking are no longer applicable and would change greatly with an implementation of the component model. The main purpose of this is to remove Wasmtime's reliance on the support for module-linking in `wasmparser` and tooling crates. With this reliance removed we can move over to the `component-model` branch of `wasmparser` and use the updated support for the component model. Additionally given the trajectory of the component model proposal the embedding API of Wasmtime will not look like what it looks like today for WebAssembly. For example the core wasm `Instance` will not change and instead a `Component` is likely to be added instead. Some more rationale for this is in #3941, but the basic idea is that I feel that it's not going to be viable to develop support for the component model on a non-`main` branch of Wasmtime. Additionaly I don't think it's viable, for the same reasons as `wasm-tools`, to support the old module linking proposal and the new component model at the same time. This commit takes a moment to not only delete the existing module linking implementation but some abstractions are also simplified. For example module serialization is a bit simpler that there's only one module. Additionally instantiation is much simpler since the only initializer we have to deal with are imports and nothing else. Closes #3941 * Fix doc link * Update comments |
||
|
|
8cfb552090 |
Increase instance size limit when fuzzing (#3943)
In today's installment of the Wondrous Adventures of What Are the Actual Limits on the Pooling Allocator Required to Run the Spec Tests a fuzz bug was found where the instance size wasn't big enough to run `names.wast`. Today's episode is similar to prior episodes where a limit is bumped until the test passes. |
||
|
|
fdf063df98 |
Increase minimum instance threshold for spectests fuzzer (#3935)
Looks like `const.wast` needs a lot of instances |
||
|
|
c22033bf93 |
Delete historical interruptable support in Wasmtime (#3925)
* Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`. |
||
|
|
44351cc4af |
fuzz: Increase more limits when spectest fuzzing (#3926)
When using the pooling instance allocator for running spec tests we have to make sure that the configured limits of the allocator aren't so low as to cause the spec tests to fail due to resource exhaustion issues or similar. This commit adds in minimum thresholds for instance size as well as instance count. While here this goes ahead and refactors everything here to look similar. |
||
|
|
f21aa98ccb |
Fuzz-code-coverage motivated improvements (#3905)
* fuzz: Fuzz padding between compiled functions This commit hooks up the custom `wasmtime_linkopt_padding_between_functions` configuration option to the cranelift compiler into the fuzz configuration, enabling us to ensure that randomly inserting a moderate amount of padding between functions shouldn't tamper with any results. * fuzz: Fuzz the `Config::generate_address_map` option This commit adds fuzz configuration where `generate_address_map` is either enabled or disabled, unlike how it's always enabled for fuzzing today. * Remove unnecessary handling of relocations This commit removes a number of bits and pieces all related to handling relocations in JIT code generated by Wasmtime. None of this is necessary nowadays that the "old backend" has been removed (quite some time ago) and relocations are no longer expected to be in the JIT code at all. Additionally with the minimum x86_64 features required to run wasm code it should be expected that no libcalls are required either for Wasmtime-based JIT code. |
||
|
|
29298b1f88 |
fuzz: Raise memory pages limit for spectest fuzzing (#3903)
A recent fuzz failure showed that at least `call.wast` requires a memory larger than 10 pages, so increase the minimum number of pages that can be used for executing spec tests. |
||
|
|
dbe797447d |
Fix table element limits in spectest fuzzer (#3888)
Ensure that there's enough table elements allowed to execute the spec tests since some tests have a minimum required. |
||
|
|
8861d0cc42 |
fuzz: Update pooling allocator limits on tables (#3880)
Another instance similar to #3879 where when doing differential tests the pooling allocator configuration needs to be updated to allow for a possible table. |
||
|
|
38d0d426f2 |
fuzz: Bump table limit with spectest fuzzing (#3878)
Spec tests need multiple tables so increase the limits on the pooling allocator when enabled for spec tests to ensure that all the spec tests can run. |
||
|
|
15940d071f |
Force enable multi-value for spec tests in fuzzing (#3869)
Spec tests require multi-value to be enabled and wasm-smith recently made this a fuzz-input option, so override the fuzz input as we do for other features and force-enable multi-value. |
||
|
|
98ef18a22a |
Fuzzing against verified fork of spec interpreter (#3843)
* Revert "Remove spec interpreter fuzz target temporarily (#3399)"
This reverts commit
|
||
|
|
15bb0c6903 |
Remove the ModuleLimits pooling configuration structure (#3837)
* Remove the `ModuleLimits` pooling configuration structure
This commit is an attempt to improve the usability of the pooling
allocator by removing the need to configure a `ModuleLimits` structure.
Internally this structure has limits on all forms of wasm constructs but
this largely bottoms out in the size of an allocation for an instance in
the instance pooling allocator. Maintaining this list of limits can be
cumbersome as modules may get tweaked over time and there's otherwise no
real reason to limit the number of globals in a module since the main
goal is to limit the memory consumption of a `VMContext` which can be
done with a memory allocation limit rather than fine-tuned control over
each maximum and minimum.
The new approach taken in this commit is to remove `ModuleLimits`. Some
fields, such as `tables`, `table_elements` , `memories`, and
`memory_pages` are moved to `InstanceLimits` since they're still
enforced at runtime. A new field `size` is added to `InstanceLimits`
which indicates, in bytes, the maximum size of the `VMContext`
allocation. If the size of a `VMContext` for a module exceeds this value
then instantiation will fail.
This involved adding a few more checks to `{Table, Memory}::new_static`
to ensure that the minimum size is able to fit in the allocation, since
previously modules were validated at compile time of the module that
everything fit and that validation no longer happens (it happens at
runtime).
A consequence of this commit is that Wasmtime will have no built-in way
to reject modules at compile time if they'll fail to be instantiated
within a particular pooling allocator configuration. Instead a module
must attempt instantiation see if a failure happens.
* Fix benchmark compiles
* Fix some doc links
* Fix a panic by ensuring modules have limited tables/memories
* Review comments
* Add back validation at `Module` time instantiation is possible
This allows for getting an early signal at compile time that a module
will never be instantiable in an engine with matching settings.
* Provide a better error message when sizes are exceeded
Improve the error message when an instance size exceeds the maximum by
providing a breakdown of where the bytes are all going and why the large
size is being requested.
* Try to fix test in qemu
* Flag new test as 64-bit only
Sizes are all specific to 64-bit right now
|
||
|
|
4e26c13bbe |
Add basic epoch-interruption config to fuzzing options. (#3844)
Without async fuzzing, we won't be able to test the most interesting aspects of epoch interruption, namely the interrupt/update-deadline/resume flow. However, the "trap on epoch change" behavior works even for synchronous stores, so we can fuzz with this the same way we fuzz with the interrupt flag. |
||
|
|
01e567ca05 |
Downgrade a cpu feature log message (#3842)
It looks like `error!` is printed by default as it's showing up in oss-fuzz logs, so downgrade this to `warn!` to avoid printing while fuzzing. |
||
|
|
bbd4a4a500 |
Enable copy-on-write heap initialization by default (#3825)
* Enable copy-on-write heap initialization by default This commit enables the `Config::memfd` feature by default now that it's been fuzzed for a few weeks on oss-fuzz, and will continue to be fuzzed leading up to the next release of Wasmtime in early March. The documentation of the `Config` option has been updated as well as adding a CLI flag to disable the feature. * Remove ubiquitous "memfd" terminology Switch instead to forms of "memory image" or "cow" or some combination thereof. * Update new option names |
||
|
|
593f8d96aa |
Update wasm-{smith,encoder} (#3835)
Ended up being a routine update but seemed good to go ahead and hook up updates. While I was at it I went ahead and hooked up multi-value swarm fuzzing as well now that wasm-smith implements it. |
||
|
|
709f7e0c8a |
Enable SSE 4.2 unconditionally (#3833)
* Enable SSE 4.2 unconditionally Fuzzing over the weekend found that `i64x2` comparison operators require `pcmpgtq` which is an SSE 4.2 instruction. Along the lines of #3816 this commit unconditionally enables and requires SSE 4.2 for compilation and fuzzing. It will no longer be possible to create a compiler for x86_64 with simd enabled if SSE 4.2 is disabled. * Update comment |
||
|
|
43d31c5bf7 |
memfd: make "dense image" heuristic limit configurable. (#3831)
In #3820 we see an issue with the new heuristics that control use of memfd: it's entirely possible for a reasonable Wasm module produced by a snapshotting system to have a relatively sparse heap (less than 50% filled). A system that avoids memfd because of this would have an undesirable performance reduction on such modules. Ultimately we should try to implement a hybrid scheme where we support outlier/leftover initializers, but for now this PR makes the "always allow dense" limit configurable. This way, embedders that want to ensure that memfd is used can do so, if they have other knowledge about the maximum heap size allowed in their system. (Partially addresses #3820 but let's leave it open to track the hybrid idea) |
||
|
|
084452acab |
Fix max memory pages for spectests fuzz target. (#3829)
This commit fixes the spectests fuzz target to set a lower bound on the arbitrary pooling allocator configurations of 10 memory pages so that the limit doesn't interfere with what's required in the spec tests. |
||
|
|
37b0fd482d |
Improve platform compatibility of fuzz test cases (#3824)
In #3800 I added support to consume fuzz input as selection of whether or not target features should be enabled. This was done in a platform-specific manner, however, which means that I can no longer reliably take the fuzz reproducer cases from oss-fuzz and reproduce them locally on an aarch64 machine. This commit fixes this problem by unconditionally pulling bytes from the input for fuzz features, irrespective of the host platform. Features are then discarded if they're not applicable. |
||
|
|
498c592b19 |
Unconditionally enable sse3, ssse3, and sse4.1 when fuzzing (#3814)
* Unconditionally enable sse3, ssse3, and sse4.1 when fuzzing This commit unconditionally enables some x86_64 instructions when fuzzing because the cranelift backend is known to not work if these features are disabled. From discussion on the wasm simd proposal the assumed general baseline for running simd code is SSE4.1 anyway. At this time I haven't added any sort of checks in Wasmtime itself. Wasmtime by default uses the native architecture and when explicitly enabling features this still needs to be explicitly specified. Closes #3809 * Update crates/fuzzing/src/generators.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> Co-authored-by: Andrew Brown <andrew.brown@intel.com> |
||
|
|
6ffcd4ead9 |
Improve stability for fuzz targets. (#3804)
This commit improves the stability of the fuzz targets by ensuring the generated configs and modules are congruent, especially when the pooling allocator is being used. For the `differential` target, this means both configurations must use the same allocation strategy for now as one side generates the module that might not be compatible with another arbitrary config now that we fuzz the pooling allocator. These changes also ensure that constraints put on the config are more consistently applied, especially when using a fuel-based timeout. |
||
|
|
0b4263333b |
Fuzz cranelift cpu flag settings with Wasmtime (#3800)
* Fuzz cranelift cpu flag settings with Wasmtime This commit updates the `Config` fuzz-generator to consume some of the input as configuration settings for codegen flags we pass to cranelift. This should allow for ideally some more coverage where settings are disabled or enabled, ideally finding possible bugs in feature-specific implementations or generic implementations that are rarely used if the feature-specific ones almost always take precedent. The technique used in this commit is to weight selection of codegen settings less frequently than using the native settings. Afterwards each listed feature is individually enabled or disabled depending on the input fuzz data, and if a feature is enabled but the host doesn't actually support it then the fuzz input is rejected with a log message. The goal here is to still have many fuzz inputs accepted but also ensure determinism across hosts. If there's a bug specifically related to enabling a flag then running it on a host without the flag should indicate that the flag isn't supported rather than silently leaving it disabled and reporting the fuzz case a success. * Use built-in `Unstructured::ratio` method * Tweak macro * Bump arbitrary dep version |
||
|
|
da539255a5 |
Use a much lower memory page limit for pooling allocator fuzzing. (#3795)
This commit makes it such that the pooling allocator will be configured with a much lower upper bound for memory pages, which will greatly reduce the likelihood that the fuzzer memory limits will be hit from having too many memories from too many instances committed. |
||
|
|
41eb225765 |
Add the instance allocation strategy to generated fuzzing configs. (#3780)
* Add the instance allocation strategy to generated fuzzing configs. This commit adds support for generating configs with arbitrary instance allocation strategies. With this, the pooling allocator will be fuzzed as part of the existing fuzz targets. * Refine maximum constants for arbitrary module limits. * Add an `instantiate-many` fuzz target. This commit adds a new `instantiate-many` fuzz target that will attempt to instantiate and terminate modules in an arbitrary order. It generates up to 5 modules, from which a random sequence of instances will be created. The primary benefactor of this fuzz target is the pooling instance allocator. * Allow no aliasing in generated modules when using the pooling allocator. This commit prevents aliases in the generated modules as they might count against the configured import limits of the pooling allocator. As the existing module linking proposal implementation will eventually be deprecated in favor of the component model proposal, it isn't very important that we test aliases in generated modules with the pooling allocator. * Improve distribution of memory config in fuzzing. The previous commit attempted to provide a 32-bit upper bound to 64-bit arbitrary values, which skewed the distribution heavily in favor of the upper bound. This commit removes the constraint and instead uses arbitrary 32-bit values that are converted to 64-bit values in the `Arbitrary` implementation. |
||
|
|
027dea549a |
Fuzz using precompiled modules on CI (#3788)
In working on #3787 I see now that our coverage of loading precompiled files specifically is somewhat lacking, so this adds a config option to the fuzzers where, if enabled, will round-trip all compiled modules through the filesystem to test out the mmapped-file case. |
||
|
|
5cd97c054d |
Update memfd support with a runtime toggle (#3778)
This commit updates the `memfd` support in Wasmtime to have a runtime toggle as to whether it's used or not. The compile-time feature gating `memfd` support is now also re-enabled by default, but the new runtime switch is still disabled-by-default. Additionally this commit updates our fuzz oracle to turn on/off the memfd flag to re-enable fuzzing with memfd on oss-fuzz. |
||
|
|
2459776424 |
fuzz: Fix infinite loops in table_ops fuzzers
I forgot in the recent refactoring to add back in fuel support to the `table_ops` fuzzer. This commit re-adds the previously existent logic to always use fuel to cancel execution of the table_ops fuzzer. |
||
|
|
ab1d845ac1 |
Refactor fuzzing configuration and sometimes disable debug verifier. (#3664)
* fuzz: Refactor Wasmtime's fuzz targets
A recent fuzz bug found is related to timing out when compiling a
module. This timeout, however, is predominately because Cranelift's
debug verifier is enabled and taking up over half the compilation time.
I wanted to fix this by disabling the verifier when input modules might
have a lot of functions, but this was pretty difficult to implement.
Over time we've grown a number of various fuzzers. Most are
`wasm-smith`-based at this point but there's various entry points for
configuring the wasm-smith module, the wasmtime configuration, etc. I've
historically gotten quite lost in trying to change defaults and feeling
like I have to touch a lot of different places. This is the motivation
for this commit, simplifying fuzzer default configuration.
This commit removes the ability to create a default `Config` for
fuzzing, instead only supporting generating a configuration via
`Arbitrary`. This then involved refactoring all targets and fuzzers to
ensure that configuration is generated through `Arbitrary`. This should
actually expand the coverage of some existing fuzz targets since
`Arbitrary for Config` will tweak options that don't affect runtime,
such as memory configuration or jump veneers.
All existing fuzz targets are refactored to use this new method of
configuration. Some fuzz targets were also shuffled around or
reimplemented:
* `compile` - this now directly calls `Module::new` to skip all the
fuzzing infrastructure. This is mostly done because this fuzz target
isn't too interesting and is largely just seeing what happens when
things are thrown at the wall for Wasmtime.
* `instantiate-maybe-invalid` - this fuzz target now skips instantiation
and instead simply goes into `Module::new` like the `compile` target.
The rationale behind this is that most modules won't instantiate
anyway and this fuzz target is primarily fuzzing the compiler. This
skips having to generate arbitrary configuration since
wasm-smith-generated-modules (or valid ones at least) aren't used
here.
* `instantiate` - this fuzz target was removed. In general this fuzz
target isn't too interesting in isolation. Almost everything it deals
with likely won't pass compilation and is covered by the `compile`
fuzz target, and otherwise interesting modules being instantiated can
all theoretically be created by `wasm-smith` anyway.
* `instantiate-wasm-smith` and `instantiate-swarm` - these were both merged
into a new `instantiate` target (replacing the old one from above).
There wasn't really much need to keep these separate since they really
only differed at this point in methods of timeout. Otherwise we much
more heavily use `SwarmConfig` than wasm-smith's built-in options.
The intention is that we should still have basically the same coverage
of fuzzing as before, if not better because configuration is now
possible on some targets. Additionally there is one centralized point of
configuration for fuzzing for wasmtime, `Arbitrary for ModuleConfig`.
This internally creates an arbitrary `SwarmConfig` from `wasm-smith` and
then further tweaks it for Wasmtime's needs, such as enabling various
wasm proposals by default. In the future enabling a wasm proposal on
fuzzing should largely just be modifying this one trait implementation.
* fuzz: Sometimes disable the cranelift debug verifier
This commit disables the cranelift debug verifier if the input wasm
module might be "large" for the definition of "more than 10 functions".
While fuzzing we disable threads (set them to 1) and enable the
cranelift debug verifier. Coupled with a 20-30x slowdown this means that
a module with the maximum number of functions, 100, gives:
60x / 100 functions / 30x slowdown = 20ms
With only 20 milliseconds per function this is even further halved by
the `differential` fuzz target compiling a module twice, which means
that, when compiling with a normal release mode Wasmtime, if any
function takes more than 10ms to compile then it's a candidate for
timing out while fuzzing. Given that the cranelift debug verifier can
more than double compilation time in fuzzing mode this actually means
that the real time budget for function compilation is more like 4ms.
The `wasm-smith` crate can pretty easily generate a large function that
takes 4ms to compile, and then when that function is multiplied 100x in
the `differential` fuzz target we trivially time out the fuzz target.
The hope of this commit is to buy back half our budget by disabling the
debug verifier for modules that may have many functions. Further
refinements can be implemented in the future such as limiting functions
for just the differential target as well.
* Fix the single-function-module fuzz configuration
* Tweak how features work in differential fuzzing
* Disable everything for baseline differential fuzzing
* Enable selectively for each engine afterwards
* Also forcibly enable reference types and bulk memory for spec tests
* Log wasms when compiling
* Add reference types support to v8 fuzzer
* Fix timeouts via fuel
The default store has "infinite" fuel so that needs to be consumed
before fuel is added back in.
* Remove fuzzing-specific tests
These no longer compile and also haven't been added to in a long time.
Most of the time a reduced form of original the fuzz test case is added
when a fuzz bug is fixed.
|
||
|
|
e08bcd6aad |
Revert "Temporarily disable SIMD fuzzing on CI" (#3555)
This reverts commit 95e8723d0767556f0ddbc9151bce269464852bb1. |
||
|
|
ff1af20479 |
Add a fuzz mode to stress unaligned wasm addresses (#3516)
Alignment on all memory instructions in wasm is currently best-effort and not actually required, meaning that whatever wasm actually uses as an address should work regardless of whether the address is aligned or not. This is theoretically tested in the fuzzers via wasm-smith-generated code, but wasm-smith doesn't today have too too high of a chance of generating an actual successful load/store. This commit adds a new configuration option to the `Config` generator for fuzzing which forces usage of a custom linear memory implementation which is backed by Rust's `Vec<u8>` and forces the base address of linear memory to be off-by-one relative to the base address of the `Vec<u8>` itself. This should theoretically force host addresses to almost always be unaligned, even if wasm addresses are otherwise aligned. The main interesting fuzz coverage here is likely to be in the existing `differential` target which compares running the same module in wasmtime with two different `Config` values to ensure the same results are produced. This probably won't increase coverage all that much in the near future due to wasm-smith rarely generating successful loads/stores, but in the meantime by hooking this up into `Config` it also means that we'll be running in comparison against v8 and also ensuring that all spec tests succeed if misalignment is forced at the hardware level. As a side effect this commit also cleans up the fuzzers slightly: * The `DifferentialConfig` struct is removed and folded into `Config` * The `init_hang_limit` processing is removed since we don't use `-ttf`-generated modules from binaryen any more. * Traps are now asserted to have the same trap code, otherwise differential fuzzing fails. * Some more debug logging was added to the differential fuzzer |
||
|
|
1ee2af0098 |
Remove the lightbeam backend (#3390)
This commit removes the Lightbeam backend from Wasmtime as per [RFC 14]. This backend hasn't received maintenance in quite some time, and as [RFC 14] indicates this doesn't meet the threshold for keeping the code in-tree, so this commit removes it. A fast "baseline" compiler may still be added in the future. The addition of such a backend should be in line with [RFC 14], though, with the principles we now have for stable releases of Wasmtime. I'll close out Lightbeam-related issues once this is merged. [RFC 14]: https://github.com/bytecodealliance/rfcs/pull/14 |
||
|
|
fc6328ae06 |
Temporarily disable SIMD fuzzing on CI (#3376)
We've got a large crop of fuzz-bugs from fuzzing with enabled-with-SIMD on oss-fuzz but at this point the fuzz stats from oss-fuzz say that the fuzzers like v8 are spending less than 50% of its time actually fuzzing and presumably mostly hitting crashes and such. While we fix the other issues this disables simd for fuzzing with v8 so we can try to see if we can weed out other issues. |
||
|
|
1532516a36 |
Use relative call instructions between wasm functions (#3275)
* Use relative `call` instructions between wasm functions
This commit is a relatively major change to the way that Wasmtime
generates code for Wasm modules and how functions call each other.
Prior to this commit all function calls between functions, even if they
were defined in the same module, were done indirectly through a
register. To implement this the backend would emit an absolute 8-byte
relocation near all function calls, load that address into a register,
and then call it. While this technique is simple to implement and easy
to get right, it has two primary downsides associated with it:
* Function calls are always indirect which means they are more difficult
to predict, resulting in worse performance.
* Generating a relocation-per-function call requires expensive
relocation resolution at module-load time, which can be a large
contributing factor to how long it takes to load a precompiled module.
To fix these issues, while also somewhat compromising on the previously
simple implementation technique, this commit switches wasm calls within
a module to using the `colocated` flag enabled in Cranelift-speak, which
basically means that a relative call instruction is used with a
relocation that's resolved relative to the pc of the call instruction
itself.
When switching the `colocated` flag to `true` this commit is also then
able to move much of the relocation resolution from `wasmtime_jit::link`
into `wasmtime_cranelift::obj` during object-construction time. This
frontloads all relocation work which means that there's actually no
relocations related to function calls in the final image, solving both
of our points above.
The main gotcha in implementing this technique is that there are
hardware limitations to relative function calls which mean we can't
simply blindly use them. AArch64, for example, can only go +/- 64 MB
from the `bl` instruction to the target, which means that if the
function we're calling is a greater distance away then we would fail to
resolve that relocation. On x86_64 the limits are +/- 2GB which are much
larger, but theoretically still feasible to hit. Consequently the main
increase in implementation complexity is fixing this issue.
This issue is actually already present in Cranelift itself, and is
internally one of the invariants handled by the `MachBuffer` type. When
generating a function relative jumps between basic blocks have similar
restrictions. This commit adds new methods for the `MachBackend` trait
and updates the implementation of `MachBuffer` to account for all these
new branches. Specifically the changes to `MachBuffer` are:
* For AAarch64 the `LabelUse::Branch26` value now supports veneers, and
AArch64 calls use this to resolve relocations.
* The `emit_island` function has been rewritten internally to handle
some cases which previously didn't come up before, such as:
* When emitting an island the deadline is now recalculated, where
previously it was always set to infinitely in the future. This was ok
prior since only a `Branch19` supported veneers and once it was
promoted no veneers were supported, so without multiple layers of
promotion the lack of a new deadline was ok.
* When emitting an island all pending fixups had veneers forced if
their branch target wasn't known yet. This was generally ok for
19-bit fixups since the only kind getting a veneer was a 19-bit
fixup, but with mixed kinds it's a bit odd to force veneers for a
26-bit fixup just because a nearby 19-bit fixup needed a veneer.
Instead fixups are now re-enqueued unless they're known to be
out-of-bounds. This may run the risk of generating more islands for
19-bit branches but it should also reduce the number of islands for
between-function calls.
* Otherwise the internal logic was tweaked to ideally be a bit more
simple, but that's a pretty subjective criteria in compilers...
I've added some simple testing of this for now. A synthetic compiler
option was create to simply add padded 0s between functions and test
cases implement various forms of calls that at least need veneers. A
test is also included for x86_64, but it is unfortunately pretty slow
because it requires generating 2GB of output. I'm hoping for now it's
not too bad, but we can disable the test if it's prohibitive and
otherwise just comment the necessary portions to be sure to run the
ignored test if these parts of the code have changed.
The final end-result of this commit is that for a large module I'm
working with the number of relocations dropped to zero, meaning that
nothing actually needs to be done to the text section when it's loaded
into memory (yay!). I haven't run final benchmarks yet but this is the
last remaining source of significant slowdown when loading modules,
after I land a number of other PRs both active and ones that I only have
locally for now.
* Fix arm32
* Review comments
|
||
|
|
e68aa99588 |
Implement the memory64 proposal in Wasmtime (#3153)
* Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can't grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that's what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we're going to/from wasm. Overall I've tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren't aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn't want to enable this by default naturally until we've fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it's implemented with "dynamic" heaps which have a few consequences: * All memory accesses are bounds-checked. I'm not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven't stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it's really not all that comprehensive. I've tried adding more tests for basic things as I've had to implement guards for them, but I wouldn't really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages |
||
|
|
bb85366a3b |
Enable simd fuzzing on oss-fuzz (#3152)
* Enable simd fuzzing on oss-fuzz This commit generally enables the simd feature while fuzzing, which should affect almost all fuzzers. For fuzzers that just throw random data at the wall and see what sticks, this means that they'll now be able to throw simd-shaped data at the wall and have it stick. For wasm-smith-based fuzzers this commit also updates wasm-smith to 0.6.0 which allows further configuring the `SwarmConfig` after generation, notably allowing `instantiate-swarm` to generate modules using simd using `wasm-smith`. This should much more reliably feed simd-related things into the fuzzers. Finally, this commit updates wasmtime to avoid usage of the general `wasm_smith::Module` generator to instead use a Wasmtime-specific custom default configuration which enables various features we have implemented. * Allow dummy table creation to fail Tables might creation for imports may exceed the memory limit on the store, which we'll want to gracefully recover from and not fail the fuzzers. |