wasmtime

Author	SHA1	Message	Date
Alex Crichton	5add267b87	Fix a soundness issue with lowering variants (#4723 ) * Fix a compile error on nightly Rust It looks like Rust nightly has gotten a bit more strict about attributes-on-expressions and previously accepted code is no longer accepted. This commit updates the generated code for a macro to a form which is accepted by rustc. * Fix a soundness issue with lowering variants This commit fixes a soundness issue lowering variants in the component model where host memory could be leaked to the guest module by accident. In reviewing code recently for `Val::lower` I noticed that the variant lowering was extending the payload with `ValRaw::u32(0)` to appropriately fit the size of the variant. In reading this it appeared incorrect to me due to the fact that it should be `ValRaw::u64(0)` since up to 64-bits can be read. Additionally this implementation was also incorrect because the lowered representation of the payload itself was not possibly zero-extended to 64-bits to accommodate other variants. It turned out these issues were benign because with the dynamic surface area to the component model the arguments were all initialized to 0 anyway. The static version of the API, however, does not initialize arguments to 0 and I wanted to initially align these two implementations so I updated the variant implementation of lowering for dynamic values and removed the zero-ing of arguments. To test this change I updated the `debug` mode of adapter module generation to assert that the upper bits of values in wasm are always zero when the value is casted down (during `stack_get` which only happens with variants). I then threaded through the `debug` boolean configuration parameter into the dynamic and static fuzzers. To my surprise this new assertion tripped even after the fix was applied. It turns out, though, that there was other leakage of bits through other means that I was previously unaware of. At the primitive level lowerings of types like `u32` will have a `Lower` representation of `ValRaw` and the lowering is simply `dst.write(ValRaw::i32(self))`, or the equivalent thereof. The problem, that the fuzzers detected, with this pattern is that the `ValRaw` type is 16-bytes, and `ValRaw::i32(X)` only initializes the first 4. This meant that all the lowerings for all primitives were writing up to 12 bytes of garbage from the host for the wasm module to read. It turned out that this write of a `ValRaw` was sometimes 16 bytes and sometimes the appropriate size depending on the number of optimizations in play. With enough inlining for example `dst.write(ValRaw::i32(self))` would only write 4 bytes, as expected. In debug mode though without inlining 16 bytes would be written, including the garbage from the upper bits. To solve this issue I ended up taking a somewhat different approach. I primarily updated the `ValRaw` constructors to simply always extend the values internally to 64-bits, meaning that the low 8 bytes of a `ValRaw` is always initialized. This prevents any undefined data from leaking from the host into a wasm module, and means that values are also zero-extended even if they're only used in 32-bit contexts outside of a variant. This felt like the best fix for now, though, in terms of not really having a performance impact while additionally not requiring a rewrite of all lowerings. This solution ended up also neatly removing the "zero out the entire payload" logic that was previously require. Now after a payload is lowered only the tail end of the payload, up to the size of the variant, is zeroed out. This means that each lowered argument is written to at most once which should hopefully be a small performance boost for calling into functions as well.	2022-08-16 22:33:24 +00:00
Alex Crichton	650979ae40	Implement strings in adapter modules (#4623 ) * Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings	2022-08-08 16:01:57 +00:00
Nick Fitzgerald	70ce288dc7	Save exit Wasm FP and PC in component-to-host trampolines (#4601 ) * Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts * Save exit Wasm FP and PC in component-to-host trampolines Fixes #4535 * Add comment about why we deref the trampoline's FP * Update some tests to use new `vmruntime_limits_*` methods	2022-08-04 10:27:30 -05:00
Nick Fitzgerald	46782b18c2	`wasmtime`: Implement fast Wasm stack walking (#4431 ) * Always preserve frame pointers in Wasmtime This allows us to efficiently and simply capture Wasm stacks without maintaining and synchronizing any safety-critical side tables between the compiler and the runtime. * wasmtime: Implement fast Wasm stack walking Why do we want Wasm stack walking to be fast? Because we capture stacks whenever there is a trap and traps actually happen fairly frequently with short-lived programs and WASI's `exit`. Previously, we would rely on generating the system unwind info (e.g. `.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk the full stack and filter out any non-Wasm stack frames. This can, unfortunately, be slow for two primary reasons: 1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than `O(wasm-frames)` work. 2. System unwind info and the system unwinder need to be much more general than a purpose-built stack walker for Wasm needs to be. It has to handle any kind of stack frame that any compiler might emit where as our Wasm frames are emitted by Cranelift and always have frame pointers. This translates into implementation complexity and general overhead. There can also be unnecessary-for-our-use-cases global synchronization and locks involved, further slowing down stack walking in the presence of multiple threads trying to capture stacks in parallel. This commit introduces a purpose-built stack walker for traversing just our Wasm frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit frame pointer)` pairs. This linked list is maintained via Wasm-to-host and host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use frame pointers (which Cranelift preserves) to find the next older Wasm frame on the stack, and we keep doing this until we reach the entry stack pointer, meaning that the next older frame will be a host frame. The trampolines need to avoid a couple stumbling blocks. First, they need to be compiled ahead of time, since we may not have access to a compiler at runtime (e.g. if the `cranelift` feature is disabled) but still want to be able to call functions that have already been compiled and get stack traces for those functions. Usually this means we would compile the appropriate trampolines inside `Module::new` and the compiled module object would hold the trampolines. However, we also need to support calling host functions that are wrapped into `wasmtime::Func`s and there doesn't exist any ahead-of-time compiled module object to hold the appropriate trampolines: ```rust // Define a host function. let func_type = wasmtime::FuncType::new( vec![wasmtime::ValType::I32], vec![wasmtime::ValType::I32], ); let func = Func::new(&mut store, func_type, \|_, params, results\| { // ... Ok(()) }); // Call that host function. let mut results = vec![wasmtime::Val::I32(0)]; func.call(&[wasmtime::Val::I32(0)], &mut results)?; ``` Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline in assembly that work for all Wasm and host function signatures. These trampolines are careful to only use volatile registers, avoid touching any register that is an argument in the calling convention ABI, and tail call to the target callee function. This allows forwarding any set of arguments and any returns to and from the callee, while also allowing us to maintain our linked list of Wasm stack and frame pointers before transferring control to the callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing the host-Wasm boundary, so they do not impose overhead on regular calls. (And if using one trampoline for all host-Wasm boundary crossing ever breaks branch prediction enough in the CPU to become any kind of bottleneck, we can do fun things like have multiple copies of the same trampoline and choose a random copy for each function, sharding the functions across branch predictor entries.) Finally, this commit also ends the use of a synthetic `Module` and allocating a stubbed out `VMContext` for host functions. Instead, we define a `VMHostFuncContext` with its own magic value, similar to `VMComponentContext`, specifically for host functions. <h2>Benchmarks</h2> <h3>Traps and Stack Traces</h3> Large improvements to taking stack traces on traps, ranging from shaving off 64% to 99.95% of the time it used to take. <details> ``` multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us] thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s] change: time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05) thrpt: [+560.90% +573.56% +585.84%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us] thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s] change: time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05) thrpt: [+1023.1% +1048.6% +1070.3%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us] thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s] change: time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05) thrpt: [+1503.1% +1542.0% +1578.0%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us] thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s] change: time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05) thrpt: [+1339.2% +1353.6% +1369.1%] Performance has improved. multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us] thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s] change: time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05) thrpt: [+1624.7% +1709.2% +1806.1%] Performance has improved. multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us] thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s] change: time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05) thrpt: [+2821.5% +3048.1% +3281.1%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild many-modules-registered-traps/1 time: [2.6278 us 2.6361 us 2.6447 us] thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s] change: time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05) thrpt: [+562.65% +571.51% +580.76%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe many-modules-registered-traps/8 time: [2.6294 us 2.6460 us 2.6623 us] thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s] change: time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05) thrpt: [+567.63% +588.95% +608.95%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe many-modules-registered-traps/64 time: [2.6218 us 2.6329 us 2.6452 us] thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s] change: time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05) thrpt: [+1431.4% +1450.6% +1469.5%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild many-modules-registered-traps/512 time: [2.6569 us 2.6737 us 2.6923 us] thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s] change: time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05) thrpt: [+13417% +13566% +13731%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild many-modules-registered-traps/4096 time: [2.7258 us 2.7390 us 2.7535 us] thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s] change: time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05) thrpt: [+221417% +223380% +224881%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe many-stack-frames-traps/1 time: [1.4658 us 1.4719 us 1.4784 us] thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s] change: time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05) thrpt: [+860.23% +894.72% +938.21%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe many-stack-frames-traps/8 time: [2.4772 us 2.4870 us 2.4973 us] thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s] change: time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05) thrpt: [+575.65% +583.51% +592.03%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe many-stack-frames-traps/64 time: [10.109 us 10.171 us 10.236 us] thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s] change: time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05) thrpt: [+341.22% +350.38% +357.55%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe many-stack-frames-traps/512 time: [126.16 us 126.54 us 126.96 us] thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s] change: time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05) thrpt: [+181.32% +185.17% +188.71%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe ``` </details> <h3>Calls</h3> There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call performance due the new trampolines. It seems to be on the order of about 2-10 nanoseconds per call, depending on the benchmark. I believe this regression is ultimately acceptable because 1. this overhead will be vastly dominated by whatever work a non-nop callee actually does, 2. we will need these trampolines, or something like them, when implementing the Wasm exceptions proposal to do things like translate Wasm's exceptions into Rust's `Result`s, 3. and because the performance improvements to trapping and capturing stack traces are of such a larger magnitude than this call regressions. <details> ``` sync/no-hook/host-to-wasm - typed - nop time: [28.683 ns 28.757 ns 28.844 ns] change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/host-to-wasm - untyped - nop time: [42.515 ns 42.652 ns 42.841 ns] change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop time: [33.936 ns 34.052 ns 34.179 ns] change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/host-to-wasm - typed - nop-params-and-results time: [34.290 ns 34.388 ns 34.502 ns] change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe sync/no-hook/host-to-wasm - untyped - nop-params-and-results time: [62.546 ns 62.721 ns 62.919 ns] change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop-params-and-results time: [42.609 ns 42.710 ns 42.831 ns] change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop time: [29.546 ns 29.675 ns 29.818 ns] change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop time: [45.448 ns 45.699 ns 45.961 ns] change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop time: [34.334 ns 34.437 ns 34.558 ns] change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop-params-and-results time: [36.594 ns 36.763 ns 36.974 ns] change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [63.541 ns 63.831 ns 64.194 ns] change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results time: [43.968 ns 44.169 ns 44.437 ns] change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) high mild 12 (12.00%) high severe async/no-hook/host-to-wasm - typed - nop time: [4.9612 us 4.9743 us 4.9889 us] change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async/no-hook/host-to-wasm - untyped - nop time: [5.0030 us 5.0211 us 5.0439 us] change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe async/no-hook/host-to-wasm - typed - nop-params-and-results time: [4.9273 us 4.9468 us 4.9700 us] change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async/no-hook/host-to-wasm - untyped - nop-params-and-results time: [5.1151 us 5.1338 us 5.1555 us] change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/hook-sync/host-to-wasm - typed - nop time: [4.9330 us 4.9394 us 4.9467 us] change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe async/hook-sync/host-to-wasm - untyped - nop time: [5.0073 us 5.0183 us 5.0310 us] change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe async/hook-sync/host-to-wasm - typed - nop-params-and-results time: [4.9610 us 4.9839 us 5.0097 us] change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [5.0995 us 5.1272 us 5.1617 us] change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop time: [2.4242 us 2.4316 us 2.4396 us] change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop time: [2.5102 us 2.5155 us 2.5210 us] change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop-params-and-results time: [2.4203 us 2.4310 us 2.4440 us] change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results time: [2.5501 us 2.5593 us 2.5700 us] change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop time: [2.4135 us 2.4190 us 2.4254 us] change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop time: [2.5172 us 2.5248 us 2.5357 us] change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results time: [2.4214 us 2.4353 us 2.4532 us] change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) high mild 13 (13.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [2.5499 us 2.5607 us 2.5748 us] change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe sync/no-hook/wasm-to-host - nop - typed time: [6.6135 ns 6.6288 ns 6.6452 ns] change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.930 ns 15.993 ns 16.067 ns] change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe sync/no-hook/wasm-to-host - nop - untyped time: [20.596 ns 20.640 ns 20.690 ns] change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - untyped time: [42.659 ns 42.882 ns 43.159 ns] change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe sync/no-hook/wasm-to-host - nop - unchecked time: [10.671 ns 10.691 ns 10.713 ns] change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.136 ns 11.190 ns 11.263 ns] change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop - typed time: [6.7964 ns 6.8087 ns 6.8226 ns] change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.865 ns 15.921 ns 15.985 ns] change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe sync/hook-sync/wasm-to-host - nop - untyped time: [21.505 ns 21.587 ns 21.677 ns] change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [44.018 ns 44.128 ns 44.261 ns] change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe sync/hook-sync/wasm-to-host - nop - unchecked time: [11.264 ns 11.326 ns 11.387 ns] change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.816 ns 11.865 ns 11.920 ns] change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop - typed time: [6.6221 ns 6.6385 ns 6.6569 ns] change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.884 ns 15.929 ns 15.983 ns] change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/no-hook/wasm-to-host - nop - untyped time: [20.615 ns 20.702 ns 20.821 ns] change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.956 ns 42.207 ns 42.521 ns] change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async/no-hook/wasm-to-host - nop - unchecked time: [10.440 ns 10.474 ns 10.513 ns] change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.476 ns 11.512 ns 11.554 ns] change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe async/no-hook/wasm-to-host - nop - async-typed time: [26.427 ns 26.478 ns 26.532 ns] change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [28.557 ns 28.693 ns 28.880 ns] change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe async/hook-sync/wasm-to-host - nop - typed time: [6.7488 ns 6.7630 ns 6.7784 ns] change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.928 ns 16.031 ns 16.149 ns] change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 9 (9.00%) high mild 2 (2.00%) high severe async/hook-sync/wasm-to-host - nop - untyped time: [21.930 ns 22.114 ns 22.296 ns] change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.684 ns 42.858 ns 43.081 ns] change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) high mild 12 (12.00%) high severe async/hook-sync/wasm-to-host - nop - unchecked time: [11.026 ns 11.053 ns 11.086 ns] change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.840 ns 11.900 ns 11.982 ns] change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async/hook-sync/wasm-to-host - nop - async-typed time: [27.601 ns 27.709 ns 27.882 ns] change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [28.955 ns 29.174 ns 29.413 ns] change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async-pool/no-hook/wasm-to-host - nop - typed time: [6.5626 ns 6.5733 ns 6.5851 ns] change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.820 ns 15.886 ns 15.969 ns] change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) high mild 13 (13.00%) high severe async-pool/no-hook/wasm-to-host - nop - untyped time: [20.481 ns 20.521 ns 20.566 ns] change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.834 ns 41.998 ns 42.189 ns] change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe async-pool/no-hook/wasm-to-host - nop - unchecked time: [10.353 ns 10.380 ns 10.414 ns] change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.123 ns 11.168 ns 11.228 ns] change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe async-pool/no-hook/wasm-to-host - nop - async-typed time: [27.442 ns 27.528 ns 27.638 ns] change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [29.014 ns 29.148 ns 29.312 ns] change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/wasm-to-host - nop - typed time: [6.7916 ns 6.8116 ns 6.8325 ns] change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.917 ns 15.975 ns 16.051 ns] change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop - untyped time: [21.558 ns 21.612 ns 21.679 ns] change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.475 ns 42.614 ns 42.775 ns] change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/hook-sync/wasm-to-host - nop - unchecked time: [11.150 ns 11.195 ns 11.247 ns] change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.639 ns 11.695 ns 11.760 ns] change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe async-pool/hook-sync/wasm-to-host - nop - async-typed time: [27.480 ns 27.712 ns 27.984 ns] change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [29.218 ns 29.380 ns 29.600 ns] change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe ``` </details> * Add s390x support for frame pointer-based stack walking * wasmtime: Allow `Caller::get_export` to get all exports * fuzzing: Add a fuzz target to check that our stack traces are correct We generate Wasm modules that keep track of their own stack as they call and return between functions, and then we periodically check that if the host captures a backtrace, it matches what the Wasm module has recorded. * Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code * Add doc comment with stack walking implementation notes * Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods * Add extensive comments for stack walking function * Factor architecture-specific bits of stack walking out into modules * Initialize store-related fields in a vmctx to null when there is no store yet Rather than leaving them as uninitialized data. * Use `set_callee` instead of manually setting the vmctx field * Use a more informative compile error message for unsupported architectures * Document unsafety of `prepare_host_to_wasm_trampoline` * Use `bti c` instead of `hint #34` in inline aarch64 assembly * Remove outdated TODO comment * Remove setting of `last_wasm_exit_fp` in `set_jit_trap` This is no longer needed as the value is plumbed through to the backtrace code directly now. * Only set the stack limit once, in the face of re-entrancy into Wasm * Add comments for s390x-specific stack walking bits * Use the helper macro for all libcalls If we forget to use it, and then trigger a GC from the libcall, that means we could miss stack frames when walking the stack, fail to find live GC refs, and then get use after free bugs. Much less risky to always use the helper macro that takes care of all of that for us. * Use the `asm_sym!` macro in Wasm-to-libcall trampolines This macro handles the macOS-specific underscore prefix stuff for us. * wasmtime: add size and align to `externref` assertion error message * Extend the `stacks` fuzzer to have host frames in between Wasm frames This way we get one or more contiguous sequences of Wasm frames on the stack, instead of exactly one. * Add documentation for aarch64-specific backtrace helpers * Clarify that we only support little-endian aarch64 in trampoline comment * Use `.machine z13` in s390x assembly file Since apparently our CI machines have pretty old assemblers that don't have `.machine z14`. This should be fine though since these trampolines don't make use of anything that is introduced in z14. * Fix aarch64 build * Fix macOS build * Document the `asm_sym!` macro * Add windows support to the `wasmtime-asm-macros` crate * Add windows support to host<--->Wasm trampolines * Fix trap handler build on windows * Run `rustfmt` on s390x trampoline source file * Temporarily disable some assertions about a trap's backtrace in the component model tests Follow up to re-enable this and fix the associated issue: https://github.com/bytecodealliance/wasmtime/issues/4535 * Refactor libcall definitions with less macros This refactors the `libcall!` macro to use the `foreach_builtin_function!` macro to define all of the trampolines. Additionally the macro surrounding each libcall itself is no longer necessary and helps avoid too many macros. * Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new` * Move `backtrace` module to be submodule of `traphandlers` This avoids making some things `pub(crate)` in `traphandlers` that really shouldn't be. * Fix macOS aarch64 build * Use "i64" instead of "word" in aarch64-specific file * Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions Also clean up assertions surrounding our saved entry/exit registers. * Put "typed" vs "untyped" in the same position of call benchmark names Regardless if we are doing wasm-to-host or host-to-wasm * Fix stacks test case generator build for new `wasm-encoder` * Fix build for s390x * Expand libcalls in s390x asm * Disable more parts of component tests now that backtrace assertions are a bit tighter * Remove assertion that can maybe fail on s390x Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-07-28 15:46:14 -07:00
Alex Crichton	1321c234e5	Remove dependency on `more-asserts` (#4408 ) * Remove dependency on `more-asserts` In my recent adventures to do a bit of gardening on our dependencies I noticed that there's a new major version for the `more-asserts` crate. Instead of updating to this though I've opted to instead remove the dependency since I don't think we heavily lean on this crate and otherwise one-off prints are probably sufficient to avoid the need for pulling in a whole crate for this. * Remove exemption for `more-asserts`	2022-07-26 16:47:33 +00:00
Alex Crichton	97894bc65e	Add initial support for fused adapter trampolines (#4501 ) * Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts	2022-07-25 23:13:26 +00:00
Andrew Brown	2b52f47b83	Add shared memories (#4187 ) * Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change 'const [u8]' to 'mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator	2022-06-08 12:13:40 -05:00
Alex Crichton	2af358dd9c	Add a `VMComponentContext` type and create it on instantiation (#4215 ) * Add a `VMComponentContext` type and create it on instantiation This commit fills out the `wasmtime-runtime` crate's support for `VMComponentContext` and creates it as part of the instantiation process. This moves a few maps that were temporarily allocated in an `InstanceData` into the `VMComponentContext` and additionally reads the canonical options data from there instead. This type still won't be used in its "full glory" until the lowering of host functions is completely implemented, however, which will be coming in a future commit. * Remove `DerefMut` implementation * Rebase conflicts	2022-06-03 13:34:50 -05:00
Alex Crichton	3ed6fae7b3	Add trampoline compilation support for lowered imports (#4206 ) * Add trampoline compilation support for lowered imports This commit adds support to the component model implementation for compiling trampolines suitable for calling host imports. Currently this is purely just the compilation side of things, modifying the wasmtime-cranelift crate and additionally filling out a new `VMComponentOffsets` type (similar to `VMOffsets`). The actual creation of a `VMComponentContext` is still not performed and will be a subsequent PR. Internally though some tests are actually possible with this where we at least assert that compilation of a component and creation of everything in-memory doesn't panic or trip any assertions, so some tests are added here for that as well. * Fix some test errors	2022-06-03 10:01:42 -05:00
Alex Crichton	2a4851ad2b	Change some `VMContext` pointers to `()` pointers (#4190 ) * Change some `VMContext` pointers to `()` pointers This commit is motivated by my work on the component model implementation for imported functions. Currently all context pointers in wasm are `mut VMContext` but with the component model my plan is to make some pointers instead along the lines of `mut VMComponentContext`. In doing this though one worry I have is breaking what has otherwise been a core invariant of Wasmtime for quite some time, subtly introducing bugs by accident. To help assuage my worry I've opted here to erase knowledge of `mut VMContext` where possible. Instead where applicable a context pointer is simply known as `mut ()` and the embedder doesn't actually know anything about this context beyond the value of the pointer. This will help prevent Wasmtime from accidentally ever trying to interpret this context pointer as an actual `VMContext` when it might instead be a `VMComponentContext`. Overall this was a pretty smooth transition. The main change here is that the `VMTrampoline` (now sporting more docs) has its first argument changed to `mut ()`. The second argument, the caller context, is still configured as `mut VMContext` though because all functions are always called from wasm still. Eventually for component-to-component calls I think we'll probably "fake" the second argument as the same as the first argument, losing track of the original caller, as an intentional way of isolating components from each other. Along the way there are a few host locations which do actually assume that the first argument is indeed a `VMContext`. These are valid assumptions that are upheld from a correct implementation, but I opted to add a "magic" field to `VMContext` to assert this in debug mode. This new "magic" field is inintialized during normal vmcontext initialization and it's checked whenever a `VMContext` is reinterpreted as an `Instance` (but only in debug mode). My hope here is to catch any future accidental mistakes, if ever. * Use a VMOpaqueContext wrapper * Fix typos	2022-06-01 11:00:43 -05:00
Alex Crichton	a02a609528	Make `ValRaw` fields private (#4186 ) * Make `ValRaw` fields private Force accessing to go through constructors and accessors to localize the knowledge about little-endian-ness. This is spawned since I made a mistake in #4039 about endianness. * Fix some tests * Component model changes	2022-05-24 19:14:29 -05:00
Alex Crichton	51d82aebfd	Store the `ValRaw` type in little-endian format (#4035 ) * Store the `ValRaw` type in little-endian format This commit changes the internal representation of the `ValRaw` type to an unconditionally little-endian format instead of its current native-endian format. The documentation and various accessors here have been updated as well as the associated trampolines that read `ValRaw` to always work with little-endian values, converting to the host endianness as necessary. The motivation for this change originally comes from the implementation of the component model that I'm working on. One aspect of the component model's canonical ABI is how variants are passed to functions as immediate arguments. For example for a component model function: ``` foo: function(x: expected<i32, f64>) ``` This translates to a core wasm function: ```wasm (module (func (export "foo") (param i32 i64) ;; ... ) ) ``` The first `i32` parameter to the core wasm function is the discriminant of whether the result is an "ok" or an "err". The second `i64`, however, is the "join" operation on the `i32` and `f64` payloads. Essentially these two types are unioned into one type to get passed into the function. Currently in the implementation of the component model my plan is to construct a `mut [ValRaw]` to pass through to WebAssembly, always invoking component exports through host trampolines. This means that the implementation for `Result<T, E>` needs to do the correct "join" operation here when encoding a particular case into the corresponding `ValRaw`. I personally found this particularly tricky to do structurally. The solution that I settled on with fitzgen was that if `ValRaw` was always stored in a little endian format then we could employ a trick where when encoding a variant we first set all the `ValRaw` slots to zero, then the associated case we have is encoding. Afterwards the `ValRaw` values are already encoded into the correct format as if they'd been "join"ed. For example if we were to encode `Ok(1i32)` then this would produce `ValRaw { i32: 1 }`, which memory-wise is equivalent to `ValRaw { i64: 1 }` if the other bytes in the `ValRaw` are guaranteed to be zero. Similarly storing `ValRaw { f64 }` is equivalent to the storage required for `ValRaw { i64 }` here in the join operation. Note, though, that this equivalence relies on everything being little-endian. Otherwise the in-memory representations of `ValRaw { i32: 1 }` and `ValRaw { i64: 1 }` are different. That motivation is what leads to this change. It's expected that this is a low-to-zero cost change in the sense that little-endian platforms will see no change and big-endian platforms are already required to efficiently byte-swap loads/stores as WebAssembly requires that. Additionally the `ValRaw` type is an esoteric niche use case primarily used for accelerating the C API right now, so it's expected that not many users will have to update for this change. Track down some more endianness conversions	2022-04-14 13:09:32 -05:00
Alex Crichton	453feb6f82	Remove some dead code (#3970 ) This commit removes methods that are never used between crates or trait impls like `Clone` which may have been used one day but are no longer used.	2022-03-30 13:51:34 -05:00
Alex Crichton	c22033bf93	Delete historical interruptable support in Wasmtime (#3925 ) * Delete historical interruptable support in Wasmtime This commit removes the `Config::interruptable` configuration along with the `InterruptHandle` type from the `wasmtime` crate. The original support for adding interruption to WebAssembly was added pretty early on in the history of Wasmtime when there was no other method to prevent an infinite loop from the host. Nowadays, however, there are alternative methods for interruption such as fuel or epoch-based interruption. One of the major downsides of `Config::interruptable` is that even when it's not enabled it forces an atomic swap to happen when entering WebAssembly code. This technically could be a non-atomic swap if the configuration option isn't enabled but that produces even more branch-y code on entry into WebAssembly which is already something we try to optimize. Calling into WebAssembly is on the order of a dozens of nanoseconds at this time and an atomic swap, even uncontended, can add up to 5ns on some platforms. The main goal of this PR is to remove this atomic swap on entry into WebAssembly. This is done by removing the `Config::interruptable` field entirely, moving all existing consumers to epochs instead which are suitable for the same purposes. This means that the stack overflow check is no longer entangled with the interruption check and perhaps one day we could continue to optimize that further as well. Some consequences of this change are: * Epochs are now the only method of remote-thread interruption. * There are no more Wasmtime traps that produces the `Interrupted` trap code, although we may wish to move future traps to this so I left it in place. * The C API support for interrupt handles was also removed and bindings for epoch methods were added. * Function-entry checks for interruption are a tiny bit less efficient since one check is performed for the stack limit and a second is performed for the epoch as opposed to the `Config::interruptable` style of bundling the stack limit and the interrupt check in one. It's expected though that this is likely to not really be measurable. * The old `VMInterrupts` structure is renamed to `VMRuntimeLimits`.	2022-03-14 15:25:11 -05:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00
Dan Gohman	881c19473d	Use `ptr::cast` instead of `as` casts in several places. (#3507 ) `ptr::cast` has the advantage of being unable to silently cast `const T` to `mut T`. This turned up several places that were performing such casts, which this PR also fixes.	2022-01-21 13:03:17 -08:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
Alex Crichton	bfdbd10a13	Add `_unchecked` variants of `Func` APIs for the C API (#3350 ) Add `_unchecked` variants of `Func` APIs for the C API This commit is what is hopefully going to be my last installment within the saga of optimizing function calls in/out of WebAssembly modules in the C API. This is yet another alternative approach to #3345 (sorry) but also contains everything necessary to make the C API fast. As in #3345 the general idea is just moving checks out of the call path in the same style of `TypedFunc`. This new strategy takes inspiration from previously learned attempts effectively "just" exposes how we previously passed `mut u128` through trampolines for arguments/results. This storage format is formalized through a new `ValRaw` union that is exposed from the `wasmtime` crate. By doing this it made it relatively easy to expose two new APIs: * `Func::new_unchecked` * `Func::call_unchecked` These are the same as their checked equivalents except that they're `unsafe` and they work with `mut ValRaw` rather than safe slices of `Val`. Working with these eschews type checks and such and requires callers/embedders to do the right thing. These two new functions are then exposed via the C API with new functions, enabling C to have a fast-path of calling/defining functions. This fast path is akin to `Func::wrap` in Rust, although that API can't be built in C due to C not having generics in the same way that Rust has. For some benchmarks, the benchmarks here are: `nop` - Call a wasm function from the host that does nothing and returns nothing. * `i64` - Call a wasm function from the host, the wasm function calls a host function, and the host function returns an `i64` all the way out to the original caller. * `many` - Call a wasm function from the host, the wasm calls host function with 5 `i32` parameters, and then an `i64` result is returned back to the original host * `i64` host - just the overhead of the wasm calling the host, so the wasm calls the host function in a loop. * `many` host - same as `i64` host, but calling the `many` host function. All numbers in this table are in nanoseconds, and this is just one measurement as well so there's bound to be some variation in the precise numbers here. \| Name \| Rust \| C (before) \| C (after) \| \|-----------\|------\|------------\|-----------\| \| nop \| 19 \| 112 \| 25 \| \| i64 \| 22 \| 207 \| 32 \| \| many \| 27 \| 189 \| 34 \| \| i64 host \| 2 \| 38 \| 5 \| \| many host \| 7 \| 75 \| 8 \| The main conclusion here is that the C API is significantly faster than before when using the `_unchecked` variants of APIs. The Rust implementation is still the ceiling (or floor I guess?) for performance The main reason that C is slower than Rust is that a little bit more has to travel through memory where on the Rust side of things we can monomorphize and inline a bit more to get rid of that. Overall though the costs are way way down from where they were originally and I don't plan on doing a whole lot more myself at this time. There's various things we theoretically could do I've considered but implementation-wise I think they'll be much more weighty. Tweak `wasmtime_externref_t` API comments	2021-09-24 14:05:45 -05:00
Alex Crichton	63a3bbbf5a	Change VMMemoryDefinition::current_length to `usize` (#3134 ) * Change VMMemoryDefinition::current_length to `usize` This commit changes the definition of `VMMemoryDefinition::current_length` to `usize` from its previous definition of `u32`. This is a pretty impactful change because it also changes the cranelift semantics of "dynamic" heaps where the bound global value specifier must now match the pointer type for the platform rather than the index type for the heap. The motivation for this change is that the `current_length` field (or bound for the heap) is intended to reflect the current size of the heap. This is bound by `usize` on the host platform rather than `u32` or` u64`. The previous choice of `u32` couldn't represent a 4GB memory because we couldn't put a number representing 4GB into the `current_length` field. By using `usize`, which reflects the host's memory allocation, this should better reflect the size of the heap and allows Wasmtime to support a full 4GB heap for a wasm program (instead of 4GB minus one page). This commit also updates the legalization of the `heap_addr` clif instruction to appropriately cast the address to the platform's pointer type, handling bounds checks along the way. The practical impact for today's targets is that a `uextend` is happening sooner than it happened before, but otherwise there is no intended impact of this change. In the future when 64-bit memories are supported there will likely need to be fancier logic which handles offsets a bit differently (especially in the case of a 64-bit memory on a 32-bit host). The clif `filetest` changes should show the differences in codegen, and the Wasmtime changes are largely removing casts here and there. Closes #3022 * Add tests for memory.size at maximum memory size * Add a dfg helper method	2021-08-02 13:09:40 -05:00
Dan Gohman	784a380e5f	Add comments about vmctx pointers in various datastructures. (#2925 ) This forward-ports the relevant parts of #1396.	2021-07-27 09:33:27 -05:00
Alex Crichton	a273add815	Simplify the list of builtin intrinsics Wasmtime needs This commit slims down the list of builtin intrinsics. It removes the duplicated intrinsics for imported and locally defined items, instead always using one intrinsic for both. This was previously inconsistently applied where some intrinsics got two copies (one for imported one for local) and other intrinsics got only one copy. This does add an extra branch in intrinsics since they need to determine whether something is local or not, but that's generally much lower cost than the intrinsics themselves. This also removes the `memory32_size` intrinsic, instead inlining the codegen directly into the clif IR. This matches what the `table.size` instruction does and removes the need for a few functions on a `wasmtime_runtime::Instance`.	2021-06-23 10:30:31 -07:00
Ulrich Weigand	83007b79e3	Fix access to VMMemoryDefinition::current_length on big-endian (#3013 ) The current_length member is defined as "usize" in Rust code, but generated wasm code refers to it as if it were "u32". While this happens to mostly work on little-endian machines (as long as the length is < 4GB), it will always fail on big-endian machines. Fixed by making current_length "u32" in Rust as well, and ensuring the actual memory size is always less than 4GB.	2021-06-23 11:45:32 -05:00
Alex Crichton	7a1b7cdf92	Implement RFC 11: Redesigning Wasmtime's APIs (#2897 ) Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.	2021-06-03 09:10:53 -05:00
Alex Crichton	c77ea0c5c7	Add some more `#[inline]` annotations for trivial functions (#2817 ) Looking at some profiles these or their related functions were all showing up, so this commit adds `#[inline]` to allow cross-crate inlining by default.	2021-04-08 12:23:54 -05:00
Peter Huene	b58afbf849	Refactor module instantiation in the runtime. This commit refactors module instantiation in the runtime to allow for different instance allocation strategy implementations. It adds an `InstanceAllocator` trait with the current implementation put behind the `OnDemandInstanceAllocator` struct. The Wasmtime API has been updated to allow a `Config` to have an instance allocation strategy set which will determine how instances get allocated. This change is in preparation for an alternative pooling instance allocator that can reserve all needed host process address space in advance. This commit also makes changes to the `wasmtime_environ` crate to represent compiled modules in a way that reduces copying at instantiation time.	2021-03-04 18:18:50 -08:00
Alex Crichton	0e41861662	Implement limiting WebAssembly execution with fuel (#2611 ) * Consume fuel during function execution This commit adds codegen infrastructure necessary to instrument wasm code to consume fuel as it executes. Currently nothing is really done with the fuel, but that'll come in later commits. The focus of this commit is to implement the codegen infrastructure necessary to consume fuel and account for fuel consumed correctly. * Periodically check remaining fuel in wasm JIT code This commit enables wasm code to periodically check to see if fuel has run out. When fuel runs out an intrinsic is called which can do what it needs to do in the result of fuel running out. For now a trap is thrown to have at least some semantics in synchronous stores, but another planned use for this feature is for asynchronous stores to periodically yield back to the host based on fuel running out. Checks for remaining fuel happen in the same locations as interrupt checks, which is to say the start of the function as well as loop headers. * Improve codegen by caching `const VMInterrupts` The location of the shared interrupt value and fuel value is through a double-indirection on the vmctx (load through the vmctx and then load through that pointer). The second pointer in this chain, however, never changes, so we can alter codegen to account for this and remove some extraneous load instructions and hopefully reduce some register pressure even maybe. Add tests fuel can abort infinite loops * More fuzzing with fuel Use fuel to time out modules in addition to time, using fuzz input to figure out which. * Update docs on trapping instructions * Fix doc links * Fix a fuzz test * Change setting fuel to adding fuel * Fix a doc link * Squelch some rustdoc warnings	2021-01-29 08:57:17 -06:00
Yury Delendik	3580205f12	[Cranelift][Atomics] Add address folding for atomic notify/wait. (#2556 ) * fold address in wasm wait and notify ops * add atomics addr folding tests	2021-01-08 11:55:21 -06:00
Alex Crichton	3887881800	Refactor how signatures/trampolines are stored in `Store` This commit refactors where trampolines and signature information is stored within a `Store`, namely moving them from `wasmtime_runtime::Instance` instead to `Store` itself. The goal here is to remove an allocation inside of an `Instance` and make them a bit cheaper to create. Additionally this should open up future possibilities like not creating duplicate trampolines for signatures already in the `Store` when using `Func::new`.	2020-11-02 07:54:18 -08:00
Alex Crichton	e659d5cecd	Add initial support for the multi-memory proposal (#2263 ) This commit adds initial (gated) support for the multi-memory wasm proposal. This was actually quite easy since almost all of wasmtime already expected multi-memory to be implemented one day. The only real substantive change is the `memory.copy` intrinsic changes, which now accounts for the source/destination memories possibly being different.	2020-10-13 19:13:52 -05:00
Alex Crichton	3d2e0e55f2	Remove the `local` field of `Module` (#2091 ) This was added long ago at this point to assist with caching, but caching has moved to a different level such that this wonky second level of a `Module` isn't necessary. This commit removes the `ModuleLocal` type to simplify accessors and generally make it easier to work with.	2020-08-04 12:29:16 -05:00
Nick Fitzgerald	3555f97906	wasmtime: Implement `table.fill` Part of #929	2020-07-02 16:59:07 -07:00
Nick Fitzgerald	bffd54c016	wasmtime: Implement `global.{get,set}` for externref globals (#1969 ) * wasmtime: Implement `global.{get,set}` for externref globals We use libcalls to implement these -- unlike `table.{get,set}`, for which we create inline JIT fast paths -- because no known toolchain actually uses externref globals. Part of #929 * wasmtime: Enable `{extern,func}ref` globals in the API	2020-07-02 16:04:01 -05:00
Nick Fitzgerald	8c5f59c0cf	wasmtime: Implement `table.get` and `table.set` These instructions have fast, inline JIT paths for the common cases, and only call out to host VM functions for the slow paths. This required some changes to `cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert an instruction sequence within the current basic block, `FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder` so that they can create whole new basic blocks. This is necessary for implementing GC read/write barriers that involve branching (e.g. checking for null, or whether a store buffer is at capacity). Furthermore, it required that the `load`, `load_complex`, and `store` instructions handle loading and storing through an `r{32,64}` rather than just `i{32,64}` addresses. This involved making `r{32,64}` types acceptable instantiations of the `iAddr` type variable, plus a few new instruction encodings. Part of #929	2020-06-30 12:00:57 -07:00
Nick Fitzgerald	58bb5dd953	wasmtime: Add support for `func.ref` and `table.grow` with `funcref`s `funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`. This should be more efficient than using a `VMExternRef` that points at a `VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic allocation, and some reference counting. Note that the null function reference is NOT a null pointer; it is a `VMCallerCheckedAnyfunc` that has a null `func_ptr` member. Part of #929	2020-06-24 10:08:13 -07:00
Nick Fitzgerald	bbd99c5bfa	reference types: Implement the `table.size` and `table.grow` instructions (#1894 ) Part of #929	2020-06-18 08:57:18 -05:00
Alex Crichton	c9a0ba81a0	Implement interrupting wasm code, reimplement stack overflow (#1490 ) * Implement interrupting wasm code, reimplement stack overflow This commit is a relatively large change for wasmtime with two main goals: * Primarily this enables interrupting executing wasm code with a trap, preventing infinite loops in wasm code. Note that resumption of the wasm code is not a goal of this commit. * Additionally this commit reimplements how we handle stack overflow to ensure that host functions always have a reasonable amount of stack to run on. This fixes an issue where we might longjmp out of a host function, skipping destructors. Lots of various odds and ends end up falling out in this commit once the two goals above were implemented. The strategy for implementing this was also lifted from Spidermonkey and existing functionality inside of Cranelift. I've tried to write up thorough documentation of how this all works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are. A brief summary of how this works is that each function and each loop header now checks to see if they're interrupted. Interrupts and the stack overflow check are actually folded into one now, where function headers check to see if they've run out of stack and the sentinel value used to indicate an interrupt, checked in loop headers, tricks functions into thinking they're out of stack. An interrupt is basically just writing a value to a location which is read by JIT code. When interrupts are delivered and what triggers them has been left up to embedders of the `wasmtime` crate. The `wasmtime::Store` type has a method to acquire an `InterruptHandle`, where `InterruptHandle` is a `Send` and `Sync` type which can travel to other threads (or perhaps even a signal handler) to get notified from. It's intended that this provides a good degree of flexibility when interrupting wasm code. Note though that this does have a large caveat where interrupts don't work when you're interrupting host code, so if you've got a host import blocking for a long time an interrupt won't actually be received until the wasm starts running again. Some fallout included from this change is: * Unix signal handlers are no longer registered with `SA_ONSTACK`. Instead they run on the native stack the thread was already using. This is possible since stack overflow isn't handled by hitting the guard page, but rather it's explicitly checked for in wasm now. Native stack overflow will continue to abort the process as usual. * Unix sigaltstack management is now no longer necessary since we don't use it any more. * Windows no longer has any need to reset guard pages since we no longer try to recover from faults on guard pages. * On all targets probestack intrinsics are disabled since we use a different mechanism for catching stack overflow. * The C API has been updated with interrupts handles. An example has also been added which shows off how to interrupt a module. Closes #139 Closes #860 Closes #900 * Update comment about magical interrupt value * Store stack limit as a global value, not a closure * Run rustfmt * Handle review comments * Add a comment about SA_ONSTACK * Use `usize` for type of `INTERRUPTED` * Parse human-readable durations * Bring back sigaltstack handling Allows libstd to print out stack overflow on failure still. * Add parsing and emission of stack limit-via-preamble * Fix new example for new apis * Fix host segfault test in release mode * Fix new doc example	2020-04-21 11:03:28 -07:00
Alex Crichton	7eea5d8d43	Optimize codegen in `Func::wrap` (#1491 ) This commit optimizes the codegen of `Func::wrap` such that if you do something like `Func::wrap(&store, \|\| {})` then the shim generated contains zero code (as expected). In general this means that the extra tidbits generated by wasmtime are all eligible to be entirely optimized away so long as you don't actually rely on something.	2020-04-10 12:52:06 -05:00
Alex Crichton	3e2be43502	Pre-generate trampoline functions (#957 ) * Refactor wasmtime_runtime::Export Instead of an enumeration with variants that have data fields have an enumeration where each variant has a struct, and each struct has the data fields. This allows us to store the structs in the `wasmtime` API and avoid lots of `panic!` calls and various extraneous matches. * Pre-generate trampoline functions The `wasmtime` crate supports calling arbitrary function signatures in wasm code, and to do this it generates "trampoline functions" which have a known ABI that then internally convert to a particular signature's ABI and call it. These trampoline functions are currently generated on-the-fly and are cached in the global `Store` structure. This, however, is suboptimal for a few reasons: * Due to how code memory is managed each trampoline resides in its own 64kb allocation of memory. This means if you have N trampolines you're using N * 64kb of memory, which is quite a lot of overhead! * Trampolines are never free'd, even if the referencing module goes away. This is similar to #925. * Trampolines are a source of shared state which prevents `Store` from being easily thread safe. This commit refactors how trampolines are managed inside of the `wasmtime` crate and jit/runtime internals. All trampolines are now allocated in the same pass of `CodeMemory` that the main module is allocated into. A trampoline is generated per-signature in a module as well, instead of per-function. This cache of trampolines is stored directly inside of an `Instance`. Trampolines are stored based on `VMSharedSignatureIndex` so they can be looked up from the internals of the `ExportFunction` value. The `Func` API has been updated with various bits and pieces to ensure the right trampolines are registered in the right places. Overall this should ensure that all trampolines necessary are generated up-front rather than lazily. This allows us to remove the trampoline cache from the `Compiler` type, and move one step closer to making `Compiler` threadsafe for usage across multiple threads. Note that as one small caveat the `Func::wrap` family of functions don't need to generate a trampoline at runtime, they actually generate the trampoline at compile time which gets passed in. Also in addition to shuffling a lot of code around this fixes one minor bug found in `code_memory.rs`, where `self.position` was loaded before allocation, but the allocation may push a new chunk which would cause `self.position` to be zero instead. Pass the `SignatureRegistry` as an argument to where it's needed. This avoids the need for storing it in an `Arc`. * Ignore tramoplines for functions with lots of arguments Co-authored-by: Dan Gohman <sunfish@mozilla.com>	2020-03-12 16:17:48 -05:00
Nick Fitzgerald	674a6208d8	Implement `data.drop` and `memory.init` and get the rest of the bulk memory spec tests passing (#1264 ) * Enable the already-passing `bulk-memoryoperations/imports.wast` test * Implement support for the `memory.init` instruction and passive data This adds support for passive data segments and the `memory.init` instruction from the bulk memory operations proposal. Passive data segments are stored on the Wasm module and then `memory.init` instructions copy their contents into memory. * Implement the `data.drop` instruction This allows wasm modules to deallocate passive data segments that it doesn't need anymore. We keep track of which segments have not been dropped on an `Instance` and when dropping them, remove the entry from the instance's hash map. The module always needs all of the segments for new instantiations. * Enable final bulk memory operations spec test This requires special casing an expected error message for an `assert_trap`, since the expected error message contains the index of an uninitialized table element, but our trap implementation doesn't save that diagnostic information and shepherd it out.	2020-03-10 09:30:11 -05:00
Nick Fitzgerald	ef0cabf8b4	Address review feedback	2020-02-26 14:37:28 -08:00
Nick Fitzgerald	44c28612fb	Implement the `memory.fill` instruction from the bulk memory proposal	2020-02-26 14:35:09 -08:00
Nick Fitzgerald	98ecef1700	Implement the `memory.copy` instruction from the bulk memory proposal	2020-02-26 14:35:09 -08:00
Nick Fitzgerald	cb97e4ec8e	Implement `table.init` and `elem.drop` from the bulk memory proposal	2020-02-26 14:35:09 -08:00
Nick Fitzgerald	33b4a37bcb	Add support for `table.copy` This adds support for the `table.copy` instruction from the bulk memory proposal. It also supports multiple tables, which were introduced by the reference types proposal. Part of #928	2020-02-26 14:30:43 -08:00
Alex Crichton	c8ab1e293e	Improve robustness of cache loading/storing (#974 ) * Improve robustness of cache loading/storing Today wasmtime incorrectly loads compiled compiled modules from the global cache when toggling settings such as optimizations. For example if you execute `wasmtime foo.wasm` that will cache globally an unoptimized version of the wasm module. If you then execute `wasmtime -O foo.wasm` it would then reload the unoptimized version from cache, not realizing the compilation settings were different, and use that instead. This can lead to very surprising behavior naturally! This commit updates how the cache is managed in an attempt to make it much more robust against these sorts of issues. This takes a leaf out of rustc's playbook and models the cache with a function that looks like: fn load<T: Hash>( &self, data: T, compute: fn(T) -> CacheEntry, ) -> CacheEntry; The goal here is that it guarantees that all the `data` necessary to `compute` the result of the cache entry is hashable and stored into the hash key entry. This was previously open-coded and manually managed where items were hashed explicitly, but this construction guarantees that everything reasonable `compute` could use to compile the module is stored in `data`, which is itself hashable. This refactoring then resulted in a few workarounds and a few fixes, including the original issue: * The `Module` type was split into `Module` and `ModuleLocal` where only the latter is hashed. The previous hash function for a `Module` left out items like the `start_func` and didn't hash items like the imports of the module. Omitting the `start_func` was fine since compilation didn't actually use it, but omitting imports seemed uncomfortable because while compilation didn't use the import values it did use the number of imports, which seems like it should then be put into the cache key. The `ModuleLocal` type now derives `Hash` to guarantee that all of its contents affect the hash key. * The `ModuleTranslationState` from `cranelift-wasm` doesn't implement `Hash` which means that we have a manual wrapper to work around that. This will be fixed with an upstream implementation, since this state affects the generated wasm code. Currently this is just a map of signatures, which is present in `Module` anyway, so we should be good for the time being. * Hashing `dyn TargetIsa` was also added, where previously it was not fully hashed. Previously only the target name was used as part of the cache key, but crucially the flags of compilation were omitted (for example the optimization flags). Unfortunately the trait object itself is not hashable so we still have to manually write a wrapper to hash it, but we likely want to add upstream some utilities to hash isa objects into cranelift itself. For now though we can continue to add hashed fields as necessary. Overall the goal here was to use the compiler to expose what we're not hashing, and then make sure we organize data and write the right code to ensure everything is hashed, and nothing more. * Update crates/environ/src/module.rs Co-Authored-By: Peter Huene <peterhuene@protonmail.com> * Fix lightbeam * Fix compilation of tests * Update the expected structure of the cache * Revert "Update the expected structure of the cache" This reverts commit 2b53fee426a4e411c313d8c1e424841ba304a9cd. * Separate the cache dir a bit * Add a test the cache is busted with opt levels * rustfmt Co-authored-by: Peter Huene <peterhuene@protonmail.com>	2020-02-26 16:18:02 -06:00
Alex Crichton	47d6db0be8	Reel in unsafety around `InstanceHandle` (#856 ) * Reel in unsafety around `InstanceHandle` This commit is an attempt, or at least is targeted at being a start, at reeling in the unsafety around the `InstanceHandle` type. Currently this type represents a sort of moral `Rc<Instance>` but is a bit more specialized since the underlying memory is allocated through mmap. Additionally, though, `InstanceHandle` exposes a fundamental flaw in its safety by safetly allowing mutable access so long as you have `&mut InstanceHandle`. This type, however, is trivially created by simply cloning a `InstanceHandle` to get an owned reference. This means that `&mut InstanceHandle` does not actually provide any guarantees about uniqueness, so there's no more safety than `&InstanceHandle` itself. This commit removes all `&mut self` APIs from `InstanceHandle`, additionally removing some where `&self` was `unsafe` and `&mut self` was safe (since it was trivial to subvert this "safety"). In doing so interior mutability patterns are now used much more extensively through structures such as `Table` and `Memory`. Additionally a number of methods were refactored to be a bit clearer and use helper functions where possible. This is a relatively large commit unfortunately, but it snowballed very quickly into touching quite a few places. My hope though is that this will prevent developers working on wasmtime internals as well as developers still yet to migrate to the `wasmtime` crate from falling into trivial unsafe traps by accidentally using `&mut` when they can't. All existing users relying on `&mut` will need to migrate to some form of interior mutability, such as using `RefCell` or `Cell`. This commit also additionally marks `InstanceHandle::new` as an `unsafe` function. The rationale for this is that the `&mut`-safety is only the beginning for the safety of `InstanceHandle`. In general the wasmtime internals are extremely unsafe and haven't been audited for appropriate usage of `unsafe`. Until that's done it's hoped that we can warn users with this `unsafe` constructor and otherwise push users to the `wasmtime` crate which we know is safe. * Fix windows build * Wrap up mutable memory state in one structure Rather than having separate fields * Use `Cell::set`, not `Cell::replace`, where possible * Add a helper function for offsets from VMContext * Fix a typo from merging * rustfmt * Use try_from, not as * Tweak style of some setters	2020-01-24 14:20:35 -06:00
Dan Gohman	9a88d3d894	Replace the global-exports mechanism with a caller-vmctx mechanism. (#789 ) * Replace the global-exports mechanism with a caller-vmctx mechanism. This eliminates the global exports mechanism, and instead adds a caller-vmctx argument to wasm functions so that WASI can obtain the memory and other things from the caller rather than looking them up in a global registry. This replaces #390. * Fixup some merge conflicts * Rustfmt * Ensure VMContext is aligned to 16 bytes With the removal of `global_exports` it "just so happens" that this isn't happening naturally any more. * Fixup some bugs with double vmctx in wasmtime crate * Trampoline stub needed adjusting * Use pointer type instead of always using I64 for caller vmctx * Don't store `ir::Signature` in `Func` since we don't know the pointer size at creation time. * Skip the first 2 arguments in IR signatures since that's the two vmctx parameters. * Update cranelift to 0.56.0 * Handle more merge conflicts * Rustfmt Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2020-01-21 14:50:59 -08:00
XAMPPRocky	907e7aac01	Clippy fixes (#692 )	2019-12-24 12:50:07 -08:00
Alex Crichton	39e57e3e9a	Migrate back to `std::` stylistically (#554 ) * Migrate back to `std::` stylistically This commit moves away from idioms such as `alloc::` and `core::` as imports of standard data structures and types. Instead it migrates all crates to uniformly use `std::` for importing standard data structures and types. This also removes the `std` and `core` features from all crates to and removes any conditional checking for `feature = "std"` All of this support was previously added in #407 in an effort to make wasmtime/cranelift "`no_std` compatible". Unfortunately though this change comes at a cost: * The usage of `alloc` and `core` isn't idiomatic. Especially trying to dual between types like `HashMap` from `std` as well as from `hashbrown` causes imports to be surprising in some cases. * Unfortunately there was no CI check that crates were `no_std`, so none of them actually were. Many crates still imported from `std` or depended on crates that used `std`. It's important to note, however, that this does not mean that wasmtime will not run in embedded environments. The style of the code today and idioms aren't ready in Rust to support this degree of multiplexing and makes it somewhat difficult to keep up with the style of `wasmtime`. Instead it's intended that embedded runtime support will be added as necessary. Currently only `std` is necessary to build `wasmtime`, and platforms that natively need to execute `wasmtime` will need to use a Rust target that supports `std`. Note though that not all of `std` needs to be supported, but instead much of it could be configured off to return errors, and `wasmtime` would be configured to gracefully handle errors. The goal of this PR is to move `wasmtime` back to idiomatic usage of features/`std`/imports/etc and help development in the short-term. Long-term when platform concerns arise (if any) they can be addressed by moving back to `no_std` crates (but fixing the issues mentioned above) or ensuring that the target in Rust has `std` available. * Start filling out platform support doc	2019-11-18 22:04:06 -08:00
Dan Gohman	061b453255	Remove unneeded `extern crate`, `macro_use`, and tidy `use`s.	2019-11-08 17:55:38 -08:00

1 2

52 Commits