Commit Graph

332 Commits

Author SHA1 Message Date
Chris Fallin
833ebeed76 Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation.
This patch makes spillslot allocation, spilling and reloading all based
on register class only. Hence when we have a 32- or 64-bit value in a
128-bit XMM register on x86-64 or vector register on aarch64, this
results in larger spillslots and spills/restores.

Why make this change, if it results in less efficient stack-frame usage?
Simply put, it is safer: there is always a risk when allocating
spillslots or spilling/reloading that we get the wrong type and make the
spillslot or the store/load too small. This was one contributing factor
to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that
puns an arbitrary user-controlled vector constant over another
stackslot. (If this were a pointer, that could result in RCE. SIMD is
not yet on by default in a release, fortunately.

In particular, we have not been particularly careful about using moves
between values of different types, for example with `raw_bitcast` or
with certain SIMD operations, and such moves indicate to regalloc.rs
that vregs are in equivalence classes and some arbitrary vreg in the
class is provided when allocating the spillslot or spilling/reloading.
Since regalloc.rs does not track actual type, and since we haven't been
careful about moves, we can't really trust this "arbitrary vreg in
equivalence class" to provide accurate type information.

In the fix to CVE-2021-32629 we fixed this for integer registers by
always spilling/reloading 64 bits; this fix can be seen as the analogous
change for FP/vector regs.
2022-01-04 13:24:40 -08:00
Alex Crichton
f1225dfd93 Add a compilation section to disable address maps (#3598)
* Add a compilation section to disable address maps

This commit adds a new `Config::generate_address_map` compilation
setting which is used to disable emission of the `.wasmtime.addrmap`
section of compiled artifacts. This section is currently around the size
of the entire `.text` section itself unfortunately and for size reasons
may wish to be omitted. Functionality-wise all that is lost is knowing
the precise wasm module offset address of a faulting instruction or in a
backtrace of instructions. This also means that if the module has DWARF
debugging information available with it Wasmtime isn't able to produce a
filename and line number in the backtrace.

This option remains enabled by default. This option may not be needed in
the future with #3547 perhaps, but in the meantime it seems reasonable
enough to support a configuration mode where the section is entirely
omitted if the smallest module possible is desired.

* Fix some CI issues

* Update tests/all/traps.rs

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Do less work in compilation for address maps

But only when disabled

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2021-12-13 13:48:05 -06:00
Dan Gohman
42b23dac4a Make the trap name for unreachable traps more descriptive. (#3568)
Following up on WebAssembly/wasi-sdk#210, this makes the trap message
for `unreachable` traps more descriptive of what actually caused the
trap, so that it doesn't sound like maybe Wasmtime itself executed a
`unreachable!()` macro in Rust.

Before:
```
wasm trap: unreachable
wasm backtrace:
     [...]
```

After:
```
wasm trap: wasm `unreachable` instruction executed
wasm backtrace:
     [...]
```
2021-11-29 15:55:10 -08:00
Andrew Brown
994fe41daf x64: allow vector types in select move
As reported in #3173, the `select` instruction fails an assertion when it is given `v128` types as operands. This change relaxes the assertion to allow the same type of XMM move that occurs for the f32 and f64 types. This fixes #3173 in the old `lower.rs` code temporarily until the relatively complex `select` lowering can be ported to ISLE.
2021-11-19 11:24:30 -08:00
Alex Crichton
88e400b98c Make memory smaller 2021-11-19 08:44:22 -08:00
Alex Crichton
efa3468ff9 x64: Add test for a fixed issue
This commit adds a test from #3337 which is an issue that was fixed
in #3506 due to moving `imul` lowering rules to ISLE which fixed the
underlying issue of accidentally not falling through to the necessary
case for general `i64x2.mul` multiplication.

Closes #3337
2021-11-19 08:16:21 -08:00
Alex Crichton
352ee2b186 Move insertlane to ISLE (#3544)
This also fixes a bug where `movsd` was incorrectly used with a memory
operand for `insertlane`, causing it to actually zero the upper bits
instead of preserving them.

Note that the insertlane logic still exists in `lower.rs` because it's
used as a helper for a few other instruction lowerings which aren't
migrated to ISLE yet. This commit also adds a helper in ISLE itself for
those other lowerings to use when they get implemented.

Closes #3216
2021-11-18 13:48:11 -06:00
Alex Crichton
92394566fc x64: Migrate fabs and bnot vector operations to ISLE
This was my first attempt at transitioning code to ISLE to originally
fix #3327 but that fix has since landed on `main`, so this is instead
now just porting a few operations to ISLE.

Closes #3336
2021-11-16 07:36:49 -08:00
Johnnie Birch
5d5629de60 Fix for issue 3327. Updates Bnot to handle case for NaN float 2021-11-15 18:47:23 -08:00
Dan Gohman
ea0cb971fb Update to rustix 0.26.2. (#3521)
This pulls in a fix for Android, where Android's seccomp policy on older
versions is to make `openat2` irrecoverably crash the process, so we have
to do a version check up front rather than relying on `ENOSYS` to
determine if `openat2` is supported.

And it pulls in the fix for the link errors when multiple versions of
rsix/rustix are linked in.

And it has updates for two crate renamings: rsix has been renamed to
rustix, and unsafe-io has been renamed to io-extras.
2021-11-15 10:21:13 -08:00
Adam Bratschi-Kaye
12bfbdfaca Skip generating DWARF info for dead code (#3498)
When encountering a subprogram that is dead code (as indicated by the
dead code proposal
https://dwarfstd.org/ShowIssue.php?issue=200609.1), don't generate debug
output for the subprogram or any of its children.
2021-11-08 09:31:04 -06:00
Alex Crichton
6be0f82b96 Fix a panic with an invalid name section (#3509)
This commit fixes a panic which can happen on a module with an invalid
name section where one of the functions named has the index `u32::MAX`.
Previously Wasmtime would create a new `FuncIndex` with the indices
found in the name section but the sentinel `u32::MAX` causes a panic.

Cranelift otherwise limits the number of functions through `wasmparser`
which has a hard limit (lower than `u32::MAX`) so this commit applies a
fix of only recording function names for function indices that are
actually present in the module.
2021-11-05 15:08:58 -05:00
Alex Crichton
6bcee7f5f7 Add a configuration option to force "static" memories (#3503)
* Add a configuration option to force "static" memories

In poking around at some things earlier today I realized that one
configuration option for memories we haven't exposed from embeddings
like the CLI is to forcibly limit the size of memory growth and force
using a static memory style. This means that the CLI, for example, can't
limit memory growth by default and memories are only limited in size by
what the OS can give and the wasm's own memory type. This configuration
option means that the CLI can artificially limit the size of wasm linear
memories.

Additionally another motivation for this is for testing out various
codegen ramifications of static/dynamic memories. This is the only way
to force a static memory, by default, for wasm64 memories with no
maximum size listed for example.

* Review feedback
2021-11-03 16:50:49 -05:00
Pat Hickey
fb549c6ddb actually do some awaiting in the async limiter, which doesnt work
something tls-related is not right
2021-10-22 16:12:07 -07:00
Pat Hickey
bcbd74678a add some tests that show behavior is the same when on wasm stack 2021-10-22 11:08:09 -07:00
Pat Hickey
175e72bac4 test that the catch unwind works 2021-10-21 16:43:13 -07:00
Pat Hickey
758abe3963 add table limiting tests 2021-10-21 16:23:54 -07:00
Pat Hickey
538c19589b build out async versions of the existing limiter tests 2021-10-21 15:21:05 -07:00
Pat Hickey
252ba39c27 implement table _async methods, test passes now 2021-10-21 14:15:53 -07:00
Pat Hickey
2225722373 Memory::new_async, grow_async now work! 2021-10-21 12:10:03 -07:00
Pat Hickey
adf7521e30 introduce a failing test 2021-10-21 12:10:03 -07:00
Pat Hickey
18a355e092 give sychronous ResourceLimiter an async alternative 2021-10-21 12:10:03 -07:00
Adam Bratschi-Kaye
afd10646c9 List exports of an instance in linking error (#3456)
When there is a linking error caused by an undefined instance, list all
the instances exports in the error message. This will clarify errors for
undefined two-level imports that get desugared to one-level instance
imports under the module-linking proposal.
2021-10-20 16:31:53 -05:00
Alex Crichton
e8d3b8e3ea Fix an off-by-two condition in heap legalization (#3462)
This commit fixes an issue in Cranelift where legalization of
`heap_addr` instructions (used by wasm to represent heap accesses) could
be off-by-two where loads that should be valid were actually treated as
invalid. The bug here happened in an optimization where tests against
odd constants were being altered to tests against even constants by
subtracting one from the limit instead of adding one to the limit. The
comment around this area has been updated in accordance with a little
more math-stuff as well to help future readers.
2021-10-19 13:19:20 -05:00
Alex Crichton
9c6884e28d Update the spec reference testsuite submodule (#3450)
* Update the spec reference testsuite submodule

This commit brings in recent updates to the spec test suite. Most of the
changes here were already fixed in `wasmparser` with some tweaks to
esoteric modules, but Wasmtime also gets a bug fix where where import
matching for the size of tables/memories is based on the current runtime
size of the table/memory rather than the original type of the
table/memory. This means that during type matching the actual value is
consulted for its size rather than using the minimum size listed in its
type.

* Fix now-missing directories in build script
2021-10-13 16:14:12 -05:00
Chris Fallin
937b319e2d Merge pull request #3009 from bjorn3/bye_x86_backend
[RFC] Remove the old x86 backend
2021-09-30 09:20:14 -07:00
Dan Gohman
e5ebef1b94 Use empty() instead of NONE with rsix flags types.
`empty()` is provided by all `bitflags` types, so it's more idiomatic
than having `NONE` values.
2021-09-30 08:14:13 -07:00
bjorn3
71907184a5 Rustfmt 2021-09-29 18:21:47 +02:00
bjorn3
9e34df33b9 Remove the old x86 backend 2021-09-29 16:13:46 +02:00
Alex Crichton
bcf3544924 Optimize Func::call and its C API (#3319)
* Optimize `Func::call` and its C API

This commit is an alternative to #3298 which achieves effectively the
same goal of optimizing the `Func::call` API as well as its C API
sibling of `wasmtime_func_call`. The strategy taken here is different
than #3298 though where a new API isn't created, rather a small tweak to
an existing API is done. Specifically this commit handles the major
sources of slowness with `Func::call` with:

* Looking up the type of a function, to typecheck the arguments with and
  use to guide how the results should be loaded, no longer hits the
  rwlock in the `Engine` but instead each `Func` contains its own
  `FuncType`. This can be an unnecessary allocation for funcs not used
  with `Func::call`, so this is a downside of this implementation
  relative to #3298. A mitigating factor, though, is that instance
  exports are loaded lazily into the `Store` and in theory not too many
  funcs are active in the store as `Func` objects.

* Temporary storage is amortized with a long-lived `Vec` in the `Store`
  rather than allocating a new vector on each call. This is basically
  the same strategy as #3294 only applied to different types in
  different places. Specifically `wasmtime::Store` now retains a
  `Vec<u128>` for `Func::call`, and the C API retains a `Vec<Val>` for
  calling `Func::call`.

* Finally, an API breaking change is made to `Func::call` and its type
  signature (as well as `Func::call_async`). Instead of returning
  `Box<[Val]>` as it did before this function now takes a
  `results: &mut [Val]` parameter. This allows the caller to manage the
  allocation and we can amortize-remove it in `wasmtime_func_call` by
  using space after the parameters in the `Vec<Val>` we're passing in.
  This change is naturally a breaking change and we'll want to consider
  it carefully, but mitigating factors are that most embeddings are
  likely using `TypedFunc::call` instead and this signature taking a
  mutable slice better aligns with `Func::new` which receives a mutable
  slice for the results.

Overall this change, in the benchmark of "call a nop function from the C
API" is not quite as good as #3298. It's still a bit slower, on the
order of 15ns, because there's lots of capacity checks around vectors
and the type checks are slightly less optimized than before. Overall
though this is still significantly better than today because allocations
and the rwlock to acquire the type information are both avoided. I
personally feel that this change is the best to do because it has less
of an API impact than #3298.

* Rebase issues
2021-09-21 14:07:05 -05:00
Dan Gohman
47490b4383 Use rsix to make system calls in Wasmtime. (#3355)
* Use rsix to make system calls in Wasmtime.

`rsix` is a system call wrapper crate that we use in `wasi-common`,
which can provide the following advantages in the rest of Wasmtime:

 - It eliminates some `unsafe` blocks in Wasmtime's code. There's
   still an `unsafe` block in the library, but this way, the `unsafe`
   is factored out and clearly scoped.

 - And, it makes error handling more consistent, factoring out code for
   checking return values and `io::Error::last_os_error()`, and code that
   does `errno::set_errno(0)`.

This doesn't cover *all* system calls; `rsix` doesn't implement
signal-handling APIs, and this doesn't cover calls made through `std` or
crates like `userfaultfd`, `rand`, and `region`.
2021-09-17 15:28:56 -07:00
Alex Crichton
853b986b22 Move custom_limiter_detect_os_oom_failure to its own test (#3366)
This test uses `rlimit` which can't be executed in parallel with other
tests. Previously this used `libc::fork` but the call afterwards to
`libc::wait` was racing all other child subprocesses since it would wait
for any child instead of the specific child we were interested in. There
was also difficulty getting the output of the child on failure coming
to the parent, so this commit simplifies the situation by moving the
test to its own executable where it's the only test.
2021-09-17 15:47:47 -05:00
Chris Fallin
f851684ebc Ignore failing test on old-backend builds. 2021-09-17 11:02:12 -07:00
Nick Fitzgerald
b39f087414 Merge pull request from GHSA-q879-9g95-56mx
Add an assertion that a `HostFunc`'s `store` agrees on engines
2021-09-17 10:29:35 -07:00
Nick Fitzgerald
101998733b Merge pull request from GHSA-v4cp-h94r-m7xf
Fix a use-after-free bug when passing `ExternRef`s to Wasm
2021-09-17 10:27:29 -07:00
Pat Hickey
faa117cac4 Merge pull request #3349 from bytecodealliance/pch/limiter
add a hook to ResourceLimiter to detect memory grow failure
2021-09-15 18:04:31 -07:00
Pat Hickey
00d65eccaf does this one hang on qemu too? idk 2021-09-15 16:19:51 -07:00
Pat Hickey
12be7cd720 skip the detect_os_oom_failure test on qemu ci 2021-09-15 15:51:19 -07:00
Pat Hickey
27083e72e3 fix warning 2021-09-15 14:51:52 -07:00
Pat Hickey
bb7f58d936 add a hook to ResourceLimiter to detect memory grow failure.
* allow the ResourceLimiter to reject a memory grow before the
memory's own maximum.
* add a hook so a ResourceLimiter can detect any reason that
a memory grow fails, including if the OS denies additional memory
* add tests for this new functionality. I only took the time to
test the OS denial on Linux, it should be possible on Mac OS
as well but I don't have a test setup. I have no idea how to
do this on windows.
2021-09-15 11:50:23 -07:00
Alex Crichton
b31a4ea16b Add Store::consume_fuel to manually consume fuel (#3352)
This can be useful for host functions that want to consume fuel to
reflect their relative cost. Additionally it's a relatively easy
addition to have and someone's asking for it!

Closes #3315
2021-09-15 13:10:11 -05:00
Alex Crichton
9db418cfd9 Improve linking-related error messages (#3353)
Include more contextual information about why the link failed related to
why the types didn't match.

Closes #3172
2021-09-15 11:42:45 -05:00
Nick Fitzgerald
d2ce1ac753 Fix a use-after-free bug when passing ExternRefs to Wasm
We _must not_ trigger a GC when moving refs from host code into
Wasm (e.g. returned from a host function or passed as arguments to a Wasm
function). After insertion into the table, this reference is no longer
rooted. If multiple references are being sent from the host into Wasm and we
allowed GCs during insertion, then the following events could happen:

* Reference A is inserted into the activations table. This does not trigger a
  GC, but does fill the table to capacity.

* The caller's reference to A is removed. Now the only reference to A is from
  the activations table.

* Reference B is inserted into the activations table. Because the table is at
  capacity, a GC is triggered.

* A is reclaimed because the only reference keeping it alive was the activation
  table's reference (it isn't inside any Wasm frames on the stack yet, so stack
  scanning and stack maps don't increment its reference count).

* We transfer control to Wasm, giving it A and B. Wasm uses A. That's a use
  after free.

To prevent uses after free, we cannot GC when moving refs into the
`VMExternRefActivationsTable` because we are passing them from the host to Wasm.

On the other hand, when we are *cloning* -- as opposed to moving -- refs from
the host to Wasm, then it is fine to GC while inserting into the activations
table, because the original referent that we are cloning from is still alive and
rooting the ref.
2021-09-14 14:23:42 -07:00
Alex Crichton
4d4779b563 Restore running precompiled modules with the CLI (#3343)
* Restore running precompiled modules with the CLI

This was accidentally broken when `Module::deserialize` was split out of
`Module::new` long ago, so this adds the detection in the CLI to call
the appropriate method to load the module. This feature is gated behind
an `--allow-precompiled` flag to enable, by default, passing arbitrary
user input to the `wasmtime` command.

Closes #3338

* Fix test on Windows
2021-09-13 15:30:46 -05:00
Pat Hickey
bd19f43f84 rewrite Store::{entering,exiting}_native_code_hook into Store::call_hook (#3313)
which provides additional detail on what state transition is being made
2021-09-09 09:20:45 -05:00
Alex Crichton
bcb1dc90d6 Add an assertion that a HostFunc's store agrees on engines
This commit adds an assertion which was previously forgotten when
inserting a `HostFunc` into a `Store`. This can happen when a `Linker`
is defined with one engine but it's used to interoperate with a store
defined within a different engine.

A function contains type information that's only valid relative to the
engine that it was defined within. This means that if a function is used
within a different engine then type information may look valid when in
fact it is not. For example it's otherwise possible to insert a function
into an engine with one type and call it in a different engine with a
different type.

Similar to how `Store` misuse is a panic throughout `wasmtime`'s API
this commit also turns this behavior into panic, so there's no API
impact. Documentation has been updated accordingly to indicate that
various functions on `Linker` will panic if a `store` is provided that's
connected to a different `Engine`.
2021-09-08 14:42:02 -07:00
Alex Crichton
8ebaaf928d Remove the wasmtime wasm2obj command (#3301)
* Remove the `wasmtime wasm2obj` command

This commit removes the `wasm2obj` subcommand of the `wasmtime` CLI.
This subcommand has a very long history and dates back quite far. While
it's existed, however, it's never been documented in terms of the output
it's produced. AFAIK it's only ever been used for debugging to see the
machine code output of Wasmtime on some modules. With recent changes to
the module serialization output the output of `wasmtime compile`, the
`*.cwasm` file, is now a native ELF file which can be fed to standard
tools like `objdump`. Consequently I dont think there's any remaining
need to keep `wasm2obj` around itself, so this commit removes the
subcommand.

* More code to delete

* Try to fix debuginfo tests
2021-09-08 10:40:58 -05:00
Pat Hickey
63b7120a00 fix the tests 2021-09-02 09:31:21 -07:00
Alex Crichton
1532516a36 Use relative call instructions between wasm functions (#3275)
* Use relative `call` instructions between wasm functions

This commit is a relatively major change to the way that Wasmtime
generates code for Wasm modules and how functions call each other.
Prior to this commit all function calls between functions, even if they
were defined in the same module, were done indirectly through a
register. To implement this the backend would emit an absolute 8-byte
relocation near all function calls, load that address into a register,
and then call it. While this technique is simple to implement and easy
to get right, it has two primary downsides associated with it:

* Function calls are always indirect which means they are more difficult
  to predict, resulting in worse performance.

* Generating a relocation-per-function call requires expensive
  relocation resolution at module-load time, which can be a large
  contributing factor to how long it takes to load a precompiled module.

To fix these issues, while also somewhat compromising on the previously
simple implementation technique, this commit switches wasm calls within
a module to using the `colocated` flag enabled in Cranelift-speak, which
basically means that a relative call instruction is used with a
relocation that's resolved relative to the pc of the call instruction
itself.

When switching the `colocated` flag to `true` this commit is also then
able to move much of the relocation resolution from `wasmtime_jit::link`
into `wasmtime_cranelift::obj` during object-construction time. This
frontloads all relocation work which means that there's actually no
relocations related to function calls in the final image, solving both
of our points above.

The main gotcha in implementing this technique is that there are
hardware limitations to relative function calls which mean we can't
simply blindly use them. AArch64, for example, can only go +/- 64 MB
from the `bl` instruction to the target, which means that if the
function we're calling is a greater distance away then we would fail to
resolve that relocation. On x86_64 the limits are +/- 2GB which are much
larger, but theoretically still feasible to hit. Consequently the main
increase in implementation complexity is fixing this issue.

This issue is actually already present in Cranelift itself, and is
internally one of the invariants handled by the `MachBuffer` type. When
generating a function relative jumps between basic blocks have similar
restrictions. This commit adds new methods for the `MachBackend` trait
and updates the implementation of `MachBuffer` to account for all these
new branches. Specifically the changes to `MachBuffer` are:

* For AAarch64 the `LabelUse::Branch26` value now supports veneers, and
  AArch64 calls use this to resolve relocations.

* The `emit_island` function has been rewritten internally to handle
  some cases which previously didn't come up before, such as:

  * When emitting an island the deadline is now recalculated, where
    previously it was always set to infinitely in the future. This was ok
    prior since only a `Branch19` supported veneers and once it was
    promoted no veneers were supported, so without multiple layers of
    promotion the lack of a new deadline was ok.

  * When emitting an island all pending fixups had veneers forced if
    their branch target wasn't known yet. This was generally ok for
    19-bit fixups since the only kind getting a veneer was a 19-bit
    fixup, but with mixed kinds it's a bit odd to force veneers for a
    26-bit fixup just because a nearby 19-bit fixup needed a veneer.
    Instead fixups are now re-enqueued unless they're known to be
    out-of-bounds. This may run the risk of generating more islands for
    19-bit branches but it should also reduce the number of islands for
    between-function calls.

  * Otherwise the internal logic was tweaked to ideally be a bit more
    simple, but that's a pretty subjective criteria in compilers...

I've added some simple testing of this for now. A synthetic compiler
option was create to simply add padded 0s between functions and test
cases implement various forms of calls that at least need veneers. A
test is also included for x86_64, but it is unfortunately pretty slow
because it requires generating 2GB of output. I'm hoping for now it's
not too bad, but we can disable the test if it's prohibitive and
otherwise just comment the necessary portions to be sure to run the
ignored test if these parts of the code have changed.

The final end-result of this commit is that for a large module I'm
working with the number of relocations dropped to zero, meaning that
nothing actually needs to be done to the text section when it's loaded
into memory (yay!). I haven't run final benchmarks yet but this is the
last remaining source of significant slowdown when loading modules,
after I land a number of other PRs both active and ones that I only have
locally for now.

* Fix arm32

* Review comments
2021-09-01 13:27:38 -05:00
Alex Crichton
9e0c910023 Add a Module::deserialize_file method (#3266)
* Add a `Module::deserialize_file` method

This commit adds a new method to the `wasmtime::Module` type,
`deserialize_file`. This is intended to be the same as the `deserialize`
method except for the serialized module is present as an on-disk file.
This enables Wasmtime to internally use `mmap` to avoid copying bytes
around and generally makes loading a module much faster.

A C API is added in this commit as well for various bindings to use this
accelerated path now as well. Another option perhaps for a Rust-based
API is to have an API taking a `File` itself to allow for a custom file
descriptor in one way or another, but for now that's left for a possible
future refactoring if we find a use case.

* Fix compat with main - handle readdonly mmap

* wip

* Try to fix Windows support
2021-08-31 13:05:51 -05:00