* cranelift-isle: New IR and revised overlap checks
* Improve error reporting
* Avoid "unused argument" warnings a nicer way
* Remove unused fields
* Minimize diff and "fix" error handling
I had tried to use Miette "right" and made things worse somehow. Among
other changes, revert all my changes to unrelated parts of `error.rs`
and `error_miette.rs`.
* Review comments: Rename "unmatchable" to "unreachable"
* Review comments: newtype wrappers, not type aliases
* Review comments: more comments on overlap checks
* Review comments: Clarify `normalize_equivalence_classes`
* Review comments: use union-find instead of linked list
This saves about 50 lines of code in the trie_again module. The
union-find implementation is about twice as long as that, counting
comments and doc-tests, but that's a worth-while tradeoff.
However, this makes `normalize_equivalence_classes` slower, because now
finding all elements of an equivalence class takes time linear in the
total size of all equivalence classes. If that ever turns out to be a
problem in practice we can find some way to optimize `remove_set_of`.
* Review comments: Hide constraints HashMap
We want to enforce that consumers of this representation can't observe
non-deterministic ordering in any of its public types.
* Review comments: Normalize equivalence classes incrementally
I'm not sure whether this is a good idea. It doesn't make the logic
particularly simpler, and I think it will do more work if three or more
binding sites with enum-variant constraints get set equal to each other.
* More comments and other clarifications
* Revert "Review comments: Normalize equivalence classes incrementally"
* Even more comments
* wiggle: adapt Wiggle guest slices for `unsafe` shared use
When multiple threads can concurrently modify a WebAssembly shared
memory, the underlying data for a Wiggle `GuestSlice` and
`GuestSliceMut` could change due to access from other threads. This
breaks Rust guarantees when `&[T]` and `&mut [T]` slices are handed out.
This change modifies `GuestPtr` to make `as_slice` and `as_slice_mut`
return an `Option` which is `None` when the underlying WebAssembly
memory is shared.
But WASI implementations still need access to the underlying WebAssembly
memory, both to read to it and write from it. This change adds new APIs:
- `GuestPtr::to_vec` copies the bytes from WebAssembly memory (from
which we can safely take a `&[T]`)
- `GuestPtr::as_unsafe_slice_mut` returns a wrapper `struct` from which
we can `unsafe`-ly return a mutable slice (users must accept the
unsafety of concurrently modifying a `&mut [T]`)
This approach allows us to maintain Wiggle's borrow-checking
infrastructure, which enforces the guarantee that Wiggle will not modify
overlapping regions, e.g. This is important because the underlying
system calls may expect this. Though other threads may modify the same
underlying region, this is impossible to prevent; at least Wiggle will
not be able to do so.
Finally, the changes to Wiggle's API are propagated to all WASI
implementations in Wasmtime. For now, code locations that attempt to get
a guest slice will panic if the underlying memory is shared. Note that
Wiggle is not enabled for shared memory (that will come later in
something like #5054), but when it is, these panics will be clear
indicators of locations that must be re-implemented in a thread-safe
way.
* review: remove double cast
* review: refactor to include more logic in 'UnsafeGuestSlice'
* review: add reference to #4203
* review: link all thread-safe WASI fixups to #5235
* fix: consume 'UnsafeGuestSlice' during conversion to safe versions
* review: remove 'as_slice' and 'as_slice_mut'
* review: use 'as_unsafe_slice_mut' in 'to_vec'
* review: add `UnsafeBorrowResult`
This commit updates the index allocation performed in the pooling
allocator with a few refactorings:
* With `cfg(fuzzing)` a deterministic rng is now used to improve
reproducibility of fuzz test cases.
* The `Mutex` was pushed inside of `IndexAllocator`, renamed from
`PoolingAllocationState`.
* Randomness is now always done through a `SmallRng` stored in the
`IndexAllocator` instead of using `thread_rng`.
* The `is_empty` method has been removed in favor of an `Option`-based
return on `alloc`.
This refactoring is additionally intended to encapsulate more
implementation details of `IndexAllocator` to more easily allow for
alternate implementations in the future such as lock-free approaches
(possibly).
This commit adds the missing "out of fuel" trap code to the C API.
Without this, calls to `wasmtime_trap_code` will trigger an unreachable panic
on traps from running out of fuel.
This commit prepares the `winch` crate for updating `wasm-tools`,
notably changing a bit about how the visitation of operators works. This
moves the function body and wasm validator out of the `CodeGen`
structure and into parameters threaded into the emission of the actual
function.
Additionally the `VisitOperator` implementation was updated to remove
the explicit calls to the validator, favoring instead a macro-generated
solution to guarantee that all validation happens before any translation
proceeds. This means that the `VisitOperator for CodeGen` impl is now
infallible and the various methods have been inlined into the trait
methods as well as removing the `Result<_>`.
Finally this commit updates translation to call `validator.finish(..)`
which is required to perform the final validation steps of the function
body.
This commit updates the default random context inserted into a
`WasiCtxt` to be seeded from `thread_rng` rather than the system's
entropy. This avoids an unconditional syscall on the creation of all
`WasiCtx` structures shouldn't reduce the quality of the random numbers
produced.
* Fix CI after CVE fixes
Alas we can't run CI ahead of time so this fixes various minor build
issues from the merging of the recent CVE fixes. Note that I plan to
publish the advisories once CI issues are sorted out.
* Fix mmap/free of zero bytes
This is a common pattern in sema, so factor it out.
Since this version uses `intern` instead of `intern_mut`, it might be a
tiny bit faster when errors occur due to not writing names into maps
then. When no error occurs, ISLE should do exactly the same work with or
without this commit.
This commit is an attempt at improving the safety of using the return
value of the `SharedMemory::data` method. Previously this returned
`*mut [u8]` which, while correct, is unwieldy and unsafe to work with.
The new return value of `&[UnsafeCell<u8>]` has a few advantages:
* The lifetime of the returned data is now connected to the
`SharedMemory` itself, removing the possibility for a class of errors
of accidentally using the prior `*mut [u8]` beyond its original lifetime.
* It's not possibly to safely access `.len()` as opposed to requiring an
`unsafe` dereference before.
* The data internally within the slice is now what retains the `unsafe`
bits, namely indicating that accessing any memory inside of the
contents returned is `unsafe` but addressing it is safe.
I was inspired by the `wiggle`-based discussion on #5229 and felt it
appropriate to apply a similar change here.
* Unconditionally use `MemoryImageSlot`
This commit removes the internal branching within the pooling instance
allocator to sometimes use a `MemoryImageSlot` and sometimes now.
Instead this is now unconditionally used in all situations on all
platforms. This fixes an issue where the state of a slot could get
corrupted if modules being instantiated switched from having images to
not having an image or vice versa.
The bulk of this commit is the removal of the `memory-init-cow`
compile-time feature in addition to adding Windows support to the
`cow.rs` file.
* Fix compile on Unix
* Add a stricter assertion for static memory bounds
Double-check that when a memory is allocated the configuration required
is satisfied by the pooling allocator.
Before, we would do a `heap_addr` to translate the given Wasm memory address
into a native memory address and pass it into the libcall that implemented the
atomic operation, which would then treat the address as a Wasm memory address
and pass it to `validate_atomic_addr` to be bounds checked a second time. This
is a bit nonsensical, as we are validating a native memory address as if it were
a Wasm memory address.
Now, we no longer do a `heap_addr` to translate the Wasm memory address to a
native memory address. Instead, we pass the Wasm memory address to the libcall,
and the libcall is responsible for doing the bounds check (by calling
`validate_atomic_addr` with the correct type of memory address now).
The `is_root` flag to `translate_pattern` just determines whether the
`rule_term` argument is used, which begs a larger cleanup. But that
cleanup is less clear if `is_root` is set anywhere aside from the call
in `collect_rules`. So I wanted to get confirmation that this particular
use of that flag is incorrect first.
These two arguments (`is_root` and `rule_term`) are used to prevent
expansion of a term as an internal extractor ("macro") if:
- that term is also an internal constructor
- and it's the root term on the left-hand side of the current rule
- and the pattern we're currently translating has no parents.
I'm not sure what it should mean to use the term you're currently
defining as the root pattern on the left-hand side of an if-let in the
same rule, but I don't think it should have this particular special
treatment.
It's nice to be able to report these names after sema analysis completes
so rule authors can recognize which names they used.
This isn't used anywhere yet, but I'm planning to use it during codegen,
and the rule-verification folks wanted something like this for debugging
output.
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
While checking the call graph of extractors during semantic validation,
save `TermId` instead of `Sym`. The types are both just integer indexes,
but the `TermId` is more useful here. Saving it avoids needing to check
for failed map lookups twice, which simplifies the implementation.
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
* Wasmtime+Cranelift: strip out some dead x86-32 code.
I was recently pointed to fastly/Viceroy#200 where it seems some folks
are trying to compile Wasmtime (via Viceroy) for Windows x86-32 and the
failures may not be loud enough. I've tried to reproduce this
cross-compiling to i686-pc-windows-gnu from Linux and hit build failures
(as expected) in several places. Nevertheless, while trying to discern
what others may be attempting, I noticed some dead x86-32-specific code
in our repo, and figured it would be a good idea to clean this up.
Otherwise, it (i) sends some mixed messages -- "hey look, this codebase
does support x86-32" -- and (ii) keeps untested code around, which is
generally not great.
This PR removes x86-32-specific cases in traphandlers and unwind code,
and Cranelift's native feature detection. It adds helpful compile-error
messages in a few cases. If we ever support x86-32 (contributors
welcome! The big missing piece is Cranelift support; see #1980), these
compile errors and git history should be enough to recover any knowledge
we are now encoding in the source.
I left the x86-32 support in `wasmtime-fiber` alone because that seems
like a bit of a special case -- foundation library, separate from the
rest of Wasmtime, with specific care to provide a (presumably working)
full 32-bit version.
* Remove some extraneous compile_error!s, already covered by others.
This change is the first in a series of changes to support shared memory
in Wiggle. Since Wiggle was written under the assumption of
single-threaded guest-side access, this change introduces a `shared`
field to guest memories in order to flag when this assumption will not
be the case. This change always sets `shared` to `false`; once a few
more pieces are in place, `shared` will be set dynamically when a shared
memory is detected, e.g., in a change like #5054.
Using the `shared` field, we can now decide to load Wiggle values
differently under the new assumptions. This change makes the guest
`T::read` and `T::write` calls into `Relaxed` atomic loads and stores in
order to maintain WebAssembly's expected memory consistency guarantees.
We choose Rust's `Relaxed` here to match the `Unordered` memory
consistency described in the [memory model] section of the ECMA spec.
These relaxed accesses are done unconditionally, since we theorize that
the performance benefit of an additional branch vs a relaxed load is
not much.
[memory model]: https://tc39.es/ecma262/multipage/memory-model.html#sec-memory-model
Since 128-bit scalar types do not have `Atomic*` equivalents, we remove
their `T::read` and `T::write` implementations here. They are unused by
any WASI implementations in the project.
* Implement support for dynamic memories in the pooling allocator
This is a continuation of the thrust in #5207 for reducing page faults
and lock contention when using the pooling allocator. To that end this
commit implements support for efficient memory management in the pooling
allocator when using wasm that is instrumented with bounds checks.
The `MemoryImageSlot` type now avoids unconditionally shrinking memory
back to its initial size during the `clear_and_remain_ready` operation,
instead deferring optional resizing of memory to the subsequent call to
`instantiate` when the slot is reused. The instantiation portion then
takes the "memory style" as an argument which dictates whether the
accessible memory must be precisely fit or whether it's allowed to
exceed the maximum. This in effect enables skipping a call to `mprotect`
to shrink the heap when dynamic memory checks are enabled.
In terms of page fault and contention this should improve the situation
by:
* Fewer calls to `mprotect` since once a heap grows it stays grown and
it never shrinks. This means that a write lock is taken within the
kernel much more rarely from before (only asymptotically now, not
N-times-per-instance).
* Accessed memory after a heap growth operation will not fault if it was
previously paged in by a prior instance and set to zero with `memset`.
Unlike #5207 which requires a 6.0 kernel to see this optimization this
commit enables the optimization for any kernel.
The major cost of choosing this strategy is naturally the performance
hit of the wasm itself. This is being looked at in PRs such as #5190 to
improve Wasmtime's story here.
This commit does not implement any new configuration options for
Wasmtime but instead reinterprets existing configuration options. The
pooling allocator no longer unconditionally sets
`static_memory_bound_is_maximum` and then implements support necessary
for this memory type. This other change to this commit is that the
`Tunables::static_memory_bound` configuration option is no longer gating
on the creation of a `MemoryPool` and it will now appropriately size to
`instance_limits.memory_pages` if the `static_memory_bound` is to small.
This is done to accomodate fuzzing more easily where the
`static_memory_bound` will become small during fuzzing and otherwise the
configuration would be rejected and require manual handling. The spirit
of the `MemoryPool` is one of large virtual address space reservations
anyway so it seemed reasonable to interpret the configuration this way.
* Skip zero memory_size cases
These are causing errors to happen when fuzzing and otherwise in theory
shouldn't be too interesting to optimize for anyway since they likely
aren't used in practice.
Remove the dependency on VCode for VReg allocation. This will simplify the changes in #5172, as that PR introduces the need to allocate temporary registers from the ABI context.
This change also allows us to remove some fields from VCode: reftyped_vregs_set and have_ref_values.
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted. These define the lane order
to be used when casting between types of different lane counts.
Update all users to pass an appropriate MemFlags argument.
Implement lane swaps where necessary in the s390x back-end.
This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
Previously extracting an exit code was only possibly on a `wasm_trap_t`
which will never successfully have an exit code on it, so the exit code
extractor is moved over to `wasmtime_error_t`. Additionally extracting a
wasm trace from a `wasmtime_error_t` is added since traces happen on
both traps and errors now.
I noticed this in the backtrace of something that timed out on oss-fuzz
and there's no need to include this information in trampolines, so this
removes the extra sections from being generated.
* wiggle: fix compilation with async functions when tracing is off
Fixes#5202
* switch tracing config from a boolean to a struct
This will enable more complex tracing rules in the future
* rename AsyncConfField to FunctionField
It is going to be reused for cases other than just async functions
* add support for disabling tracing per-function
This adds a `disable_for` syntax after the `tracing` boolean. For
example:
```
wiggle::from_witx!(
tracing: true disable_for {
module1::foo,
module2::{bar, baz},
}
)
```
The sample program in cranelift/filetests/src/function_runner.rs
would abort with an mprotect failure under certain circumstances,
see https://github.com/bytecodealliance/wasmtime/pull/4453#issuecomment-1303803222
Root cause was that enabling PROT_EXEC on the main process heap
may be prohibited, depending on Linux distro and version.
This only shows up in the doc test sample program because the main
clif-util is multi-threaded and therefore allocations will happen
on glibc's per-thread heap, which is allocated via mmap, and not
the main process heap.
Work around the problem by enabling the "selinux-fix" feature of
the cranelift-jit crate dependency in the filetests. Note that
this didn't compile out of the box, so a separate fix is also
required and provided as part of this PR.
Going forward, it would be preferable to always use mmap to allocate
the backing memory for JITted code.
When new wasm instances are created repeatedly in high-concurrency
environments one of the largest bottlenecks is the contention on
kernel-level locks having to do with the virtual memory. It's expected
that usage in this environment is leveraging the pooling instance
allocator with the `memory-init-cow` feature enabled which means that
the kernel level VM lock is acquired in operations such as:
1. Growing a heap with `mprotect` (write lock)
2. Faulting in memory during usage (read lock)
3. Resetting a heap's contents with `madvise` (read lock)
4. Shrinking a heap with `mprotect` when reusing a slot (write lock)
Rapid usage of these operations can lead to detrimental performance
especially on otherwise heavily loaded systems, worsening the more
frequent the above operations are. This commit is aimed at addressing
the (2) case above, reducing the number of page faults that are
fulfilled by the kernel.
Currently these page faults happen for three reasons:
* When memory is first accessed after the heap is grown.
* When the initial linear memory image is accessed for the first time.
* When the initial zero'd heap contents, not part of the linear memory
image, are accessed.
This PR is attempting to address the latter of these cases, and to a
lesser extent the first case as well. Specifically this PR provides the
ability to partially reset a pooled linear memory with `memset` rather
than `madvise`. This is done to have the same effect of resetting
contents to zero but namely has a different effect on paging, notably
keeping the pages resident in memory rather than returning them to the
kernel. This means that reuse of a linear memory slot on a page that was
previously `memset` will not trigger a page fault since everything
remains paged into the process.
The end result is that any access to linear memory which has been
touched by `memset` will no longer page fault on reuse. On more recent
kernels (6.0+) this also means pages which were zero'd by `memset`, made
inaccessible with `PROT_NONE`, and then made accessible again with
`PROT_READ | PROT_WRITE` will not page fault. This can be common when a
wasm instances grows its heap slightly, uses that memory, but then it's
shrunk when the memory is reused for the next instance. Note that this
kernel optimization requires a 6.0+ kernel.
This same optimization is furthermore applied to both async stacks with
the pooling memory allocator in addition to table elements. The defaults
of Wasmtime are not changing with this PR, instead knobs are being
exposed for embedders to turn if they so desire. This is currently being
experimented with at Fastly and I may come back and alter the defaults
of Wasmtime if it seems suitable after our measurements.
This commit changes the APIs in the `wasmtime` crate for configuring the
pooling allocator. I plan on adding a few more configuration options in
the near future and the current structure was feeling unwieldy for
adding these new abstractions.
The previous `struct`-based API has been replaced with a builder-style
API in a similar shape as to `Config`. This is done to help make it
easier to add more configuration options in the future through adding
more methods as opposed to adding more field which could break prior
initializations.
* cranelift: improve syscall error/oom handling in JIT module
The JIT module has several places where it `expect`s or `panic`s
on syscall or allocator errors. For example, `mmap` and `mprotect`
can fail if Linux `vm.max_map_count` is not high enough, and some
users may wish to handle this error rather than immediately
crashing.
This commit plumbs these errors upward as new `ModuleError`
types, so that callers of jit module functions like
`finalize_definitions` and `define_function` can handle them
(or just `unwrap()`, as desired).
* cranelift: Remove ModuleError::Syscall variant
Syscall errors can just be folded into the generic Backend error,
which is an anyhow::Error
* cranelift-jit: return io::ErrorKind::OutOfMemory for alloc failure
Just using `io::Error::last_os_error()` is not correct as global
allocator impls are not required to set errno
Change add_nan_canon_seq to use vselect instead of bitselect.
This is more straightforward and removes bitcast operations.
Codegen should be unchanged.
The simplifier was performing an optimization to replace bitselect
with vselect if the all bytes of the condition mask could be shown
to be all ones or all zeros.
This optimization only ever made any difference in codegen on the
x64 target. Therefore, move this optimization to the x64 back-end
and perform it in ISLE instead. Resulting codegen should be
unchanged, with slightly improved compile time.
This also eliminates a few endian-dependent bitcast operations.