This defines the full set of 32 128-bit vector registers on s390x.
(Note that the VRs overlap the existing FPRs.) In addition, this
adds support to use all 32 vector registers to implement floating-
point operations, by using vector floating-point instructions with
the 'W' bit set to operate only on the first element.
This part of the vector instruction set mostly matches the old FP
instruction set, with two exceptions:
- There is no vector version of the COPY SIGN instruction. Instead,
now use a VECTOR SELECT with an appropriate bit mask to implement
the fcopysign operation.
- There are no vector version of the float <-> int conversion
instructions where source and target differ in bit size. Use
appropriate multiple conversion steps instead. This also requires
use of explicit checking to implement correct overflow handling.
As a side effect, this version now also implements the i8 / i16
variants of all conversions, which had been missing so far.
For all operations except those two above, we continue to use the
old FP instruction if applicable (i.e. if all operands happen to
have been allocated to the original FP register set), and use the
vector instruction otherwise.
* support variant, enum, and union derives
This is the second stage of implementing #4308. It adds support for deriving
variant, enum, and union impls for `ComponentType`, `Lift`, and `Lower`. It
also fixes derived record impls for generic `struct`s, which I had intended to
support in my previous commit, but forgot to test.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* deduplicate component-macro code
Thanks to @jameysharp for the suggestion!
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Move from passing and returning u8 and u16 values to u32 in many of
the functions. This removes a number of type conversions and gives
a small compilation time speedup, around ~0.7% on my aarch64 machine.
Copyright (c) 2022, Arm Limited.
Now that lowering is fully done in ISLE, clean up some code remnants
in lower.rs. In particular, move code to lower/isle.rs where
possible, and inline lower_insn_to_regs into its caller and simplify.
* Wasmtime: disable unwind_info unless needed
fixes#4350
Otherwise wasm modules will be built with unwind info, even if
backtraces are disabled. This can get expensive in deeply recursive
modules.
* Wasmtime: test that disabling backtraces disables unwind_info
* fix: make sure we have unwind_info when the engine needs it
This commit updates the management of the `may_enter` flag in line with
WebAssembly/component-model#57. Namely the `may_enter` flag is now
exclusively managed in the `canon lift` function (which is
`TypedFunc::call`) and is only unset after post-return completes
successfully. This implements semantics where if any trap happens for
any reason (lifting, lowering, execution, imports, etc) then the
instance is considered permanently poisoned and can no longer be
entered.
Tests needed many updates to create new instances where previously the
same instance was reused after it had an erroneous state.
The more I read over this again the more I think that these should be
constants to explicitly indicate that we're supposed to be able to
optimize for them. Currently I'm predicting that adding memory64 support
will probably double the surface area of each trait (e.g. `lower32` and
`lower64`) rather than have a parameter passed around. This is in the
hopes that having specialized 32 and 64-bit paths will enable better
optimizations within each path instead of having to check all bounds
everywhere.
Additionally one day I'd like to have `fn load(bytes: &[u8;
Self::SIZE32])` but that doesn't work today in Rust.
This adds infrastructure to allow implementing call and return
instructions in ISLE, and migrates the s390x back-end.
To implement ABI details, this patch creates public accessors
for `ABISig` and makes them accessible in ISLE. All actual
code generation is then done in ISLE rules, following the
information provided by that signature.
[ Note that the s390x back end never requires multiple slots for
a single argument - the infrastructure to handle this should
already be present, however. ]
To implement loops in ISLE rules, this patch uses regular tail
recursion, employing a `Range` data structure holding a range
of integers to be looped over.
* Implement `canon lower` of a `canon lift` function in the same component
This commit implements the "degenerate" logic for implementing a
function within a component that is lifted and then immediately lowered
again. In this situation the lowered function will immediately generate
a trap and doesn't need to implement anything else.
The implementation in this commit is somewhat heavyweight but I think is
probably justified moreso in future additions to the component model
rather than what exactly is here right now. It's not expected that this
"always trap" functionality will really be used all that often since it
would generally mean a buggy component, but the functionality plumbed
through here is hopefully going to be useful for implementing
component-to-component adapter trampolines.
Specifically this commit implements a strategy where the `canon.lower`'d
function is generated by Cranelift and simply has a single trap
instruction when called, doing nothing else. The main complexity comes
from juggling around all the data associated with these functions,
primarily plumbing through the traps into the `ModuleRegistry` to
ensure that the global `is_wasm_trap_pc` function returns `true` and at
runtime when we lookup information about the trap it's all readily
available (e.g. translating the trapping pc to a `TrapCode`).
* Fix non-component build
* Fix some offset calculations
* Only create one "always trap" per signature
Use an internal map to deduplicate during compilation.
This is the first stage of implementing
https://github.com/bytecodealliance/wasmtime/issues/4308, i.e. derive macros for
`ComponentType`, `Lift`, and `Lower` for composite types in the component model.
This stage only covers records; I expect the other composite types will follow a
similar pattern.
It borrows heavily from the work Jamey Sharp did in
https://github.com/bytecodealliance/wasmtime/pull/4217. Thanks for that, and
thanks to both Jamey and Alex Crichton for their excellent review feedback.
Thanks also to Brian for pairing up on the initial draft.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Currently I don't know how we can reasonably implement this. Given all
the signatures of how we call functions and how functions are called on
the host there's no real feasible way that I know of to hook these two
up "seamlessly". This means that a component which reexports an imported
function can't be run in Wasmtime.
One of the main reasons for this is that when calling a component
function Wasmtime wants to lower arguments first and then have them
lifted when the host is called. With a reexport though there's not
actually anything to lower into so we'd sort of need something similar
to a table on the side or maybe a linear memory and that seems like it'd
get quite complicated quite quickly for not really all that much
benefit. As-such for now this simply returns a first-class error (rather
than the current panic) in situations like this.
* Implement lowered-then-lifted functions
This commit is a few features bundled into one, culminating in the
implementation of lowered-then-lifted functions for the component model.
It's probably not going to be used all that often but this is possible
within a valid component so Wasmtime needs to do something relatively
reasonable. The main things implemented in this commit are:
* Component instances are now assigned a `RuntimeComponentInstanceIndex`
to differentiate each one. This will be used in the future to detect
fusion (one instance lowering a function from another instance). For
now it's used to allocate separate `VMComponentFlags` for each
internal component instance.
* The `CoreExport<FuncIndex>` of lowered functions was changed to a
`CoreDef` since technically a lowered function can use another lowered
function as the callee. This ended up being not too difficult to plumb
through as everything else was already in place.
* A need arose to compile host-to-wasm trampolines which weren't already
present. Currently wasm in a component is always entered through a
host-to-wasm trampoline but core wasm modules are the source of all
the trampolines. In the case of a lowered-then-lifted function there
may not actually be any core wasm modules, so component objects now
contain necessary trampolines not otherwise provided by the core wasm
objects. This feature required splitting a new function into the
`Compiler` trait for creating a host-to-wasm trampoline. After doing
this core wasm compilation was also updated to leverage this which
further enabled compiling trampolines in parallel as opposed to the
previous synchronous compilation.
* Review comments
* Migrate from `winapi` to `windows-sys`
I believe that Microsoft itself is supporting the development of
`windows-sys` and it's also used by `cap-std` now so this switches
Wasmtime's dependencies on Windows APIs from the `winapi` crate to the
`windows-sys` crate. We still have `winapi` in our dependency graph but
that may get phased out over time.
* Make windows-sys a target-specific dependency
* Improve error message for failed function compiles
Add in the wasm function index, the name if specified, and the function
offset in the original file to assist in debugging failed function compiles.
* Review commments
- Handle call instructions' clobbers with the clobbers API, using RA2's
clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers
list;
- Pull in changes from bytecodealliance/regalloc2#59 for much more sane
edge-case behavior w.r.t. liverange splitting.
* Change how unwind information is stored on Windows
Unwind information on Windows is stored in two separate locations. The
first location is the unwind information itself which corresponds to
`UNWIND_INFO`. The second location is a list of `RUNTIME_INFO`
structures which point to function bodes and `UNWIND_INFO` structures.
Currently in Wasmtime the `UNWIND_INFO` structures are stored just after
functions themselves with a somewhat cryptic comment indicating that
Windows prefers this (I'm unsure as to the provenance of this comment).
The `RUNTIME_INFO` data is then stored in a separate section which has
the custom name of `_wasmtime_winx64_unwind`.
After my recent foray into trying to debug windows-2022 bad unwind
information again I realized though that Windows actually has official
sections for these two unwind information items. The `.xdata` section is
used to store the `UNWIND_INFO` structures and the `.pdata` section
stores the `RUNTIME_INFO` list. To try to be somewhat idiomatic and
perhaps one day even hook into standard Windows debugging tools I went
ahead and refactored how our unwind information is stored to match this.
Perhaps the main benefit of this is that it reduces the size of the
read/execute section of the binary. Previously the unwind information
was executable since it was stored in the `.text` section, but
unnecessarily so. Now it's in a read-only section which is in theory a
small amount of hardening.
Otherwise though I don't think this will really help all that much to
hook up in to standard debugging tools like `objdump` because it's all
still stored in an ELF file rather than a COFF file.
* Review comments
This commit extends the `WasmList<T>` type to have an
`as_slice`-lookalike method (now renamed to `as_le_slice`) for all
integer types rather than just the `u8` type. With the guarantees of the
component model it's known that all lists are aligned in linear memory.
Additionally linear memories themselves are also generally guaranteed to
be aligned. This means that hosts where the primitive integer alignment
is at most the size (which I think is basically all host platforms) can
get a raw view into memory for the wasm linear memory for slices of
these types.
Note, though, that the remaining caveat after alignment is endianness.
Big-endian hosts need to be aware that the integers aren't stored in a
native format. Previously tools like wit-bindgen have added an `Le<T>`
wrapper but for now I've opted to instead use a method that has "le" in
the name - `as_le_slice`. I'm hoping that this is a clear enough
indicator for users to little-endian conversions as appropriate when
reading the values within the slice.
Turns out that `adr` doesn't work in inline assembly within LLVM on
arm macOS, or at least not how we were using it. This switches instead
to an `adrp` and `add` pair which seems to convince the linker that the
relocations should all fit. The same pattern is used on Linux as well
only it has different syntax (so much for a portable assembler) for
consistency. Performance isn't really an issue here so there's no need
to go out of our way to get the single-instruction operand working.
The previous `cls` code was producing wrong results when fed with a -1 i8.
The fix here is to sign extend instead of zero extending since we want
to keep the sign bit as one in order for it to be counted correctly
in the cls instruction
This also merges the interpreter only tests now that aarch64
correctly supports this instruction
* Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60.
* Update RELEASES.md.
* Update two compile tests based on slightly shifting regalloc output.
Previous to this commit Wasmtime would use the `GlobalModuleRegistry`
when learning information about a trap such as its trap code, the
symbols for each frame, etc. This has a downside though of holding a
global read-write lock for the duration of this operation which hinders
registration of new modules in parallel. In addition there was a fair
amount of internal duplication between this "global module registry" and
the store-local module registry. Finally relying on global state for
information like this gets a bit more brittle over time as it seems best
to scope global queries to precisely what's necessary rather than
holding extra information.
With the refactoring in wasm backtraces done in #4183 it's now possible
to always have a `StoreOpaque` reference when a backtrace is collected
for symbolication and otherwise Trap-identification purposes. This
commit adds a `StoreOpaque` parameter to the `Trap::from_runtime`
constructor and then plumbs that everywhere. Note that while doing this
I changed the internal `traphandlers::lazy_per_thread_init` function to
no longer return a `Result` and instead just `panic!` on Unix if memory
couldn't be allocated for a stack. This removed quite a lot of
error-handling code for a case that's expected to quite rarely happen.
If necessary in the future we can add a fallible initialization point
but this feels like a better default balance for the code here.
With a `StoreOpaque` in use when a trap is being symbolicated that means
we have a `ModuleRegistry` which can be used for queries and such. This
meant that the `GlobalModuleRegistry` state could largely be dismantled
and moved to per-`Store` state (within the `ModuleRegistry`, mostly just
moving methods around).
The final state is that the global rwlock is not exclusively scoped
around insertions/deletions/`is_wasm_trap_pc` which is just a lookup and
atomic add. Otherwise symbolication for a backtrace exclusively uses
store-local state now (as intended).
The original motivation for this commit was that frame information
lookup and pieces were looking to get somewhat complicated with the
addition of components which are a new vector of traps coming out of
Cranelift-generated code. My hope is that by having a `Store` around for
more operations it's easier to plumb all this through.
`idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width
greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit
divisor from `ax`, as our CISC ancestors intended. This PR fixes the
metadata to avoid a regalloc panic (due to undefined `rdx`) in this
case. Does not affect Wasmtime or other Wasm-frontend embedders.
Fuzzers weren't updated to account for #4262 where guard sizes are now
validated rather than automatically sanitized. I'm not sure why oss-fuzz
hasn't filed a bug about this yet because it's definitely crashing a lot
on oss-fuzz...
This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into #4315.
* Use `global_asm!` instead of external assembly files
This commit moves the external assembly files of the `wasmtime-fiber`
crate into `global_asm!` blocks defined in Rust. The motivation for
doing this is not very strong at this time, but the points in favor of
this are:
* One less tool needed to cross-compile Wasmtime. A linker is still
needed but perhaps one day that will improve as well.
* A "modern" assembler, built-in to LLVM, is used instead of whatever
appears on the system.
The first point hasn't really cropped up that much and typically getting
an assembler is just as hard as getting a linker nowadays. The second
point though has us using `hint #xx` in aarch64 assembly instead of the
actual instructions for assembler compatibility, and I believe that's no
longer necessary because the LLVM assembler supports the modern
instruction names.
The translation of the x86/x86_64 assembly has been done to Intel
syntax as well as opposed to the old AT&T syntax since that's Rust's
default. Additionally s390x still remains in an external assembler file
because `global_asm!` is still unstable in Rust on that platform.
* Simplify alignment specification
* Temporarily disable fail-fast
* Add `.cfi_def_cfa_offset 0` to fix CI
* Turn off fail-fast
* Review comments
Currently `cargo doc` fails for a number of broken links and this commit
fixes all of them. Currently I don't think it's worth adding this to CI
because we don't actually generate docs for the component model anywhere
yet. When the component model support is compiled in by default I think
it would make sense to implement that.
This commit fixes a bug in the previous codegen for the `select`
instruction when the operations of the `select` were of the `v128` type.
Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32
or 64 for a 64 or 32-bit move, but this was also used for these 128-bit
types which meant that when used the wrong move instruction was
generated. The fix applied here is to store the whole `Type` being moved
so the 128-bit variant can be selected as well.
This commit is a small refactoring of `wasmtime_runtime::Trap` and
various internals. The `Trap` structure is now a reason plus backtrace,
and the old `Trap` enum is mostly in `TrapReason` now. Additionally all
`Trap`-returning methods of `wasmtime_runtime` are changed to returning
a `TrapCode` to indicate that they never capture a backtrace. Finally
the `UnwindReason` internally now no longer duplicates the trap reasons,
instead only having two variants of "panic" and "trap".
The motivation for this commit is mostly just cleaning up trap internals
and removing the need for methods like
`wasmtime_runtime::Trap::insert_backtrace` to leave it only happening at
the `wasmtime` layer.
* Add a method to Linker and flag to wasmtime-cli to trap unknown import funcs
Sometimes users have a Command module which imports functions unknown to
the wasmtime-cli, but does not call them at runtime. This PR provides a
convenience method on Linker to define all unknown import functions in
a given Module as a trivial implementation which traps, and hooks this
up to a new cli flag --trap-unknown-imports.
* add cfg guards - func_new requires compiler (naturally)
* cranelift: Fix `bint` implementation on interpreter
The interpreter was returning -1 instead of 1 for positive values.
This also extends the bint test suite to cover all types.
* cranelift: Restrict `bint` to scalar values only
This commit implements the `post-return` feature of the canonical ABI in
the component model. This attribute is an optionally-specified function
which is to be executed after the return value has been processed by the
caller to optionally clean-up the return value. This enables, for
example, returning an allocated string and the host then knows how to
clean it up to prevent memory leaks in the original module.
The API exposed in this PR changes the prior `TypedFunc::call` API in
behavior but not in its signature. Previously the `TypedFunc::call`
method would set the `may_enter` flag on the way out, but now that
operation is deferred until a new `TypedFunc::post_return` method is
called. This means that once a method on an instance is invoked then
nothing else can be done on the instance until the `post_return` method
is called. Note that the method must be called irrespective of whether
the `post-return` canonical ABI option was specified or not. Internally
wasm will be invoked if necessary.
This is a pretty wonky and unergonomic API to work with. For now I
couldn't think of a better alternative that improved on the ergonomics.
In the theory that the raw Wasmtime bindings for a component may not be
used all that heavily (instead `wit-bindgen` would largely be used) I'm
hoping that this isn't too much of an issue in the future.
cc #4185
This updates to rustix 0.35.6, and updates wasi-common to use cap-std 0.25 and
windows-sys (instead of winapi).
Changes include:
- Better error code mappings on Windows.
- Fixes undefined references to `utimensat` on Darwin.
- Fixes undefined references to `preadv64` and `pwritev64` on Android.
- Updates to io-lifetimes 0.7, which matches the io_safety API in Rust.
- y2038 bug fixes for 32-bit platforms
One test case I wrote recently was to import a lowered function into a
wasm module and then immediately export it. This previously didn't work
because trampoline lookup would fail as the original
`VMCallerCheckedAnyfunc` function pointer points into the
`trampoline_obj` of a component which wasn't registered with the
`ModuleRegistry`. This plumbs through the necessary configuration to get
that all hooked up.
* Update comment on stack overflow checking
This commit moves the top-level comment in `crates/cranelift/src/lib.rs`
into the location where the global value for the stack limit is
generated. Stack overflow checking is pretty localized nowadays so
there's not much need to have it at the top of the crate and most of the
words there were just adapted to this new location.
Closes#4286
* Review comments
* Document a caveat regarding `max_wasm_stack`
Specifically, that the `max_wasm_stack` only limits the stack that can
be consumed by wasm, but it does not guarantee that the so much stack
space will be available.
* rustfmt
* Fix the claim about reseting the stack limit.