* Improve cranelift disassembly of stack maps
Print out extra information about stack maps such as their contents and
other related metadata available. Additionally also print out addresses
in hex to line up with the disassembly otherwise printed as well.
* Improve the `table_ops` fuzzer
* Generate more instructions by default
* Fix negative indices appearing in `table.{get,set}`
* Assert that the traps generated are expected to prevent accidental
other errors reporting a fuzzing success.
* Fix `reftype_vregs` reported to `regalloc2`
This fixes a mistake in the register allocation of Cranelift functions
where functions using reference-typed arguments incorrectly report which
virtual registers are reference-typed values if there are vreg aliases
in play. The fix here is to apply the vreg aliases to the final list of
reftyped regs which is eventually passed to `regalloc2`.
The main consequence of this fix is that functions which previously
accidentally didn't have correct stack maps should now have the missing
stack maps.
* Add a test that `table_ops` gc's eventually
* Add a comment about new alias resolution
* Update crates/fuzzing/src/oracles.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add some comments
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
With branch protections enabled that would otherwise mean that the PR
cannot be landed since CI is now required to run. These date-update PRs
typically come at odd off-hours for Wasmtime anyway so it should be fine
to run CI.
Preserving frame pointers -- even inside leaf functions -- makes it easy to
capture the stack of a running program, without requiring any side tables or
metadata (like `.eh_frame` sections). Many sampling profilers and similar tools
walk frame pointers to capture stacks. Enabling this option will play nice with
those tools.
These were for x86 (32-bit) where the ISA didn't have instructions for these
things, but now that we don't support that, and always have SSE2 for x86_64, we
never need or use these libcalls anymore.
Converted the existing implementations for the following Opcodes to ISLE on AArch64:
- `fadd`
- `fsub`
- `fmul`
- `fdiv`
- `fmin`
- `fmax`
- `fmin_pseudo`
- `fmax_pseudo`
Copyright (c) 2022 Arm Limited
This adds full support for all Cranelift SIMD instructions
to the s390x target. Everything is matched fully via ISLE.
In addition to adding support for many new instructions,
and the lower.isle code to match all SIMD IR patterns,
this patch also adds ABI support for vector types.
In particular, we now need to handle the fact that
vector registers 8 .. 15 are partially callee-saved,
i.e. the high parts of those registers (which correspond
to the old floating-poing registers) are callee-saved,
but the low parts are not. This is the exact same situation
that we already have on AArch64, and so this patch uses the
same solution (the is_included_in_clobbers callback).
The bulk of the changes are platform-specific, but there are
a few exceptions:
- Added ISLE extractors for the Immediate and Constant types,
to enable matching the vconst and swizzle instructions.
- Added a missing accessor for call_conv to ABISig.
- Fixed endian conversion for vector types in data_value.rs
to enable their use in runtests on the big-endian platforms.
- Enabled (nearly) all SIMD runtests on s390x. [ Two test cases
remain disabled due to vector shift count semantics, see below. ]
- Enabled all Wasmtime SIMD tests on s390x.
There are three minor issues, called out via FIXMEs below,
which should be addressed in the future, but should not be
blockers to getting this patch merged. I've opened the
following issues to track them:
- Vector shift count semantics
https://github.com/bytecodealliance/wasmtime/issues/4424
- is_included_in_clobbers vs. link register
https://github.com/bytecodealliance/wasmtime/issues/4425
- gen_constant callback
https://github.com/bytecodealliance/wasmtime/issues/4426
All tests, including all newly enabled SIMD tests, pass
on both z14 and z15 architectures.
* Implement `iabs` in ISLE (AArch64)
Converts the existing implementation of `iabs` for AArch64 into ISLE,
and fixes support for `iabs` on scalar values.
Copyright (c) 2022 Arm Limited.
* Improve scalar `iabs` implementation.
Also introduces `CSNeg` instruction.
Copyright (c) 2022 Arm Limited
* Convert `scalar_to_vector` to ISLE (AArch64)
Converted the exisiting implementation of `scalar_to_vector` for AArch64 to
ISLE.
Copyright (c) 2022 Arm Limited
* Add support for floats and fix FpuExtend
- Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for
`scalar_to_vector`
- Added tests for `scalar_to_vector` on floats.
- Corrected an invalid instruction emitted by `FpuExtend` on 64-bit
values.
Copyright (c) 2022 Arm Limited
This commit augments the current translation phase of components with
extra machinery to track the type information of component items such as
instances, components, and functions. The end goal of this commit is to
enable the `Lower` instruction to know the type of the component
function being lowered. Currently during the inlining pass where
component fusion is detected the type of the lifted function is known,
but to implement fusion entirely the type of the lowered function must
be known. Note that these two types are expected to be different to
allow for the subtyping rules specified by the component model.
For now nothing is actually done with this information other than noting
its presence in the face of a lifted-then-lowered function. My hope
though was to split this out for a separate review to avoid making a
future component-adapter-compiler-containing-PR too large.
This moves them into a new `wasmtime-asm-macros` crate that can be used not just
from the `wasmtime-fibers` crate but also from other crates (e.g. we will need
them in https://github.com/bytecodealliance/wasmtime/pull/4431).
This commit fixes an issue with the initialization of element segments
when one of the elements in the element segment is `ref.func null`.
Previously the contents of a table were accidentally initialized with
the raw value of the `*mut VMCallerCheckedAnyfunc` which bypassed the
"this is initialized" encoding of function table entries that Wasmtime
uses for lazy table initialization. The fix here was to ensure that the
encoded form is used.
The impact of this issue is that a module could panic at runtime when
accessing a table element that was initialized with an element segment
containing a `ref.null func` entry. This only happens with imported
tables in a WebAssembly module where the table itself was defined on the
host. If the table was defined in another wasm module or in the local
wasm module this bug would not occur. Additionally this bug requires
enabling the reference types proposal for WebAssembly (which is enabled
by default) due to the usage of encodings for null funcrefs in element
segments.
When parsing isa specific values we were accidentally discarding the
value of the flag, and treating it always as a boolean flag.
This would cause a `clif-util` invocation such as
`cargo run -- compile -D --set has_sse41=false --target x86_64 test.clif`
to be interpreted as `--set has_sse41` and enable that feature instead
of disabling it.
Rather than sometimes using `file-per-thread-logger`.
Also remove the debug CLI flags, so that we can always just define
`RUST_LOG=...` to get logging and don't need to also do other things.
* implement wasmtime::component::flags! per #4308
This is the last macro needed to complete #4308. It supports generating a Rust
type that represents a `flags` component type, analogous to how the [bitflags
crate](https://crates.io/crates/bitflags) operates.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* wrap `format_flags` output in parens
This ensures we generate non-empty output even when no flags are set. Empty
output for a `Debug` implementation would be confusing.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* unconditionally derive `Lift` and `Lower` in wasmtime::component::flags!
Per feedback on #4414, we now derive impls for those traits unconditionally,
which simplifies the syntax of the macro.
Also, I happened to notice an alignment bug in `LowerExpander::expand_variant`,
so I fixed that and cleaned up some related code.
Finally, I used @jameysharp's trick to calculate bit masks without looping.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* fix shift overflow regression in previous commit
Jamey pointed out my mistake: I didn't consider the case when the flag count was
evenly divisible by the representation size. This fixes the problem and adds
test cases to cover it.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Allow using jump-tables multiple times (fixes#3347)
If there are multiple `br_table` instructions using the same jump table,
then `append_jump_argument` must not modify the jump table in-place.
When this function is called, we don't know if more `br_table`
instructions might be added later. So this patch conservatively assumes
that all jump tables might be reused. If Cranelift needs to add a block
argument to a block that's the target of some jump table, then the jump
table will be unconditionally cloned.
I'm not sure if having duplicated and unused jump tables will turn out
to be a compile-time performance issue. If it is, there's discussion in
issue #3347 about ways to determine that there can't be any more uses of
a jump table, so that it's safe to modify in-place.
* Re-enable cranelift-fuzzgen fuzz target
I've been running this fuzz target for an hour without finding new bugs.
Let's see if oss-fuzz finds anything now.
As @MaxGraey pointed out (thanks!) in #4397, `round` has different
behavior from `nearest`. And it looks like the native rust
implementation is still pending stabilization.
Right now we duplicate the wasmtime implementation, merged in #2171.
However, we definitely should switch to the rust native version
when it is available.
* fuzz: add a single instruction module generator
As proposed by @cfallin in #3251, this change adds a way to generate a
Wasm module for a single instruction. It captures the necessary
parameter and result types so that fuzzing can not only choose which
instruction to check but also generate values to pass to the
instruction. Not all instructions are available yet, but a significant
portion of scalar instructions are implemented in this change.
This does not wire the generator up to any fuzz targets.
* review: use raw string in test
* review: remove once_cell, use slices
* review: refactor macros to use valtype!
* review: avoid cloning when choosing a SingleInstModule
The compile step that cranelift-fuzzgen does also triggers IR
verification. So all bugs that cranelift-fuzzgen-verify could catch are
also caught by cranelift-fuzzgen. Removing redundant fuzzers lets us
spend limited fuzz-testing CPU time budgets better.
This commit removes Wasmtime's dependency on the `region` crate. The
motivation for this came about when I was updating dependencies and saw
that `region` had a new major version at 3.0.0 as opposed to our
currently used 2.3 track. In reviewing the use cases of `region` within
Wasmtime I found two trends in particular which motivated this commit:
* Some unix-specific areas of `wasmtime_runtime` use
`rustix::mm::mprotect` instead of `region::protect` already. This
means that the usage of `region::protect` for changing virtual memory
protections was already inconsistent.
* Many uses of `region::protect` were already in unix-specific regions
which could make use of `rustix`.
Overall I opted to remove the dependency on the `region` crate to avoid
chasing its versions over time. Unix-specific changes of protections
were easily changed to `rustix::mm::mprotect`. There were two locations
where a windows/unix split is now required and I subjectively ruled
"that seems ok". Finally removing `region` also meant that the "what is
the current page size" query needed to be inlined into
`wasmtime_runtime`, which I have also subjectively ruled "that seems
fine".
Finally one final refactoring here was that the `unix.rs` and `linux.rs`
split for the pooling allocator was merged. These two files already only
differed in one function so I slapped a `cfg_if!` in there to help
reduce the duplication.
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.
Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.
The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.
Copyright (c) 2022, Arm Limited.
Previously, much of the logic for generating the various objects needed
for fuzzing was concentrated primarily in `generators.rs`. In trying to
piece together what code does what, the size of the file and the light
documentation make it hard to discern what each part does. Since several
generator structures had been split out as separate modules in the
`generators/` directory, this change takes that refactoring further by
moving the structures in `generators.rs` to their own modules. No logic
changes were made, only the addition of documentation in a few places.
@yuyang-ok reported via zulip that i128 overflow tests were:
1. different from the interpreter implementation
2. wrong on some of the test cases
This fixes both the tests and the aarch64 implementation and adds the
interpreter to the testsuite.
* wasmtime: Rename host->wasm trampolines
As we introduce new types of trampolines, having clear names for our existing
trampolines will be helpful.
* Fix typo in docs for `VMCOMPONENT_MAGIC`
* wasmtime: Add criterion micro benchmarks for traps
* x64: port `atomic_rmw` to ISLE
This change ports `atomic_rmw` to ISLE for the x64 backend. It does not
change the lowering in any way, though it seems possible that the fixed
regs need not be as fixed and that there are opportunities for single
instruction lowerings. It does rename `inst_common::AtomicRmwOp` to
`MachAtomicRmwOp` to disambiguate with the IR enum with the same name.
* x64: remove remaining hardcoded register constraints for `atomic_rmw`
* x64: use `SyntheticAmode` in `AtomicRmwSeq`
* review: add missing reg collector for amode
* review: collect memory registers in the 'late' phase
Additionally remove the `rlib` crate type so it's possible to build the
API with LTO options if configured (otherwise Cargo ignores LTO
configuration with an `rlib` output since it would hit an error in
rustc)