This commit fixes an incorrect usage of `func_type_at` to retrieve a defined
function signature and instead uses `function_at` to retrieve the signature.
Additionally it enhances `winch-tools` `compile` and `test` commands to handle
modules with multiple functions correctly.
* Remove trailing whitespace in `lower.isle` files
* Legalize the `band_not` instruction into simpler form
This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.
Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.
* Legalize `bor_not` into components
Same as prior commit, but for the `bor_not` instruction.
* Legalize bxor_not into bxor-of-bnot
Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.
* Simplify not-fused operands for riscv64
Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.
* Remove B{and,or,xor}Not from cost of egraph metrics
These shouldn't ever reach egraphs now that they're legalized away.
* Add an egraph optimization for `x^-1 => ~x`
This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.
* Add a wasm test for end-to-end bitwise lowerings
Test that end-to-end various optimizations are being applied for input
wasm modules.
* Specifically don't self-update rustup on CI
I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.
* Cleanup some aarch64 lowering rules
Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.
* Narrow a x64 lowering rule
This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.
* Simplify xor-against-minus-1 rule
No need to have the commutative version since constants are already
shuffled right for egraphs
* Optimize band-of-bnot when bnot is on the left
Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.
* Add commutative lowering rules
* Update cranelift/codegen/src/isa/x64/lower.isle
Co-authored-by: Jamey Sharp <jamey@minilop.net>
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* update cap-std family and its deps, and audit them
* audit base64: append a safe-to-deploy entry
I mistakenly marked it safe-to-run not understanding that safe-to-deploy was required.
* update to fd-lock 3.0.10
eliminates duplicate dep on windows-sys
Also move these optimization rules to cprop.isle; it's where all the
other similar rules are.
Like the other cprop rules, these can subsume any other rules. We can't
do better than reducing an expression to a constant.
The new i64_sextend_imm64 and u64_uextend_imm64 constructors are useful
helpers to clean up other code. I applied them to `imm64_icmp` while I
was here, as well as using the existing `ty_mask` helper to clean up
`imm64_masked`.
This commit adds some missing conversions between Winch's x64 `Reg` type and
Cranelift's `Gpr`, `WritableGpr` and `GprMemImm`. This results in less
boilerplate. This is also a bit of groundwork in the assembler to support
the rest of the integer binary instructions.
This patch introduces basic aarch64 code generation by using
`cranelift-codegen`'s backend.
This commit *does not*:
* Change the semantics of the code generation
* Adds support for other Wasm instructions
The most notable change in this patch is how addressing modes are handled at the
MacroAssembler layer: instead of having a canonical address representation, this
patch introduces the addressing mode as an associated type in the
MacroAssembler trait. This approach has the advantage that gives each ISA enough
flexiblity to describe the addressing modes and their constraints in isolation
without having to worry on how a particular addressing mode is going to affect
other ISAs. In the case of Aarch64 this becomes useful to describe indexed
addressing modes (particularly from the stack pointer).
This patch uses the concept of a shadow stack pointer (x28) as a workaround to
Aarch64's stack pointer 16-byte alignment. This constraint is enforced by:
* Introducing specialized addressing modes when using the real stack pointer; this
enables auditing when the real stack pointer is used. As of this change, the
real stack pointer is only used in the function's prologue and epilogue.
* Asserting that the real stack pointer is not used as a base for addressing
modes.
* Ensuring that at any point during the code generation process where the stack
pointer changes (e.g. when stack space is allocated / deallocated) the value of
the real stack pointer is copied into the shadow stack pointer.
* Prefix component-bindgen-generated-functions with `call_`
This fixes clashes between Rust-native methods and the methods
themselves. For example right now `new` is a Rust-generated function for
constructing the wrapper but this can conflict with a world-exported
function called `new`.
Closes#5585
* Fix types being both shared and owned
This refactors some inherited cruft from the original `wit-bindgen`
repository to be more Wasmtime-specific and fixes a codegen case where
a type was used in both a shared and an owned context.
Closes#5688
* Remove the need to have a `Store` for an `InstancePre`
This commit relaxes a requirement of the `InstancePre` API, notably its
construction via `Linker::instantiate_pre`. Previously this function
required a `Store<T>` to be present to be able to perform type-checking
on the contents of the linker, and now this requirement has been
removed.
Items stored within a linker are either a `HostFunc`, which has type
information inside of it, or an `Extern`, which doesn't have type
information inside of it. Due to the usage of `Extern` this is why a
`Store` was required during the `InstancePre` construction process, it's
used to extract the type of an `Extern`. This commit implements a
solution where the type information of an `Extern` is stored alongside
the `Extern` itself, meaning that the `InstancePre` construction process
no longer requires a `Store<T>`.
One caveat of this implementation is that some items, such as tables and
memories, technically have a "dynamic type" where during type checking
their current size is consulted to match against the minimum size
required of an import. This no longer works when using
`Linker::instantiate_pre` as the current size used is the one when it
was inserted into the linker rather than the one available at
instantiation time. It's hoped, however, that this is a relatively
esoteric use case that doesn't impact many real-world users.
Additionally note that this is an API-breaking change. Not only is the
`Store` argument removed from `Linker::instantiate_pre`, but some other
methods such as `Linker::define` grew a `Store` argument as the type
needs to be extracted when an item is inserted into a linker.
Closes#5675
* Fix the C API
* Fix benchmark compilation
* Add C API docs
* Update crates/wasmtime/src/linker.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
This commit contains a small set of clean up items for x64.
Notably:
* Adds filetests
* Documents why 16 for the arg base offset abi implementation, for clarity.
* Fixes a bug in the spill implementation caught while anlyzing the
filetests results. The fix consists of emitting a load instead of a store into
the scratch register before spiiling its value.
* Remove dead code for pretty printing registers which is not needed anymore
since we now have proper disassembly.
* Cranelift: Introduce the `tail` calling convention
This is an unstable-ABI calling convention that we will eventually use to
support Wasm tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Introduce the `return_call` and `return_call_indirect` instructions
These will be used to implement tail calls for Wasm and any other language
targeting CLIF. The `return_call_indirect` instruction differs from the Wasm
instruction of the same name by taking a native address callee rather than a
Wasm function index.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Implement verification rules for `return_call[_indirect]`
They must:
* have the same return types between the caller and callee,
* have the same calling convention between caller and callee,
* and that calling convention must support tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* cargo fmt
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun
where we accidentally weren't actually checking the `y == !z` precondition. This
commit fixes the precondition check. It also fixes the arithmetic to be
correctly masked to the value type's width.
This reverts commit 268f6bfc1d.
This commit removes the pooling of `Instance` allocations from the
pooling instance allocator. This means that the allocation of `Instance`
(and `VMContext`) memory, now always happens through the system `malloc`
and `free` instead of optionally being part of the pooling instance
allocator. Along the way this refactors the `InstanceAllocator` trait so
the pooling and on-demand allocators can share more structure with this
new property of the implementation.
The main rationale for this commit is to reduce the RSS of long-lived
programs which allocate instances with the pooling instance allocator
and aren't using the "next available" allocation strategy. In this
situation the memory for an instance is never decommitted until the end
of the program, meaning that eventually all instance slots will become
occupied and resident. This has the effect of Wasmtime slowly eating
more and more memory over time as each slot gets an instance allocated.
By switching to the system allocator this should reduce the current RSS
workload from O(used slots) to O(active slots), which is more in line
with expectations.
* Reimplement the pooling instance allocation strategy
This commit is a reimplementation of the strategy by which the pooling
instance allocator selects a slot for a module. Previously there was a
choice amongst three different algorithms: "reuse affinity", "next
available", and "random". The default was "reuse affinity" but some new
data has come to light which shows that this may not always be a good
default.
Notably the pooling allocator will retain some memory per-slot in the
pooling instance allocator, for example instance data or memory data
if-so-configured. This means that a currently unused, but previously
used, slot can contribute to the RSS usage of a program using Wasmtime.
Consequently the RSS impact here is O(max slots) which can be
counter-intuitive for embedders. This particularly affects "reuse
affinity" because the algorithm for picking a slot when there are no
affine slots is "pick a random slot", which means eventually all slots
will get used.
In discussions about possible ways to tackle this, an alternative to
"pick a strategy" arose and is now implemented in this commit.
Concretely the new allocation algorithm for a slot is now:
* First pick the most recently used affine slot, if one exists.
* Otherwise if the number of affine slots to other modules is above some
threshold N then pick the least-recently used affine slot.
* Otherwise pick a slot that's affine to nothing.
The "N" in this algorithm is configurable and setting it to 0 is the
same as the old "next available" strategy while setting it to infinity
is the same as the "reuse affinity" algorithm. Setting it to something
in the middle provides a knob to allow a modest "cache" of affine slots
while not allowing the total set of slots used to grow too much beyond
the maximal concurrent set of modules. The "random" strategy is now no
longer possible and was removed to help simplify the allocator.
* Resolve rustdoc warnings in `wasmtime-runtime` crate
* Remove `max_cold` as it duplicates the `slot_state.len()`
* More descriptive names
* Add a comment and debug assertion
* Add some list assertions
Add a display method to BlockCall that returns a std::fmt::Displayable result. Rework the display code in the write module of cranelift-codegen to use this method instead.
* Fix some warnings on nightly Rust
Cargo is warning about the usage of workspace dependencies where the
workspace declaration does not mention `default-features` but the
dependency mentions `default-features`, so this explicitly turns off
default features for `cranelift-codegen` at the workspace level and
removes the explicit `default-features = false` at the manifest levels.
* Explicitly enable default feature in wasmtime
* Enable another feature
In multi-terms, all matching rules fire. We treat the result as an
unordered set of values, so setting rule priorities is meaningless. We
want to prohibit relying on the rule match order in this case.
Also, codegen can produce invalid Rust if rules with different
priorities both match against a multi-term. We first documented this
symptom in #5647. As far as I can figure, prohibiting rule priorities
prevents all possible instances of that bug.
At some point in the future we might decide we want to carefully define
semantics for multi-term result ordering, at which point we can revisit
this.
This fixes the build issue identified in #5664 at the toolchain level
rather than working around it in our own build. The next step in fixing
this will be to remove the nightly override in the future when the
toolchain becomes stable.
Souper requires an `i1` condition value, we don't and will implicitly check
against 0. We were truncating conditions but should actually be doing the
comparison against `0`.
Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.
This allows the `wasmtime` binary provided in our release artifacts to
cross-compile: `wasmtime compile` can build a `.cwasm` for any platform
that Wasmtime supports, not just the host platform. This may be useful
in some deployment scenarios.
We don't turn on `all-arch` by default because it increases build time
and binary size of Wasmtime itself, and other embedders of the
`wasmtime` crate won't necessarily want this; hence, we set it only as
part of the CI build configuration.
Fixes#5655.
* Cranelift: Harvest each Souper LHS into its own file
Souper only handles one input LHS at a time, so this makes it way easier to
script. Don't need to try and parse each LHS.
* Add audit of `arrayref` version 0.3.6
* Add audit of `constant_time_eq` version 0.2.4
* Add several `WASMTIME_VERSION_*` macros to `wasmtime.h`.
* Update `scripts/publish.rs`
* To set these macros as per the new version in `./Cargo.toml` during
`./publish bump`.
* To verify the macros match the version in `./Cargo.toml` during
`./publish verify`.
Fix#5635
Fix an error introduced in #5644, where an unsigned subtraction from zero was possible with an empty Switch structure. Additionally, missing the empty case caused us to not emit a branch to the default block. This PR fixes the issue by detecting the empty Switch case early, and emitting a jump.