* Reimplement the pooling instance allocation strategy
This commit is a reimplementation of the strategy by which the pooling
instance allocator selects a slot for a module. Previously there was a
choice amongst three different algorithms: "reuse affinity", "next
available", and "random". The default was "reuse affinity" but some new
data has come to light which shows that this may not always be a good
default.
Notably the pooling allocator will retain some memory per-slot in the
pooling instance allocator, for example instance data or memory data
if-so-configured. This means that a currently unused, but previously
used, slot can contribute to the RSS usage of a program using Wasmtime.
Consequently the RSS impact here is O(max slots) which can be
counter-intuitive for embedders. This particularly affects "reuse
affinity" because the algorithm for picking a slot when there are no
affine slots is "pick a random slot", which means eventually all slots
will get used.
In discussions about possible ways to tackle this, an alternative to
"pick a strategy" arose and is now implemented in this commit.
Concretely the new allocation algorithm for a slot is now:
* First pick the most recently used affine slot, if one exists.
* Otherwise if the number of affine slots to other modules is above some
threshold N then pick the least-recently used affine slot.
* Otherwise pick a slot that's affine to nothing.
The "N" in this algorithm is configurable and setting it to 0 is the
same as the old "next available" strategy while setting it to infinity
is the same as the "reuse affinity" algorithm. Setting it to something
in the middle provides a knob to allow a modest "cache" of affine slots
while not allowing the total set of slots used to grow too much beyond
the maximal concurrent set of modules. The "random" strategy is now no
longer possible and was removed to help simplify the allocator.
* Resolve rustdoc warnings in `wasmtime-runtime` crate
* Remove `max_cold` as it duplicates the `slot_state.len()`
* More descriptive names
* Add a comment and debug assertion
* Add some list assertions
Add a display method to BlockCall that returns a std::fmt::Displayable result. Rework the display code in the write module of cranelift-codegen to use this method instead.
* Fix some warnings on nightly Rust
Cargo is warning about the usage of workspace dependencies where the
workspace declaration does not mention `default-features` but the
dependency mentions `default-features`, so this explicitly turns off
default features for `cranelift-codegen` at the workspace level and
removes the explicit `default-features = false` at the manifest levels.
* Explicitly enable default feature in wasmtime
* Enable another feature
In multi-terms, all matching rules fire. We treat the result as an
unordered set of values, so setting rule priorities is meaningless. We
want to prohibit relying on the rule match order in this case.
Also, codegen can produce invalid Rust if rules with different
priorities both match against a multi-term. We first documented this
symptom in #5647. As far as I can figure, prohibiting rule priorities
prevents all possible instances of that bug.
At some point in the future we might decide we want to carefully define
semantics for multi-term result ordering, at which point we can revisit
this.
This fixes the build issue identified in #5664 at the toolchain level
rather than working around it in our own build. The next step in fixing
this will be to remove the nightly override in the future when the
toolchain becomes stable.
Souper requires an `i1` condition value, we don't and will implicitly check
against 0. We were truncating conditions but should actually be doing the
comparison against `0`.
Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.
This allows the `wasmtime` binary provided in our release artifacts to
cross-compile: `wasmtime compile` can build a `.cwasm` for any platform
that Wasmtime supports, not just the host platform. This may be useful
in some deployment scenarios.
We don't turn on `all-arch` by default because it increases build time
and binary size of Wasmtime itself, and other embedders of the
`wasmtime` crate won't necessarily want this; hence, we set it only as
part of the CI build configuration.
Fixes#5655.
* Cranelift: Harvest each Souper LHS into its own file
Souper only handles one input LHS at a time, so this makes it way easier to
script. Don't need to try and parse each LHS.
* Add audit of `arrayref` version 0.3.6
* Add audit of `constant_time_eq` version 0.2.4
* Add several `WASMTIME_VERSION_*` macros to `wasmtime.h`.
* Update `scripts/publish.rs`
* To set these macros as per the new version in `./Cargo.toml` during
`./publish bump`.
* To verify the macros match the version in `./Cargo.toml` during
`./publish verify`.
Fix#5635
Fix an error introduced in #5644, where an unsigned subtraction from zero was possible with an empty Switch structure. Additionally, missing the empty case caused us to not emit a branch to the default block. This PR fixes the issue by detecting the empty Switch case early, and emitting a jump.
Rework the compilation strategy for switch to:
* use brif instead of brz and brnz
* generate tables inline, rather than delyaing them to after the decision tree has been generated
* avoid allocating new vectors by using slices into the sorted contiguous ranges
* avoid generating some unconditional jumps
* output differences in test output using the similar crate for easier debugging
Most of these optimizations are in the egraph `cprop.isle` rules now,
making a separate crate unnecessary.
Also I think the `udiv` optimizations here are straight-up wrong (doing
signed instead of unsigned division, and panicking instead of preserving
traps on division by zero) so I'm guessing this crate isn't seriously
used anywhere.
At the least, bjorn3 confirms that cg_clif doesn't use this, and I've
verified that Wasmtime doesn't either.
Closes#1090.
Improve the generated code for unordered floating point comparisons by negating the comparison and inveritng the branches. This allows us to pick the unordered versions, which generate significantly better code.
* Fix libcall relocations for precompiled modules
This commit fixes some asserts and support for relocation libcalls in
precompiled modules loaded from disk. In doing so this reworks how mmaps
are managed for files from disk. All non-file-backed `Mmap` entries are
read/write but file-backed versions were readonly. This commit changes
this such that all `Mmap` objects, even if they're file-backed, start as
read/write. The file-based versions all use copy-on-write to preserve
the private-ness of the mapping.
This is not functionally intended to change anything. Instead this
should have some more memory writable after a module is loaded but the
text section, for example, is still left as read/execute when loading is
finished. Additionally this makes modules compiled in memory more
consistent with modules loaded from disk.
* Update a comment
* Force images to become readonly during publish
This marks compiled images as entirely readonly during the
`CodeMemory::publish` step which happens just before the text section
becomes executable. This ensures that all images, no matter where they
come from, are guaranteed frozen before they start executing.
Nothing major pulled in here, but wanted to update to the latest
versions which enable tail calls by default. When used in Wasmtime,
however, the feature is disabled without the possibility of being
enabled since it's not implemented.
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.
This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.
Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
ISLE's existing code-generation strategy doesn't generate the most
efficient matching order for rules. This PR completely replaces it.
With this PR applied, wasmtime compile retires 2% fewer instructions on
the pulldown-cmark and spidermonkey benchmarks from Sightglass.
A dev build of cranelift-codegen from an empty target/ directory takes
2% less time. The build script, invoking ISLE, takes a little longer,
but Rust can compile the generated code faster, so it balances out.
Ideally these pairs of CLIF instructions should emit a single x86
instruction, but they don't today. This test will tell us if somebody
fixes that.
Similar tests might make sense for imul/umulhi as well as signed
versions, but I haven't tried that.
This commit fixes a bug in the `bindgen!` macro for components where
previously the `param` and `result` properties weren't properly
calculated depending on the structure of the type and which types were
visited in which order. This is simplified to use a `LiveTypes`
structure from the `wit-parser` crate and relies on that to do necessary
recursion.
After #5587, this is on by default. We are retaining the traditional
(no-egraphs) path for now, selected by setting this option to `false`,
but we eventually plan to delete it assuming that we don't find serious
regressions or issues. This PR adds a deprecation notice to the option.
* Fix compile error on FreeBSD x64
* Fix compile on FreeBSD arm64
* Update Cargo.lock for ittapi
* vet: certify diff for ittapi libraries
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
The check needs to verify that the maximum number of memories is
precisely one to ensure that multi-memory is disabled yet modules can
still have up to one memory as configured in the pooling allocator.
This doesn't fully update tokio since the update to the latest version
has quite a few changes I'd prefer to not audit at the moment, but it
updates to a patched version.