Add a display method to BlockCall that returns a std::fmt::Displayable result. Rework the display code in the write module of cranelift-codegen to use this method instead.
* Fix some warnings on nightly Rust
Cargo is warning about the usage of workspace dependencies where the
workspace declaration does not mention `default-features` but the
dependency mentions `default-features`, so this explicitly turns off
default features for `cranelift-codegen` at the workspace level and
removes the explicit `default-features = false` at the manifest levels.
* Explicitly enable default feature in wasmtime
* Enable another feature
In multi-terms, all matching rules fire. We treat the result as an
unordered set of values, so setting rule priorities is meaningless. We
want to prohibit relying on the rule match order in this case.
Also, codegen can produce invalid Rust if rules with different
priorities both match against a multi-term. We first documented this
symptom in #5647. As far as I can figure, prohibiting rule priorities
prevents all possible instances of that bug.
At some point in the future we might decide we want to carefully define
semantics for multi-term result ordering, at which point we can revisit
this.
This fixes the build issue identified in #5664 at the toolchain level
rather than working around it in our own build. The next step in fixing
this will be to remove the nightly override in the future when the
toolchain becomes stable.
Souper requires an `i1` condition value, we don't and will implicitly check
against 0. We were truncating conditions but should actually be doing the
comparison against `0`.
Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.
This allows the `wasmtime` binary provided in our release artifacts to
cross-compile: `wasmtime compile` can build a `.cwasm` for any platform
that Wasmtime supports, not just the host platform. This may be useful
in some deployment scenarios.
We don't turn on `all-arch` by default because it increases build time
and binary size of Wasmtime itself, and other embedders of the
`wasmtime` crate won't necessarily want this; hence, we set it only as
part of the CI build configuration.
Fixes#5655.
* Cranelift: Harvest each Souper LHS into its own file
Souper only handles one input LHS at a time, so this makes it way easier to
script. Don't need to try and parse each LHS.
* Add audit of `arrayref` version 0.3.6
* Add audit of `constant_time_eq` version 0.2.4
* Add several `WASMTIME_VERSION_*` macros to `wasmtime.h`.
* Update `scripts/publish.rs`
* To set these macros as per the new version in `./Cargo.toml` during
`./publish bump`.
* To verify the macros match the version in `./Cargo.toml` during
`./publish verify`.
Fix#5635
Fix an error introduced in #5644, where an unsigned subtraction from zero was possible with an empty Switch structure. Additionally, missing the empty case caused us to not emit a branch to the default block. This PR fixes the issue by detecting the empty Switch case early, and emitting a jump.
Rework the compilation strategy for switch to:
* use brif instead of brz and brnz
* generate tables inline, rather than delyaing them to after the decision tree has been generated
* avoid allocating new vectors by using slices into the sorted contiguous ranges
* avoid generating some unconditional jumps
* output differences in test output using the similar crate for easier debugging
Most of these optimizations are in the egraph `cprop.isle` rules now,
making a separate crate unnecessary.
Also I think the `udiv` optimizations here are straight-up wrong (doing
signed instead of unsigned division, and panicking instead of preserving
traps on division by zero) so I'm guessing this crate isn't seriously
used anywhere.
At the least, bjorn3 confirms that cg_clif doesn't use this, and I've
verified that Wasmtime doesn't either.
Closes#1090.
Improve the generated code for unordered floating point comparisons by negating the comparison and inveritng the branches. This allows us to pick the unordered versions, which generate significantly better code.
* Fix libcall relocations for precompiled modules
This commit fixes some asserts and support for relocation libcalls in
precompiled modules loaded from disk. In doing so this reworks how mmaps
are managed for files from disk. All non-file-backed `Mmap` entries are
read/write but file-backed versions were readonly. This commit changes
this such that all `Mmap` objects, even if they're file-backed, start as
read/write. The file-based versions all use copy-on-write to preserve
the private-ness of the mapping.
This is not functionally intended to change anything. Instead this
should have some more memory writable after a module is loaded but the
text section, for example, is still left as read/execute when loading is
finished. Additionally this makes modules compiled in memory more
consistent with modules loaded from disk.
* Update a comment
* Force images to become readonly during publish
This marks compiled images as entirely readonly during the
`CodeMemory::publish` step which happens just before the text section
becomes executable. This ensures that all images, no matter where they
come from, are guaranteed frozen before they start executing.
Nothing major pulled in here, but wanted to update to the latest
versions which enable tail calls by default. When used in Wasmtime,
however, the feature is disabled without the possibility of being
enabled since it's not implemented.
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.
This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.
Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
ISLE's existing code-generation strategy doesn't generate the most
efficient matching order for rules. This PR completely replaces it.
With this PR applied, wasmtime compile retires 2% fewer instructions on
the pulldown-cmark and spidermonkey benchmarks from Sightglass.
A dev build of cranelift-codegen from an empty target/ directory takes
2% less time. The build script, invoking ISLE, takes a little longer,
but Rust can compile the generated code faster, so it balances out.
Ideally these pairs of CLIF instructions should emit a single x86
instruction, but they don't today. This test will tell us if somebody
fixes that.
Similar tests might make sense for imul/umulhi as well as signed
versions, but I haven't tried that.
This commit fixes a bug in the `bindgen!` macro for components where
previously the `param` and `result` properties weren't properly
calculated depending on the structure of the type and which types were
visited in which order. This is simplified to use a `LiveTypes`
structure from the `wit-parser` crate and relies on that to do necessary
recursion.
After #5587, this is on by default. We are retaining the traditional
(no-egraphs) path for now, selected by setting this option to `false`,
but we eventually plan to delete it assuming that we don't find serious
regressions or issues. This PR adds a deprecation notice to the option.
* Fix compile error on FreeBSD x64
* Fix compile on FreeBSD arm64
* Update Cargo.lock for ittapi
* vet: certify diff for ittapi libraries
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
The check needs to verify that the maximum number of memories is
precisely one to ensure that multi-memory is disabled yet modules can
still have up to one memory as configured in the pooling allocator.
This doesn't fully update tokio since the update to the latest version
has quite a few changes I'd prefer to not audit at the moment, but it
updates to a patched version.
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.
Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.
This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).
Fixes#5181.
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.
This mirrors the similar change made in #5534.
* Add tests for egraph case.