Also move these optimization rules to cprop.isle; it's where all the
other similar rules are.
Like the other cprop rules, these can subsume any other rules. We can't
do better than reducing an expression to a constant.
The new i64_sextend_imm64 and u64_uextend_imm64 constructors are useful
helpers to clean up other code. I applied them to `imm64_icmp` while I
was here, as well as using the existing `ty_mask` helper to clean up
`imm64_masked`.
* Cranelift: Introduce the `tail` calling convention
This is an unstable-ABI calling convention that we will eventually use to
support Wasm tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Introduce the `return_call` and `return_call_indirect` instructions
These will be used to implement tail calls for Wasm and any other language
targeting CLIF. The `return_call_indirect` instruction differs from the Wasm
instruction of the same name by taking a native address callee rather than a
Wasm function index.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Implement verification rules for `return_call[_indirect]`
They must:
* have the same return types between the caller and callee,
* have the same calling convention between caller and callee,
* and that calling convention must support tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* cargo fmt
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun
where we accidentally weren't actually checking the `y == !z` precondition. This
commit fixes the precondition check. It also fixes the arithmetic to be
correctly masked to the value type's width.
This reverts commit 268f6bfc1d.
Add a display method to BlockCall that returns a std::fmt::Displayable result. Rework the display code in the write module of cranelift-codegen to use this method instead.
* Fix some warnings on nightly Rust
Cargo is warning about the usage of workspace dependencies where the
workspace declaration does not mention `default-features` but the
dependency mentions `default-features`, so this explicitly turns off
default features for `cranelift-codegen` at the workspace level and
removes the explicit `default-features = false` at the manifest levels.
* Explicitly enable default feature in wasmtime
* Enable another feature
In multi-terms, all matching rules fire. We treat the result as an
unordered set of values, so setting rule priorities is meaningless. We
want to prohibit relying on the rule match order in this case.
Also, codegen can produce invalid Rust if rules with different
priorities both match against a multi-term. We first documented this
symptom in #5647. As far as I can figure, prohibiting rule priorities
prevents all possible instances of that bug.
At some point in the future we might decide we want to carefully define
semantics for multi-term result ordering, at which point we can revisit
this.
Souper requires an `i1` condition value, we don't and will implicitly check
against 0. We were truncating conditions but should actually be doing the
comparison against `0`.
Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.
* Cranelift: Harvest each Souper LHS into its own file
Souper only handles one input LHS at a time, so this makes it way easier to
script. Don't need to try and parse each LHS.
* Add audit of `arrayref` version 0.3.6
* Add audit of `constant_time_eq` version 0.2.4
Fix an error introduced in #5644, where an unsigned subtraction from zero was possible with an empty Switch structure. Additionally, missing the empty case caused us to not emit a branch to the default block. This PR fixes the issue by detecting the empty Switch case early, and emitting a jump.
Rework the compilation strategy for switch to:
* use brif instead of brz and brnz
* generate tables inline, rather than delyaing them to after the decision tree has been generated
* avoid allocating new vectors by using slices into the sorted contiguous ranges
* avoid generating some unconditional jumps
* output differences in test output using the similar crate for easier debugging
Most of these optimizations are in the egraph `cprop.isle` rules now,
making a separate crate unnecessary.
Also I think the `udiv` optimizations here are straight-up wrong (doing
signed instead of unsigned division, and panicking instead of preserving
traps on division by zero) so I'm guessing this crate isn't seriously
used anywhere.
At the least, bjorn3 confirms that cg_clif doesn't use this, and I've
verified that Wasmtime doesn't either.
Closes#1090.
Improve the generated code for unordered floating point comparisons by negating the comparison and inveritng the branches. This allows us to pick the unordered versions, which generate significantly better code.
Nothing major pulled in here, but wanted to update to the latest
versions which enable tail calls by default. When used in Wasmtime,
however, the feature is disabled without the possibility of being
enabled since it's not implemented.
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.
This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.
Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
ISLE's existing code-generation strategy doesn't generate the most
efficient matching order for rules. This PR completely replaces it.
With this PR applied, wasmtime compile retires 2% fewer instructions on
the pulldown-cmark and spidermonkey benchmarks from Sightglass.
A dev build of cranelift-codegen from an empty target/ directory takes
2% less time. The build script, invoking ISLE, takes a little longer,
but Rust can compile the generated code faster, so it balances out.
Ideally these pairs of CLIF instructions should emit a single x86
instruction, but they don't today. This test will tell us if somebody
fixes that.
Similar tests might make sense for imul/umulhi as well as signed
versions, but I haven't tried that.
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.
Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.
This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).
Fixes#5181.
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.
This mirrors the similar change made in #5534.
* Add tests for egraph case.
* Adding in the foundations for Winch `filetests`
This commit adds two new crates into the Winch workspace:
`filetests` and `test-macros`. The intent is to mimic the
structure of Cranelift `filetests`, but in a simpler way.
* Updates to documentation
This commits adds a high level document to outline how to test Winch
through the `winch-tools` utility. It also updates some inline
documentation which gets propagated to the CLI.
* Updating test-macro to use a glob instead of only a flat directory