* Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option.
This PR removes the LICM, GVN, and preopt passes, and associated support
pieces, from `cranelift-codegen`. Not to worry, we still have
optimizations: the egraph framework subsumes all of these, and has been
on by default since #5181.
A few decision points:
- Filetests for the legacy LICM, GVN and simple_preopt were removed too.
As we built optimizations in the egraph framework we wrote new tests
for the equivalent functionality, and many of the old tests were
testing specific behaviors in the old implementations that may not be
relevant anymore. However if folks prefer I could take a different
approach here and try to port over all of the tests.
- The corresponding filetest modes (commands) were deleted too. The
`test alias_analysis` mode remains, but no longer invokes a separate
GVN first (since there is no separate GVN that will not also do alias
analysis) so the tests were tweaked slightly to work with that. The
egrpah testsuite also covers alias analysis.
- The `divconst_magic_numbers` module is removed since it's unused
without `simple_preopt`, though this is the one remaining optimization
we still need to build in the egraphs framework, pending #5908. The
magic numbers will live forever in git history so removing this in the
meantime is not a major issue IMHO.
- The `use_egraphs` setting itself was removed at both the Cranelift and
Wasmtime levels. It has been marked deprecated for a few releases now
(Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift
versions) so I think this is probably OK. As an alternative if anyone
feels strongly, we could leave the setting and make it a no-op.
* Update test outputs for remaining test differences.
* Restrict the types for isplit and iconcat to match backends
* Admit unimplemented bitwidths to isplit/iconcat
* Modify the NarrowInt type instead of shadowing it
* Fix filetest failures
* x64: Take SIGFPE signals for divide traps
Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.
There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.
This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:
* When `avoid_div_traps=true`, traps are no longer emitted for `div`
instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
for the `div` instruction is a divide-by-zero.
This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).
This is unlikely to have really all that much of a speedup but was
something I noticed during #6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.
* Remove the `avoid_div_traps` cranelift setting
With no known users currently removing this should be possible and helps
simplify the x64 backend.
* x64: GC more support for avoid_div_traps
Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up
some of the ISLE rules now that `div` is allowed to itself trap
unconditionally.
* x64: Store div trap code in instruction itself
* Keep divisors in registers, not in memory
Don't accidentally fold multiple traps together
* Handle EXC_ARITHMETIC on macos
* Update emit tests
* Update winch and tests
* Add narrower and wider constraints to the instruction DSL
* Add docs to narrower/wider operands
* Update cranelift/codegen/meta/src/cdsl/instructions.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Fix assertion message
* Simplify upper bounds for the wider constraint
* Remove additional unnecessary cases in the verifier
* Remove unused variables
* Remove changes to is_ctrl_typevar_candidate
These changes were only necessary when the type returned by an
instruction was a variable constrained by narrow or widen. As we have
switched to requiring that constraints must appear on argument types and
not return types, these changes were not longer necessary.
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Change CLIF `shuffle` to validate lane indices
Previously the CLIF `shuffle` instruction did not perform any validation
on the lane shuffle mask and specified that out-of-bounds lanes always
returned 0 as the value. This behavior though is not required by
WebAssembly which validates that lane indices are always in-bounds.
Additionally since these are static immediates even other code
generators should be able to verify that the immediates are in-bounds.
As a result this commit updates the definition of the `shuffle`
instruction to specify that all byte immediates must be in-bounds in the
range of [0, 32). The verifier has been updated and some test cases have
been removed that were testing this functionality.
Closes#5989
* Only generate valid shuffle immediates in fuzzer
We've adopted this pattern in Cranelift's instruction definitions where
we let-bind some calls to `Operand::new` and then later use them in one
or more calls to `Inst::new`.
That pattern has two problems:
- It puts the type of each operand somewhere potentially far removed
from the instruction in which it's used.
- We let-bind the same name for many different operands, compounding the
first problem by making it harder to find _which_ definition is used.
So instead this commit removes all let-bindings for operand definitions
and constructs a new `Operand` every time.
Constructing an `Operand` at every use means we duplicate some
documentation strings, but not all that many of them as it turns out.
I've left the let-bound type-sets alone, so those are currently still
shared across many instructions. They have some of the same problems and
should be reviewed as well.
* Remove the Cranelift `vselect` instruction
This instruction is documented as selecting lanes based on the "truthy"
value of the condition lane, but the current status of the
implementation of this instruction is:
* x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the
high bit of each byte doing a byte-wise lane select rather than
whatever the controlling type is.
* AArch64 - this is the same as `bitselect` which is a bit-wise
selection rather than a lane-wise selection.
* s390x - this is the same as AArch64, a bit-wise selection rather than
lane-wise.
* interpreter - the interpreter implements the documented semantics of
selecting based on "truthy" values.
Coupled with the status of the implementation is the fact that this
instruction is not used by WebAssembly SIMD today either. The only use
of this instruction in Cranelift is the nan-canonicalization pass. By
moving nan-canonicalization to `bitselect`, since that has the desired
semantics, there's no longer any need for `vselect`.
Given this situation this commit subsqeuently removes `vselect` and all
usage of it throughout Cranelift.
Closes#5917
* Review comments
* Bring back vselect opts as bitselect opts
* Clean up vselect usage in the interpreter
* Move bitcast in nan canonicalization
* Add a comment about float optimization
* Initial support for the Relaxed SIMD proposal
This commit adds initial scaffolding and support for the Relaxed SIMD
proposal for WebAssembly. Codegen support is supported on the x64 and
AArch64 backends on this time.
The purpose of this commit is to get all the boilerplate out of the way
in terms of plumbing through a new feature, adding tests, etc. The tests
are copied from the upstream repository at this time while the
WebAssembly/testsuite repository hasn't been updated.
A summary of changes made in this commit are:
* Lowerings for all relaxed simd opcodes have been added, currently all
exhibiting deterministic behavior. This means that few lowerings are
optimal on the x86 backend, but on the AArch64 backend, for example,
all lowerings should be optimal.
* Support is added to codegen to, eventually, conditionally generate
different code based on input codegen flags. This is intended to
enable codegen to more efficient instructions on x86 by default, for
example, while still allowing embedders to force
architecture-independent semantics and behavior. One good example of
this is the `f32x4.relaxed_fmadd` instruction which when deterministic
forces the `fma` instruction, but otherwise if the backend doesn't
have support for `fma` then intermediate operations are performed
instead.
* Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the
x86 backend as they're now exercised by the deterministic lowerings of
relaxed simd instructions.
* Sample codegen tests for added for x86 and aarch64 for some relaxed
simd instructions.
* Wasmtime embedder support for the relaxed-simd proposal and forcing
determinism have been added to `Config` and the CLI.
* Support has been added to the `*.wast` runtime execution for the
`(either ...)` matcher used in the relaxed-simd proposal.
* Tests for relaxed-simd are run both with a default `Engine` as well as
a "force deterministic" `Engine` to test both configurations.
* All tests from the upstream repository were copied into Wasmtime.
These tests should be deleted when WebAssembly/testsuite is updated.
* x64: Add x86-specific lowerings for relaxed simd
This commit builds on the prior commit and adds an array of `x86_*`
instructions to Cranelift which have semantics that match their
corresponding x86 equivalents. Translation for relaxed simd is then
additionally updated to conditionally generate different CLIF for
relaxed simd instructions depending on whether the target is x86 or not.
This means that for AArch64 no changes are made but for x86 most relaxed
instructions now lower to some x86-equivalent with slightly different
semantics than the "deterministic" lowering.
* Add libcall support for fma to Wasmtime
This will be required to implement the `f32x4.relaxed_madd` instruction
(and others) when an x86 host doesn't specify the `has_fma` feature.
* Ignore relaxed-simd tests on s390x and riscv64
* Enable relaxed-simd tests on s390x
* Update cranelift/codegen/meta/src/shared/instructions.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Add a FIXME from review
* Add notes about deterministic semantics
* Don't default `has_native_fma` to `true`
* Review comments and rebase fixes
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
This instruction is only defined with i8x16 inputs and outputs so
there's no need for a type variable, so shadow the otherwise-generic `a`
result with a concrete i8x16 type.
This was added for the wasm SIMD proposal but I've been poking around at
this recently and the instruction can instead be represented by its
component parts with the same semantics I believe. This commit removes
the instruction and instead represents it with the existing
`iadd_pairwise` instruction (among others) and updates backends to with
new pattern matches to have the same codegen as before.
This interestingly entirely removed the codegen rule with no replacement
on the AArch64 backend as the existing rules all existed to produce the
same codegen.
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
* Cranelift: Introduce the `tail` calling convention
This is an unstable-ABI calling convention that we will eventually use to
support Wasm tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Introduce the `return_call` and `return_call_indirect` instructions
These will be used to implement tail calls for Wasm and any other language
targeting CLIF. The `return_call_indirect` instruction differs from the Wasm
instruction of the same name by taking a native address callee rather than a
Wasm function index.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Implement verification rules for `return_call[_indirect]`
They must:
* have the same return types between the caller and callee,
* have the same calling convention between caller and callee,
* and that calling convention must support tail calls.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* cargo fmt
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Remove the boolean parameters from the instruction builder functions, as they were only ever used with true. Additionally, change the returns and branches functions to imply terminates_block.
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.
This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.
Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.
Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.
This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).
Fixes#5181.
Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList.
To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments:
inst_values - returns an iterator that traverses values in the instruction and block arguments
map_inst_values - applies a function to each value in the instruction and block arguments
overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Switch duplicate loads w/ dynamic memories test to `min_size = 0`
This test was accidentally hitting a special case for bounds checks for when we
know that `offset + access_size < min_size` and can skip some steps. This
commit changes the `min_size` of the memory to zero so that we are forced to do
fully general bounds checks.
* Cranelift: Mark `uadd_overflow_trap` as okay for GVN
Although this improves our test sequence for duplicate loads with dynamic
memories, it unfortunately doesn't have any effect on sightglass benchmarks:
```
instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[34448 35607.23 37158] gvn_uadd_overflow_trap.so
[34566 35734.05 36585] main.so
instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[44101 60449.62 92712] gvn_uadd_overflow_trap.so
[44011 60436.37 92690] main.so
instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[35595 36675.72 38153] gvn_uadd_overflow_trap.so
[35440 36670.42 37993] main.so
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[17370195 17405125.62 17471222] gvn_uadd_overflow_trap.so
[17369324 17404859.43 17470725] main.so
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[7055720520 7055886880.32 7056265930] gvn_uadd_overflow_trap.so
[7055719554 7055843809.33 7056193289] main.so
compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[683589861 683767276.00 684098366] gvn_uadd_overflow_trap.so
[683590024 683767998.02 684097885] main.so
execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[46436883 46437135.10 46437823] gvn_uadd_overflow_trap.so
[46436883 46437087.67 46437785] main.so
compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[126522461 126565812.58 126647044] gvn_uadd_overflow_trap.so
[126522176 126565757.75 126647522] main.so
execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[653010531 653010533.03 653010544] gvn_uadd_overflow_trap.so
[653010531 653010533.18 653010537] main.so
```
* cranelift-codegen-meta: Rename `side_effects_okay_for_gvn` to `side_effects_idempotent`
* cranelift-filetests: Ensure there is a trailing newline for blessed Wasm tests
* Cranelift: Make spectre guards GVN-able
While these instructions have a side effect that is otherwise invisible to the
optimizer, the side effect in question is idempotent, so it can be de-duplicated
by GVN.
* Cranelift: Run redundant load replacement and GVN twice
This allows us to actually replace redundant Wasm loads with dynamic memories.
While this improves our hand-crafted test sequences, it doesn't seem to have any
improvement on sightglass benchmarks run with dynamic memories, however it also
isn't a hit to compilation times, so seems generally good to land anyways:
```
$ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite
compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[683595240 683768610.53 684097577] once.so
[683597068 700115966.83 1664907164] twice.so
instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[44107 60411.07 92785] once.so
[44138 59552.32 92097] twice.so
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[17369916 17404839.78 17471458] once.so
[17369935 17625713.87 30700150] twice.so
compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[126523640 126566170.80 126648265] once.so
[126523076 127174580.30 163145149] twice.so
instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[34569 35686.25 36513] once.so
[34651 35749.97 36953] twice.so
instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[35146 36639.10 37707] once.so
[34472 36580.82 38431] twice.so
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[7055720115 7055841324.82 7056180024] once.so
[7055717681 7055877095.85 7056225217] twice.so
execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[46436881 46437081.28 46437691] once.so
[46436883 46437127.68 46437766] twice.so
execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[653010530 653010533.27 653010539] once.so
[653010531 653010532.95 653010538] twice.so
```
* cranelift-wasm: translate Wasm loads into lower-level CLIF operations
Rather than using `heap_{load,store,addr}`.
* cranelift: Remove the `heap_{addr,load,store}` instructions
These are now legalized in the `cranelift-wasm` frontend.
* cranelift: Remove the `ir::Heap` entity from CLIF
* Port basic memory operation tests to .wat filetests
* Remove test for verifying CLIF heaps
* Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test
* Remove `heap_addr` from readonly.clif test
* Remove `heap_addr` from `table_addr.clif` test
* Remove `heap_addr` from the simd-fvpromote_low.clif test
* Remove `heap_addr` from simd-fvdemote.clif test
* Remove `heap_addr` from the load-op-store.clif test
* Remove the CLIF heap runtest
* Remove `heap_addr` from the global_value.clif test
* Remove `heap_addr` from fpromote.clif runtests
* Remove `heap_addr` from fdemote.clif runtests
* Remove `heap_addr` from memory.clif parser test
* Remove `heap_addr` from reject_load_readonly.clif test
* Remove `heap_addr` from reject_load_notrap.clif test
* Remove `heap_addr` from load_readonly_notrap.clif test
* Remove `static-heap-without-guard-pages.clif` test
Will be subsumed when we port `make-heap-load-store-tests.sh` to generating
`.wat` tests.
* Remove `static-heap-with-guard-pages.clif` test
Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat`
tests.
* Remove more heap tests
These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat`
tests.
* Remove `heap_addr` from `simple-alias.clif` test
* Remove `heap_addr` from partial-redundancy.clif test
* Remove `heap_addr` from multiple-blocks.clif test
* Remove `heap_addr` from fence.clif test
* Remove `heap_addr` from extends.clif test
* Remove runtests that rely on heaps
Heaps are not a thing in CLIF or the interpreter anymore
* Add generated load/store `.wat` tests
* Enable memory-related wasm features in `.wat` tests
* Remove CLIF heap from fcmp-mem-bug.clif test
* Add a mode for compiling `.wat` all the way to assembly in filetests
* Also generate WAT to assembly tests in `make-load-store-tests.sh`
* cargo fmt
* Reinstate `f{de,pro}mote.clif` tests without the heap bits
* Remove undefined doc link
* Remove outdated SVG and dot file from docs
* Add docs about `None` returns for base address computation helpers
* Factor out `env.heap_access_spectre_mitigation()` to a local
* Expand docs for `FuncEnvironment::heaps` trait method
* Restore f{de,pro}mote+load clif runtests with stack memory
All instructions using the CPU flags types (IFLAGS/FFLAGS) were already
removed. This patch completes the cleanup by removing all remaining
instructions that define values of CPU flags types, as well as the
types themselves.
Specifically, the following features are removed:
- The IFLAGS and FFLAGS types and the SpecialType category.
- Special handling of IFLAGS and FFLAGS in machinst/isle.rs and
machinst/lower.rs.
- The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry,
isub_ifbin, isub_ifbout, and isub_ifborrow instructions.
- The writes_cpu_flags instruction property.
- The flags verifier pass.
- Flags handling in the interpreter.
All of these features are currently unused; no functional change
intended by this patch.
This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.
* Turn off probestack by default in Cranelift
The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.
* aarch64: Improve codegen for AMode fallback
Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.
* aarch64: Implement inline stack probes
This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.
* Enable inline probestack for AArch64 and Riscv64
This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.
* Enable probestack for aarch64 in cranelift-fuzzgen
* Address review comments
* Remove implicit stack overflow traps from x64 backend
This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.
This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.
* Fix a style issue
* Cranelift: Define `heap_load` and `heap_store` instructions
* Cranelift: Implement interpreter support for `heap_load` and `heap_store`
* Cranelift: Add a suite runtests for `heap_{load,store}`
There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.
* Review feedback
* cranelift: Cleanup `fdemote`/`fpromote` tests
* cranelift: Fix `fdemote`/`fpromote` instruction docs
The verifier fails if the input and output types are the same
for these instructions
* cranelift: Fix `fdemote`/`fpromote` in the interpreter
* fuzzgen: Add `fdemote`/`fpromote`
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted. These define the lane order
to be used when casting between types of different lane counts.
Update all users to pass an appropriate MemFlags argument.
Implement lane swaps where necessary in the s390x back-end.
This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.
- Allow bitcast for vectors with differing lane widths
- Remove raw_bitcast IR instruction
- Change all users of raw_bitcast to bitcast
- Implement support for no-op bitcast cases across backends
This implements the second step of the plan outlined here:
https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.
* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.
As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.
Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap
* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.
* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing
* For completeness, added bswap to the interpreter as well.
* added filetests and runtests for each ISA
* added bswap to fuzzgen, thanks to afonso360 for the code there
* 128-bit swaps are not yet implemented, that can be done later
Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely.
The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.
As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input.
For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.
* cranelift: Remove iconst.i128
* bugpoint: Report Changed when only one instruction is mutated
* cranelift: Fix egraph bxor rule
* cranelift: Remove some simple_preopt opts for i128
* fuzzgen: Test compiler flags
* cranelift: Generate `all()` function for all enum flags
This allows a user to iterate all flags that exist.
* fuzzgen: Minimize regalloc_checker compiles
* fuzzgen: Limit the amount of test case inputs
* fuzzgen: Add egraphs flag
It's finally here! 🥳
* cranelift: Add fuzzing comment to settings
* fuzzgen: Add riscv64
* fuzzgen: Unconditionally enable some flags
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.
Fixes#3205
Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>