* x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest
This commit adds lowerings that work with SSE2 for CLIF `ceil`, `trunc`,
`floor`, and `nearest` instructions over vectors. To get these working
`insertlane` for float vectors was also implemented for non-SSE4.1
instructions as well.
Note that the goal of these lowerings is not speed but rather "it
works", so the decompose-to-call-libcalls logic for vector is probably
horrendously slow but should at least be correct.
* Skip new tests on riscv64
* Update cranelift/codegen/src/isa/x64/inst.isle
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Optimize sign extension via shifts
This commit adds egraph optimization patterns for left-shifting a value
and then right-shifting it as a form of sign extending its lower bits.
This matches the behavior of the WebAssembly `i32.extend8_s`
instruction, for example. Note that the lowering of that WebAssembly
instruction does not use shifts, but historical versions of LLVM that
didn't support the instruction, or versions with the instruction
disabled, will use shifts instead.
A second rule for reduction-of-extend being the same as the original
value was added to keep an existing shift-related test passing as well.
* Add reference assemblies for new opts
* x64: Change `use_sse41` to a constructor
This refactors the existing `use_sse41` extractor to instead be a
`constructor` to use with `if-let`.
* x64: Gate the `pblendw` instruction on SSE4.1 being enabled
This specialization of `shuffle` isn't a base case so adding an `if-let`
here should be sufficient for gating this instruction properly on
enabled CPU features.
* x64: Gate `pmuldq` lowerings on SSE 4.1
The specialized rules using these instructions can fall back to the
standard lowerings for non-SSE 4.1 instructions.
While bringing in no major updates for Wasmtime I've taken this
opportunity to list myself for `cargo vet` with wildcard audits of this
family of crates. That means I shouldn't need to further add any more
entries in the future for updating these crates and additionally any
other organizations using these audits will automatically be able to
have audits for version that I publish.
While here I also ran `cargo vet prune` which was able to remove a
number of our exemptions.
Cranelift only has one instruction SIMD which depends on SSE4.2 so this
commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and
enables lowering the baseline requirement for SIMD support from SSE4.2
to SSE4.1.
The `has_sse42` setting is no longer enabled by default for Cranelift.
Additionally `enable_simd` no longer requires `has_sse42` on x64.
Finally the fuzz-generator for Wasmtime codegen settings now enables
flipping the `has_sse42` setting instead of unconditionally setting it
to `true`.
The specific lowering for `pcmpgtq` is copied from LLVM's lowering of
this instruction.
* cranelift-native: Move riscv to separate module
* cranelift-native: Read /proc/cpuinfo to parse RISC-V extensions
* ci: Add QEMU cpuinfo emulation patch
This patch emulates the /proc/cpuinfo interface for RISC-V. This
allows us to do feature detection for the RISC-V backend.
It has been queued for QEMU 8.1 so we should remove it as soon as
that is available.
* ci: Enable QEMU RISC-V extensions
* cranelift-native: Cleanup ISA string parsing
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* cranelift-native: Rework `/proc/cpuinfo` parsing
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Handle signature() for more libcalls
This is necessary to be able to call them in the interpreter. All the
remaining libcalls which signature() doesn't handle are never used in
clif ir. Only in code compiled by a backend.
* Fix libcall declarations in cranelift-frontend
* Add function signatures
* Use correct pointer type instead of I64
* Remove ImmutableRegisterState
It was introduced for an SCCP optimization pass, but a simplified
version of this will likely use the egraph infrastructure instead.
* Replace {get,set}_value in State with current_frame{,_mut}
The outer Interpreter needs this anyway and only offering one way to
get locals simplifies things.
* Update comment
* fix typo
* add test to check that Option<EntityRef> is twice as large as EntityRef
* grammar
* grammar
* reverse snakecase -- Not sure if folks want this type of change
I noticed recently that for the `ImmRegRegShift` addressing mode
Cranelift will unconditionally emit at least a 1-byte immediate for the
offset to be added to the register addition computation, even when the
offset is zero. In this case though the instruction encoding can be
slightly more compact and remove a byte. This commit started off by
applying this optimization, which resulted in the `*.clif` test changes
in this commit.
Further reading this code, however, I personally found it quite hard to
follow what was happening with all the various branches and ModRM/SIB
bits. I reviewed these encodings in the x64 architecture manual and
attempted to improve the logic for encoding here. The new version in
this commit is intended to be functionally equivalent to the prior
version where dropping a zero-offset from the `ImmRegRegShift` variant
is the only change.
`FunctionBuilder::create_stackslot` was split into `create_sized_stack_slot`
and `create_dynamic_stack_slot`. This updates the doc in the `StackBuilder`
docstring to refer to the new methods.
Fixes#5838.
* Remove the validate_address State trait method
It isn't used anywhere
* Expose the inner Function of a Frame
This is necessary to create your own interpreter that reuses most of
cranelift-interpreter. For example to use a different State
implementation.
* Support the symbol_value and tls_value instructions in the interpreter
Instead remove the colocated flag for hotplug mode in define_function.
This prevents issues if declare_*_in_func wasn't used due to eg the
function being from a previously serialized module and now deserialized
into JITModule.
* Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option.
This PR removes the LICM, GVN, and preopt passes, and associated support
pieces, from `cranelift-codegen`. Not to worry, we still have
optimizations: the egraph framework subsumes all of these, and has been
on by default since #5181.
A few decision points:
- Filetests for the legacy LICM, GVN and simple_preopt were removed too.
As we built optimizations in the egraph framework we wrote new tests
for the equivalent functionality, and many of the old tests were
testing specific behaviors in the old implementations that may not be
relevant anymore. However if folks prefer I could take a different
approach here and try to port over all of the tests.
- The corresponding filetest modes (commands) were deleted too. The
`test alias_analysis` mode remains, but no longer invokes a separate
GVN first (since there is no separate GVN that will not also do alias
analysis) so the tests were tweaked slightly to work with that. The
egrpah testsuite also covers alias analysis.
- The `divconst_magic_numbers` module is removed since it's unused
without `simple_preopt`, though this is the one remaining optimization
we still need to build in the egraphs framework, pending #5908. The
magic numbers will live forever in git history so removing this in the
meantime is not a major issue IMHO.
- The `use_egraphs` setting itself was removed at both the Cranelift and
Wasmtime levels. It has been marked deprecated for a few releases now
(Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift
versions) so I think this is probably OK. As an alternative if anyone
feels strongly, we could leave the setting and make it a no-op.
* Update test outputs for remaining test differences.
* Remove the DataContext wrapper around DataDescription
It doesn't have much of a purpose while making it harder to for example
rewrite the function and data object declarations within it as is
necessary for deserializing a serialized module.
* Derive Debug for DataDescription
* cranelift-interpreter: Propagate traps from call's
* cranelift-interpreter: Make `unwrap_return` only available in tests
This is a footgun for normal use in the interpreter (#6156) but it
still has uses in the tests, so enable it only there.
This commit adds new lowerings to the AArch64 backend of the
element-based `fmla` and `fmls` instructions. These instructions have
one of the multiplicands as an implicit broadcast of a single lane of
another register and can help remove `shuffle` or `dup` instructions
that would otherwise be used to implement them.
While type-checking the AST for a pattern, ISLE was passing in an
`Option<TypeId>` for the expected result type of the pattern. However,
at every call we either passed `Some` type explicitly, or passed the
parent's expected type in a self-recursive call.
Therefore, by induction, `expected_ty` is never `None`. So this PR
unwraps the type everywhere. That in turn shows that a bunch of error
messages were unreachable, so this deletes a bunch of error-handling
code.
In addition, this function returned the type it computed for the
sub-pattern, but that information is already available in the
sub-pattern itself. Not only that but the type should always be equal to
`expected_ty`; when it isn't, we've reported a type error and are just
trying to check for more errors.
Most callers ignored the returned type but in some cases we used it to
try to avoid emitting useless error messages. I've preserved that
behavior for bind-patterns.
For and-patterns, the returned type looked like it was being used, but
because `expected_ty` was never `None`, the fallback of "fill in with
the sub-pattern's type" never fired. So I've deleted that fallback.
Finally, this reverts #4915 (9d99eff6f9)
which was introduced to flatten nested and-patterns, to simplify overlap
checking. However, the visitor trait used by trie_again effectively
flattens and-patterns anyway, so the current representation used for
overlap checking doesn't need this any more.
This commit adds constant-propagation optimizations for
`splat`-of-constant to produce a `vconst` node. This should help later
hoisting these constants out of loops if it shows up in wasm.
* The `vectorizelanes` function performs a check to see whether there
is a single value provided in an array, and if so returns it as a
scalar.
While elsewhere in the interpreter this behaviour is relied
upon, it yields an incorrect result when attempting to convert a
scalar to a vector.
The original `vectorizelanes` remains untouched, however, an
unconditional variant `vectorizelanes_all` was added.
* A test was added under `filetests/runtests/issue5911.clif`.
Fixes#5911
* simple_gvn: recognize commutative operators
Normalize instructions with commutative opcodes by sorting the arguments. This
means instructions like `iadd v0, v1` and `iadd v1, v0` will be considered
identical by GVN and deduplicated.
* Remove `UsubSat` and `SsubSat` from `is_commutative`
They are not actually commutative
* Remove `TODO`s
* Move InstructionData normalization into helper fn
* Add normalization of commutative instructions in the epgrah implementation
* Handle reflexive icmp/fcmps in GVN
* Change formatting of `normalize_in_place`
* suggestions from code review
* ISLE: move `icmp` rewrites to separate file.
Move `icmp`-related rewrite rules from `algebraic.isle` to `icmp.isle`.
Also move `icmp`-related tests from `algebraic.clif` to `icmp.clif`.
* Put parameterized and unparameterized `icmp` tests in separate files
* Undo refactoring of (ir)reflexivity rewrites
* Fix `icmp-parameterised.clif`
* Undo formatting/comment changes
* x64: Add AVX encodings of `vcvt{ss2sd,sd2ss}`
Additionally update the instruction helpers to take an `XmmMem` argument
to allow load sinking into the instruction.
* x64: Add AVX encoding of `sqrts{s,d}`
* x64: Add AVX support for `rounds{s,d}`
* x64: Deduplicate fcmp emission logic
The `select`-of-`fcmp` lowering duplicated a good deal of `FloatCC`
lowering logic that was already done by `emit_fcmp`, so this commit
refactors these lowering rules to instead delegate to `emit_fcmp` and
then handle that result.
* Swap order of condition codes
Shouldn't affect the correctness of this operation and it's a bit more
natural to write the lowering rule this way.
* Swap the order of comparison operands
No need to swap `a b`, only the `x y` needs swapping.
* Fix x64 printing of `XmmCmove`