Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load.
Fixes#4761
* cranelift: Change test runner order
Changes the ordering of runtests to run per target and then per function.
This change doesn't do a lot by itself, but helps future refactorings of runtests.
* cranelift: Rename SingleFunctionCompiler to TestCaseCompiler
* cranelift: Skip runtests per target instead of per run
* cranelift: Deduplicate test names
With the upcoming changes to the runtest infrastructure we require unique ExtNames for all tests.
Note that for test names we have a 16 character limit on test names, and must be unique within those 16 characters.
* cranelift: Add TestFileCompiler to runtests
TestFileCompiler allows us to compile the entire file once, and then call the trampolines for each test.
The previous code was compiling the function for each invocation of a test.
* cranelift: Deduplicate ExtName for avg_round tests
* cranelift: Rename functions as they are defined.
The JIT internally only deals with User functions, and cannot link test name funcs.
This also caches trampolines by signature.
* cranelift: Preserve original name when reporting errors.
* cranelift: Rename aarch64 test functions
* cranelift: Add `call` and `call_indirect` tests!
* cranelift: Add pauth runtests for aarch64
* cranelift: Rename duplicate s390x tests
* cranelift: Delete `i128_bricmp_of` function from i128-bricmp
It looks like we forgot to delete it when it was moved to
`i128-bricmp-overflow`, and since it didn't have a run invocation
it was never compiled.
However, s390x does not support this, and panics when lowering.
* cranelift: Add `colocated` call tests
* cranelift: Rename *more* `s390x` tests
* cranelift: Add pauth + sign_return_address call tests
* cranelift: Undeduplicate test names
With the latest main changes we now support *unlimited* length test names.
This commit reverts:
52274676ff631c630f9879dd32e756566d3e700f
7989edc172493547cdf63e180bb58365e8a43a42
25c8a8395527d98976be6a34baa3b0b214776739
792e8cfa8f748077f9d80fe7ee5e958b7124e83b
* cranelift: Add LibCall tests
* cranelift: Revert more test names
These weren't auto reverted by the previous revert.
* cranelift: Disable libcall tests for aarch64
* cranelift: Runtest fibonacci tests
* cranelift: Misc cleanup
Lower extractlane, scalar_to_vector and splat in ISLE.
This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
This was likely a copy-paste from the `ast::Pattern` case, but here it
is checking a term name in `ast::Expr` and so should say "... in
expression", not "... in pattern".
Lower `shuffle` and `swizzle` in ISLE.
This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.
I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
* Port `Fcopysign`..``FcvtToSintSat` to ISLE (AArch64)
Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `Fcopysign`
- Also introduced missing support for `fcopysign` on vector values, as
per the docs.
- This introduces the vector encoding for the `SLI` machine
instruction.
- `FcvtToUint`
- `FcvtToSint`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtToUintSat`
- `FcvtToSintSat`
Copyright (c) 2022 Arm Limited
* Document helpers and abstract conversion checks
* x64: Mask shift amounts for small types
* cranelift: Disable i128 shifts in fuzzer again
They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed
* cranelift: Avoid masking shifts for 32 and 64 bit cases
* cranelift: Add const shift tests and fix them
* cranelift: Remove const `rotl` cases
Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
In order to keep the `ExternalName` enum small, the `TestcaseName`
struct was limited to 17 bytes: a 1 byte length and a 16 byte buffer.
Due to alignment, that made `ExternalName` 20 bytes.
That fixed-size buffer means that the names of functions in Cranelift
filetests are truncated to fit, which limits our ability to give tests
meaningful names. And I think meaningful names are important in tests.
This patch replaces the inline `TestcaseName` buffer with a
heap-allocated slice. We don't care about performance for test names, so
an indirection out to the heap is fine in that case. But we do care
somewhat about the size of `ExternalName` when it's used during
compiles.
On 64-bit systems, `Box<[u8]>` is 16 bytes, so `TestcaseName` gets one
byte smaller. Unfortunately, its alignment is 8 bytes, so `ExternalName`
grows from 20 to 24 bytes.
According to `valgrind --tool=dhat`, this change has very little effect
on compiler performance. Building wasmtime with `--no-default-features
--release`, and compiling the pulldown-cmark benchmark from Sightglass,
I measured these differences between `main` and this patch:
- total number of allocations didn't change (`ExternalName::TestCase` is
not used in normal compiles)
- 592 more bytes allocated over the process lifetime, out of 171.5MiB
- 320 more bytes allocated at peak heap size, out of 12MiB
- 0.24% more instructions executed
- 16,987 more bytes written
- 12,120 _fewer_ bytes read
Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE.
For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.
Ported the existing implementations of the following opcodes for AArch64
to ISLE, and implemented support for 64-bit vectors (per the docs):
- `SwidenLow`
- `SwidenHigh`
- `UwidenLow`
- `UwidenHigh`
Also ported `WideningPairwiseDotProductS` as-is.
Copyright (c) 2022 Arm Limited
* Port `vconst` to ISLE (AArch64)
Ported the existing implementation of `vconst` to ISLE for AArch64, and
added support for 64-bit vector constants.
Also introduced 64-bit `vconst` support to the interpreter.
Copyright (c) 2022 Arm Limited
* Replace if-chains with match statements
Copyright (c) 2022 Arm Limited
* Cranelift: extend docs on Inst to discuss `call` instructions
the docs on `Inst` note that the type is returned by non-resultful
instructions built from `InstBuilder`, but did _not_ note that it is
also returned by `call` and `call_indirect`. if you're trying to learn
and use Cranelift by following the docs, this means you'd follow a doc
link to `Inst` that implies that `call` does not return a value - this
is actively misleading, since you'd want to use the returned `Inst` to
find exactly those returned values!
so, this adds a few sentences talking about the case of call `Inst`s.
* cranelift: Add assert to prevent wrong InstFormat being used for the wrong opcode
* cranelift: Use correct instruction format when inserting opcodes in fuzzgen (fixes#4733)
* cranelift: Use debug assert on InstFormat assert
Fixes#4736
Fix lowerings that were using values as both a Reg and a RegMem, making it look like a load could be sunk while its value in a register was still being used. Also add an assert that checks that loads that are sunk are never used.
The sse_cmp_op rule had cases that would produce SseOperand values that aren't legal to use with MInst.XmmRmR, and was only used in vector_all_ones when constructing an XmmRmR value. Additionally, vector_all_ones always called sse_cmp_op with the same type, so the other cases were redundant.
The solution in this PR is to remove sse_cmp_op entirely and inline a call to x64_pcmpeqd directly in vector_all_ones, and remove the unused argument from vector_all_ones.
Also, adjust the tests that are executed on that platform. Finally,
fix a bug with obtaining backtraces when back-edge CFI is enabled.
Copyright (c) 2022, Arm Limited.
* Upgrade wasm-tools crates, namely the component model
This commit pulls in the latest versions of all of the `wasm-tools`
family of crates. There were two major changes that happened in
`wasm-tools` in the meantime:
* bytecodealliance/wasm-tools#697 - this commit introduced a new API for
more efficiently reading binary operators from a wasm binary. The old
`Operator`-based reading was left in place, however, and continues to
be what Wasmtime uses. I hope to update Wasmtime in a future PR to use
this new API, but for now the biggest change is...
* bytecodealliance/wasm-tools#703 - this commit was a major update to
the component model AST. This commit almost entirely deals with the
fallout of this change.
The changes made to the component model were:
1. The `unit` type no longer exists. This was generally a simple change
where the `Unit` case in a few different locations were all removed.
2. The `expected` type was renamed to `result`. This similarly was
relatively lightweight and mostly just a renaming on the surface. I
took this opportunity to rename `val::Result` to `val::ResultVal` and
`types::Result` to `types::ResultType` to avoid clashing with the
standard library types. The `Option`-based types were handled with
this as well.
3. The payload type of `variant` and `result` types are now optional.
This affected many locations that calculate flat type
representations, ABI information, etc. The `#[derive(ComponentType)]`
macro now specifically handles Rust-defined `enum` types which have
no payload to the equivalent in the component model.
4. Functions can now return multiple parameters. This changed the
signature of invoking component functions because the return value is
now bound by `ComponentNamedList` (renamed from `ComponentParams`).
This had a large effect in the tests, fuzz test case generation, etc.
5. Function types with 2-or-more parameters/results must uniquely name
all parameters/results. This mostly affected the text format used
throughout the tests.
I haven't added specifically new tests for multi-return but I changed a
number of tests to use it. Additionally I've updated the fuzzers to all
exercise multi-return as well so I think we should get some good
coverage with that.
* Update version numbers
* Use crates.io
When trying to read generated CLIF, it's nice to be able to see at a
glance that some of the operands are defined by `iconst` and similar
instructions, without having to go find each operand's definition
manually.
All of the `*_imm` instructions are rewritten during legalization to an
explicit `iconst` plus the general form of the operator, so backends
never see them. Therefore these ISLE rules in the x64 backend can never
match anything.
This fixes two problems: minimum symbol alignment for the LARL
instruction, and alignment requirements for LRL/LGRL etc.
The first problem is that the LARL instruction used to load a
symbol address (PC relative) requires that the target symbol
is at least 2-byte aligned. This is always guaranteed for code
symbols (all instructions must be 2-aligned anyway), but not
necessarily for data symbols.
Other s390x compilers fix this problem by ensuring that all
global symbols are always emitted with a minimum 2-byte
alignment. This patch introduces an equivalent mechanism
for cranelift:
- Add a symbol_alignment routine to TargetIsa, similar to the
existing code_section_alignment routine.
- Respect symbol_alignment as minimum alignment for all symbols
emitted in the object backend (code and data).
The second problem is that PC-relative instructions that
directly *access* data (like LRL/LGRL, STRL/STGRL etc.)
not only have the 2-byte requirement like LARL, but actually
require that their memory operand is *naturally* aligned
(i.e. alignment is at least the size of the access).
This property (natural alignment for memory accesses) is
supposed to be provided by the "aligned" flag in MemFlags;
however, this is not implemented correctly at the moment.
To fix this, this patch:
- Only emits PC-relative memory access instructions if the
"aligned" flag is set in the associated MemFlags.
- Fixes a bug in emit_small_memory_copy and emit_small_memset
which currently set the aligned flag unconditionally, ignoring
the actual alignment info passed by their caller.
Tested with wasmtime and cg_clif.
* Add a test for iadd_pairwise with swiden input
* Implement iadd_pairwise for swiden_{low,high} input
* Add a test case for iadd_pairwise with uwiden input
* Implement iadd_pairwise with uwiden
* Cranelift: Use bump allocation in `remove_constant_phis` pass
This makes compilation 2-6% faster for Sightglass's bz2 benchmark:
```
compilation :: cycles :: benchmarks/bz2/benchmark.wasm
Δ = 7290648.36 ± 4245152.07 (confidence = 99%)
bump.so is 1.02x to 1.06x faster than main.so!
[166388177 183238542.98 214732518] bump.so
[172836648 190529191.34 217514271] main.so
compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[182220055 225793551.12 277857575] bump.so
[193212613 227784078.61 277175335] main.so
compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[3848442474 4295214144.37 4665127241] bump.so
[3969505457 4262415290.10 4563869974] main.so
```
* Add audit for `bumpalo`
* Add an audit of `arrayvec` version 0.7.2
* Remove unnecessary `collect` into `Vec`
I wasn't able to measure any perf difference here, but its nice to do anyways.
* Use a `SecondaryMap` for keeping track of summaries
* Cranelift: Remove the `ABICaller` trait
It has only one implementation: the `ABICallerImpl` struct. We can just use that
directly rather than having extra, unnecessary layers of generics and abstractions.
* Cranelift: Rename `ABICallerImpl` to `Caller`
* Cranelift: Remove `ABICallee` trait
It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.
* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`
* Fix comments as per review
* Rename `AbiCallee` to `Callee`