* cranelift-wasm: translate Wasm loads into lower-level CLIF operations
Rather than using `heap_{load,store,addr}`.
* cranelift: Remove the `heap_{addr,load,store}` instructions
These are now legalized in the `cranelift-wasm` frontend.
* cranelift: Remove the `ir::Heap` entity from CLIF
* Port basic memory operation tests to .wat filetests
* Remove test for verifying CLIF heaps
* Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test
* Remove `heap_addr` from readonly.clif test
* Remove `heap_addr` from `table_addr.clif` test
* Remove `heap_addr` from the simd-fvpromote_low.clif test
* Remove `heap_addr` from simd-fvdemote.clif test
* Remove `heap_addr` from the load-op-store.clif test
* Remove the CLIF heap runtest
* Remove `heap_addr` from the global_value.clif test
* Remove `heap_addr` from fpromote.clif runtests
* Remove `heap_addr` from fdemote.clif runtests
* Remove `heap_addr` from memory.clif parser test
* Remove `heap_addr` from reject_load_readonly.clif test
* Remove `heap_addr` from reject_load_notrap.clif test
* Remove `heap_addr` from load_readonly_notrap.clif test
* Remove `static-heap-without-guard-pages.clif` test
Will be subsumed when we port `make-heap-load-store-tests.sh` to generating
`.wat` tests.
* Remove `static-heap-with-guard-pages.clif` test
Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat`
tests.
* Remove more heap tests
These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat`
tests.
* Remove `heap_addr` from `simple-alias.clif` test
* Remove `heap_addr` from partial-redundancy.clif test
* Remove `heap_addr` from multiple-blocks.clif test
* Remove `heap_addr` from fence.clif test
* Remove `heap_addr` from extends.clif test
* Remove runtests that rely on heaps
Heaps are not a thing in CLIF or the interpreter anymore
* Add generated load/store `.wat` tests
* Enable memory-related wasm features in `.wat` tests
* Remove CLIF heap from fcmp-mem-bug.clif test
* Add a mode for compiling `.wat` all the way to assembly in filetests
* Also generate WAT to assembly tests in `make-load-store-tests.sh`
* cargo fmt
* Reinstate `f{de,pro}mote.clif` tests without the heap bits
* Remove undefined doc link
* Remove outdated SVG and dot file from docs
* Add docs about `None` returns for base address computation helpers
* Factor out `env.heap_access_spectre_mitigation()` to a local
* Expand docs for `FuncEnvironment::heaps` trait method
* Restore f{de,pro}mote+load clif runtests with stack memory
* cranelift-filetest: Add the ability to test `.wat` to assembly
* Make the load/store test case generator script use `.wat` tests
And generate tests that exercise both Wasm-to-CLIF lowering and Wasm all the way
to assembly.
* Remove old versions of generated load/store tests
* Add new generated load/store tests
* Fix filename reference in script
* Cranelift: implement `heap_{load,store}` instruction legalization
This does not remove `heap_addr` yet, but it does factor out the common
bounds-check-and-compute-the-native-address functionality that is shared between
all of `heap_{addr,load,store}`.
Finally, this adds a missing optimization for when we can dedupe explicit bounds
checks for static memories and Spectre mitigations.
* Cranelift: Enable `heap_load_store_*` run tests on all targets
* Turn off probestack by default in Cranelift
The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.
* aarch64: Improve codegen for AMode fallback
Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.
* aarch64: Implement inline stack probes
This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.
* Enable inline probestack for AArch64 and Riscv64
This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.
* Enable probestack for aarch64 in cranelift-fuzzgen
* Address review comments
* Remove implicit stack overflow traps from x64 backend
This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.
This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.
* Fix a style issue
Was missing some '$' characters and so was comparing string literals against
string literals instead of variable values against string literals. Regenerated
tests to fix them and add missing tests.
* Cranelift: Define `heap_load` and `heap_store` instructions
* Cranelift: Implement interpreter support for `heap_load` and `heap_store`
* Cranelift: Add a suite runtests for `heap_{load,store}`
There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.
* Review feedback
* cranelift: Cleanup `fdemote`/`fpromote` tests
* cranelift: Fix `fdemote`/`fpromote` instruction docs
The verifier fails if the input and output types are the same
for these instructions
* cranelift: Fix `fdemote`/`fpromote` in the interpreter
* fuzzgen: Add `fdemote`/`fpromote`
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted. These define the lane order
to be used when casting between types of different lane counts.
Update all users to pass an appropriate MemFlags argument.
Implement lane swaps where necessary in the s390x back-end.
This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
* fuzzgen: Request only one variable for bswap
This was included by accident. Bswap only has one input, instead of two.
* cranelift: Add `bswap.i128` support
Adds support only for x86, AArch64, S390X.
RISCV does not yet have bswap.
- Allow bitcast for vectors with differing lane widths
- Remove raw_bitcast IR instruction
- Change all users of raw_bitcast to bitcast
- Implement support for no-op bitcast cases across backends
This implements the second step of the plan outlined here:
https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.
* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.
As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.
Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap
* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.
* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing
* For completeness, added bswap to the interpreter as well.
* added filetests and runtests for each ISA
* added bswap to fuzzgen, thanks to afonso360 for the code there
* 128-bit swaps are not yet implemented, that can be done later
* aarch64: Fix incorrect masking for small types on bmask
`bmask` was accidentally relying on the uppermost bits of the register
for small types.
This was found by fuzzgen, when it generated a shift left followed by
a bmask, the shift left shifted the bits out of the range of the input
type (i8), however these are not automatically cleared since they
remained inside the 32 bits of the register.
That caused issues when the bmask tried to compare the whole register
instead of just the bottom bits. The solution here is to mask the upper
bits for small types.
* aarch64: Emit 32bit cmp on bmask
This fixes an issue where bmask was accidentally comparing the
upper bits of the register by always using a 64bit cmp.
* riscv: Mask high bits in bmask
* riscv: Add compile tests for br{z,nz}
* riscv: Use shifts to mask 32bit values
This produces less code than the AND since that version needs to
load an immediate constant from memory.
* cranelift: Update test input to hexadecimal values
This makes it a bit more clear what is being tested.
* riscv: Use addiw for masking 32 bit values
Co-authored-by: Trevor Elliott <telliott@fastly.com>
* aarch64: Update bmask rule priority
Co-authored-by: Trevor Elliott <telliott@fastly.com>
Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely.
The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.
As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input.
For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.
* cranelift: Remove iconst.i128
* bugpoint: Report Changed when only one instruction is mutated
* cranelift: Fix egraph bxor rule
* cranelift: Remove some simple_preopt opts for i128
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.
Fixes#3205
Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Trueif`
- `Trueff`
- `Trapif`
- `Trapff`
- `Select`
- `Selectif`
- `SelectifSpectreGuard`
Copyright (c) 2022 Arm Limited
Improved the instruction lowering for the following opcodes on AArch64,
and introduced support for converting to integers less than 32-bits wide
as per the docs:
- `FcvtToSintSat`
- `FcvtToUintSat`
Copyright (c) 2022 Arm Limited
* Vector bitcast support (AArch64 & Interpreter)
Implemented support for `bitcast` on vector values for AArch64 and the
interpreter.
Also corrected the verifier to ensure that the size, in bits, of the input and
output types match for a `bitcast`, per the docs.
Copyright (c) 2022 Arm Limited
* `I128` same-type bitcast support
Copyright (c) 2022 Arm Limited
* Directly return input for 64-bit GPR<=>GPR bitcast
Copyright (c) 2022 Arm Limited
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* cranelift: Remove of/nof overflow flags from icmp
Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.
Fixes#3060, fixes#4406, and fixes #4875... by deleting all the code
that could have been buggy.
This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.
* Restore trapif tests
It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
* cranelift: Add `fcmp` tests
Some of these are disabled on aarch64 due to not being implemented yet.
* cranelift: Implement float PartialEq for Ieee{32,64} (fixes#4828)
Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.
This is not correct for floats because `NaN != NaN`.
PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.
The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.
* cranelift: Disable `fcmp ord` tests on aarch64
* cranelift: Disable `fcmp ueq` tests on aarch64
* cranelift: Implement `bnot` in interpreter
* cranelift: Register all functions in test file for interpreter
* cranelift: Relax signature checking for bools and vectors
Ported the existing implementation of `fcmp` for AArch64 to ISLE.
This also ports the `lower_vector_comparison` method to ISLE.
Copyright (c) 2022 Arm Limited
Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load.
Fixes#4761
* cranelift: Change test runner order
Changes the ordering of runtests to run per target and then per function.
This change doesn't do a lot by itself, but helps future refactorings of runtests.
* cranelift: Rename SingleFunctionCompiler to TestCaseCompiler
* cranelift: Skip runtests per target instead of per run
* cranelift: Deduplicate test names
With the upcoming changes to the runtest infrastructure we require unique ExtNames for all tests.
Note that for test names we have a 16 character limit on test names, and must be unique within those 16 characters.
* cranelift: Add TestFileCompiler to runtests
TestFileCompiler allows us to compile the entire file once, and then call the trampolines for each test.
The previous code was compiling the function for each invocation of a test.
* cranelift: Deduplicate ExtName for avg_round tests
* cranelift: Rename functions as they are defined.
The JIT internally only deals with User functions, and cannot link test name funcs.
This also caches trampolines by signature.
* cranelift: Preserve original name when reporting errors.
* cranelift: Rename aarch64 test functions
* cranelift: Add `call` and `call_indirect` tests!
* cranelift: Add pauth runtests for aarch64
* cranelift: Rename duplicate s390x tests
* cranelift: Delete `i128_bricmp_of` function from i128-bricmp
It looks like we forgot to delete it when it was moved to
`i128-bricmp-overflow`, and since it didn't have a run invocation
it was never compiled.
However, s390x does not support this, and panics when lowering.
* cranelift: Add `colocated` call tests
* cranelift: Rename *more* `s390x` tests
* cranelift: Add pauth + sign_return_address call tests
* cranelift: Undeduplicate test names
With the latest main changes we now support *unlimited* length test names.
This commit reverts:
52274676ff631c630f9879dd32e756566d3e700f
7989edc172493547cdf63e180bb58365e8a43a42
25c8a8395527d98976be6a34baa3b0b214776739
792e8cfa8f748077f9d80fe7ee5e958b7124e83b
* cranelift: Add LibCall tests
* cranelift: Revert more test names
These weren't auto reverted by the previous revert.
* cranelift: Disable libcall tests for aarch64
* cranelift: Runtest fibonacci tests
* cranelift: Misc cleanup
Lower `shuffle` and `swizzle` in ISLE.
This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.
I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
* Port `Fcopysign`..``FcvtToSintSat` to ISLE (AArch64)
Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `Fcopysign`
- Also introduced missing support for `fcopysign` on vector values, as
per the docs.
- This introduces the vector encoding for the `SLI` machine
instruction.
- `FcvtToUint`
- `FcvtToSint`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtToUintSat`
- `FcvtToSintSat`
Copyright (c) 2022 Arm Limited
* Document helpers and abstract conversion checks
* x64: Mask shift amounts for small types
* cranelift: Disable i128 shifts in fuzzer again
They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed
* cranelift: Avoid masking shifts for 32 and 64 bit cases
* cranelift: Add const shift tests and fix them
* cranelift: Remove const `rotl` cases
Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
In order to keep the `ExternalName` enum small, the `TestcaseName`
struct was limited to 17 bytes: a 1 byte length and a 16 byte buffer.
Due to alignment, that made `ExternalName` 20 bytes.
That fixed-size buffer means that the names of functions in Cranelift
filetests are truncated to fit, which limits our ability to give tests
meaningful names. And I think meaningful names are important in tests.
This patch replaces the inline `TestcaseName` buffer with a
heap-allocated slice. We don't care about performance for test names, so
an indirection out to the heap is fine in that case. But we do care
somewhat about the size of `ExternalName` when it's used during
compiles.
On 64-bit systems, `Box<[u8]>` is 16 bytes, so `TestcaseName` gets one
byte smaller. Unfortunately, its alignment is 8 bytes, so `ExternalName`
grows from 20 to 24 bytes.
According to `valgrind --tool=dhat`, this change has very little effect
on compiler performance. Building wasmtime with `--no-default-features
--release`, and compiling the pulldown-cmark benchmark from Sightglass,
I measured these differences between `main` and this patch:
- total number of allocations didn't change (`ExternalName::TestCase` is
not used in normal compiles)
- 592 more bytes allocated over the process lifetime, out of 171.5MiB
- 320 more bytes allocated at peak heap size, out of 12MiB
- 0.24% more instructions executed
- 16,987 more bytes written
- 12,120 _fewer_ bytes read