The implementation is pretty straightforward. Wasm atomic instructions fall
into 5 groups
* atomic read-modify-write
* atomic compare-and-swap
* atomic loads
* atomic stores
* fences
and the implementation mirrors that structure, at both the CLIF and AArch64
levels.
At the CLIF level, there are five new instructions, one for each group. Some
comments about these:
* for those that take addresses (all except fences), the address is contained
entirely in a single `Value`; there is no offset field as there is with
normal loads and stores. Wasm atomics require alignment checks, and
removing the offset makes implementation of those checks a bit simpler.
* atomic loads and stores get their own instructions, rather than reusing the
existing load and store instructions, for two reasons:
- per above comment, makes alignment checking simpler
- reuse of existing loads and stores would require extension of `MemFlags`
to indicate atomicity, which sounds semantically unclean. For example,
then *any* instruction carrying `MemFlags` could be marked as atomic, even
in cases where it is meaningless or ambiguous.
* I tried to specify, in comments, the behaviour of these instructions as
tightly as I could. Unfortunately there is no way (per my limited CLIF
knowledge) to enforce the constraint that they may only be used on I8, I16,
I32 and I64 types, and in particular not on floating point or vector types.
The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.
At the AArch64 level, there are also five new instructions, one for each
group. All of them except `::Fence` contain multiple real machine
instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual
load-linked store-conditional loops, guarded at both ends by memory fences.
Atomic loads and stores are emitted as a load preceded by a fence, and a store
followed by a fence, respectively. The amount of fencing may be overkill, but
it reflects exactly what the SM Wasm baseline compiler for AArch64 does.
One reason to implement r-m-w and c-a-s as a single insn which is expanded
only at emission time is that we must be very careful what instructions we
allow in between the load-linked and store-conditional. In particular, we
cannot allow *any* extra memory transactions in there, since -- particularly
on low-end hardware -- that might cause the transaction to fail, hence
deadlocking the generated code. That implies that we can't present the LL/SC
loop to the register allocator as its constituent instructions, since it might
insert spills anywhere. Hence we must present it as a single indivisible
unit, as we do here. It also has the benefit of reducing the total amount of
work the RA has to do.
The only other notable feature of the r-m-w and c-a-s translations into
AArch64 code, is that they both need a scratch register internally. Rather
than faking one up by claiming, in `get_regs` that it modifies an extra
scratch register, and having to have a dummy initialisation of it, these new
instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the
cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so
as to be call-clobbered, hence their use is less likely to interfere with long
live ranges that span calls.
One subtlety regarding the use of completely fixed input and output registers
is that we must be careful how the surrounding copy from/to of the arg/result
registers is done. In particular, it is not safe to simply emit copies in
some arbitrary order if one of the arg registers is a real reg. For that
reason, the arguments are first moved into virtual regs if they are not
already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
Again, we rely on coalescing to turn them into no-ops in the common case.
There is also a ridealong fix for the AArch64 lowering case for
`Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
in a row were generated.
In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
which I believe to be correct, but for which I would appreciate a second
opinion. Unless otherwise directed, I will remove them for the final commit
but leave the associated code/comments unchanged.
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
Since we now allow constants of any size, we have to verify that `vconst` (currently the only user of the constant pool) is accessing constants that match its controlling type.
- Undo temporary changes to default features (`all-arch`) and a
signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
"undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
rust-lang/compiler-builtins [1], LLVM only supports stack probes on
x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
mutate the input IR.
- Lots of other miscellaneous cleanups.
[1] cae3e6ea23/src/probestack.rs (L39)
This patch ties together the new backend infrastructure with the
existing Cranelift codegen APIs.
With all patches in this series up to this patch applied, the ARM64
compiler is now functional and can be used. Two uses of this
functionality -- filecheck-based tests and integration into wasmtime --
will come in subsequent patches.
* Manually rename BasicBlock to BlockPredecessor
BasicBlock is a pair of (Ebb, Inst) that is used to represent the
basic block subcomponent of an Ebb that is a predecessor to an Ebb.
Eventually we will be able to remove this struct, but for now it
makes sense to give it a non-conflicting name so that we can start
to transition Ebb to represent a basic block.
I have not updated any comments that refer to BasicBlock, as
eventually we will remove BlockPredecessor and replace with Block,
which is a basic block, so the comments will become correct.
* Manually rename SSABuilder block types to avoid conflict
SSABuilder has its own Block and BlockData types. These along with
associated identifier will cause conflicts in a later commit, so
they are renamed to be more verbose here.
* Automatically rename 'Ebb' to 'Block' in *.rs
* Automatically rename 'EBB' to 'block' in *.rs
* Automatically rename 'ebb' to 'block' in *.rs
* Automatically rename 'extended basic block' to 'basic block' in *.rs
* Automatically rename 'an basic block' to 'a basic block' in *.rs
* Manually update comment for `Block`
`Block`'s wikipedia article required an update.
* Automatically rename 'an `Block`' to 'a `Block`' in *.rs
* Automatically rename 'extended_basic_block' to 'basic_block' in *.rs
* Automatically rename 'ebb' to 'block' in *.clif
* Manually rename clif constant that contains 'ebb' as substring to avoid conflict
* Automatically rename filecheck uses of 'EBB' to 'BB'
'regex: EBB' -> 'regex: BB'
'$EBB' -> '$BB'
* Automatically rename 'EBB' 'Ebb' to 'block' in *.clif
* Automatically rename 'an block' to 'a block' in *.clif
* Fix broken testcase when function name length increases
Test function names are limited to 16 characters. This causes
the new longer name to be truncated and fail a filecheck test. An
outdated comment was also fixed.
* All: Drop 'basic-blocks' feature
This makes it so that 'basic-blocks' cannot be disabled and we can
start assuming it everywhere.
* Tests: Replace non-bb filetests with bb version
* Tests: Adapt solver-fixedconflict filetests to use basic blocks
This removes `report!`, `fatal!`, and `nonfatal!` from the verifier code and replaces them with methods on `VerifierErrors`. In order to maintain similar ease-of-use, `VerifierError` is expanded with several `From` implementations that convert a tuple to a verifier error.
The failure crate invents its own traits that don't use
std::error::Error (because failure predates certain features added to
Error); this prevents using ? on an error from failure in a function
using Error. The thiserror crate integrates with the standard Error
trait instead.
-Add resumable_trap, safepoint, isnull, and null instructions
-Add Stackmap struct and StackmapSink trait
Co-authored-by: Mir Ahmed <mirahmed753@gmail.com>
Co-authored-by: Dan Gohman <sunfish@mozilla.com>
This allows prefixing BB-specific code with "#[cfg(feature = "basic-blocks")]",
which avoids having to reference an environment variable across the codebase.
The easiest way to enable the feature locally is to add the arguments
'features = ["basic-blocks"]' to the workspace Cargo.toml, where it defines
the cranelift-codegen dependency.
This is also https://bugzilla.mozilla.org/show_bug.cgi?id=1552737.
Cranelift currently has a tendency to create redundant copies (self-copies) of
values from a stack slot back to the same stack slot. This generates a
pointless load and store and an unnecessary register use. The copies are
created by `visit_inst` in regalloc/reload.rs. They appear to occur mostly,
but not exclusively, at loop heads. It's unclear why this happens.
This patch adds a special case to `visit_inst` to find such copies. They are
converted into a new instruction, `copy_nop`, which takes and produces the
same SSA names, so as not to break any of the SSA invariants, but which has a
zero-length encoding, hence removing the copy at emission time.
`copy_nop`s source and destination operands must be stack slots and of course
the *same* stack slot. The verifier has been enhanced to check this, since
misuse of `copy_nop` will likely lead to hard-to-find incorrect-code bugs.
Attempts were made to write a standalone .clif test case. But these failed
because it appears the .clif parser accepts but ignores location hints that
are stack slots. So it's impossible to write, in clif, the exact form of
`copy` instruction that triggers the transformation.