In order to register traps for `load_splat`, several instruction formats need knowledge of `SourceLoc`s; however, since the x64 backend does not correctly and completely register traps for `RegMem::Mem` variants I opened https://github.com/bytecodealliance/wasmtime/issues/2290 to discuss and resolve this issue. In the meantime, the current behavior (i.e. remaining largely unaware of `SourceLoc`s) is retained.
This change abstracts away (from the perspective of the new backend) how immediate values are stored in InstructionData. It gathers large immediates from necessary places (e.g. constant pool) and delegates to `InstructionData::imm_value` for the rest. This refactor only touches original users of `LowerCtx::get_immediate` but a future change could do the same for any place the new backend is accessing InstructionData directly to retrieve immediates.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.
In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).
Fixes#2254.
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
In particular:
- try to optimize the integer emission into a 32-bit emission, when the
high bits are all zero, and stop relying on the caller of `imm_r` to
ensure this.
- rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`.
- generate a sign-extending mov 32-bit immediate to 64-bits, whenever
possible.
- fix a few places where the previous commit did introduce the
generation of zero-constants with xor, when calling `put_input_to_reg`,
thus clobbering the flags before they were read.
Eventually, we should be able to unify this function's implementation
with the aarch64 one; but the latter does much more, and this would
require abstractions brought up in another pending PR#2142.
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.
This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
The logic for generation of shifts-by-immediate was not quite right. The result was that even
shifts by an amount known at compile time were being done by moving the shift immediate into %cl
and then doing a variable shift by %cl. The effect is worse than it sounds, because all of
those shift constants are small and often used in multiple places, so they were GVN'd up and
often ended up at the entry block of the function. Hence these were connected to the use points
by long live ranges which got spilled. So all in all, most of the win here comes from avoiding
spilling.
The problem was caused by this line, in the `Opcode::Ishl | Opcode::Ushr ..` case:
```
let (count, rhs) = if let Some(cst) = ctx.get_constant(inputs[1].insn) {
```
`inputs[]` appears to refer to this CLIF instruction's inputs, and bizarrely `inputs[].insn` all
refer to the instruction (the shift) itself. Hence `ctx.get_constant(inputs[1].insn)` asks
"does this shift instruction produce a constant" to which the answer is always "no", so the
shift-by-unknown amount code is always generated. The fix here is to change that expression to
```
let (count, rhs) = if let Some(cst) = ctx.get_input(insn, 1).constant {
```
`get_input`'s result conveniently includes a `constant` field of type `Option<u64>`, so we just
use that instead.