Files
wasmtime/cranelift/filetests/filetests/isa/aarch64/arithmetic.clif
Chris Fallin 2d5db92a9e Rework/simplify unwind infrastructure and implement Windows unwind.
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for `alloca` (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
`abi_impl.rs`.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from `sp` once to allocate space, then used stores to `[sp,
offset]` to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.
2021-03-11 20:03:52 -08:00

428 lines
8.1 KiB
Plaintext

test compile
set unwind_info=false
target aarch64
function %f1(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = iadd.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: add x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f2(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = isub.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sub x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f3(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = imul.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: madd x0, x0, x1, xzr
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f4(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = umulhi.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: umulh x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f5(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = smulhi.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: smulh x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f6(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = sdiv.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sdiv x2, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: adds xzr, x1, #1
; nextln: ccmp x0, #1, #nzcv, eq
; nextln: b.vc 8 ; udf
; nextln: mov x0, x2
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f7(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 2
v2 = sdiv.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x2, #2
; nextln: sdiv x1, x0, x2
; nextln: cbnz x2, 8 ; udf
; nextln: adds xzr, x2, #1
; nextln: ccmp x0, #1, #nzcv, eq
; nextln: b.vc 8 ; udf
; nextln: mov x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f8(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = udiv.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: udiv x0, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f9(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 2
v2 = udiv.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x1, #2
; nextln: udiv x0, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f10(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = srem.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sdiv x2, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: msub x0, x2, x1, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f11(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = urem.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: udiv x2, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: msub x0, x2, x1, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f12(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v2 = sdiv.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sxtw x3, w0
; nextln: sxtw x2, w1
; nextln: sdiv x0, x3, x2
; nextln: cbnz x2, 8 ; udf
; nextln: adds wzr, w2, #1
; nextln: ccmp w3, #1, #nzcv, eq
; nextln: b.vc 8 ; udf
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f13(i32) -> i32 {
block0(v0: i32):
v1 = iconst.i32 2
v2 = sdiv.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sxtw x0, w0
; nextln: movz x1, #2
; nextln: sxtw x2, w1
; nextln: sdiv x1, x0, x2
; nextln: cbnz x2, 8 ; udf
; nextln: adds wzr, w2, #1
; nextln: ccmp w0, #1, #nzcv, eq
; nextln: b.vc 8 ; udf
; nextln: mov x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f14(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v2 = udiv.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: mov w0, w0
; nextln: mov w1, w1
; nextln: udiv x0, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f15(i32) -> i32 {
block0(v0: i32):
v1 = iconst.i32 2
v2 = udiv.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: mov w0, w0
; nextln: movz x1, #2
; nextln: udiv x0, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f16(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v2 = srem.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sxtw x0, w0
; nextln: sxtw x1, w1
; nextln: sdiv x2, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: msub x0, x2, x1, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f17(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v2 = urem.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: mov w0, w0
; nextln: mov w1, w1
; nextln: udiv x2, x0, x1
; nextln: cbnz x1, 8 ; udf
; nextln: msub x0, x2, x1, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f18(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = band.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: and x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f19(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bor.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: orr x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f20(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bxor.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: eor x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f21(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = band_not.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: bic x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f22(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bor_not.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: orn x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f23(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bxor_not.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: eon x0, x0, x1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f24(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bnot.i64 v0
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: orn x0, xzr, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f25(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v2 = iconst.i32 53
v3 = ishl.i32 v0, v2
v4 = isub.i32 v1, v3
return v4
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sub w0, w1, w0, LSL 21
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f26(i32) -> i32 {
block0(v0: i32):
v1 = iconst.i32 -1
v2 = iadd.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: sub w0, w0, #1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f27(i32) -> i32 {
block0(v0: i32):
v1 = iconst.i32 -1
v2 = isub.i32 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: add w0, w0, #1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f28(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 -1
v2 = isub.i64 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: add x0, x0, #1
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f29(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 1
v2 = ineg v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x0, #1
; nextln: sub x0, xzr, x0
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f30(i8x16) -> i8x16 {
block0(v0: i8x16):
v1 = iconst.i64 1
v2 = ushr.i8x16 v0, v1
return v2
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x0, #1
; nextln: sub w0, wzr, w0
; nextln: dup v1.16b, w0
; nextln: ushl v0.16b, v0.16b, v1.16b
; nextln: ldp fp, lr, [sp], #16
; nextln: ret