CL/aarch64: implement the wasm SIMD v128.load{32,64}_zero instructions.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  https://github.com/WebAssembly/simd/pull/237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
This commit is contained in:
Julian Seward
2020-11-03 17:34:07 +01:00
committed by julian-seward1
parent 285edeec3e
commit dd9bfcefaa
9 changed files with 144 additions and 35 deletions

View File

@@ -0,0 +1,33 @@
test compile
target aarch64
function %f1() -> i64x2 {
block0:
v0 = iconst.i64 281474976710657
v1 = scalar_to_vector.i64x2 v0
return v1
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x0, #1
; nextln: movk x0, #1, LSL #48
; nextln: fmov d0, x0
; nextln: mov sp, fp
; nextln: ldp fp, lr, [sp], #16
; nextln: ret
function %f2() -> i32x4 {
block0:
v0 = iconst.i32 42679
v1 = scalar_to_vector.i32x4 v0
return v1
}
; check: stp fp, lr, [sp, #-16]!
; nextln: mov fp, sp
; nextln: movz x0, #42679
; nextln: fmov s0, w0
; nextln: mov sp, fp
; nextln: ldp fp, lr, [sp], #16
; nextln: ret