This patch implements, for aarch64, the following wasm SIMD extensions.
v128.load32_zero and v128.load64_zero instructions
https://github.com/WebAssembly/simd/pull/237
The changes are straightforward:
* no new CLIF instructions. They are translated into an existing CLIF scalar
load followed by a CLIF `scalar_to_vector`.
* the comment/specification for CLIF `scalar_to_vector` has been changed to
match the actual intended semantics, per consulation with Andrew Brown.
* translation from `scalar_to_vector` to aarch64 `fmov` instruction. This
has been generalised slightly so as to allow both 32- and 64-bit transfers.
* special-case zero in `lower_constant_f128` in order to avoid a
potentially slow call to `Inst::load_fp_constant128`.
* Once "Allow loads to merge into other operations during instruction
selection in MachInst backends"
(https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
we can use that functionality to pattern match the two-CLIF pair and
emit a single AArch64 instruction.
* A simple filetest has been added.
There is no comprehensive testcase in this commit, because that is a separate
repo. The implementation has been tested, nevertheless.