CL/aarch64 back end: implement the wasm SIMD bitmask instructions

The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single AArch64 SIMD instruction, and instead need a sequence of around ten instructions. Because of this, this patch is somewhat longer and more complex than it would be for (eg) x64. Main changes are: * the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64. * at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm instructions are to be translated. * in the wasm->CLIF translation (code_translator.rs), translate into `vhigh_bits`. This is straightforward. * in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits` into equivalent sequences of AArch64 instructions. There is a different sequence for each of the `{8x16, 16x8, 32x4}` variants. All other changes are AArch64-specific, and add instruction definitions needed by the previous step: * Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by immediate) and `VecExtract` (effectively a double-length vector shift) * To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the `VecLanesOp` family add an `addv` variant. * Add supporting code for the above changes to AArch64 instructions: - getting the register uses (`aarch64_get_regs`) - mapping the registers (`aarch64_map_regs`) - printing instructions - emitting instructions (`impl MachInstEmit for Inst`). The handling of `VecShiftImm` is a bit complex. - emission tests for new instructions and variants.
2020-10-22 16:02:46 +02:00
parent b10e027fef
commit 2702942050
8 changed files with 570 additions and 5 deletions
--- a/cranelift/wasm/src/code_translator.rs
+++ b/cranelift/wasm/src/code_translator.rs
@@ -1600,6 +1600,10 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
            let bool_result = builder.ins().vall_true(a);
            state.push1(builder.ins().bint(I32, bool_result))
        }
+        Operator::I8x16Bitmask | Operator::I16x8Bitmask | Operator::I32x4Bitmask => {
+            let a = pop1_with_bitcast(state, type_of(op), builder);
+            state.push1(builder.ins().vhigh_bits(I32, a));
+        }
        Operator::I8x16Eq | Operator::I16x8Eq | Operator::I32x4Eq => {
            translate_vector_icmp(IntCC::Equal, type_of(op), builder, state)
        }
@@ -1763,10 +1767,7 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
        | Operator::F64x2Trunc
        | Operator::F64x2PMin
        | Operator::F64x2PMax
-        | Operator::F64x2Nearest
-        | Operator::I8x16Bitmask
-        | Operator::I16x8Bitmask
-        | Operator::I32x4Bitmask => {
+        | Operator::F64x2Nearest => {
            return Err(wasm_unsupported!("proposed SIMD operator {:?}", op));
        }