s390x: Support both big- and little-endian vector lane order (#4682)

This implements the s390x back-end portion of the solution for https://github.com/bytecodealliance/wasmtime/issues/4566 We now support both big- and little-endian vector lane order in code generation. The order used for a function is determined by the function's ABI: if it uses a Wasmtime ABI, it will use little-endian lane order, and big-endian lane order otherwise. (This ensures that all raw_bitcast instructions generated by both wasmtime and other cranelift frontends can always be implemented as a no-op.) Lane order affects the implementation of a number of operations: - Vector immediates - Vector memory load / store (in big- and little-endian variants) - Operations explicitly using lane numbers (insertlane, extractlane, shuffle, swizzle) - Operations implicitly using lane numbers (iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits) In addition, when calling a function using a different lane order, we need to lane-swap all vector values passed or returned in registers. A small number of changes to common code were also needed: - Ensure we always select a Wasmtime calling convention on s390x in crates/cranelift (func_signature). - Fix vector immediates for filetests/runtests. In PR #4427, I attempted to fix this by byte-swapping the V128 value, but with the new scheme, we'd instead need to perform a per-lane byte swap. Since we do not know the actual type in write_to_slice and read_from_slice, this isn't easily possible. Revert this part of PR #4427 again, and instead just mark the memory buffer as little-endian when emitting the trampoline; the back-end will then emit correct code to load the constant. - Change a runtest in simd-bitselect-to-vselect.clif to no longer make little-endian lane order assumptions. - Remove runtests in simd-swizzle.clif that make little-endian lane order assumptions by relying on implicit type conversion when using a non-i16x8 swizzle result type (this feature should probably be removed anyway). Tested with both wasmtime and cg_clif.
2022-08-11 21:10:46 +02:00
parent c1c48b4386
commit 67870d1518
29 changed files with 6584 additions and 593 deletions
--- a/cranelift/filetests/src/function_runner.rs
+++ b/cranelift/filetests/src/function_runner.rs
@@ -285,10 +285,16 @@ fn make_trampoline(signature: &ir::Signature, isa: &dyn TargetIsa) -> Function {
            // Calculate the type to load from memory, using integers for booleans (no encodings).
            let ty = param.value_type.coerce_bools_to_ints();

+            // We always store vector types in little-endian byte order as DataValue.
+            let mut flags = ir::MemFlags::trusted();
+            if param.value_type.is_vector() {
+                flags.set_endianness(ir::Endianness::Little);
+            }
+
            // Load the value.
            let loaded = builder.ins().load(
                ty,
-                ir::MemFlags::trusted(),
+                flags,
                values_vec_ptr_val,
                (i * UnboxedValues::SLOT_SIZE) as i32,
            );
@@ -331,9 +337,14 @@ fn make_trampoline(signature: &ir::Signature, isa: &dyn TargetIsa) -> Function {
        } else {
            *value
        };
+        // We always store vector types in little-endian byte order as DataValue.
+        let mut flags = ir::MemFlags::trusted();
+        if param.value_type.is_vector() {
+            flags.set_endianness(ir::Endianness::Little);
+        }
        // Store the value.
        builder.ins().store(
-            ir::MemFlags::trusted(),
+            flags,
            value,
            values_vec_ptr_val,
            (i * UnboxedValues::SLOT_SIZE) as i32,
@@ -400,11 +411,11 @@ mod test {
 block0(v0: i64, v1: i64):
    v2 = load.f32 notrap aligned v1
    v3 = load.i8 notrap aligned v1+16
-    v4 = load.i64x2 notrap aligned v1+32
+    v4 = load.i64x2 notrap aligned little v1+32
    v5 = load.i8 notrap aligned v1+48
    v6 = icmp_imm ne v5, 0
    v7, v8 = call_indirect sig0, v0(v2, v3, v4, v6)
-    store notrap aligned v7, v1
+    store notrap aligned little v7, v1
    v9 = bint.i64 v8
    store notrap aligned v9, v1+16
    return