s390x: Support both big- and little-endian vector lane order (#4682)
This implements the s390x back-end portion of the solution for https://github.com/bytecodealliance/wasmtime/issues/4566 We now support both big- and little-endian vector lane order in code generation. The order used for a function is determined by the function's ABI: if it uses a Wasmtime ABI, it will use little-endian lane order, and big-endian lane order otherwise. (This ensures that all raw_bitcast instructions generated by both wasmtime and other cranelift frontends can always be implemented as a no-op.) Lane order affects the implementation of a number of operations: - Vector immediates - Vector memory load / store (in big- and little-endian variants) - Operations explicitly using lane numbers (insertlane, extractlane, shuffle, swizzle) - Operations implicitly using lane numbers (iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits) In addition, when calling a function using a different lane order, we need to lane-swap all vector values passed or returned in registers. A small number of changes to common code were also needed: - Ensure we always select a Wasmtime calling convention on s390x in crates/cranelift (func_signature). - Fix vector immediates for filetests/runtests. In PR #4427, I attempted to fix this by byte-swapping the V128 value, but with the new scheme, we'd instead need to perform a per-lane byte swap. Since we do not know the actual type in write_to_slice and read_from_slice, this isn't easily possible. Revert this part of PR #4427 again, and instead just mark the memory buffer as little-endian when emitting the trampoline; the back-end will then emit correct code to load the constant. - Change a runtest in simd-bitselect-to-vselect.clif to no longer make little-endian lane order assumptions. - Remove runtests in simd-swizzle.clif that make little-endian lane order assumptions by relying on implicit type conversion when using a non-i16x8 swizzle result type (this feature should probably be removed anyway). Tested with both wasmtime and cg_clif.
This commit is contained in:
@@ -8,7 +8,7 @@ use cranelift_codegen::ir;
|
||||
use cranelift_codegen::isa::{unwind::UnwindInfo, CallConv, TargetIsa};
|
||||
use cranelift_entity::PrimaryMap;
|
||||
use cranelift_wasm::{DefinedFuncIndex, FuncIndex, WasmFuncType, WasmType};
|
||||
use target_lexicon::CallingConvention;
|
||||
use target_lexicon::{Architecture, CallingConvention};
|
||||
use wasmtime_environ::{
|
||||
FilePos, FunctionInfo, InstructionAddressMap, ModuleTranslation, ModuleTypes, TrapInformation,
|
||||
};
|
||||
@@ -190,6 +190,10 @@ fn func_signature(
|
||||
// about pointer authentication usage, so we can't just use
|
||||
// `CallConv::Fast`.
|
||||
CallConv::WasmtimeAppleAarch64
|
||||
} else if isa.triple().architecture == Architecture::S390x {
|
||||
// On S390x we need a Wasmtime calling convention to ensure
|
||||
// we're using little-endian vector lane order.
|
||||
wasmtime_call_conv(isa)
|
||||
} else {
|
||||
CallConv::Fast
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user