x64: Improve codegen for splats (#6025)

This commit goes through the lowerings for the CLIF `splat` instruction and improves the support for each operator. Many of these lowerings are mirrored from v8/SpiderMonkey and there are a number of improvements: * AVX2 `v{p,}broadcast*` instructions are added and used when available. * Float-based splats are much simpler and always a single-instruction * Integer-based splats don't insert into an uninit xmm value and instead start out with a `movd` to move into an `xmm` register. This thoeretically breaks dependencies with prior instructions since `movd` creates a fresh new value in the destination register. * Loads are now sunk into all of the instructions. A new extractor, `sinkable_load_exact`, was added to sink the i8/i16 loads.
2023-03-15 16:33:56 -05:00
parent a10c50afe9
commit d76f7ee52e
12 changed files with 1216 additions and 82 deletions
--- a/cranelift/filetests/filetests/runtests/simd-splat.clif
+++ b/cranelift/filetests/filetests/runtests/simd-splat.clif
@@ -4,6 +4,8 @@ target aarch64
 target s390x
 set enable_simd
 target x86_64 has_sse3 has_ssse3 has_sse41
+target x86_64 has_sse3 has_ssse3 has_sse41 has_avx
+target x86_64 has_sse3 has_ssse3 has_sse41 has_avx has_avx2

 function %splat_i8x16(i8) -> i8x16 {
 block0(v0: i8):