x64: Improve codegen for splats (#6025)
This commit goes through the lowerings for the CLIF `splat` instruction
and improves the support for each operator. Many of these lowerings are
mirrored from v8/SpiderMonkey and there are a number of improvements:
* AVX2 `v{p,}broadcast*` instructions are added and used when available.
* Float-based splats are much simpler and always a single-instruction
* Integer-based splats don't insert into an uninit xmm value and instead
start out with a `movd` to move into an `xmm` register. This
thoeretically breaks dependencies with prior instructions since `movd`
creates a fresh new value in the destination register.
* Loads are now sunk into all of the instructions. A new extractor,
`sinkable_load_exact`, was added to sink the i8/i16 loads.
This commit is contained in:
@@ -4,6 +4,8 @@ target aarch64
|
||||
target s390x
|
||||
set enable_simd
|
||||
target x86_64 has_sse3 has_ssse3 has_sse41
|
||||
target x86_64 has_sse3 has_ssse3 has_sse41 has_avx
|
||||
target x86_64 has_sse3 has_ssse3 has_sse41 has_avx has_avx2
|
||||
|
||||
function %splat_i8x16(i8) -> i8x16 {
|
||||
block0(v0: i8):
|
||||
|
||||
Reference in New Issue
Block a user