x64: Add shuffle cases for punpck{h,l}bw (#5905)

* x64: Add `shuffle` cases for `punpck{h,l}bw`

I noticed this difference between LLVM and Cranelift for something I was
looking at recently, and while it's probably not all that common I
figured I'd add it here since it should be somewhat useful nevertheless.

* Review feedback

* Use u128 extractor instead
This commit is contained in:
Alex Crichton
2023-03-01 15:49:00 -06:00
committed by GitHub
parent 6f6fcfa437
commit f05babc744
2 changed files with 64 additions and 0 deletions

View File

@@ -3510,6 +3510,16 @@
;; Rules for `shuffle` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Special case for the `punpckhbw` instruction which interleaves the upper
;; lanes of the two input registers.
(rule 4 (lower (shuffle a b (u128_from_immediate 0x1f0f_1e0e_1d0d_1c0c_1b0b_1a0a_1909_1808)))
(x64_punpckhbw a b))
;; Special case for the `punpcklbw` instruction which interleaves the lower
;; lanes of the two input registers.
(rule 4 (lower (shuffle a b (u128_from_immediate 0x1707_1606_1505_1404_1303_1202_1101_1000)))
(x64_punpcklbw a b))
;; If `lhs` and `rhs` are the same we can use a single PSHUFB to shuffle the XMM
;; register. We statically build `constructed_mask` to zero out any unknown lane
;; indices (may not be completely necessary: verification could fail incorrect