x64: Elide more uextend with extractlane (#6045)

* x64: Elide more uextend with extractlane

I've confirmed locally now that `pextr{b,w,d}` all zero the upper bits
of the full 64-bit register size which means that the `extractlane`
operation with a zero-extend can be elided for more cases, including
8-to-64-bit casts as well as 32-to-64.

This helps elide a few extra `mov`s in a loop I was looking at and had a
modest corresponding increase in performance (my guess was due to the
slightly decreased code size mostly as opposed to the removed `mov`s).

* Remove stray file
This commit is contained in:
Alex Crichton
2023-03-17 11:18:41 -05:00
committed by GitHub
parent d939bdbd07
commit 5ebe53a351
3 changed files with 86 additions and 9 deletions

View File

@@ -3808,6 +3808,8 @@
(rule 0 (lower (extractlane val @ (value_type ty @ (multi_lane 16 8)) (u8_from_uimm8 lane)))
(x64_pextrw val lane))
;; See the note in the 8x16 case above for how this rule is connected to
;; `extend_to_gpr`.
(rule 0 (lower (extractlane val @ (value_type ty @ (multi_lane 32 4)) (u8_from_uimm8 lane)))
(x64_pextrd val lane))