x64: Elide more uextend with extractlane (#6045)
* x64: Elide more uextend with extractlane
I've confirmed locally now that `pextr{b,w,d}` all zero the upper bits
of the full 64-bit register size which means that the `extractlane`
operation with a zero-extend can be elided for more cases, including
8-to-64-bit casts as well as 32-to-64.
This helps elide a few extra `mov`s in a loop I was looking at and had a
modest corresponding increase in performance (my guess was due to the
slightly decreased code size mostly as opposed to the removed `mov`s).
* Remove stray file
This commit is contained in:
@@ -3808,6 +3808,8 @@
|
||||
(rule 0 (lower (extractlane val @ (value_type ty @ (multi_lane 16 8)) (u8_from_uimm8 lane)))
|
||||
(x64_pextrw val lane))
|
||||
|
||||
;; See the note in the 8x16 case above for how this rule is connected to
|
||||
;; `extend_to_gpr`.
|
||||
(rule 0 (lower (extractlane val @ (value_type ty @ (multi_lane 32 4)) (u8_from_uimm8 lane)))
|
||||
(x64_pextrd val lane))
|
||||
|
||||
|
||||
Reference in New Issue
Block a user