x64: Add most remaining AVX lowerings (#5819)
* x64: Add most remaining AVX lowerings This commit goes through `inst.isle` and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as `roundps`. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR. Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for `movdqa` between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions. As a side note, the recent capstone integration into `precise-output` tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests! * Move `vpinsr*` instructions to their own variant Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where. * Remove `Inst::produces_const` accessor Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such. * Fix tests * Register traps in `MachBuffer` for load-folding ops This adds a missing `add_trap` to encoding of VEX instructions with memory operands to ensure that if they cause a segfault that there's appropriate metadata for Wasmtime to understand that the instruction could in fact trap. This fixes a fuzz test case found locally where v8 trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
This commit is contained in:
@@ -1333,11 +1333,11 @@
|
||||
|
||||
;; i32x4.replace_lane
|
||||
(rule (vec_insert_lane $I32X4 vec val idx)
|
||||
(x64_pinsrd vec val idx (OperandSize.Size32)))
|
||||
(x64_pinsrd vec val idx))
|
||||
|
||||
;; i64x2.replace_lane
|
||||
(rule (vec_insert_lane $I64X2 vec val idx)
|
||||
(x64_pinsrd vec val idx (OperandSize.Size64)))
|
||||
(x64_pinsrq vec val idx))
|
||||
|
||||
;; f32x4.replace_lane
|
||||
(rule (vec_insert_lane $F32X4 vec val idx)
|
||||
@@ -2982,8 +2982,8 @@
|
||||
|
||||
;; Rules for `fcvt_low_from_sint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
(rule (lower (fcvt_low_from_sint a @ (value_type ty)))
|
||||
(x64_cvtdq2pd ty a))
|
||||
(rule (lower (fcvt_low_from_sint a))
|
||||
(x64_cvtdq2pd a))
|
||||
|
||||
;; Rules for `fcvt_from_uint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
@@ -3148,7 +3148,7 @@
|
||||
|
||||
;; Converting to unsigned int so if float src is negative or NaN
|
||||
;; will first set to zero.
|
||||
(tmp2 Xmm (x64_pxor src src)) ;; make a zero
|
||||
(tmp2 Xmm (xmm_zero $F32X4))
|
||||
(dst Xmm (x64_maxps src tmp2))
|
||||
|
||||
;; Set tmp2 to INT_MAX+1. It is important to note here that after it looks
|
||||
@@ -3181,7 +3181,7 @@
|
||||
;; that have positive overflow (based on the mask) by setting these lanes
|
||||
;; to 0x7FFFFFFF
|
||||
(tmp1 Xmm (x64_pxor tmp1 tmp2))
|
||||
(tmp2 Xmm (x64_pxor tmp2 tmp2)) ;; make another zero
|
||||
(tmp2 Xmm (xmm_zero $I32X4))
|
||||
(tmp1 Xmm (x64_pmaxsd tmp1 tmp2)))
|
||||
|
||||
;; Add this second set of converted lanes to the original to properly handle
|
||||
@@ -3611,7 +3611,7 @@
|
||||
|
||||
(rule (lower (has_type (multi_lane 8 16) (splat src)))
|
||||
(let ((vec Xmm (vec_insert_lane $I8X16 (xmm_uninit_value) src 0))
|
||||
(zeros Xmm (x64_pxor vec vec)))
|
||||
(zeros Xmm (xmm_zero $I8X16)))
|
||||
;; Shuffle the lowest byte lane to all other lanes.
|
||||
(x64_pshufb vec zeros)))
|
||||
|
||||
@@ -3661,7 +3661,7 @@
|
||||
|
||||
(rule (lower (vall_true val @ (value_type ty)))
|
||||
(let ((src Xmm val)
|
||||
(zeros Xmm (x64_pxor src src))
|
||||
(zeros Xmm (xmm_zero ty))
|
||||
(cmp Xmm (x64_pcmpeq (vec_int_type ty) src zeros)))
|
||||
(with_flags (x64_ptest cmp cmp) (x64_setcc (CC.Z)))))
|
||||
|
||||
@@ -3752,7 +3752,7 @@
|
||||
|
||||
;; MOVAPD xmm_y, xmm_x
|
||||
;; XORPD xmm_tmp, xmm_tmp
|
||||
(zeros Xmm (x64_xorpd src src))
|
||||
(zeros Xmm (xmm_zero $F64X2))
|
||||
(dst Xmm (x64_maxpd src zeros))
|
||||
|
||||
(umax_mask Xmm (x64_xmm_load_const $F64X2 (uunarrow_umax_mask)))
|
||||
|
||||
Reference in New Issue
Block a user