x64: Add most remaining AVX lowerings (#5819)

* x64: Add most remaining AVX lowerings

This commit goes through `inst.isle` and adds a corresponding AVX
lowering for most SSE lowerings. I opted to skip instructions where the
SSE lowering didn't read/modify a register, such as `roundps`. I think
that AVX will benefit these instructions when there's load-merging since
AVX doesn't require alignment, but I've deferred that work to a future
PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand
forms of AVX instructions are supported with their SSE counterparts.
This should ideally improve codegen slightly by removing register
pressure and the need for `movdqa` between registers. I've attempted to
ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into `precise-output`
tests helped me catch a number of encoding bugs much earlier than
otherwise, so I've found that incredibly useful in tests!

* Move `vpinsr*` instructions to their own variant

Use true `XmmMem` and `GprMem` types in the instruction as well to get
more type-level safety for what goes where.

* Remove `Inst::produces_const` accessor

Instead of conditionally defining regalloc and various other operations
instead add dedicated `MInst` variants for operations which are intended
to produce a constant to have more clear interactions with regalloc and
printing and such.

* Fix tests

* Register traps in `MachBuffer` for load-folding ops

This adds a missing `add_trap` to encoding of VEX instructions with
memory operands to ensure that if they cause a segfault that there's
appropriate metadata for Wasmtime to understand that the instruction
could in fact trap. This fixes a fuzz test case found locally where v8
trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
This commit is contained in:
Alex Crichton
2023-02-20 09:11:52 -06:00
committed by GitHub
parent ad128b6811
commit c26a65a854
16 changed files with 4145 additions and 466 deletions

View File

@@ -1333,11 +1333,11 @@
;; i32x4.replace_lane
(rule (vec_insert_lane $I32X4 vec val idx)
(x64_pinsrd vec val idx (OperandSize.Size32)))
(x64_pinsrd vec val idx))
;; i64x2.replace_lane
(rule (vec_insert_lane $I64X2 vec val idx)
(x64_pinsrd vec val idx (OperandSize.Size64)))
(x64_pinsrq vec val idx))
;; f32x4.replace_lane
(rule (vec_insert_lane $F32X4 vec val idx)
@@ -2982,8 +2982,8 @@
;; Rules for `fcvt_low_from_sint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(rule (lower (fcvt_low_from_sint a @ (value_type ty)))
(x64_cvtdq2pd ty a))
(rule (lower (fcvt_low_from_sint a))
(x64_cvtdq2pd a))
;; Rules for `fcvt_from_uint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -3148,7 +3148,7 @@
;; Converting to unsigned int so if float src is negative or NaN
;; will first set to zero.
(tmp2 Xmm (x64_pxor src src)) ;; make a zero
(tmp2 Xmm (xmm_zero $F32X4))
(dst Xmm (x64_maxps src tmp2))
;; Set tmp2 to INT_MAX+1. It is important to note here that after it looks
@@ -3181,7 +3181,7 @@
;; that have positive overflow (based on the mask) by setting these lanes
;; to 0x7FFFFFFF
(tmp1 Xmm (x64_pxor tmp1 tmp2))
(tmp2 Xmm (x64_pxor tmp2 tmp2)) ;; make another zero
(tmp2 Xmm (xmm_zero $I32X4))
(tmp1 Xmm (x64_pmaxsd tmp1 tmp2)))
;; Add this second set of converted lanes to the original to properly handle
@@ -3611,7 +3611,7 @@
(rule (lower (has_type (multi_lane 8 16) (splat src)))
(let ((vec Xmm (vec_insert_lane $I8X16 (xmm_uninit_value) src 0))
(zeros Xmm (x64_pxor vec vec)))
(zeros Xmm (xmm_zero $I8X16)))
;; Shuffle the lowest byte lane to all other lanes.
(x64_pshufb vec zeros)))
@@ -3661,7 +3661,7 @@
(rule (lower (vall_true val @ (value_type ty)))
(let ((src Xmm val)
(zeros Xmm (x64_pxor src src))
(zeros Xmm (xmm_zero ty))
(cmp Xmm (x64_pcmpeq (vec_int_type ty) src zeros)))
(with_flags (x64_ptest cmp cmp) (x64_setcc (CC.Z)))))
@@ -3752,7 +3752,7 @@
;; MOVAPD xmm_y, xmm_x
;; XORPD xmm_tmp, xmm_tmp
(zeros Xmm (x64_xorpd src src))
(zeros Xmm (xmm_zero $F64X2))
(dst Xmm (x64_maxpd src zeros))
(umax_mask Xmm (x64_xmm_load_const $F64X2 (uunarrow_umax_mask)))