x64: Add most remaining AVX lowerings (#5819)

* x64: Add most remaining AVX lowerings This commit goes through `inst.isle` and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as `roundps`. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR. Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for `movdqa` between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions. As a side note, the recent capstone integration into `precise-output` tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests! * Move `vpinsr*` instructions to their own variant Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where. * Remove `Inst::produces_const` accessor Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such. * Fix tests * Register traps in `MachBuffer` for load-folding ops This adds a missing `add_trap` to encoding of VEX instructions with memory operands to ensure that if they cause a segfault that there's appropriate metadata for Wasmtime to understand that the instruction could in fact trap. This fixes a fuzz test case found locally where v8 trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
2023-02-20 09:11:52 -06:00
parent ad128b6811
commit c26a65a854
16 changed files with 4145 additions and 466 deletions
--- a/cranelift/codegen/src/isa/x64/lower.isle
+++ b/cranelift/codegen/src/isa/x64/lower.isle
@@ -1333,11 +1333,11 @@

 ;; i32x4.replace_lane
 (rule (vec_insert_lane $I32X4 vec val idx)
-      (x64_pinsrd vec val idx (OperandSize.Size32)))
+      (x64_pinsrd vec val idx))

 ;; i64x2.replace_lane
 (rule (vec_insert_lane $I64X2 vec val idx)
-      (x64_pinsrd vec val idx (OperandSize.Size64)))
+      (x64_pinsrq vec val idx))

 ;; f32x4.replace_lane
 (rule (vec_insert_lane $F32X4 vec val idx)
@@ -2982,8 +2982,8 @@

 ;; Rules for `fcvt_low_from_sint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-(rule (lower (fcvt_low_from_sint a @ (value_type ty)))
-      (x64_cvtdq2pd ty a))
+(rule (lower (fcvt_low_from_sint a))
+      (x64_cvtdq2pd a))

 ;; Rules for `fcvt_from_uint` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

@@ -3148,7 +3148,7 @@

            ;; Converting to unsigned int so if float src is negative or NaN
            ;; will first set to zero.
-            (tmp2 Xmm (x64_pxor src src)) ;; make a zero
+            (tmp2 Xmm (xmm_zero $F32X4))
            (dst Xmm (x64_maxps src tmp2))

            ;; Set tmp2 to INT_MAX+1. It is important to note here that after it looks
@@ -3181,7 +3181,7 @@
            ;; that have positive overflow (based on the mask) by setting these lanes
            ;; to 0x7FFFFFFF
            (tmp1 Xmm (x64_pxor tmp1 tmp2))
-            (tmp2 Xmm (x64_pxor tmp2 tmp2)) ;; make another zero
+            (tmp2 Xmm (xmm_zero $I32X4))
            (tmp1 Xmm (x64_pmaxsd tmp1 tmp2)))

        ;; Add this second set of converted lanes to the original to properly handle
@@ -3611,7 +3611,7 @@

 (rule (lower (has_type (multi_lane 8 16) (splat src)))
      (let ((vec Xmm (vec_insert_lane $I8X16 (xmm_uninit_value) src 0))
-            (zeros Xmm (x64_pxor vec vec)))
+            (zeros Xmm (xmm_zero $I8X16)))
        ;; Shuffle the lowest byte lane to all other lanes.
        (x64_pshufb vec zeros)))

@@ -3661,7 +3661,7 @@

 (rule (lower (vall_true val @ (value_type ty)))
      (let ((src Xmm val)
-            (zeros Xmm (x64_pxor src src))
+            (zeros Xmm (xmm_zero ty))
            (cmp Xmm (x64_pcmpeq (vec_int_type ty) src zeros)))
        (with_flags (x64_ptest cmp cmp) (x64_setcc (CC.Z)))))

@@ -3752,7 +3752,7 @@

            ;; MOVAPD xmm_y, xmm_x
            ;; XORPD xmm_tmp, xmm_tmp
-            (zeros Xmm (x64_xorpd src src))
+            (zeros Xmm (xmm_zero $F64X2))
            (dst Xmm (x64_maxpd src zeros))

            (umax_mask Xmm (x64_xmm_load_const $F64X2 (uunarrow_umax_mask)))