[RFC] Dynamic Vector Support (#4200)

Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.
2022-07-07 20:54:39 +01:00
parent 9ae060a12a
commit 9c43749dfe
69 changed files with 2422 additions and 294 deletions
--- a/cranelift/filetests/filetests/runtests/dynamic-simd-arithmetic.clif
+++ b/cranelift/filetests/filetests/runtests/dynamic-simd-arithmetic.clif
@@ -0,0 +1,197 @@
+test run
+target aarch64
+
+function %i8x16_splat_add(i8, i8) -> i8x16 {
+  gv0 = dyn_scale_target_const.i8x16
+  dt0 = i8x16*gv0
+
+block0(v0: i8, v1: i8):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = iadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i8x16_splat_add(1, 3) == [4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4]
+
+function %i16x8_splat_add(i16, i16) -> i16x8 {
+  gv0 = dyn_scale_target_const.i16x8
+  dt0 = i16x8*gv0
+
+block0(v0: i16, v1: i16):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = iadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i16x8_splat_add(255, 254) == [509 509 509 509 509 509 509 509]
+
+function %i32x4_splat_add(i32, i32) -> i32x4 {
+  gv0 = dyn_scale_target_const.i32x4
+  dt0 = i32x4*gv0
+
+block0(v0: i32, v1: i32):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = iadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i32sv_splat_add(1234, 8765) == [9999 9999 9999 9999]
+
+function %i64x2_splat_add(i64, i64) -> i64x2 {
+  gv0 = dyn_scale_target_const.i64x2
+  dt0 = i64x2*gv0
+
+block0(v0: i64, v1: i64):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = iadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i64x2_splat_add(4321, 8765) == [13086 13086]
+
+function %i8x16_splat_sub(i8, i8) -> i8x16 {
+  gv0 = dyn_scale_target_const.i8x16
+  dt0 = i8x16*gv0
+
+block0(v0: i8, v1: i8):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = isub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i8x16_splat_sub(127, 126) == [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
+
+function %i16x8_splat_sub(i16, i16) -> i16x8 {
+  gv0 = dyn_scale_target_const.i16x8
+  dt0 = i16x8*gv0
+
+block0(v0: i16, v1: i16):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = isub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i16x8_splat_sub(12345, 6789) == [5556 5556 5556 5556 5556 5556 5556 5556]
+
+function %i32x4_splat_sub(i32, i32) -> i32x4 {
+  gv0 = dyn_scale_target_const.i32x4
+  dt0 = i32x4*gv0
+
+block0(v0: i32, v1: i32):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = isub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i32x4_splat_sub(1, 3) == [-2 -2 -2 -2]
+
+function %i64x2_splat_sub(i64, i64) -> i64x2 {
+  gv0 = dyn_scale_target_const.i64x2
+  dt0 = i64x2*gv0
+
+block0(v0: i64, v1: i64):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = isub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i64x2_splat_sub(255, 65535) == [-65280 -65280]
+
+function %i8x16_splat_mul(i8, i8) -> i8x16 {
+  gv0 = dyn_scale_target_const.i8x16
+  dt0 = i8x16*gv0
+
+block0(v0: i8, v1: i8):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = imul v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i8x16_splat_mul(15, 15) == [225 225 225 225 225 225 225 225 225 225 225 225 225 225 225 225]
+
+function %i16x8_splat_mul(i16, i16) -> i16x8 {
+  gv0 = dyn_scale_target_const.i16x8
+  dt0 = i16x8*gv0
+
+block0(v0: i16, v1: i16):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = imul v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i16x8_splat_mul(135, 246) == [33210 33210 33210 33210 33210 33210 33210 33210]
+
+function %i32x4_splat_mul(i32, i32) -> i32x4 {
+  gv0 = dyn_scale_target_const.i32x4
+  dt0 = i32x4*gv0
+
+block0(v0: i32, v1: i32):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = imul v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %i32x4_splat_mul(2, 3) == [6 6 6 6]
+
+function %f32x4_splat_add(f32, f32) -> f32x4 {
+  gv0 = dyn_scale_target_const.f32x4
+  dt0 = f32x4*gv0
+
+block0(v0: f32, v1: f32):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = fadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %f32x4_splat_add(0x1.2, 0x3.4) == [0x4.6 0x4.6 0x4.6 0x4.6]
+
+function %f64x2_splat_add(f64, f64) -> f64x2 {
+  gv0 = dyn_scale_target_const.f64x2
+  dt0 = f64x2*gv0
+
+block0(v0: f64, v1: f64):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = fadd v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %f64x2_splat_add(0x1.0, 0x2.0) == [0x3.0 0x3.0]
+
+function %f32x4_splat_sub(f32, f32) -> f32x4 {
+  gv0 = dyn_scale_target_const.f32x4
+  dt0 = f32x4*gv0
+
+block0(v0: f32, v1: f32):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = fsub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %f32x4_splat_sub(0x1.2, 0x3.4) == [-0x2.2 -0x2.2 -0x2.2 -0x2.2]
+
+function %f64x2_splat_sub(f64, f64) -> f64x2 {
+  gv0 = dyn_scale_target_const.f64x2
+  dt0 = f64x2*gv0
+
+block0(v0: f64, v1: f64):
+  v2 = splat.dt0 v0
+  v3 = splat.dt0 v1
+  v4 = fsub v2, v3
+  v5 = extract_vector v4, 0
+  return v5
+}
+; run: %f64x2_splat_sub(0x1.0, 0x3.0) == [-0x2.0 -0x2.0]