x64: Add more support for more AVX instructions (#5931)

* x64: Add a smattering of lowerings for `shuffle` specializations (#5930)

* x64: Add lowerings for `punpck{h,l}wd`

Add some special cases for `shuffle` for more specialized x86
instructions.

* x64: Add `shuffle` lowerings for `pshufd`

This commit adds special-cased lowerings for the x64 `shuffle`
instruction when the `pshufd` instruction alone is necessary. This is
possible when the shuffle immediate permutes 32-bit values within one of
the vector inputs of the `shuffle` instruction, but not both.

* x64: Add shuffle lowerings for `punpck{h,l}{q,}dq`

This adds specific permutations for some x86 instructions which
specifically interleave high/low bytes for 32 and 64-bit values. This
corresponds to the preexisting specific lowerings for interleaving 8 and
16-bit values.

* x64: Add `shuffle` lowerings for `shufps`

This commit adds targeted lowerings for the `shuffle` instruction that
match the pattern that `shufps` supports. The `shufps` instruction
selects two elements from the first vector and two elements from the
second vector which means while it's not generally applicable it should
still be more useful than the catch-all lowering of `shuffle`.

* x64: Add shuffle support for `pshuf{l,h}w`

This commit adds special lowering cases for these instructions which
permute 16-bit values within a 128-bit value either within the upper or
lower half of the 128-bit value.

* x64: Specialize `shuffle` with an all-zeros immediate

Instead of loading the all-zeros immediate from a rip-relative address
at the end of the function instead generate a zero with a `pxor`
instruction and then use `pshufb` to do the broadcast.

* Review comments

* x64: Add an AVX encoding for the `pshufd` instruction

This will benefit from lack of need for alignment vs the `pshufd`
instruction if working with a memory operand and additionally, as I've
just learned, this reduces dependencies between instructions because the
`v*` instructions zero the upper bits as opposed to preserving them
which could accidentally create false dependencies in the CPU between
instructions.

* x64: Add more support for AVX loads/stores

This commit adds VEX-encoded versions of instructions such as
`mov{ss,sd,upd,ups,dqu}` for load and store operations. This also
changes some signatures so the `load` helpers specifically take a
`SyntheticAmode` argument which ended up doing a small refactoring of
the `*_regmove` variant used for `insertlane 0` into f64x2 vectors.

* x64: Enable using AVX instructions for zero regs

This commit refactors the internal ISLE helpers for creating zero'd
xmm registers to leverage the AVX support for all other instructions.
This moves away from picking opcodes to instead picking instructions
with a bit of reorganization.

* x64: Remove `XmmConstOp` as an instruction

All existing users can be replaced with usage of the `xmm_uninit_value`
helper instruction so there's no longer any need for these otherwise
constant operations. This additionally reduces manual usage of opcodes
in favor of instruction helpers.

* Review comments

* Update test expectations
This commit is contained in:
Alex Crichton
2023-03-09 17:57:42 -06:00
committed by GitHub
parent 1c3a1bda6c
commit 83f21e784a
22 changed files with 635 additions and 300 deletions

View File

@@ -1976,22 +1976,6 @@ pub(crate) fn emit(
}
}
Inst::XmmConstOp { op, dst } => {
let dst = allocs.next(dst.to_reg().to_reg());
emit(
&Inst::XmmRmR {
op: *op,
dst: Writable::from_reg(Xmm::new(dst).unwrap()),
src1: Xmm::new(dst).unwrap(),
src2: Xmm::new(dst).unwrap().into(),
},
allocs,
sink,
info,
state,
);
}
Inst::XmmRmRBlend {
op,
src1,
@@ -2181,6 +2165,7 @@ pub(crate) fn emit(
AvxOpcode::Vpunpckhdq => (LP::_66, OM::_0F, 0x6A),
AvxOpcode::Vpunpcklqdq => (LP::_66, OM::_0F, 0x6C),
AvxOpcode::Vpunpckhqdq => (LP::_66, OM::_0F, 0x6D),
AvxOpcode::Vmovsd => (LP::_F2, OM::_0F, 0x10),
_ => panic!("unexpected rmir vex opcode {op:?}"),
};
VexInstruction::new()
@@ -2385,6 +2370,23 @@ pub(crate) fn emit(
AvxOpcode::Vcvtps2pd => (LegacyPrefixes::None, OpcodeMap::_0F, 0x5A),
AvxOpcode::Vcvttpd2dq => (LegacyPrefixes::_66, OpcodeMap::_0F, 0xE6),
AvxOpcode::Vcvttps2dq => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x5B),
AvxOpcode::Vmovdqu => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x6F),
AvxOpcode::Vmovups => (LegacyPrefixes::None, OpcodeMap::_0F, 0x10),
AvxOpcode::Vmovupd => (LegacyPrefixes::_66, OpcodeMap::_0F, 0x10),
// Note that for `vmov{s,d}` the `inst.isle` rules should
// statically ensure that only `Amode` operands are used here.
// Otherwise the other encodings of `vmovss` are more like
// 2-operand instructions which this unary encoding does not
// have.
AvxOpcode::Vmovss => match &src {
RegisterOrAmode::Amode(_) => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x10),
_ => unreachable!(),
},
AvxOpcode::Vmovsd => match &src {
RegisterOrAmode::Amode(_) => (LegacyPrefixes::_F2, OpcodeMap::_0F, 0x10),
_ => unreachable!(),
},
_ => panic!("unexpected rmr_imm_vex opcode {op:?}"),
};
@@ -2412,6 +2414,7 @@ pub(crate) fn emit(
AvxOpcode::Vroundpd => (LegacyPrefixes::_66, OpcodeMap::_0F3A, 0x09),
AvxOpcode::Vpshuflw => (LegacyPrefixes::_F2, OpcodeMap::_0F, 0x70),
AvxOpcode::Vpshufhw => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x70),
AvxOpcode::Vpshufd => (LegacyPrefixes::_66, OpcodeMap::_0F, 0x70),
_ => panic!("unexpected rmr_imm_vex opcode {op:?}"),
};
@@ -2426,6 +2429,28 @@ pub(crate) fn emit(
.encode(sink);
}
Inst::XmmMovRMVex { op, src, dst } => {
let src = allocs.next(*src);
let dst = dst.with_allocs(allocs).finalize(state, sink);
let (prefix, map, opcode) = match op {
AvxOpcode::Vmovdqu => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x7F),
AvxOpcode::Vmovss => (LegacyPrefixes::_F3, OpcodeMap::_0F, 0x11),
AvxOpcode::Vmovsd => (LegacyPrefixes::_F2, OpcodeMap::_0F, 0x11),
AvxOpcode::Vmovups => (LegacyPrefixes::None, OpcodeMap::_0F, 0x11),
AvxOpcode::Vmovupd => (LegacyPrefixes::_66, OpcodeMap::_0F, 0x11),
_ => unimplemented!("Opcode {:?} not implemented", op),
};
VexInstruction::new()
.length(VexVectorLength::V128)
.prefix(prefix)
.map(map)
.opcode(opcode)
.rm(dst)
.reg(src.to_real_reg().unwrap().hw_enc())
.encode(sink);
}
Inst::XmmRmREvex {
op,
src1,