This is a preparation for AVX512-FP16, where the broadcast size is not
just 32/64 bit depending solely on EVEX.W, but can also be 16 bit (with
EVEX.W=0). The broadcast size therefore needs two bits, but the evex
field only had one free bit left. Store broadcast size with the segment
for now. (This is not a good fit and is likely to change at some point.)
For byte-sized immediates, there are only two options for the operand
size: byte and the instruction's operand size. This knowledge allows to
remove the byte constraint from the set of fixed operand sizes.
This change doesn't really affect performance and is preparatory
work for AVX-512, where the memory operand size is required to compute
the compressed displacement.
This is realized by two changes: first, GP and vector operand size are
completely separated using one extra bit. If the operand size of an
instruction is derived from VEX.L (or EVEX.L'L), then the "opsize" bits
indicate how to derive a smaller vector size (half/quarter/eighth).
There is no instruction that uses an implicit register and an
VEX-encoded register at the same time. Thus, we can merge vexreg and
zeroreg in the instruction descriptor; the zeroreg value will be added
to the vex-operand (which is zero unless set by a VEX prefix).
This also frees 4 descriptor bits for use with AVX-512 (which will
probably need 1-2 additional unused bits, probably from the type).
To avoid GCC warnings when building with `-Os`:
warning: inlining failed in call to 'table_walk': call is
unlikely and code size would grow [-Winline]
I don't know if this causes a performance regression when optimizing for
speed instead of size, but perhaps there's a different way we can help
the compiler make this decision in such cases.
Where size_t is only 32 bits wide, and we end up losing the upper bits.
GCC catches this and emits a warning such as:
warning: left shift count >= width of type [-Wshift-count-overflow]
The AMD64 instructions VPERMIL2PS and VPERMIL2PD (currently not
supported) encode a fifth immediate operand in the lower bits of the
re-purposed immediate. Expose this value in any case so that no
information gets lost during decoding.
- 3DNow! instructions have a trailing immediate byte which indicates the
opcode. Decoding this with the existing table structure requires more
effort (in particular, a new lookup table after decoding ModRM would
be required). Given that AMD even removed 3DNow! over 10 years ago, it
appears unlikely that this will ever be fully supported. Adding the
RMI-encoded pseudo-instruction "3DNOW" just to support that opcode.
- FEMMS is a legacy 3DNow! instruction.
- EXTRQ/INSERTQ are instructions with an "unusual" encoding and
operation mode. This is another instance of 16-bit immediates.
- SVM (AMD's variant of VMX) and SNP instructions are AMD-only.