It is a longer standing issue that some instructions like ADD, IMUL, and
SHL have multiple mnemonics for different encoding forms. This is a
relict from a time where such information was not stored in the
instruction decoding. This, however, is no longer the case and therefore
the extra mnemonics just increase the number of cases to be handled by
users.
Some instruction opcodes have an entirely different encoding when a VEX
prefix is present. For example, 0f41 is CMOVNO without mandatory
prefixes while VEX.NP.W0.L1.0f41 is KANDW with a mandatory prefix. To
avoid collisions, the VEX prefix is better handled as a completely
separate decode tree, at the cost of a slight increase in table size.
It turns out that in x86-64 mode, address and operand size overrides are
ignored by the processor. (Tested that on a real machine.)
Even libopcodes gets this wrong...
The formatter now includes the following information:
- Segment overrides
- Address-size overrides
- REP/REPNZ prefixes
- LOCK prefix
- High-byte registers (determined using presence of REX prefix)
It is possible to configure the build process such that decoding of 32
bit and 64 bit instructions can be chosen at runtime using an additional
parameter of the decode function. The header file is now entirely
architecture-independent and no longer required any previous defines.
Decoding x86-64 still requires a 64-bit pointer size.
This allows to decode x86-32 machine code on a 64-bit platform (but
not vice versa). As a side-effect, we also get rid of pointer-size
detection for architecture selection.