Commit Graph

103 Commits

Author SHA1 Message Date
Alexis Engelke
4abad24610 decode: Support half vector size
This is realized by two changes: first, GP and vector operand size are
completely separated using one extra bit. If the operand size of an
instruction is derived from VEX.L (or EVEX.L'L), then the "opsize" bits
indicate how to derive a smaller vector size (half/quarter/eighth).
2022-11-30 09:32:02 +01:00
Alexis Engelke
f565f09f9d instrs: Clearly separate vector and GP opsize
Now, an instruction cannot refer to the GP operand size and the vector
operand size at the same time. This isn't necessary, all necessary
distinguishing could also be achieved manually using W0/W1/66 selectors.
2022-11-27 18:07:44 +01:00
Alexis Engelke
fe6fe0ffc1 instrs: Add CMPCCXADD 2022-11-27 13:39:37 +01:00
Alexis Engelke
1c04ded9b7 decode: Merge zeroreg and vexreg
There is no instruction that uses an implicit register and an
VEX-encoded register at the same time. Thus, we can merge vexreg and
zeroreg in the instruction descriptor; the zeroreg value will be added
to the vex-operand (which is zero unless set by a VEX prefix).

This also frees 4 descriptor bits for use with AVX-512 (which will
probably need 1-2 additional unused bits, probably from the type).
2022-11-20 15:25:37 +01:00
Alexis Engelke
1f0ef1ba0c parseinstr: Remove duplicate encode mnem generator 2022-02-26 10:11:12 +01:00
Alexis Engelke
6b8c2968c1 encode2: Add new encoder API, one func per instr.
This is an *experimental* (read: unstable) API which exposes encoding
functionality as one function per instruction. This makes the encoding
process itself significantly faster, at the cost of a much larger binary
size (~1 MiB of code, no data) and much higher compilation time.
2022-02-20 17:21:04 +01:00
Alexis Engelke
2fd83903cf meson,parseinstrs: Make decode and encode optional 2022-02-20 17:15:21 +01:00
Alexis Engelke
87fe6314b8 instrs: Disambiguate instr type by mnemonic 2022-02-20 15:38:12 +01:00
Alexis Engelke
e78a89b610 instrs: Annotate 67h and segment override usage
Some instructions honor an address-size override or a segment override,
even in the absence of a directly encoded memory operand.

These annotations are not yet used, but may be used in future to
optimize the size of encoded instructions.
2022-01-06 10:31:09 +01:00
Alexis Engelke
49984c940c parseinstrs: Encode struct in InstrDesc 2021-09-14 18:15:38 +02:00
Alexis Engelke
4193ef96bf parseinstrs: Generate more statistics 2021-09-13 17:44:00 +02:00
Alexis Engelke
e5cdc57743 parseinstrs: Minor simplification 2021-09-13 17:43:12 +02:00
Alexis Engelke
7229f13742 parseinstrs: Don't generate 16-bit version for I66 2021-09-13 17:28:49 +02:00
Alexis Engelke
effc0c7e49 parseinstrs: Fold trie layers with only one child 2021-09-13 17:27:47 +02:00
Alexis Engelke
71c0daf581 instrs: Change operand format
This changes the instruction description format:

- Use Intel/AMD terminology for describing operands (where applicable)
- Group instructions by ISA extension
- Indicate read/written status flags
2021-09-13 17:26:43 +02:00
Alexis Engelke
e41d6c26f8 parseinstrs: Make superstring function generic 2021-09-11 13:19:17 +02:00
Alexis Engelke
1fcacdeda7 parseinstrs: Optimize mnemonic compression
As the formatter no longer demands a null-terminated string, mnemonics
can arbitarily overlap and therefore save space.

This is the shortest superstring problem, which is NP-hard. This is
currently approximated with a greedy heuristic.
2021-09-11 13:05:34 +02:00
Alexis Engelke
99a1fbeee1 format: Major refactoring for performance 2021-05-30 14:25:38 +02:00
Alexis Engelke
50f052488d decode: More precise register types 2021-04-02 17:20:23 +02:00
Alexis Engelke
4185d7b2d6 encode: Support FD/TD encodings 2021-04-02 11:26:12 +02:00
Alexis Engelke
2d9587bc16 encode: Fix encoding of memory operand
When a modrm_idx is used without a ModRM being present, the encoder
attempted to encode memory operands using O/OA/AO encodings.
2021-04-02 10:54:04 +02:00
Alexis Engelke
5faa90a292 encode: Support RVMR encoding 2021-03-23 12:55:43 +01:00
Alexis Engelke
4f2366afd1 instrs: Add VIA PadLock and AMD RDPRU 2021-01-23 16:47:30 +01:00
Alexis Engelke
363698db3b parseinstrs: Move decode table gen to new function 2021-01-23 13:59:59 +01:00
Alexis Engelke
85fdaa3a9b instrs: Remove incorrect NFx specifiers
The new trie implementation is more flexible and allows omitting
prefixes even with a ModRM specifier in the opcode. Use this flexibility
to simplify instruction descriptions.
2021-01-23 13:25:23 +01:00
Alexis Engelke
dc399390a4 parseinstrs: Refactor mapping of opcode to Trie 2021-01-23 13:25:23 +01:00
Alexis Engelke
13a2456458 parseinstrs: Simplify trie implementation 2021-01-23 13:25:23 +01:00
Alexis Engelke
43910a6227 parseinstrs: Avoid redundant encoding of InstrDesc 2021-01-23 13:25:23 +01:00
Alexis Engelke
09d3886577 parseinstrs: Move regtype encoding to InstrDesc 2021-01-23 13:25:23 +01:00
Alexis Engelke
d6278de812 parseinstrs: Use tuples/ints for indexing in trie
This avoids useless internal string formatting.
2021-01-23 13:25:23 +01:00
Alexis Engelke
1390bae341 parseinstr: Create optype string in descriptor
The raw encoding representation is now only used in InstrDesc.
2021-01-23 13:25:23 +01:00
Alexis Engelke
801fe4bc43 parseinstrs: Generalize immediate size computation 2021-01-23 13:25:23 +01:00
Alexis Engelke
62018556a1 parseinstrs: Simplify operand kind parsing 2021-01-23 13:25:23 +01:00
Alexis Engelke
bd611902b0 parseinstrs: Add separate ModRM indicator to desc
Some instructions have no ModRM operand and no extended opcode but still
consume a ModRM byte.
2021-01-23 13:25:23 +01:00
Alexis Engelke
80df5ff47c instrs: Add reserved NOP/PREFETCH as weak opcodes 2021-01-10 16:53:27 +01:00
Alexis Engelke
1458bf9673 encode: Support VEX-encoded instructions 2021-01-10 16:03:40 +01:00
Alexis Engelke
c050b34ff9 instrs: Add support for undocumented instructions
Undocumented instruction are not decoded by default.

- SALC: undocumented in any recent manual and unsupported by newer
  Intel CPUs. Including as listed by [1,2].
- Undocumented FPU instructions: see [2].

[1]: http://www.rcollins.org/secrets/opcodes/SALC.html
[2]: https://github.com/xoreaxeaxeax/sandsifter/issues/33
2021-01-10 15:04:37 +01:00
Alexis Engelke
862b6d285c instrs: Minor operand size fixes 2021-01-10 14:13:44 +01:00
Alexis Engelke
af9188e267 parseinstrs: Respect mem-only/reg-only encodings 2021-01-10 12:02:58 +01:00
Alexis Engelke
111769832f format: Properly output VSIB encodings 2021-01-08 10:37:13 +01:00
Alexis Engelke
3fdbd70153 encode: Fix erroneous encoding of high registers 2021-01-07 10:03:17 +01:00
Alexis Engelke
db183ee6f9 meson: Check compiler options and Python version
Thanks to William Woodruff for pointing out that -Wcast-align=strict is
a GCC-only option, which causes build errors (instead of just
complaining about an unsupported warning option).
2021-01-05 20:21:44 +01:00
Alexis Engelke
44808e7b1a format: Format instructions with Intel syntax 2021-01-03 21:18:57 +01:00
Alexis Engelke
d2bf961b77 instrs: Properly handle PUSH/POP of SEG registers 2021-01-03 20:08:34 +01:00
Alexis Engelke
3a3a284f6f parseinstrs: Improve performance 2021-01-03 20:08:34 +01:00
Alexis Engelke
5a77c0e6eb parseinstrs: Use suffix tree to reduce mnem size
This brings slight size improvements, although due to SSE/MMX
instruction name prefixes, benefits are rather small (~50 bytes).
2021-01-03 20:08:30 +01:00
Alexis Engelke
fd80706f54 decode: Store instruction descriptors separately 2020-11-22 22:27:43 +01:00
Alexis Engelke
6fe5500444 instrs: Force RIP access to 64-bit and fix XBEGIN 2020-11-22 15:13:52 +01:00
Alexis Engelke
f9bba6289e instrs: Annotate only-mem and only-reg in opcode 2020-11-22 11:34:55 +01:00
Alexis Engelke
62b0420147 parseinstr: Simplify opcode naming scheme 2020-11-09 09:47:36 +01:00