From bd72439fbc1a70a1f8de166a339155ad8b18b0b0 Mon Sep 17 00:00:00 2001 From: Jakob Stoklund Olesen Date: Fri, 29 Jul 2016 14:50:19 -0700 Subject: [PATCH] Document binary encodings. Describe the meta-language data structures that are built to represent instruction encodings. Begin a metaref glossary. --- docs/metaref.rst | 127 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 126 insertions(+), 1 deletion(-) diff --git a/docs/metaref.rst b/docs/metaref.rst index fc9c5e4e29..058cc6459f 100644 --- a/docs/metaref.rst +++ b/docs/metaref.rst @@ -203,11 +203,99 @@ This means that the type of an input operand can either be computed from the controlling type variable, or it can vary independently of the other operands. +Encodings +========= + +Encodings describe how Cretonne instructions are mapped to binary machine code +for the target architecture. After the lealization pass, all remaining +instructions are expected to map 1-1 to native instruction encodings. Cretonne +instructions that can't be encoded for the current architecture are called +:term:`illegal instruction`\s. + +Some instruction set architectures have different :term:`CPU mode`\s with +incompatible encodings. For example, a modern ARMv8 CPU might support three +different CPU modes: *A64* where instructions are encoded in 32 bits, *A32* +where all instuctions are 32 bits, and *T32* which has a mix of 16-bit and +32-bit instruction encodings. These are incompatible encoding spaces, and while +an :cton:inst:`iadd` instruction can be encoded in 32 bits in each of them, it's +not the same 32 bits. It's a judgement call if CPU modes should be modelled as +separate targets, or as sub-modes of the same target. In the ARMv8 case, the +different register banks means that it makes sense to model A64 as a separate +target architecture, while A32 and T32 are CPU modes of the 32-bit ARM target. + +In a given CPU mode, there may be multiple valid encodings of the same +instruction. Both RISC-V and ARMv8's T32 mode have 32-bit encodings of all +instructions with 16-bit encodings available for some opcodes if certain +constraints are satisfied. + +Encodings are guarded by :term:`sub-target predicate`\s. For example, the RISC-V +"C" extension which specifies the compressed encodings may not be supported, and +a predicate would be used to disable all of the 16-bit encodings in that case. +This can also affect whether an instruction is legal. For example, x86 has a +predicate that controls the SSE 4.1 instruction encodings. When that predicate +is false, the SSE 4.1 instructions are not available. + +Encodings also have a :term:`instruction predicate` which depends on the +specific values of the instruction's immediate fields. This is used to ensure +that immediate address offsets are within range, for example. The instructions +in the base Cretonne instruction set can often represent a wider range of +immediates than any specific encoding. The fixed-size RISC-style encodings tend +to have more range limitations than CISC-style variable length encodings like +x86. + +The diagram below shows the relationship between the classes involved in +specifying instruction encodings: + +.. digraph:: encoding + + node [shape=record] + CPUMode -> Target + EncRecipe -> CPUMode + EncRecipe -> SubtargetPred + EncRecipe -> InstrFormat + EncRecipe -> InstrPred + Encoding [label="{Encoding|Opcode+TypeVars}"] + Encoding -> EncRecipe [label="+EncBits"] + Encoding -> SubtargetPred + Encoding -> InstrPred + Encoding -> Opcode + Opcode -> InstrFormat + +An :py:class:`Encoding` instance specifies the encoding of a concrete +instruction. The following properties are used to select instructions to be +encoded: + +- An opcode, i.e. :cton:inst:`iadd_imm`, that must match the instruction's + opcode. +- Values for any type variables if the opcode represents a polymorphic + instruction. +- An :term:`instruction predicate` that must be satisfied by the instruction's + immediate operands. +- A :term:`sub-target predicate` that must be satisfied by the currently active + sub-target. +- :term:`Register constraint`\s that must be satisfied by the instruction's value + operands and results. + +An encoding specifies an *encoding recipe* along with some *encoding bits* that +the recipe can use for native opcode fields etc. The encoding recipe has +additional constraints that must be satisfied: + +- The CPU mode that must be active to enable encodings. +- An :py:class:`InstructionFormat` that must match the format required by the + opcodes of any encodings that use this recipe. +- An additional :term:`instruction predicate`. +- An additional :term:`sub-target predicate`. + +The additional predicates in the :py:class:`EncRecipe` are merged with the +per-encoding predicates when generating the encoding matcher code. Often +encodings only need the recipe predicates. + + Targets ======= Cretonne can be compiled with support for multiple target instruction set -architectures. Each ISA is represented by a :py:class`cretonne.Target` instance. +architectures. Each ISA is represented by a :py:class:`cretonne.Target` instance. .. autoclass:: Target @@ -218,3 +306,40 @@ The definitions for each supported target live in a package under :members: .. automodule:: target.riscv + + +Glossary +======== + +.. glossary:: + + Illegal instruction + An instruction is considered illegal if there is no encoding available + for the current CPU mode. The legality of an instruction depends on the + value of :term:`sub-target predicate`\s, so it can't always be + determined ahead of time. + + CPU mode + Every target defines one or more CPU modes that determine how the CPU + decodes binary instructions. Some CPUs can switch modes dynamically with + a branch instruction (like ARM/Thumb), while other modes are + process-wide (like x86 32/64-bit). + + Sub-target predicate + A predicate that depends on the current sub-target configuration. + Examples are "Use SSE 4.1 instructions", "Use RISC-V compressed + encodings". Sub-target predicates can depend on both detected CPU + features and configuration settings. + + Instruction predicate + A predicate that depends on the immediate fields of an instruction. An + example is "the load address offset must be a 10-bit signed integer". + Instruction predicates do not depend on the registers selected for value + operands. + + Register constraint + Value operands and results correspond to machine registers. Encodings may + constrain operands to either a fixed register or a register class. There + may also be register constraints between operands, for example some + encodings require that the result register is one of the input + registers.