******************************** Cretonne Meta Language Reference ******************************** .. default-domain:: py .. highlight:: python .. module:: cretonne The Cretonne meta language is used to define instructions for Cretonne. It is a domain specific language embedded in Python. This document describes the Python modules that form the embedded DSL. The meta language descriptions are Python modules under the :file:`meta` top-level directory. The descriptions are processed in two steps: 1. The Python modules are imported. This has the effect of building static data structures in global variables in the modules. These static data structures use the classes in the :mod:`cretonne` module to describe instruction sets and other properties. 2. The static data structures are processed to produce Rust source code and constant tables. The main driver for this source code generation process is the :file:`meta/build.py` script which is invoked as part of the build process if anything in the :file:`meta` directory has changed since the last build. Settings ======== Settings are used by the environment embedding Cretonne to control the details of code generation. Each setting is defined in the meta language so a compact and consistent Rust representation can be generated. Shared settings are defined in the :mod:`cretonne.settings` module. Some settings are specific to a target ISA, and defined in a `settings` module under the appropriate :file:`meta/isa/*` directory. Settings can take boolean on/off values, small numbers, or explicitly enumerated symbolic values. Each type is represented by a sub-class of :class:`Setting`: .. inheritance-diagram:: Setting BoolSetting NumSetting EnumSetting :parts: 1 .. autoclass:: Setting .. autoclass:: BoolSetting .. autoclass:: NumSetting .. autoclass:: EnumSetting All settings must belong to a *group*, represented by a :class:`SettingGroup` object. .. autoclass:: SettingGroup Normally, a setting group corresponds to all settings defined in a module. Such a module looks like this:: group = SettingGroup('example') foo = BoolSetting('use the foo') bar = BoolSetting('enable bars', True) opt = EnumSetting('optimization level', 'Debug', 'Release') group.close(globals()) Instruction descriptions ======================== New instructions are defined as instances of the :class:`Instruction` class. As instruction instances are created, they are added to the currently open :class:`InstructionGroup`. .. autoclass:: InstructionGroup :members: The basic Cretonne instruction set described in :doc:`langref` is defined by the Python module :mod:`cretonne.base`. This module has a global variable :data:`cretonne.base.instructions` which is an :class:`InstructionGroup` instance containing all the base instructions. .. autoclass:: Instruction An instruction is defined with a set of distinct input and output operands which must be instances of the :class:`Operand` class. .. autoclass:: Operand Cretonne uses two separate type systems for operand kinds and SSA values. Type variables -------------- Instruction descriptions can be made polymorphic by using :class:`Operand` instances that refer to a *type variable* instead of a concrete value type. Polymorphism only works for SSA value operands. Other operands have a fixed operand kind. .. autoclass:: TypeVar :members: If multiple operands refer to the same type variable they will be required to have the same concrete type. For example, this defines an integer addition instruction:: Int = TypeVar('Int', 'A scalar or vector integer type', ints=True, simd=True) a = Operand('a', Int) x = Operand('x', Int) y = Operand('y', Int) iadd = Instruction('iadd', 'Integer addition', ins=(x, y), outs=a) The type variable `Int` is allowed to vary over all scalar and vector integer value types, but in a given instance of the `iadd` instruction, the two operands must have the same type, and the result will be the same type as the inputs. There are some practical restrictions on the use of type variables, see :ref:`restricted-polymorphism`. Immediate operands ------------------ Immediate instruction operands don't correspond to SSA values, but have values that are encoded directly in the instruction. Immediate operands don't have types from the :class:`cretonne.ValueType` type system; they often have enumerated values of a specific type. The type of an immediate operand is indicated with an instance of :class:`ImmediateKind`. .. autoclass:: ImmediateKind .. automodule:: cretonne.immediates :members: .. currentmodule:: cretonne Entity references ----------------- Instruction operands can also refer to other entties in the same function. This can be extended basic blocks, or entities declared in the function preamble. .. autoclass:: EntityRefKind .. automodule:: cretonne.entities :members: .. currentmodule:: cretonne Value types ----------- Concrete value types are represented as instances of :class:`cretonne.ValueType`. There are subclasses to represent scalar and vector types. .. autoclass:: ValueType .. inheritance-diagram:: ValueType ScalarType VectorType IntType FloatType BoolType :parts: 1 .. autoclass:: ScalarType :members: .. autoclass:: VectorType :members: .. autoclass:: IntType :members: .. autoclass:: FloatType :members: .. autoclass:: BoolType :members: .. automodule:: cretonne.types :members: .. currentmodule:: cretonne There are no predefined vector types, but they can be created as needed with the :func:`ScalarType.by` function. Instruction representation ========================== The Rust in-memory representation of instructions is derived from the instruction descriptions. Part of the representation is generated, and part is written as Rust code in the `cretonne.instructions` module. The instruction representation depends on the input operand kinds and whether the instruction can produce multiple results. .. autoclass:: OperandKind .. inheritance-diagram:: OperandKind ImmediateKind EntityRefKind Since all SSA value operands are represented as a `Value` in Rust code, value types don't affect the representation. Two special operand kinds are used to represent SSA values: .. autodata:: value .. autodata:: variable_args When an instruction description is created, it is automatically assigned a predefined instruction format which is an instance of :class:`InstructionFormat`: .. autoclass:: InstructionFormat .. _restricted-polymorphism: Restricted polymorphism ----------------------- The instruction format strictly controls the kinds of operands on an instruction, but it does not constrain value types at all. A given instruction description typically does constrain the allowed value types for its value operands. The type variables give a lot of freedom in describing the value type constraints, in practice more freedom than what is needed for normal instruction set architectures. In order to simplify the Rust representation of value type constraints, some restrictions are imposed on the use of type variables. A polymorphic instruction has a single *controlling type variable*. For a given opcode, this type variable must be the type of the first result or the type of the input value operand designated by the `typevar_operand` argument to the :py:class:`InstructionFormat` constructor. By default, this is the first value operand, which works most of the time. The value types of instruction results must be one of the following: 1. A concrete value type. 2. The controlling type variable. 3. A type variable derived from the controlling type variable. This means that all result types can be computed from the controlling type variable. Input values to the instruction are allowed a bit more freedom. Input value types must be one of: 1. A concrete value type. 2. The controlling type variable. 3. A type variable derived from the controlling type variable. 4. A free type variable that is not used by any other operands. This means that the type of an input operand can either be computed from the controlling type variable, or it can vary independently of the other operands. Encodings ========= Encodings describe how Cretonne instructions are mapped to binary machine code for the target architecture. After the lealization pass, all remaining instructions are expected to map 1-1 to native instruction encodings. Cretonne instructions that can't be encoded for the current architecture are called :term:`illegal instruction`\s. Some instruction set architectures have different :term:`CPU mode`\s with incompatible encodings. For example, a modern ARMv8 CPU might support three different CPU modes: *A64* where instructions are encoded in 32 bits, *A32* where all instuctions are 32 bits, and *T32* which has a mix of 16-bit and 32-bit instruction encodings. These are incompatible encoding spaces, and while an :cton:inst:`iadd` instruction can be encoded in 32 bits in each of them, it's not the same 32 bits. It's a judgement call if CPU modes should be modelled as separate targets, or as sub-modes of the same target. In the ARMv8 case, the different register banks means that it makes sense to model A64 as a separate target architecture, while A32 and T32 are CPU modes of the 32-bit ARM target. In a given CPU mode, there may be multiple valid encodings of the same instruction. Both RISC-V and ARMv8's T32 mode have 32-bit encodings of all instructions with 16-bit encodings available for some opcodes if certain constraints are satisfied. .. autoclass:: CPUMode Encodings are guarded by :term:`sub-target predicate`\s. For example, the RISC-V "C" extension which specifies the compressed encodings may not be supported, and a predicate would be used to disable all of the 16-bit encodings in that case. This can also affect whether an instruction is legal. For example, x86 has a predicate that controls the SSE 4.1 instruction encodings. When that predicate is false, the SSE 4.1 instructions are not available. Encodings also have a :term:`instruction predicate` which depends on the specific values of the instruction's immediate fields. This is used to ensure that immediate address offsets are within range, for example. The instructions in the base Cretonne instruction set can often represent a wider range of immediates than any specific encoding. The fixed-size RISC-style encodings tend to have more range limitations than CISC-style variable length encodings like x86. The diagram below shows the relationship between the classes involved in specifying instruction encodings: .. digraph:: encoding node [shape=record] EncRecipe -> SubtargetPred EncRecipe -> InstrFormat EncRecipe -> InstrPred Encoding [label="{Encoding|Opcode+TypeVars}"] Encoding -> EncRecipe [label="+EncBits"] Encoding -> CPUMode Encoding -> SubtargetPred Encoding -> InstrPred Encoding -> Opcode Opcode -> InstrFormat CPUMode -> Target An :py:class:`Encoding` instance specifies the encoding of a concrete instruction. The following properties are used to select instructions to be encoded: - An opcode, i.e. :cton:inst:`iadd_imm`, that must match the instruction's opcode. - Values for any type variables if the opcode represents a polymorphic instruction. - An :term:`instruction predicate` that must be satisfied by the instruction's immediate operands. - The CPU mode that must be active. - A :term:`sub-target predicate` that must be satisfied by the currently active sub-target. - :term:`Register constraint`\s that must be satisfied by the instruction's value operands and results. An encoding specifies an *encoding recipe* along with some *encoding bits* that the recipe can use for native opcode fields etc. The encoding recipe has additional constraints that must be satisfied: - An :py:class:`InstructionFormat` that must match the format required by the opcodes of any encodings that use this recipe. - An additional :term:`instruction predicate`. - An additional :term:`sub-target predicate`. The additional predicates in the :py:class:`EncRecipe` are merged with the per-encoding predicates when generating the encoding matcher code. Often encodings only need the recipe predicates. .. autoclass:: EncRecipe Targets ======= Cretonne can be compiled with support for multiple target instruction set architectures. Each ISA is represented by a :py:class:`cretonne.TargetISA` instance. .. autoclass:: TargetISA The definitions for each supported target live in a package under :file:`meta/isa`. .. automodule:: isa :members: .. automodule:: isa.riscv Glossary ======== .. glossary:: Illegal instruction An instruction is considered illegal if there is no encoding available for the current CPU mode. The legality of an instruction depends on the value of :term:`sub-target predicate`\s, so it can't always be determined ahead of time. CPU mode Every target defines one or more CPU modes that determine how the CPU decodes binary instructions. Some CPUs can switch modes dynamically with a branch instruction (like ARM/Thumb), while other modes are process-wide (like x86 32/64-bit). Sub-target predicate A predicate that depends on the current sub-target configuration. Examples are "Use SSE 4.1 instructions", "Use RISC-V compressed encodings". Sub-target predicates can depend on both detected CPU features and configuration settings. Instruction predicate A predicate that depends on the immediate fields of an instruction. An example is "the load address offset must be a 10-bit signed integer". Instruction predicates do not depend on the registers selected for value operands. Register constraint Value operands and results correspond to machine registers. Encodings may constrain operands to either a fixed register or a register class. There may also be register constraints between operands, for example some encodings require that the result register is one of the input registers.