From fd412b49e1d9325c6df62272d2322362aa88f3dc Mon Sep 17 00:00:00 2001 From: Jakob Stoklund Olesen Date: Mon, 21 Nov 2016 13:49:15 -0800 Subject: [PATCH] Start a design document for the Cretonne register allocator. --- cranelift/docs/index.rst | 1 + cranelift/docs/regalloc.rst | 232 ++++++++++++++++++++++++++++++++++++ 2 files changed, 233 insertions(+) create mode 100644 cranelift/docs/regalloc.rst diff --git a/cranelift/docs/index.rst b/cranelift/docs/index.rst index 8df4042db3..301346a9e3 100644 --- a/cranelift/docs/index.rst +++ b/cranelift/docs/index.rst @@ -9,6 +9,7 @@ Contents: langref metaref testing + regalloc compare-llvm Indices and tables diff --git a/cranelift/docs/regalloc.rst b/cranelift/docs/regalloc.rst new file mode 100644 index 0000000000..18057c2511 --- /dev/null +++ b/cranelift/docs/regalloc.rst @@ -0,0 +1,232 @@ +******************************* +Register Allocation in Cretonne +******************************* + +.. default-domain:: cton +.. highlight:: rust + +Cretonne uses a *decoupled, SSA-based* register allocator. Decoupled means that +register allocation is split into two primary phases: *spilling* and +*coloring*. SSA-based means that the code stays in SSA form throughout the +register allocator, and in fact is still in SSA form after register allocation. + +Before the register allocator is run, all instructions in the function must be +*legalized*, which means that every instruction has an entry in the +``encodings`` table. The encoding entries also provide register class +constraints on the instruction's operands that the register allocator must +satisfy. + +After the register allocator has run, the ``locations`` table provides a +register or stack slot location for all SSA values used by the function. The +register allocator may have inserted :inst:`spill`, :inst:`fill`, and +:inst:`copy` instructions to make that possible. + +SSA-based register allocation +============================= + +The phases of the SSA-based register allocator are: + +Liveness analysis + For each SSA value, determine exactly where it is live. + +Spilling + The process of deciding which SSA values go in a stack slot and which + values go in a register. The spilling phase can also split live ranges by + inserting :inst:`copy` instructions, or transform the code in other ways to + reduce the number of values kept in registers. + + After spilling, the number of live register values never exceeds the number + of available registers. + +Coloring + The process of assigning specific registers to the live values. It's a + property of SSA form that this can be done in a linear scan of the + dominator tree without causing any additional spills. + +EBB argument fixup + The coloring phase does not guarantee that EBB arguments are placed in the + correct registers and/or stack slots before jumping to the EBB. It will + try its best, but not making this guarantee is essential to the speed of + the coloring phase. (EBB arguments correspond to PHI nodes in traditional + SSA form). + + The argument fixup phase inserts 'shuffle code' before jumps and branches + to place the argument values in their expected locations. + +The contract between the spilling and coloring phases is that the number of +values in registers never exceeds the number of available registers. This +sounds simple enough in theory, but in pratice there are some complications. + +Real-world complications to SSA coloring +---------------------------------------- + +In practice, instruction set architectures don't have "K interchangable +registers", and register pressure can't be measured with a single number. There +are complications: + +Different register banks + Most ISAs separate integer registers from floating point registers, and + instructions require their operands to come from a specific bank. This is a + fairly simple problem to deal with since the register banks are completely + disjoint. We simply count the number of integer and floating-point values + that are live independently, and make sure that each number does not exceed + the size of their respective register banks. + +Instructions with fixed operands + Some instructions use a fixed register for an operand. This happens on the + Intel ISAs: + + - Dynamic shift and rotate instructions take the shift amount in CL. + - Division instructions use RAX and RDX for both input and output operands. + - Wide multiply instructions use fixed RAX and RDX registers for input and + output operands. + - A few SSE variable blend instructions use a hardwired XMM0 input operand. + +Operands constrained to register subclasses + Some instructions can only use a subset of the registers for some operands. + For example, the ARM NEON vmla (scalar) instruction requires the scalar + operand to be located in D0-15 or even D0-7, depending on the data type. + The other operands can be from the full D0-31 register set. + +ABI boundaries + Before making a function call, arguments must be placed in specific + registers and stack locations determined by the ABI, and return values + appear in fixed registers. + + Some registers can be clobbered by the call and some are saved by the + callee. In some cases, only the low bits of a register are saved by the + callee. For example, ARM64 callees save only the low 64 bits of v8-15, and + Win64 callees only save the low 128 bits of AVX registers. + + ABI boundaries also affect the location of arguments to the entry block and + return values passed to the :inst:`return` instruction. + +Aliasing registers + Different registers sometimes share the same bits in the register bank. + This can make it difficult to measure register pressure. For example, the + Intel registers RAX, EAX, AX, AL, and AH overlap. + + If only one of the aliasing registers can be used at a time, the aliasing + doesn't cause problems since the registers can simply be counted as one + unit. + +Early clobbers + Sometimes an instruction requires that the register used for an output + operand does not alias any of the input operands. This happens for inline + assembly and in some other special cases. + + +Liveness Analysis +================= + +Both spilling and coloring need to know exactly where SSA values are live. The +liveness analysis computes this information. + +The data structure representing the live range of a value uses the linear +layout of the function. All instructions and EBB headers are assigned a +*program position*. A starting point for a live range can be one of the +following: + +- The instruction where the value is defined. +- The EBB header where the value is an EBB argument. +- An EBB header where the value is live-in because it was defined in a + dominating block. + +The ending point of a live range can be: + +- The last instruction to use the value. +- A branch or jump to an EBB where the value is live-in. + +When all the EBBs in a function are laid out linearly, the live range of a +value doesn't have to be a contiguous interval, although it will be in a +majority of cases. There can be holes in the linear live range. + +The live range of an SSA value is represented as: + +- The earliest program point where the value is live. +- The latest program point where the value is live. +- A (often empty) list of holes, sorted in program order. + +Any value that is only used inside a single EBB will have a live range without +holes. Some values are live across large parts of the function, and this can +often be represented with very few holes. It is important that the live range +data structure doesn't have to grow linearly with the number of EBBs covered by +a live range. + +This representation is very similar to LLVM's ``LiveInterval`` data structure +with a few important differences: + +- The Cretonne ``LiveRange`` only covers a single SSA value, while LLVM's + ``LiveInterval`` represents the union of multiple related SSA values in a + virtual register. This makes Cretonne's representation smaller because + individual segments don't have to annotated with a value number. +- Cretonne stores the min and max program points separately from a list of + holes, while LLVM stores an array of segments. The two representations are + equivalent, but Cretonne optimizes for the common case of a single contiguous + interval. +- LLVM represents a program point as ``SlotIndex`` which holds a pointer to a + 32-byte ``IndexListEntry`` struct. The entries are organized in a double + linked list that mirrors the ordering of instructions in a basic block. This + allows 'tombstone' program points corresponding to instructions that have + been deleted. + + Cretonne uses a 32-bit program point representation that encodes an + instruction or EBB number directly. There are no 'tombstones' for deleted + instructions, and no mirrored linked list of instructions. Live ranges must + be updated when instructions are deleted. + +A consequence of Cretonne's more compact representation is that two program +points can't be compared without the context of a function layout. + + +Spilling algorithm +================== + +There is no one way of implementing spilling, and different tradeoffs between +compilation time and code quality are possible. Any spilling algorithm will +need a way of tracking the register pressure so the colorability condition can +be satisfied. + +Coloring algorithm +================== + +The SSA coloring algorithm is based on a single observation: If two SSA values +interfere, one of the values must be live where the other value is defined. + +We visit the EBBs in a topological order such that all dominating EBBs are +visited before the current EBB. The instructions in an EBB are visited in a +top-down order, and each value define by the instruction is assigned an +available register. With this iteration order, every value that is live at an +instruction has already been assigned to a register. + +This coloring algorith works if the following condition holds: + + At every instruction, consider the values live through the instruction. No + matter how the live values have been assigned to registers, there must be + available registers of the right register classes available for the values + defined by the instruction. + +We'll need to modify this condition in order to deal with the real-world +complications. + +The coloring algorithm needs to keep track of the set of live values at each +instruction. At the top of an EBB, this set can be computed as the union of: + +- The set of live values before the immediately dominating branch or jump + instruction. The topological iteration order guarantees that this set is + available. Values whose live range indicate that they are not live-in to the + current EBB should be filtered out. +- The set of arguments to the EBB. These values should all be live-in, although + it is possible that some are dead and never used anywhere. + +For each live value, we also track its kill point in the current EBB. This is +the last instruction to use the value in the EBB. Values that are live-out +through the EBB terminator don't have a kill point. Note that the kill point +can be a branch to another EBB that uses the value, so the kill instruction +doesn't have to be a use of the value. + +When advancing past an instruction, the live set is updated: + +- Any values whose kill point is the current instruction are removed. +- Any values defined by the instruction are added, unless their kill point is + the current instruction. This corresponds to a dead def which has no uses.