Use "vector" or "sequence" in lieu of "list" in DESIGN.md, except when referring to linked lists.

From feedback from @julian-seward1. I had used "list" in a more generic sense, meaning ordered sequence of elements, while in a Rust context it can sometimes be confused with "linked list" specifically. These alternative terms are more precise.
2021-08-12 11:23:44 -07:00
parent 84285c26fb
commit a591535fe9
1 changed files with 85 additions and 85 deletions
--- a/doc/DESIGN.md
+++ b/doc/DESIGN.md
@@ -15,23 +15,23 @@ The toplevel API to regalloc2 consists of a single entry point `run()`
 that takes a register environment, which specifies all physical
 registers, and the input program. The function returns either an error
 or an `Output` struct that provides allocations for each operand and a
-list of additional instructions (moves, loads, stores) to insert.
+vector of additional instructions (moves, loads, stores) to insert.

 ## Register Environment

 The allocator takes a `MachineEnv` which specifies, for each of the
-two register classes `Int` and `Float`, a list of `PReg`s by index. A
+two register classes `Int` and `Float`, a vector of `PReg`s by index. A
 `PReg` is nothing more than the class and index within the class; the
 allocator does not need to know anything more.

-The `MachineEnv` provides a list of preferred and non-preferred
-physical registers per class. Any register not on either list will not
-be allocated. Usually, registers that do not need to be saved in the
-prologue if used (i.e., caller-save registers) are given in the
-"preferred" list. The environment also provides exactly one scratch
+The `MachineEnv` provides a vector of preferred and non-preferred
+physical registers per class. Any register not in either vector will
+not be allocated. Usually, registers that do not need to be saved in
+the prologue if used (i.e., caller-save registers) are given in the
+"preferred" vector. The environment also provides exactly one scratch
 register per class. This register must not be in the preferred or
-non-preferred lists, and is used whenever a set of moves that need to
-occur logically in parallel have a cycle (for a simple example,
+non-preferred vectors, and is used whenever a set of moves that need
+to occur logically in parallel have a cycle (for a simple example,
 consider a swap `r0, r1 := r1, r0`).

 With some more work, we could potentially remove the need for the
@@ -41,17 +41,17 @@ the client ("swap"), but we have not pursued this.
 ## CFG and Instructions

 The allocator operates on an input program that is in a standard CFG
-representation: the function body is a list of basic blocks, and each
-block has a list of instructions and zero or more successors. The
-allocator also requires the client to provide predecessors for each
-block, and these must be consistent with the successor
-lists.
+representation: the function body is a sequence of basic blocks, and
+each block has a sequence of instructions and zero or more
+successors. The allocator also requires the client to provide
+predecessors for each block, and these must be consistent with the
+successors.

 Instructions are opaque to the allocator except for a few important
 bits: (1) `is_ret` (is a return instruction); (2) `is_branch` (is a
 branch instruction); (3) `is_call` (is a call instruction, for
 heuristic purposes only), (4) `is_move` (is a move between registers),
-and (5) a list of Operands, covered below. Every block must end in a
+and (5) a vector of Operands, covered below. Every block must end in a
 return or branch.

 Both instructions and blocks are named by indices in contiguous index
@@ -248,8 +248,8 @@ removed. However, it is very important for performance at the moment.
 ## Output

 The allocator produces two main data structures as output: an array of
-`Allocation`s and a list of edits. Some other data, such as stackmap
-slot info, is also provided.
+`Allocation`s and a sequence of edits. Some other data, such as
+stackmap slot info, is also provided.

 ### Allocations

@@ -266,7 +266,7 @@ In order to implement the necessary movement of data between
 allocations, the allocator needs to insert moves at various program
 points.

-The list of inserted moves contains tuples that name a program point
+The vector of inserted moves contains tuples that name a program point
 and an "edit". The edit is either a move, from one `Allocation` to
 another, or else a kind of metadata used by the checker to know which
 VReg is live in a given allocation at any particular time. The latter
@@ -304,44 +304,44 @@ standard backward iterative dataflow analysis and are exact; they do
 not over-approximate (this turns out to be important for performance,
 and is also necessary for correctness in the case of stackmaps).

-### Blockparam Lists: Source-Side and Dest-Side
+### Blockparam Vectors: Source-Side and Dest-Side

 The initialization stage scans the input program and produces two
-lists that represent blockparam flows from branches to destination
+vectors that represent blockparam flows from branches to destination
 blocks: `blockparam_ins` and `blockparam_outs`.

-These two lists are the first instance we will see of a recurring
-pattern: the lists contain tuples that are carefully ordered in a way
-such that their sort-order is meaningful. "Build a list lazily then
-sort" is a common idiom: it batches the O(n log n) cost into one
+These two vectors are the first instance we will see of a recurring
+pattern: the vectors contain tuples that are carefully ordered in a
+way such that their sort-order is meaningful. "Build a vector lazily
+then sort" is a common idiom: it batches the O(n log n) cost into one
 operation that the stdlib has aggressively optimized, it provides
 dense storage, and it allows for a scan in a certain order that often
 lines up with a scan over the program.

-In this particular case, we will build lists of (vreg, block) points
+In this particular case, we will build vectors of (vreg, block) points
 that are meaningful either at the start or end of a block, so that
 later, when we scan over a particular vreg's allocations in block
-order, we can generate another list of allocations. One side (the
+order, we can generate another vector of allocations. One side (the
 "outs") also contains enough information that it can line up with the
 other side (the "ins") in a later sort.

-To make this work, `blockparam_ins` contains a list of (to-vreg,
+To make this work, `blockparam_ins` contains a vector of (to-vreg,
 to-block, from-block) tuples, and has an entry for every blockparam of
 every block. Note that we can compute this without actually observing
 from-blocks; we only need to iterate over `block_preds` at any given
 block.

-Then, `blockparam_outs` contains a list of (from-vreg, from-block,
+Then, `blockparam_outs` contains a vector of (from-vreg, from-block,
 to-block, to-vreg), and has an entry for every parameter on every
 branch that ends a block. There is exactly one "out" tuple for every
 "in" tuple. As mentioned above, we will later scan over both to
 generate moves.

-### Program-Move Lists: Source-Side and Dest-Side
+### Program-Move Vectors: Source-Side and Dest-Side

 Similar to blockparams, we handle moves specially. In fact, we ingest
-all moves in the input program into a set of lists -- "move sources"
-and "move dests", analogous to the "ins" and "outs" blockparam lists
+all moves in the input program into a set of vectors -- "move sources"
+and "move dests", analogous to the "ins" and "outs" blockparam vectors
 described above -- and then completely ignore the moves in the program
 thereafter. The semantics of the API are such that all program moves
 will be recreated with regalloc-inserted edits, and should not still
@@ -353,7 +353,7 @@ opaque instructions with a source and dest, and we found that there
 were many redundant move-chains (A->B, B->C) that are eliminated when
 everything is handled centrally.

-We also construct a `prog_move_merges` list of live-range index pairs
+We also construct a `prog_move_merges` vector of live-range index pairs
 to attempt to merge when we reach that stage of allocation.

 ## Core Allocation State: Ranges, Uses, Bundles, VRegs, PRegs
@@ -370,7 +370,7 @@ A live-range is a contiguous range of program points (half-open,
 i.e. including `from` and excluding `to`) for which a particular vreg
 is live with a value.

-A live-range contains a list of uses. Each use contains four parts:
+A live-range contains a vector of uses. Each use contains four parts:
 the Operand word (directly copied, so there is no need to dereference
 it); the ProgPoint at which the use occurs; the operand slot on that
 instruction, if any, that the operand comes from, and the use's
@@ -392,14 +392,14 @@ values throughout the allocator. New live-ranges can be created
 state is bulk-freed at the end.

 Live-ranges are aggregated into "bundles". A bundle is a collection of
-ranges that does not overlap. Each bundle carries: a list (inline
+ranges that does not overlap. Each bundle carries: a vector (inline
 SmallVec) of (range, live-range index) tuples, an allocation (starts
 as "none"), a "spillset" (more below), and some metadata, including a
 spill weight (sum of ranges' weights), a priority (sum of ranges'
 lengths), and three property flags: "minimal", "contains fixed
 constraints", "contains stack constraints".

-VRegs also contain their lists of live-ranges, in the same form as a
+VRegs also contain their vectors of live-ranges, in the same form as a
 bundle does (inline SmallVec that has inline (from, to) range bounds
 and range indices).

@@ -407,14 +407,14 @@ There are two important overlap invariants: (i) no liveranges within a
 bundle overlap, and (ii) no liveranges within a vreg overlap. These
 are extremely important and we rely on them implicitly in many places.

-The live-range lists in bundles and vregs, and use-lists in ranges,
+The live-range vectors in bundles and vregs, and use-vectors in ranges,
 have various sorting invariants as well. These invariants differ
 according to the phase of the allocator's computation. First, during
 live-range construction, live-ranges are placed into vregs in reverse
 order (because the computation is a reverse scan) and uses into ranges
 in reverse order; these are sorted into forward order at the end of
 live-range computation. When bundles are first constructed, their
-range lists are sorted, and they remain so for the rest of allocation,
+range vectors are sorted, and they remain so for the rest of allocation,
 as we need for interference testing. However, as ranges are created
 and split, sortedness of vreg ranges is *not* maintained; they are
 sorted once more, in bulk, when allocation is done and we start to
@@ -464,7 +464,7 @@ create a second-chance spill bundle just for a liverange with an "Any"
 use; but if it was already forced into existence by splitting and
 trimming, then we might as well use it.

-Note that unlike other bundles, a spill bundle's liverange list
+Note that unlike other bundles, a spill bundle's liverange vector
 remains unsorted until we do the second-chance allocation. This allows
 quick appends of more liveranges.

@@ -502,8 +502,8 @@ is spilled, and we traverse to the spillset then spillslot.

 ## Other: Fixups, Stats, Debug Annotations

-There are a few fixup lists that we will cover in more detail
-later. Of particular note is the "multi-fixed-reg fixup list": this
+There are a few fixup vectors that we will cover in more detail
+later. Of particular note is the "multi-fixed-reg fixup vector": this
 handles instructions that constrain the same input vreg to multiple,
 different, fixed registers for different operands at the same program
 point. The only way to satisfy such a set of constraints is to
@@ -550,7 +550,7 @@ For each block, we perform a scan with the following state:
 - A liveness bitvec, initialized at the start from `liveouts`.
 - A vector of live-range indices, with one entry per vreg, initially
  "invalid" (this vector is allocated once and reused at each block).
- In-progress list of live-range indices per vreg in the vreg state,
+- In-progress vector of live-range indices per vreg in the vreg state,
  in *reverse* order (we will reverse it when we're done).

 A vreg is live at the current point in the scan if its bit is set in
@@ -630,7 +630,7 @@ pregs' allocation maps. Finally, we need to handle moves specially.

 With the caveat that "this is a massive hack and I am very very
 sorry", here is how it works. A move between two pinned vregs is easy:
-we add that to the inserted-moves list right away because we know the
+we add that to the inserted-moves vector right away because we know the
 Allocation on both sides. A move from a pinned vreg to a normal vreg
 is the first interesting case. In this case, we (i) create a ghost def
 with a fixed-register policy on the normal vreg, doing the other
@@ -693,8 +693,8 @@ sees the corner case where it's necessary!)

 ## Bundle Merging

-Once we have built the liverange lists for every vreg, we can reverse
-these lists (recall, they were built in strict reverse order) and
+Once we have built the liverange vectors for every vreg, we can reverse
+these vectors (recall, they were built in strict reverse order) and
 initially assign one bundle per (non-pinned) vreg. We then try to
 merge bundles together as long as find pairs of bundles that do not
 overlap and that (heuristically) make sense to merge.
@@ -711,9 +711,9 @@ corresponding output; across program moves; and across blockparam
 assignments.

 To merge two bundles, we traverse over both their sorted liverange
-lists at once, checking for overlaps. Note that we can do this without
+vectors at once, checking for overlaps. Note that we can do this without
 pointer-chasing to the liverange data; the (from, to) range is in the
-liverange list itself.
+liverange vector itself.

 We also check whether the merged bundle would have conflicting
 requirements (see below for more on requirements). We do a coarse
@@ -724,7 +724,7 @@ on both sides, merging, and checking for Conflict (the lattice bottom
 value). If no conflict, we merge.

 A performance note: merging is extremely performance-sensitive, and it
-turns out that a mergesort-like merge of the liverange lists is too
+turns out that a mergesort-like merge of the liverange vectors is too
 expensive, partly because it requires allocating a separate result
 vector (in-place merge in mergesort is infamously complex). Instead,
 we simply append one vector onto the end of the other and invoke
@@ -835,10 +835,10 @@ then we *can* use a register (either `Any`, which accepts a register
 as one of several options, or `Reg`, which must have one, or `Fixed`,
 which must have a particular one).

-We determine the list of physical registers whose allocation maps we
-will probe, and in what order. If a particular fixed register is
-required, we probe only that register. Otherwise, we probe all
-registers in the required class.
+We determine which physical registers whose allocation maps we will
+probe, and in what order. If a particular fixed register is required,
+we probe only that register. Otherwise, we probe all registers in the
+required class.

 The order in which we probe, if we are not constrained to a single
 register, is carefully chosen. First, if there is a hint register from
@@ -846,13 +846,13 @@ the spillset (this is set by the last allocation into a register of
 any other bundle in this spillset), we probe that. Then, we probe all
 preferred registers; then all non-preferred registers.

-For each of the preferred and non-preferred register lists, we probe
-in an *offset* manner: we start at some index partway through the
-list, determined by some heuristic number that is random and
+For each of the preferred and non-preferred register sequences, we
+probe in an *offset* manner: we start at some index partway through
+the sequence, determined by some heuristic number that is random and
 well-dstributed. (In practice, we use the sum of the bundle index and
 the instruction index of the start of the first range in the bundle.)
-We then march through the list and wrap around, stopping before we hit
-our starting point again.
+We then march through the sequence and wrap around, stopping before we
+hit our starting point again.

 The purpose of this offset is to distribute the contention and speed
 up the allocation process. In the common case where there are enough
@@ -863,7 +863,7 @@ order. This has a large allocation performance impact in practice.

 For each register in probe order, we probe the allocation map, and
 gather, simultaneously, several results: (i) whether the entire range
-is free; (ii) if not, the list of all conflicting bundles, *and* the
+is free; (ii) if not, the vector of all conflicting bundles, *and* the
 highest weight among those bundles; (iii) if not, the *first* conflict
 point.

@@ -915,7 +915,7 @@ track the "lowest cost split option", which is the cost (more below),
 the point at which to split, and the register for this option.

 For each register we probe, if there is a conflict but none of the
-conflicts are fixed allocations, we receive a list of bundles that
+conflicts are fixed allocations, we receive a vector of bundles that
 conflicted, and also separately, the first conflicting program
 point. We update the lowest-cost eviction option if the cost (max
 weight) of the conflicting bundles is less than the current best. We
@@ -955,14 +955,14 @@ an inner loop).

 The actual split procedure is fairly simple. We are given a bundle and
 a split-point. We create a new bundle to take on the second half
-("rest") of the original. We find the point in the liverange list that
-corresponds to the split, and distribute appropriately. If the
+("rest") of the original. We find the point in the liverange vector
+that corresponds to the split, and distribute appropriately. If the
 split-point lands in the middle of a liverange, then we split that
 liverange as well.

 In the case that a new liverange is created, we add the liverange to
-the corresponding vreg liverange list as well. Note that, as described
-above, the vreg's liverange list is unsorted while splitting is
+the corresponding vreg liverange vector as well. Note that, as described
+above, the vreg's liverange vector is unsorted while splitting is
 occurring (because we do not need to traverse it or do any lookups
 during this phase); so we just append.

@@ -1010,14 +1010,14 @@ second-chance allocation).
 ## Second-Chance Allocation: Spilled Bundles

 Once the main allocation loop terminates, when all bundles have either
-been allocated or punted to the "spilled bundles" list, we do
+been allocated or punted to the "spilled bundles" vector, we do
 second-chance allocation. This is a simpler loop that never evicts and
 never splits. Instead, each bundle gets one second chance, in which it
 can probe pregs and attempt to allocate. If it fails, it will actually
 live on the stack.

 This is correct because we are careful to only place bundles on the
-spilled-bundles list that are *allowed* to live on the
+spilled-bundles vector that are *allowed* to live on the
 stack. Specifically, only the canonical spill bundles (which will
 contain only empty ranges) and other bundles that have an "any" or
 "unknown" requirement are placed here (but *not* "stack" requirements;
@@ -1107,7 +1107,7 @@ each, and for each move that comes *to* or *from* this liverange,
 generate a "half-move". The key idea is that we generate a record for
 each "side" of the move, and these records are keyed in a way that
 after a sort, the "from" and "to" ends will be consecutive. We can
-sort the list of halfmoves once (this is expensive, but not as
+sort the vector of halfmoves once (this is expensive, but not as
 expensive as many separate pointer-chasing lookups), then scan it
 again to actually generate the move instructions.

@@ -1124,7 +1124,7 @@ of every block covered by a liverange, we can generate "dest"
 half-moves for blockparams, and at the end of every block covered by a
 liverange, we can generate "source" half-moves for blockparam args on
 branches. Incidentally, this is the reason that `blockparam_ins` and
-`blockparam_outs` are sorted tuple-lists whose tuples begin with
+`blockparam_outs` are sorted tuple-vectors whose tuples begin with
 (vreg, block, ...): this is the order in which we do the toplevel scan
 over allocations.

@@ -1166,9 +1166,9 @@ happen *in parallel*. For example, if multiple vregs change
 allocations between two instructions, all of those moves happen as
 part of one parallel permutation. Similarly, blockparams have
 parallel-assignment semantics. We thus enqueue all the moves that we
-generate at program points and resolve them into lists of sequential
-moves that can actually be lowered to move instructions in the machine
-code.
+generate at program points and resolve them into sequences of
+sequential moves that can actually be lowered to move instructions in
+the machine code.

 First, a word on *move priorities*. There are different kinds of moves
 that are generated between instructions, and we have to ensure that
@@ -1198,7 +1198,7 @@ Every move is statically given one of these priorities by the code
 that generates it.

 We collect moves with (prog-point, prio) keys, and we short by those
-keys. We then have, for each such key, a list of moves that
+keys. We then have, for each such key, a set of moves that
 semantically happen in parallel.

 We then resolve those moves using a parallel-move resolver, as we now
@@ -1212,7 +1212,7 @@ registers that other moves use as sources. We must carefully order
 moves so that this does not clobber values incorrectly.

 We first check if such overlap occurs. If it does not (this is
-actually the most common case), the list of parallel moves can be
+actually the most common case), the sequence of parallel moves can be
 emitted as sequential moves directly. Done!

 Otherwise, we have to order the moves carefully. Furthermore, if there
@@ -1229,9 +1229,9 @@ move that overwrites its source. (This will be important in a bit!)

 Our task is now to find an ordering of moves that respects these
 dependencies. To do so, we perform a depth-first search on the graph
-induced by the dependencies, which will generate a list of sequential
-moves in reverse order. We keep a stack of moves; we start with any
-move that has not been visited yet; in each iteration, if the
+induced by the dependencies, which will generate a sequence of
+sequential moves in reverse order. We keep a stack of moves; we start
+with any move that has not been visited yet; in each iteration, if the
 top-of-stack has no out-edge to another move (does not need to come
 before any others), then push it to a result vector, followed by all
 others on the stack (in popped order). If it does have an out-edge and
@@ -1257,8 +1257,8 @@ nodes (moves) can be part of the SCC, because every node's single
 out-edge is already accounted for. This is what allows us to avoid a
 fully general SCC algorithm.

-Once the list of moves in-reverse has been constructed, we reverse it
-and return.
+Once the vector of moves in-reverse has been constructed, we reverse
+it and return.

 Note that this "move resolver" is fuzzed separately with a simple
 symbolic move simulator (the `moves` fuzz-target).
@@ -1283,7 +1283,7 @@ extra spillslot.

 ## Redundant-Move Elimination

-As a final step before returning the list of program edits to the
+As a final step before returning the vector of program edits to the
 client, we perform one optimization: redundant-move elimination.

 To understand the need for this, consider what will occur when a vreg
@@ -1450,9 +1450,9 @@ Several notable high-level differences are:
  across blocks by, when reaching one end of a control-flow edge in a
  scan, doing a lookup of the allocation at the other end. This is in
  principle a linear lookup (so quadratic overall). We instead
-  generate a list of "half-moves", keyed on the edge and from/to
+  generate a vector of "half-moves", keyed on the edge and from/to
  vregs, with each holding one of the allocations. By sorting and then
-  scanning this list, we can generate all edge moves in one linear
+  scanning this vector, we can generate all edge moves in one linear
  scan. There are a number of other examples of simplifications: for
  example, we handle multiple conflicting
  physical-register-constrained uses of a vreg in a single instruction
@@ -1513,7 +1513,7 @@ number of general principles:
  cache-efficient. As another example, a side-effect of the precise
  liveness was that we could then process operands within blocks in
  actual instruction order (in reverse), which allowed us to simply
-  append liveranges to in-progress vreg liverange lists and then
+  append liveranges to in-progress vreg liverange vectors and then
  reverse at the end. The expensive part is a single pass; only the
  bitset computation is a fixpoint loop.
  
@@ -1551,11 +1551,11 @@ of the function; two separate chunks will cover that.
 We tried a number of other designs as well. Initially we used a simple
 dense bitvec, but this was prohibitively expensive: O(n^2) space when
 the real need is closer to O(n) (i.e., a classic sparse matrix). We
-also tried a hybrid scheme that kept a list of indices when small and
-used either a bitvec or a hashset when large. This did not perform as
-well because (i) it was less memory-efficient (the chunking helps with
-this) and (ii) insertions are more expensive when they always require
-a full hashset/hashmap insert.
+also tried a hybrid scheme that kept a vector of indices when small
+and used either a bitvec or a hashset when large. This did not perform
+as well because (i) it was less memory-efficient (the chunking helps
+with this) and (ii) insertions are more expensive when they always
+require a full hashset/hashmap insert.

 # Appendix: Fuzzing