Use "vector" or "sequence" in lieu of "list" in DESIGN.md, except when referring to linked lists.
From feedback from @julian-seward1. I had used "list" in a more generic sense, meaning ordered sequence of elements, while in a Rust context it can sometimes be confused with "linked list" specifically. These alternative terms are more precise.
This commit is contained in:
170
doc/DESIGN.md
170
doc/DESIGN.md
@@ -15,23 +15,23 @@ The toplevel API to regalloc2 consists of a single entry point `run()`
|
||||
that takes a register environment, which specifies all physical
|
||||
registers, and the input program. The function returns either an error
|
||||
or an `Output` struct that provides allocations for each operand and a
|
||||
list of additional instructions (moves, loads, stores) to insert.
|
||||
vector of additional instructions (moves, loads, stores) to insert.
|
||||
|
||||
## Register Environment
|
||||
|
||||
The allocator takes a `MachineEnv` which specifies, for each of the
|
||||
two register classes `Int` and `Float`, a list of `PReg`s by index. A
|
||||
two register classes `Int` and `Float`, a vector of `PReg`s by index. A
|
||||
`PReg` is nothing more than the class and index within the class; the
|
||||
allocator does not need to know anything more.
|
||||
|
||||
The `MachineEnv` provides a list of preferred and non-preferred
|
||||
physical registers per class. Any register not on either list will not
|
||||
be allocated. Usually, registers that do not need to be saved in the
|
||||
prologue if used (i.e., caller-save registers) are given in the
|
||||
"preferred" list. The environment also provides exactly one scratch
|
||||
The `MachineEnv` provides a vector of preferred and non-preferred
|
||||
physical registers per class. Any register not in either vector will
|
||||
not be allocated. Usually, registers that do not need to be saved in
|
||||
the prologue if used (i.e., caller-save registers) are given in the
|
||||
"preferred" vector. The environment also provides exactly one scratch
|
||||
register per class. This register must not be in the preferred or
|
||||
non-preferred lists, and is used whenever a set of moves that need to
|
||||
occur logically in parallel have a cycle (for a simple example,
|
||||
non-preferred vectors, and is used whenever a set of moves that need
|
||||
to occur logically in parallel have a cycle (for a simple example,
|
||||
consider a swap `r0, r1 := r1, r0`).
|
||||
|
||||
With some more work, we could potentially remove the need for the
|
||||
@@ -41,17 +41,17 @@ the client ("swap"), but we have not pursued this.
|
||||
## CFG and Instructions
|
||||
|
||||
The allocator operates on an input program that is in a standard CFG
|
||||
representation: the function body is a list of basic blocks, and each
|
||||
block has a list of instructions and zero or more successors. The
|
||||
allocator also requires the client to provide predecessors for each
|
||||
block, and these must be consistent with the successor
|
||||
lists.
|
||||
representation: the function body is a sequence of basic blocks, and
|
||||
each block has a sequence of instructions and zero or more
|
||||
successors. The allocator also requires the client to provide
|
||||
predecessors for each block, and these must be consistent with the
|
||||
successors.
|
||||
|
||||
Instructions are opaque to the allocator except for a few important
|
||||
bits: (1) `is_ret` (is a return instruction); (2) `is_branch` (is a
|
||||
branch instruction); (3) `is_call` (is a call instruction, for
|
||||
heuristic purposes only), (4) `is_move` (is a move between registers),
|
||||
and (5) a list of Operands, covered below. Every block must end in a
|
||||
and (5) a vector of Operands, covered below. Every block must end in a
|
||||
return or branch.
|
||||
|
||||
Both instructions and blocks are named by indices in contiguous index
|
||||
@@ -248,8 +248,8 @@ removed. However, it is very important for performance at the moment.
|
||||
## Output
|
||||
|
||||
The allocator produces two main data structures as output: an array of
|
||||
`Allocation`s and a list of edits. Some other data, such as stackmap
|
||||
slot info, is also provided.
|
||||
`Allocation`s and a sequence of edits. Some other data, such as
|
||||
stackmap slot info, is also provided.
|
||||
|
||||
### Allocations
|
||||
|
||||
@@ -266,7 +266,7 @@ In order to implement the necessary movement of data between
|
||||
allocations, the allocator needs to insert moves at various program
|
||||
points.
|
||||
|
||||
The list of inserted moves contains tuples that name a program point
|
||||
The vector of inserted moves contains tuples that name a program point
|
||||
and an "edit". The edit is either a move, from one `Allocation` to
|
||||
another, or else a kind of metadata used by the checker to know which
|
||||
VReg is live in a given allocation at any particular time. The latter
|
||||
@@ -304,44 +304,44 @@ standard backward iterative dataflow analysis and are exact; they do
|
||||
not over-approximate (this turns out to be important for performance,
|
||||
and is also necessary for correctness in the case of stackmaps).
|
||||
|
||||
### Blockparam Lists: Source-Side and Dest-Side
|
||||
### Blockparam Vectors: Source-Side and Dest-Side
|
||||
|
||||
The initialization stage scans the input program and produces two
|
||||
lists that represent blockparam flows from branches to destination
|
||||
vectors that represent blockparam flows from branches to destination
|
||||
blocks: `blockparam_ins` and `blockparam_outs`.
|
||||
|
||||
These two lists are the first instance we will see of a recurring
|
||||
pattern: the lists contain tuples that are carefully ordered in a way
|
||||
such that their sort-order is meaningful. "Build a list lazily then
|
||||
sort" is a common idiom: it batches the O(n log n) cost into one
|
||||
These two vectors are the first instance we will see of a recurring
|
||||
pattern: the vectors contain tuples that are carefully ordered in a
|
||||
way such that their sort-order is meaningful. "Build a vector lazily
|
||||
then sort" is a common idiom: it batches the O(n log n) cost into one
|
||||
operation that the stdlib has aggressively optimized, it provides
|
||||
dense storage, and it allows for a scan in a certain order that often
|
||||
lines up with a scan over the program.
|
||||
|
||||
In this particular case, we will build lists of (vreg, block) points
|
||||
In this particular case, we will build vectors of (vreg, block) points
|
||||
that are meaningful either at the start or end of a block, so that
|
||||
later, when we scan over a particular vreg's allocations in block
|
||||
order, we can generate another list of allocations. One side (the
|
||||
order, we can generate another vector of allocations. One side (the
|
||||
"outs") also contains enough information that it can line up with the
|
||||
other side (the "ins") in a later sort.
|
||||
|
||||
To make this work, `blockparam_ins` contains a list of (to-vreg,
|
||||
To make this work, `blockparam_ins` contains a vector of (to-vreg,
|
||||
to-block, from-block) tuples, and has an entry for every blockparam of
|
||||
every block. Note that we can compute this without actually observing
|
||||
from-blocks; we only need to iterate over `block_preds` at any given
|
||||
block.
|
||||
|
||||
Then, `blockparam_outs` contains a list of (from-vreg, from-block,
|
||||
Then, `blockparam_outs` contains a vector of (from-vreg, from-block,
|
||||
to-block, to-vreg), and has an entry for every parameter on every
|
||||
branch that ends a block. There is exactly one "out" tuple for every
|
||||
"in" tuple. As mentioned above, we will later scan over both to
|
||||
generate moves.
|
||||
|
||||
### Program-Move Lists: Source-Side and Dest-Side
|
||||
### Program-Move Vectors: Source-Side and Dest-Side
|
||||
|
||||
Similar to blockparams, we handle moves specially. In fact, we ingest
|
||||
all moves in the input program into a set of lists -- "move sources"
|
||||
and "move dests", analogous to the "ins" and "outs" blockparam lists
|
||||
all moves in the input program into a set of vectors -- "move sources"
|
||||
and "move dests", analogous to the "ins" and "outs" blockparam vectors
|
||||
described above -- and then completely ignore the moves in the program
|
||||
thereafter. The semantics of the API are such that all program moves
|
||||
will be recreated with regalloc-inserted edits, and should not still
|
||||
@@ -353,7 +353,7 @@ opaque instructions with a source and dest, and we found that there
|
||||
were many redundant move-chains (A->B, B->C) that are eliminated when
|
||||
everything is handled centrally.
|
||||
|
||||
We also construct a `prog_move_merges` list of live-range index pairs
|
||||
We also construct a `prog_move_merges` vector of live-range index pairs
|
||||
to attempt to merge when we reach that stage of allocation.
|
||||
|
||||
## Core Allocation State: Ranges, Uses, Bundles, VRegs, PRegs
|
||||
@@ -370,7 +370,7 @@ A live-range is a contiguous range of program points (half-open,
|
||||
i.e. including `from` and excluding `to`) for which a particular vreg
|
||||
is live with a value.
|
||||
|
||||
A live-range contains a list of uses. Each use contains four parts:
|
||||
A live-range contains a vector of uses. Each use contains four parts:
|
||||
the Operand word (directly copied, so there is no need to dereference
|
||||
it); the ProgPoint at which the use occurs; the operand slot on that
|
||||
instruction, if any, that the operand comes from, and the use's
|
||||
@@ -392,14 +392,14 @@ values throughout the allocator. New live-ranges can be created
|
||||
state is bulk-freed at the end.
|
||||
|
||||
Live-ranges are aggregated into "bundles". A bundle is a collection of
|
||||
ranges that does not overlap. Each bundle carries: a list (inline
|
||||
ranges that does not overlap. Each bundle carries: a vector (inline
|
||||
SmallVec) of (range, live-range index) tuples, an allocation (starts
|
||||
as "none"), a "spillset" (more below), and some metadata, including a
|
||||
spill weight (sum of ranges' weights), a priority (sum of ranges'
|
||||
lengths), and three property flags: "minimal", "contains fixed
|
||||
constraints", "contains stack constraints".
|
||||
|
||||
VRegs also contain their lists of live-ranges, in the same form as a
|
||||
VRegs also contain their vectors of live-ranges, in the same form as a
|
||||
bundle does (inline SmallVec that has inline (from, to) range bounds
|
||||
and range indices).
|
||||
|
||||
@@ -407,14 +407,14 @@ There are two important overlap invariants: (i) no liveranges within a
|
||||
bundle overlap, and (ii) no liveranges within a vreg overlap. These
|
||||
are extremely important and we rely on them implicitly in many places.
|
||||
|
||||
The live-range lists in bundles and vregs, and use-lists in ranges,
|
||||
The live-range vectors in bundles and vregs, and use-vectors in ranges,
|
||||
have various sorting invariants as well. These invariants differ
|
||||
according to the phase of the allocator's computation. First, during
|
||||
live-range construction, live-ranges are placed into vregs in reverse
|
||||
order (because the computation is a reverse scan) and uses into ranges
|
||||
in reverse order; these are sorted into forward order at the end of
|
||||
live-range computation. When bundles are first constructed, their
|
||||
range lists are sorted, and they remain so for the rest of allocation,
|
||||
range vectors are sorted, and they remain so for the rest of allocation,
|
||||
as we need for interference testing. However, as ranges are created
|
||||
and split, sortedness of vreg ranges is *not* maintained; they are
|
||||
sorted once more, in bulk, when allocation is done and we start to
|
||||
@@ -464,7 +464,7 @@ create a second-chance spill bundle just for a liverange with an "Any"
|
||||
use; but if it was already forced into existence by splitting and
|
||||
trimming, then we might as well use it.
|
||||
|
||||
Note that unlike other bundles, a spill bundle's liverange list
|
||||
Note that unlike other bundles, a spill bundle's liverange vector
|
||||
remains unsorted until we do the second-chance allocation. This allows
|
||||
quick appends of more liveranges.
|
||||
|
||||
@@ -502,8 +502,8 @@ is spilled, and we traverse to the spillset then spillslot.
|
||||
|
||||
## Other: Fixups, Stats, Debug Annotations
|
||||
|
||||
There are a few fixup lists that we will cover in more detail
|
||||
later. Of particular note is the "multi-fixed-reg fixup list": this
|
||||
There are a few fixup vectors that we will cover in more detail
|
||||
later. Of particular note is the "multi-fixed-reg fixup vector": this
|
||||
handles instructions that constrain the same input vreg to multiple,
|
||||
different, fixed registers for different operands at the same program
|
||||
point. The only way to satisfy such a set of constraints is to
|
||||
@@ -550,7 +550,7 @@ For each block, we perform a scan with the following state:
|
||||
- A liveness bitvec, initialized at the start from `liveouts`.
|
||||
- A vector of live-range indices, with one entry per vreg, initially
|
||||
"invalid" (this vector is allocated once and reused at each block).
|
||||
- In-progress list of live-range indices per vreg in the vreg state,
|
||||
- In-progress vector of live-range indices per vreg in the vreg state,
|
||||
in *reverse* order (we will reverse it when we're done).
|
||||
|
||||
A vreg is live at the current point in the scan if its bit is set in
|
||||
@@ -630,7 +630,7 @@ pregs' allocation maps. Finally, we need to handle moves specially.
|
||||
|
||||
With the caveat that "this is a massive hack and I am very very
|
||||
sorry", here is how it works. A move between two pinned vregs is easy:
|
||||
we add that to the inserted-moves list right away because we know the
|
||||
we add that to the inserted-moves vector right away because we know the
|
||||
Allocation on both sides. A move from a pinned vreg to a normal vreg
|
||||
is the first interesting case. In this case, we (i) create a ghost def
|
||||
with a fixed-register policy on the normal vreg, doing the other
|
||||
@@ -693,8 +693,8 @@ sees the corner case where it's necessary!)
|
||||
|
||||
## Bundle Merging
|
||||
|
||||
Once we have built the liverange lists for every vreg, we can reverse
|
||||
these lists (recall, they were built in strict reverse order) and
|
||||
Once we have built the liverange vectors for every vreg, we can reverse
|
||||
these vectors (recall, they were built in strict reverse order) and
|
||||
initially assign one bundle per (non-pinned) vreg. We then try to
|
||||
merge bundles together as long as find pairs of bundles that do not
|
||||
overlap and that (heuristically) make sense to merge.
|
||||
@@ -711,9 +711,9 @@ corresponding output; across program moves; and across blockparam
|
||||
assignments.
|
||||
|
||||
To merge two bundles, we traverse over both their sorted liverange
|
||||
lists at once, checking for overlaps. Note that we can do this without
|
||||
vectors at once, checking for overlaps. Note that we can do this without
|
||||
pointer-chasing to the liverange data; the (from, to) range is in the
|
||||
liverange list itself.
|
||||
liverange vector itself.
|
||||
|
||||
We also check whether the merged bundle would have conflicting
|
||||
requirements (see below for more on requirements). We do a coarse
|
||||
@@ -724,7 +724,7 @@ on both sides, merging, and checking for Conflict (the lattice bottom
|
||||
value). If no conflict, we merge.
|
||||
|
||||
A performance note: merging is extremely performance-sensitive, and it
|
||||
turns out that a mergesort-like merge of the liverange lists is too
|
||||
turns out that a mergesort-like merge of the liverange vectors is too
|
||||
expensive, partly because it requires allocating a separate result
|
||||
vector (in-place merge in mergesort is infamously complex). Instead,
|
||||
we simply append one vector onto the end of the other and invoke
|
||||
@@ -835,10 +835,10 @@ then we *can* use a register (either `Any`, which accepts a register
|
||||
as one of several options, or `Reg`, which must have one, or `Fixed`,
|
||||
which must have a particular one).
|
||||
|
||||
We determine the list of physical registers whose allocation maps we
|
||||
will probe, and in what order. If a particular fixed register is
|
||||
required, we probe only that register. Otherwise, we probe all
|
||||
registers in the required class.
|
||||
We determine which physical registers whose allocation maps we will
|
||||
probe, and in what order. If a particular fixed register is required,
|
||||
we probe only that register. Otherwise, we probe all registers in the
|
||||
required class.
|
||||
|
||||
The order in which we probe, if we are not constrained to a single
|
||||
register, is carefully chosen. First, if there is a hint register from
|
||||
@@ -846,13 +846,13 @@ the spillset (this is set by the last allocation into a register of
|
||||
any other bundle in this spillset), we probe that. Then, we probe all
|
||||
preferred registers; then all non-preferred registers.
|
||||
|
||||
For each of the preferred and non-preferred register lists, we probe
|
||||
in an *offset* manner: we start at some index partway through the
|
||||
list, determined by some heuristic number that is random and
|
||||
For each of the preferred and non-preferred register sequences, we
|
||||
probe in an *offset* manner: we start at some index partway through
|
||||
the sequence, determined by some heuristic number that is random and
|
||||
well-dstributed. (In practice, we use the sum of the bundle index and
|
||||
the instruction index of the start of the first range in the bundle.)
|
||||
We then march through the list and wrap around, stopping before we hit
|
||||
our starting point again.
|
||||
We then march through the sequence and wrap around, stopping before we
|
||||
hit our starting point again.
|
||||
|
||||
The purpose of this offset is to distribute the contention and speed
|
||||
up the allocation process. In the common case where there are enough
|
||||
@@ -863,7 +863,7 @@ order. This has a large allocation performance impact in practice.
|
||||
|
||||
For each register in probe order, we probe the allocation map, and
|
||||
gather, simultaneously, several results: (i) whether the entire range
|
||||
is free; (ii) if not, the list of all conflicting bundles, *and* the
|
||||
is free; (ii) if not, the vector of all conflicting bundles, *and* the
|
||||
highest weight among those bundles; (iii) if not, the *first* conflict
|
||||
point.
|
||||
|
||||
@@ -915,7 +915,7 @@ track the "lowest cost split option", which is the cost (more below),
|
||||
the point at which to split, and the register for this option.
|
||||
|
||||
For each register we probe, if there is a conflict but none of the
|
||||
conflicts are fixed allocations, we receive a list of bundles that
|
||||
conflicts are fixed allocations, we receive a vector of bundles that
|
||||
conflicted, and also separately, the first conflicting program
|
||||
point. We update the lowest-cost eviction option if the cost (max
|
||||
weight) of the conflicting bundles is less than the current best. We
|
||||
@@ -955,14 +955,14 @@ an inner loop).
|
||||
|
||||
The actual split procedure is fairly simple. We are given a bundle and
|
||||
a split-point. We create a new bundle to take on the second half
|
||||
("rest") of the original. We find the point in the liverange list that
|
||||
corresponds to the split, and distribute appropriately. If the
|
||||
("rest") of the original. We find the point in the liverange vector
|
||||
that corresponds to the split, and distribute appropriately. If the
|
||||
split-point lands in the middle of a liverange, then we split that
|
||||
liverange as well.
|
||||
|
||||
In the case that a new liverange is created, we add the liverange to
|
||||
the corresponding vreg liverange list as well. Note that, as described
|
||||
above, the vreg's liverange list is unsorted while splitting is
|
||||
the corresponding vreg liverange vector as well. Note that, as described
|
||||
above, the vreg's liverange vector is unsorted while splitting is
|
||||
occurring (because we do not need to traverse it or do any lookups
|
||||
during this phase); so we just append.
|
||||
|
||||
@@ -1010,14 +1010,14 @@ second-chance allocation).
|
||||
## Second-Chance Allocation: Spilled Bundles
|
||||
|
||||
Once the main allocation loop terminates, when all bundles have either
|
||||
been allocated or punted to the "spilled bundles" list, we do
|
||||
been allocated or punted to the "spilled bundles" vector, we do
|
||||
second-chance allocation. This is a simpler loop that never evicts and
|
||||
never splits. Instead, each bundle gets one second chance, in which it
|
||||
can probe pregs and attempt to allocate. If it fails, it will actually
|
||||
live on the stack.
|
||||
|
||||
This is correct because we are careful to only place bundles on the
|
||||
spilled-bundles list that are *allowed* to live on the
|
||||
spilled-bundles vector that are *allowed* to live on the
|
||||
stack. Specifically, only the canonical spill bundles (which will
|
||||
contain only empty ranges) and other bundles that have an "any" or
|
||||
"unknown" requirement are placed here (but *not* "stack" requirements;
|
||||
@@ -1107,7 +1107,7 @@ each, and for each move that comes *to* or *from* this liverange,
|
||||
generate a "half-move". The key idea is that we generate a record for
|
||||
each "side" of the move, and these records are keyed in a way that
|
||||
after a sort, the "from" and "to" ends will be consecutive. We can
|
||||
sort the list of halfmoves once (this is expensive, but not as
|
||||
sort the vector of halfmoves once (this is expensive, but not as
|
||||
expensive as many separate pointer-chasing lookups), then scan it
|
||||
again to actually generate the move instructions.
|
||||
|
||||
@@ -1124,7 +1124,7 @@ of every block covered by a liverange, we can generate "dest"
|
||||
half-moves for blockparams, and at the end of every block covered by a
|
||||
liverange, we can generate "source" half-moves for blockparam args on
|
||||
branches. Incidentally, this is the reason that `blockparam_ins` and
|
||||
`blockparam_outs` are sorted tuple-lists whose tuples begin with
|
||||
`blockparam_outs` are sorted tuple-vectors whose tuples begin with
|
||||
(vreg, block, ...): this is the order in which we do the toplevel scan
|
||||
over allocations.
|
||||
|
||||
@@ -1166,9 +1166,9 @@ happen *in parallel*. For example, if multiple vregs change
|
||||
allocations between two instructions, all of those moves happen as
|
||||
part of one parallel permutation. Similarly, blockparams have
|
||||
parallel-assignment semantics. We thus enqueue all the moves that we
|
||||
generate at program points and resolve them into lists of sequential
|
||||
moves that can actually be lowered to move instructions in the machine
|
||||
code.
|
||||
generate at program points and resolve them into sequences of
|
||||
sequential moves that can actually be lowered to move instructions in
|
||||
the machine code.
|
||||
|
||||
First, a word on *move priorities*. There are different kinds of moves
|
||||
that are generated between instructions, and we have to ensure that
|
||||
@@ -1198,7 +1198,7 @@ Every move is statically given one of these priorities by the code
|
||||
that generates it.
|
||||
|
||||
We collect moves with (prog-point, prio) keys, and we short by those
|
||||
keys. We then have, for each such key, a list of moves that
|
||||
keys. We then have, for each such key, a set of moves that
|
||||
semantically happen in parallel.
|
||||
|
||||
We then resolve those moves using a parallel-move resolver, as we now
|
||||
@@ -1212,7 +1212,7 @@ registers that other moves use as sources. We must carefully order
|
||||
moves so that this does not clobber values incorrectly.
|
||||
|
||||
We first check if such overlap occurs. If it does not (this is
|
||||
actually the most common case), the list of parallel moves can be
|
||||
actually the most common case), the sequence of parallel moves can be
|
||||
emitted as sequential moves directly. Done!
|
||||
|
||||
Otherwise, we have to order the moves carefully. Furthermore, if there
|
||||
@@ -1229,9 +1229,9 @@ move that overwrites its source. (This will be important in a bit!)
|
||||
|
||||
Our task is now to find an ordering of moves that respects these
|
||||
dependencies. To do so, we perform a depth-first search on the graph
|
||||
induced by the dependencies, which will generate a list of sequential
|
||||
moves in reverse order. We keep a stack of moves; we start with any
|
||||
move that has not been visited yet; in each iteration, if the
|
||||
induced by the dependencies, which will generate a sequence of
|
||||
sequential moves in reverse order. We keep a stack of moves; we start
|
||||
with any move that has not been visited yet; in each iteration, if the
|
||||
top-of-stack has no out-edge to another move (does not need to come
|
||||
before any others), then push it to a result vector, followed by all
|
||||
others on the stack (in popped order). If it does have an out-edge and
|
||||
@@ -1257,8 +1257,8 @@ nodes (moves) can be part of the SCC, because every node's single
|
||||
out-edge is already accounted for. This is what allows us to avoid a
|
||||
fully general SCC algorithm.
|
||||
|
||||
Once the list of moves in-reverse has been constructed, we reverse it
|
||||
and return.
|
||||
Once the vector of moves in-reverse has been constructed, we reverse
|
||||
it and return.
|
||||
|
||||
Note that this "move resolver" is fuzzed separately with a simple
|
||||
symbolic move simulator (the `moves` fuzz-target).
|
||||
@@ -1283,7 +1283,7 @@ extra spillslot.
|
||||
|
||||
## Redundant-Move Elimination
|
||||
|
||||
As a final step before returning the list of program edits to the
|
||||
As a final step before returning the vector of program edits to the
|
||||
client, we perform one optimization: redundant-move elimination.
|
||||
|
||||
To understand the need for this, consider what will occur when a vreg
|
||||
@@ -1450,9 +1450,9 @@ Several notable high-level differences are:
|
||||
across blocks by, when reaching one end of a control-flow edge in a
|
||||
scan, doing a lookup of the allocation at the other end. This is in
|
||||
principle a linear lookup (so quadratic overall). We instead
|
||||
generate a list of "half-moves", keyed on the edge and from/to
|
||||
generate a vector of "half-moves", keyed on the edge and from/to
|
||||
vregs, with each holding one of the allocations. By sorting and then
|
||||
scanning this list, we can generate all edge moves in one linear
|
||||
scanning this vector, we can generate all edge moves in one linear
|
||||
scan. There are a number of other examples of simplifications: for
|
||||
example, we handle multiple conflicting
|
||||
physical-register-constrained uses of a vreg in a single instruction
|
||||
@@ -1513,7 +1513,7 @@ number of general principles:
|
||||
cache-efficient. As another example, a side-effect of the precise
|
||||
liveness was that we could then process operands within blocks in
|
||||
actual instruction order (in reverse), which allowed us to simply
|
||||
append liveranges to in-progress vreg liverange lists and then
|
||||
append liveranges to in-progress vreg liverange vectors and then
|
||||
reverse at the end. The expensive part is a single pass; only the
|
||||
bitset computation is a fixpoint loop.
|
||||
|
||||
@@ -1551,11 +1551,11 @@ of the function; two separate chunks will cover that.
|
||||
We tried a number of other designs as well. Initially we used a simple
|
||||
dense bitvec, but this was prohibitively expensive: O(n^2) space when
|
||||
the real need is closer to O(n) (i.e., a classic sparse matrix). We
|
||||
also tried a hybrid scheme that kept a list of indices when small and
|
||||
used either a bitvec or a hashset when large. This did not perform as
|
||||
well because (i) it was less memory-efficient (the chunking helps with
|
||||
this) and (ii) insertions are more expensive when they always require
|
||||
a full hashset/hashmap insert.
|
||||
also tried a hybrid scheme that kept a vector of indices when small
|
||||
and used either a bitvec or a hashset when large. This did not perform
|
||||
as well because (i) it was less memory-efficient (the chunking helps
|
||||
with this) and (ii) insertions are more expensive when they always
|
||||
require a full hashset/hashmap insert.
|
||||
|
||||
# Appendix: Fuzzing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user