Commit Graph

26 Commits

Author SHA1 Message Date
Dan Gohman
f04e02c0a1 Clarify comments about Conventional SSA form.
Captialize "Conventional" so that it's clear that "Conventional SSA"
is a specific concept being referenced.
2018-03-12 12:38:30 -07:00
Dan Gohman
30f8daa9d6 Replace assert! with debug_assert! in production code paths.
This allows the assertions to be disabled in release builds, so that
the code is faster and smaller, at the expense of not performing the
checks. Assertions can be re-enabled in release builds with the
debug-assertions flag in Cargo.toml, as the top-level Cargo.toml
file does.
2018-03-12 12:38:30 -07:00
Dan Gohman
bf480c341b Use https rather than http for several URLs. 2018-03-05 06:55:27 -08:00
Jakob Stoklund Olesen
d56ce9e8bf Optimize DomForest::push_node().
The dominator tree pre-order is defined at the EBB granularity, but we
are looking for dominating nodes at the instruction level. This means
that we sometimes need to look higher up the DomForest stack for a
dominating node, using DominatorTree::dominates() instead of
DominatorTreePreorder::dominates().

Each dominance check involves the domtree.last_dominator() function
scanning up the dominator tree, starting from the new node that was
pushed. We can eliminate this duplicate work by exposing the
last_dominator() function to push_node().

As we are searching through nodes on the stack, maintain a last_dom
program point representing the previous return value from
last_dominator(). This way, we're only scanning the dominator tree once.
2018-01-24 15:01:58 -08:00
Jakob Stoklund Olesen
416b21c18d Pairwise virtual register coalescing.
Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.

After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.

The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.

This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.

Example 1, the Python interpreter which contains a very large function
with a lot of variables.

Before:
  15.664    0.011  Register allocation
   1.535    1.535  RA liveness analysis
   2.872    1.911  RA coalescing CSSA
   4.436    4.436  RA spilling
   2.610    2.598  RA reloading
   4.200    4.199  RA coloring

After:
   9.795    0.013  Register allocation
   1.372    1.372  RA liveness analysis
   6.231    6.227  RA coalescing CSSA
   0.712    0.712  RA spilling
   0.598    0.598  RA reloading
   0.869    0.869  RA coloring

Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.

Example 2, the clang compiler.

Before:
  57.148    0.035  Register allocation
   9.630    9.630  RA liveness analysis
   7.210    7.169  RA coalescing CSSA
   9.972    9.972  RA spilling
  11.602   11.572  RA reloading
  18.698   18.672  RA coloring

After:
  64.792    0.042  Register allocation
   8.630    8.630  RA liveness analysis
  22.937   22.928  RA coalescing CSSA
   8.684    8.684  RA spilling
   9.559    9.551  RA reloading
  14.939   14.936  RA coloring

Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.

Most examples are less extreme than these two. They just get better code
at about the same compile time.
2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen
d1f236b00a Reimplement coalescer following the Budimlic paper.
The old coalescing algorithm had some algorithmic complexity issues when
dealing with large virtual registers. Reimplement to use a proper
union-find algorithm so we only need one pass through the dominator
forests for virtual registers that are interference free.

Virtual registers that do have interference are split and new registers
built.

This pass is about twice as fast as the old one when dealing with
complex virtual registers.
2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen
d617d5e0f3 Use a domtree pre-order instead of a CFG RPO for coalescing.
The stack implementation if the Budimlic dominator forest doesn't work
correctly with a CFG RPO. It needs the domtree pre-order.

Also handle EBB pre-order vs inst-level preorder. Manage the stack
according to EBB dominance. Look for a dominating value by searching the
stack. This is different from the Budimlic algorithm because we're
computing the dominator tree pre-order with EBB granularity only.

Fixes #207.
2017-12-13 16:22:01 -06:00
Jakob Stoklund Olesen
60c456c1ec Add a compilation pass timing facility.
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.

The --time-passes command line option prints a pass timing report to
stdout.
2017-12-06 17:04:23 -08:00
Jakob Stoklund Olesen
feaea238bc Use bforest::Map for representing live ranges.
Get rid of the per-value Vec in the LiveRange data type and use a
bforest::Map instead to represent the live-in intervals for non-local
live ranges.

This has some advantages:

- The memory footprint of a local live range is reduced from 40 to 20
  bytes, and
- Clearing the Liveness data structure is now a constant time operation
  which doesn't call free().
- The potentially quadratic behavior when computing large live ranges is
  controlled by the logarithmic B-tree operations.
2017-12-06 14:14:21 -08:00
Jakob Stoklund Olesen
92f378de76 Expose CFG predecessors only as an iterator.
Define two public iterator types in the flowgraph module, PredIter and
SuccIter, which are by-value iterators over an EBB's predecessors and
successors respectively.

Provide matching pred_iter() and succ_iter() methods for inspecting the
CFG. Remove the get_predecessors() method which returned a slice.

Update the uses of get_predecessors(), none of which depended on it
being a slice.

This abstraction makes it possible to change the internal representation
of the CFG.
2017-11-22 09:13:04 -08:00
Dan Gohman
bfa5a72b77 Use reserve_exact instead of reserve when the max size is known. 2017-11-16 15:16:23 -08:00
Dan Gohman
4d9aedbaca Add a 'clear()' function to Context.
This includes adding `clear()` functions to its (transitive) members.
2017-11-15 11:15:30 -08:00
Dan Gohman
3ab4349c1b Use Self instead of repeating the type name. 2017-11-08 10:43:11 -08:00
Jakob Stoklund Olesen
b3fb41087e Use the term "Function parameter" instead of "argument".
Rename the ArgumentType type to AbiParam since it describes the ABI
characteristics of a parameter or return value, not just the value type.

In Signature, rename members argument_types and return_types to "params"
and "returns". Again, they are not just types.

Fix a couple lingering references to "EBB arguments".
2017-10-19 17:39:23 -07:00
Jakob Stoklund Olesen
921bcc6c25 Use the term "EBB parameter" everywhere.
Add EBB parameter and EBB argument to the langref glossary to clarify
the distinction between formal EBB parameter values and arguments passed
to branches.

- Replace "ebb_arg" with "ebb_param" in function names that deal with
  EBB parameters.
- Rename the ValueDef variants to Result and Param.
- A bunch of other small langref fixes.

No functional changes intended.
2017-10-19 16:17:09 -07:00
Jakob Stoklund Olesen
c091a695e6 Fix coalescer bug exposed by the gvn-unremovable-phi test.
When we detect interference between the values that have already been
merged into the candidate virtual register and an EBB argument, we first
try to resolve the conflict by splitting. We also check if the existing
interfering value is fundamentally incompatible with the branch
instruction so it needs to be removed from the virtual register,
restarting the merge operation.

However, this existing interfering value is not necessarily the only
interference, so the split is not guaranteed to resolve the conflict. If
it turns out that splitting didn't resolve the conflict, restart the
merge after removing this second conflicting value.
2017-10-03 11:13:46 -07:00
Dan Gohman
adfdd77311 Avoid creating a reference to a reference.
https://github.com/rust-lang-nursery/rust-clippy/wiki#needless_borrow
2017-08-31 12:47:05 -07:00
Dan Gohman
2efdc0ed37 Update rustfmt to 0.9.0. 2017-08-31 10:44:59 -07:00
Jakob Stoklund Olesen
87cee86acb Cursor library.
Add a new cursor module and define an EncCursor data type in it. An
EncCursor is a cursor that inserts instructions with a valid encoding
for the ISA. This is useful for passes generating code after
legalization.

Implement a builder interface via the new InstInserterBase trait such
that the EncCursor builders support with_result().

Use EncCursor in coalescing.rs instead of the layout cursor as a proof
of concept.
2017-08-04 14:05:14 -07:00
Jakob Stoklund Olesen
aa0c37235a Add CursorBase builder methods.
The CursorBase::at_* are convenience methods for building a cursor that
points to a specific instruction.
2017-08-04 09:08:24 -07:00
Jakob Stoklund Olesen
dba0df787c Move most Cursor methods into a CursorBase trait.
The Cursor navigation methods all just depend on the cursor's position
and layout reference. Make a CursorBase trait that provides access to
this information with methods and implement the navigation methods on
top of that.

This makes it possible to have multiple types implement the cursor
interface.
2017-08-04 09:08:24 -07:00
Dan Gohman
587f37ec6c Don't use documentation-style comments inside a function body. (#131) 2017-08-02 11:19:16 -07:00
Jakob Stoklund Olesen
2aca35a9aa Return a function pointer from TargetIsa::encode().
Replace the isa::Legalize enumeration with a function pointer. This
allows an ISA to define its own specific legalization actions instead of
relying on the default two.

Generate a LEGALIZE_ACTIONS table for each ISA which contains
legalization function pointers indexed by the legalization codes that
are already in the encoding tables. Include this table in
isa/*/enc_tables.rs.

Give the `Encodings` iterator a reference to the action table and change
its `legalize()` method to return a function pointer instead of an
ISA-specific code.

The Result<> returned from TargetIsa::encode() no longer implements
Debug, so eliminate uses of unwrap and expect on that type.
2017-07-27 17:08:00 -07:00
Jakob Stoklund Olesen
69f974ba5d Add an ISA argument to dfg.display_inst().
Include ISA-specific annotations in tracing and error messages.
2017-07-12 10:13:13 -07:00
Jakob Stoklund Olesen
0574dcdeee Don't coalesce incoming stack arguments.
A function parameter in an incoming_arg stack slot should not be
coalesced into any virtual registers. We don't want to force the whole
virtual register to spill to the incoming_arg slot.
2017-06-28 15:37:38 -07:00
Jakob Stoklund Olesen
85b624d13b Add a coalescing pass to the register allocator.
Coalescing means creating virtual registers and transforming the code
into conventional SSA form. This means that every value used as a branch
argument will belong to the same virtual register as the corresponding
EBB argument value.

Conventional SSA form makes it easy to avoid memory-memory copies when
spilling values, and the virtual registers can be used as hints when
picking registers too. This reduces the number of register moves needed
for EBB arguments.
2017-06-22 15:03:48 -07:00