The old coalescing algorithm had some algorithmic complexity issues when
dealing with large virtual registers. Reimplement to use a proper
union-find algorithm so we only need one pass through the dominator
forests for virtual registers that are interference free.
Virtual registers that do have interference are split and new registers
built.
This pass is about twice as fast as the old one when dealing with
complex virtual registers.
The initial phase of computing virtual registers can now be implemented
with a textbook union-find algorithm using a disjoint set forest
complete with rank and path compression optimizations.
The disjoint set forest is converted to virtual register value lists in
a single linear scan implemented in finish_union_find().
This union-find algorithm will soon be used by the coalescer.
- Allow the syntax "specials=True" to indicate that a type variable can
assume all special types. Use this for the unconstrained type variable
created in ast.py.
- Fix TypeSet.copy() to avoid deepcopy() which doesn't do the right
thing for the self.specials set.
- Fix TypeSet.typeset_key() to just use the name of special types
instead of the full SpecialType objects.
Ghost instructions and values are supposed to be stored as metadata
alongside the compiled program such that the ghost values can be
computed from the real register/stack values when the program is stopped
for debugging or de-optimization.
If we allow an EBB parameter to be a ghost value, we have no way of
computing its real value using ghost instructions. We would need to know
a complete execution trace of the stopped program to figure out which
values were passed to the ghost parameter.
Instead we require EBB parameters to be real values materialized in
registers or on the stack. We use the regclass_for_abi_type() TargetIsa
callback to determine the initial register class for these parameters.
They can then be spilled later if needed.
Fixes#215.
We want to disable dominance checks in unreachable code. The
is_reachable() check for EBB parameter values was checking if the
defining EBB was reachable, not the EBB using the value.
This bug showed up in fuzzing and in #213.
It looks like mypy 0.560 doesn't like when a local variable changes its
type inside a function.
Fixes introduce a new variable instead of reusing an existing one.
Add an addend field to reloc_external, and use it to move the
responsibility for accounting for the difference between the end of an
instruction (where the PC is considered to be in PC-relative on intel)
and the beginning of the immediate field into the encoding code.
Specifically, this makes IntelGOTPCRel4 directly correspond to
R_X86_64_GOTPCREL, instead of also carrying an implicit `- 4`.
When the spiller needs to make a register available for a conditional
branch instruction, it can be necessary to spill some of the EBB
arguments on the branch instruction. This is ok because EBB argument
values belong to the same virtual register as the corresponding EBB
parameter and we spill the whole virtreg to the same slot.
Also make sure free_regs() can handle values that are killed by the
current instruction *and* spilled.
The stack implementation if the Budimlic dominator forest doesn't work
correctly with a CFG RPO. It needs the domtree pre-order.
Also handle EBB pre-order vs inst-level preorder. Manage the stack
according to EBB dominance. Look for a dominating value by searching the
stack. This is different from the Budimlic algorithm because we're
computing the dominator tree pre-order with EBB granularity only.
Fixes#207.
Instead of requiring the values in a virtual register to be sorted
according to the domtree.rpo_cmp() order, just require any topological
ordering w.r.t. dominance.
The coalescer with stop using the RPO shortly.
The coalescer makes sure that matching EBB arguments and parameters are
always in the same virtual registers, and therefore also in the same
stack slot if they are spilled.
This means that the reload pass should never rewrite an EBB argument if
the argument value is spilled. This comes up in cases where the branch
instruction needs the same value in a register:
brnz v9, ebb3(v9)
If the virtual register containing v9 is spilled, the branch instruction
must be reloaded like:
v52 = fill v9
brnz v52, ebb3(v9)
The branch register argument must be rewritten, and the EBB argument
must be referring to the original stack value.
Fixes#208.
Add a DominatorTreePreorder data structure which can be initialized for
a DominatorTree and used for queries involving a pre-order of the
dominator tree.
Print out the pre-order and send it through filecheck in "test domtree"
file tests.
Add a LibCall type which represents runtime library functions that many
be synthesized by Cretonne from pure instructions.
Add a LibCall variant to ExternalName to represent one of these runtime
functions.
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.
The --time-passes command line option prints a pass timing report to
stdout.
Get rid of the per-value Vec in the LiveRange data type and use a
bforest::Map instead to represent the live-in intervals for non-local
live ranges.
This has some advantages:
- The memory footprint of a local live range is reduced from 40 to 20
bytes, and
- Clearing the Liveness data structure is now a constant time operation
which doesn't call free().
- The potentially quadratic behavior when computing large live ranges is
controlled by the logarithmic B-tree operations.
This makes the whole LiveRange generic over the program order instead of
having a number of methods that are individually program order-generic.
This makes is possible to have data members that depend on the program
order, as we will shortly.
This also gives us stronger type checking on the public LiveRange
methods which now require a Layout argument, not just any program order.
Change the default value for the "enable_verifier" setting so the
verifier runs unless it is explicitly disabled.
Most projects using Cretonne are best off running the verifier always
until they start caring about compile time performance. Then they can
easily disable the verifier.
The reserved register heaps are not implemented in the Cretonne
legalizer, so IR generated by the dummy environment would trip
assertions when compiled.
Use a heap with a vmctx base address instead, and also demonstrate how
vmctx arguments are added to all signatures to achieve this.
* Use imm64 rather than offset32
* Add predicate to enforce signed 32-bit limit to imm
* Remove AdjustSpImm format
* Add encoding tests for adjust_sp_imm
* Adjust use of adjust_sp_imm in Intel prologue_epilogue to match