Emergency stack slots are a new kind of stack slot added relatively
recently. They need to be allocated a stack offset just like explicit
and spill slots.
Also, make StackSlotData's offset field an Option, to catch problems
like this in the future. Previously the value 0 was used when offsets
weren't assigned yet, however that made it non-obvious when the field
meant "not assigned yet" and when it meant "assigned the value 0".
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.
* div/rem by constant powers of two are turned into right shifts, plus some
fixups for the signed cases.
* div/rem by constant non-powers of two are turned into double length
multiplies by a magic constant, plus some fixups involving shifts,
addition and subtraction, that depends on the constant, the word size and
the signedness involved.
* The following cases are transformed: div and rem, signed or unsigned, 32
or 64 bit. The only un-transformed cases are: unsigned div and rem by
zero, signed div and rem by zero or -1.
* This is all incorporated within a new transformation pass, "preopt", in
lib/cretonne/src/preopt.rs.
* In preopt.rs, fn do_preopt() is the main driver. It is designed to be
extensible to transformations of other kinds of instructions. Currently
it merely uses a helper to identify div/rem transformation candidates and
another helper to perform the transformation.
* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
cases where the second arg is an immediate, and cases where the second
arg is an identifier bound to an immediate at its definition point.
* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
transformation proper. It in turn uses magic{S,U}{32,64} to calculate the
magic numbers required for the transformations.
* There are many test cases for the transformation proper:
filetests/preopt/div_by_const_non_power_of_2.cton
filetests/preopt/div_by_const_power_of_2.cton
filetests/preopt/rem_by_const_non_power_of_2.cton
filetests/preopt/rem_by_const_power_of_2.cton
filetests/preopt/div_by_const_indirect.cton
preopt.rs also contains a set of tests for magic number generation.
* The main (non-power-of-2) transformation requires instructions that return
the high word of a double-length multiply. For this, instructions umulhi
and smulhi have been added to the core instruction set. These will map
directly to single instructions on most non-intel targets.
* intel does not have an instruction exactly like that. For intel,
instructions x86_umulx and x86_smulx have been added. These map to real
instructions and return both result words. The intel legaliser will
rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
word. Tests:
filetests/isa/intel/legalize-mulhi.cton (new file)
filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
Cretonne clients don't need to know how the register allocator works.
Export the RegDiversions type from the binemit module instead. It is
used by the "test binemit" driver.
This makes it easier to debug testcases:
- the entity numbers in a .cton file match the entity numbers used
within Cretonne.
- serializing and deserializing doesn't cause indices to change.
One disadvantage is that if a .cton file uses sparse entity numbers,
deserializing to the in-memory form doesn't compact it. However, the
text format is not intended to be performance-critical, so this isn't
expected to be a big burden.
This Function method can be used after the final code layout has been
computed. It returns all the instructions in an EBB along with their
encoded size and offset from the beginning of the function.
This is useful for extracting additional metadata about trapping
instructions and other things that may be needed by a VM.
When an instruction can't be encoded, provide a bit more help:
- Detect missing register assignments for input and output operands.
- List encodings that where considered and rejected.
Add an addend field to reloc_external, and use it to move the
responsibility for accounting for the difference between the end of an
instruction (where the PC is considered to be in PC-relative on intel)
and the beginning of the immediate field into the encoding code.
Specifically, this makes IntelGOTPCRel4 directly correspond to
R_X86_64_GOTPCREL, instead of also carrying an implicit `- 4`.
Add a DominatorTreePreorder data structure which can be initialized for
a DominatorTree and used for queries involving a pre-order of the
dominator tree.
Print out the pre-order and send it through filecheck in "test domtree"
file tests.
Add a "cfg_postorder:" printout to the "test domtree" file tests and use
that to check the computed CFG post-order instead of doing it manually
with Rust code.
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.
The --time-passes command line option prints a pass timing report to
stdout.
This was supposed to be Q3 + 1.5 IQR, but a braino meant we actually used
Q3 + 2/3 IQR.
Since the distribution of test case times is far from gaussian, bump the
"slow" limit up even further to Q3 + 3 IQR.
These tests were only using "test compile" because it doesn't require
any filecheck directives to be present, so just stop requiring filecheck
directives for "test regalloc" and other filecheck-based test drivers.
This is a verification pass that can be run after register allocation.
It verifies that value locations are consistent with constraints on
their uses, and that the register diversions are consistent.
Make it clear that register diversions are local to an EBB only. This
affects what branch relaxation is allowed to do.
The verify_locations() takes an optional Liveness parameter which is
used to check that no diverted values are live across CFG edges.
When "binemit" tests encode instructions, keep track of the current set
of register diversions, and use the diverted locations to check operand
constraints.
This matches how constraints are applied during a real binemit phase.
This contains encoding details for a stack reference: The base register
and offset to use in the specific instruction encoding.
Generate StackRef objects called in_stk0 etc for the binemit recipe
code. All binemit recipes need to compute base pointer offsets for stack
references, so have the automatically generated code do it.
Add a settings::FlagsOrIsa struct which represents a flags reference and
optionally the ISA it belongs to. Use this for passing flags/isa
information to the verifier.
The verify_function() and verify_context() functions are now generic so
they accept either a &Flags or a &TargetISa argument.
Fix the return_at_end verifier tests which no longer require an ISA
specified. The signle "set return_at_end" flag setting now makes it to
the verifier even when no ISA is present to carry it.
* Make passes assert their dependencies consistently.
This avoids ambiguity about whose responsibility it is to run
to compute cfg, domtree, and loop_analysis data.
* Reset the `valid` flag in DominatorTree's `clear()`.
* Remove the redundant assert from DominatorTree::with_function.
* Remove the message strings from obvious asserts.
This avoids having them spill out into multiple lines.
* Refactor calls to `compute` on `Context` objects into helper functions.
Also, move flowgraph() calls out of filetest and into the passes that
need them so that filetest doesn't have embedded knowledge of these
dependencies.
This resolves a TODO about the way Context was running the verifier, and
it makes the Context functions and the filetest runners more transparent.
This also fixes simple_gvn to use the existing dominator tree rather
than computing its own.
The new PrimaryMap replaces the primary EntityMap and the PrimaryEntityData
marker trait which was causing some confusion. We now have a clear
division between the two types of maps:
- PrimaryMap is used to assign entity numbers to the primary data for an
entity.
- EntityMap is a secondary mapping adding additional info.
The split also means that the secondary EntityMap can now behave as if
all keys have a default value. This means that we can get rid of the
annoying ensure() and get_or_default() methods ther were used everywhere
instead of indexing. Just use normal indexing now; non-existent keys
will return the default value.
Register locations can change throughout an EBB. Make sure the
emit_inst() function considers this when encoding instructions and
update the register diversion tracker.
This function will emit the binary machine code into contiguous raw
memory while sending relocations to a RelocSink.
Add a MemoryCodeSink for generating machine code directly into memory
efficiently. Allow the TargetIsa to provide emit_function
implementations that are specialized to the MemoryCodeSink type to avoid
needless small virtual callbacks to put1() et etc.
This is the main entry point to the code generator. It returns the
computed size of the functions code.
Also add a 'test compile' command which runs the whole code generation
pipeline.
* LICM pass
* Uses loop analysis to detect loop tree
* For each loop (starting with the inner ones), create a pre-header and move there loop-invariant instructions
* An instruction is loop invariant if it does not use as argument a value defined earlier in the loop
* File tests to check LICM's correctness
* Optimized pre-header creation
If the loop already has a natural pre-header, we use it instead of creating a new one.
The natural pre-header of a loop is the only predecessor of the header it doesn't dominate.
* Skeleton simple_gvn pass.
* Basic testing infrastructure for simple-gvn.
* Add can_load and can_store flags to instructions.
* Move the replace_values function into the DataFlowGraph.
* Make InstructionData derive from Hash, PartialEq, and Eq.
* Make EntityList's hash and eq functions panic.
* Change Ieee32 and Ieee64 to store u32 and u64, respectively.
The test drivers can stop calling comp_ctx.verify because legalize() and
regalloc() do it themselves now.
This also makes it possible for those two passes to return other
CtonError codes in the future, not just verifier errors.
Soon, InstructionData won't have sufficient information to compute this.
Give TargetIsa::encode() an explicit ctrl_typevar argument. This
function does not require the instruction to be inserted in the DFG
tables.