Value aliases aren't instructions, so they don't have a location in the
CFG, so it's not meaningful to query whether a value alias is defined
within a loop.
There appear to be underlying problems with the way Cretonne handles value
aliases, which are causing problems for LICM. Disable LICM until we have
a chance to fix the underlying issues.
Fixes#275.
* cton-util: fix some clippy unnecessary pass-by-value warnings
* clippy: ignore too many arguments / cyclomatic complexity in module
since these functions are taking args coming from the command line, i
dont think this is actually a valid lint, morally the arguments are all
from one structure
* cton-util: take care of remaining clippy warnings
* cton-reader: fix all non-suspicious clippy warnings
* cton-reader: disable clippy at site of suspicious lint
* cton-frontend: disable clippy at the site of an invalid lint
* cton-frontend: fix clippy warnings, or ignore benign ones
* clippy: ignore the camelcase word WebAssembly in docs
* cton-wasm: fix clippy complaints or ignore benign ones
* cton-wasm tests: fix clippy complaints
* cretonne: starting point turns off all clippy warnings
* cretonne: clippy fixes, or lower allow() to source of problem
* cretonne: more clippy fixes
* cretonne: fix or disable needless_lifetimes lint
this linter is buggy when the declared lifetime is used for another type
constraint.
* cretonne: fix clippy complaint about Pass::NoPass
* rustfmt
* fix prev minor api changes clippy suggested
* add clippy to test-all
* cton-filetests: clippy fixes
* simplify clippy reporting in test-all
* cretonne: document clippy allows better
* cretonne: fix some more clippy lints
* cretonne: fix clippy lints (mostly doc comments)
* cretonne: allow all needless_lifetimes clippy warnings
remove overrides at the false positives
* rustfmt
When relaxing a branch, restrict the set of candidate encodings to those which
have the same input constraints as the original encoding choice. This prevents
situations where relaxation prefers a non-REX-prefixed encoding over a REX
prefixed one because the end of the instruction can be one byte closer to the
destination, in a situation where the encoding needs to be REX-prefixed
because of one of the operand registers.
This also makes the Context class perform encoding verification after
relaxation, to catch similar problems in the future.
Fixes#256.
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.
* div/rem by constant powers of two are turned into right shifts, plus some
fixups for the signed cases.
* div/rem by constant non-powers of two are turned into double length
multiplies by a magic constant, plus some fixups involving shifts,
addition and subtraction, that depends on the constant, the word size and
the signedness involved.
* The following cases are transformed: div and rem, signed or unsigned, 32
or 64 bit. The only un-transformed cases are: unsigned div and rem by
zero, signed div and rem by zero or -1.
* This is all incorporated within a new transformation pass, "preopt", in
lib/cretonne/src/preopt.rs.
* In preopt.rs, fn do_preopt() is the main driver. It is designed to be
extensible to transformations of other kinds of instructions. Currently
it merely uses a helper to identify div/rem transformation candidates and
another helper to perform the transformation.
* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
cases where the second arg is an immediate, and cases where the second
arg is an identifier bound to an immediate at its definition point.
* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
transformation proper. It in turn uses magic{S,U}{32,64} to calculate the
magic numbers required for the transformations.
* There are many test cases for the transformation proper:
filetests/preopt/div_by_const_non_power_of_2.cton
filetests/preopt/div_by_const_power_of_2.cton
filetests/preopt/rem_by_const_non_power_of_2.cton
filetests/preopt/rem_by_const_power_of_2.cton
filetests/preopt/div_by_const_indirect.cton
preopt.rs also contains a set of tests for magic number generation.
* The main (non-power-of-2) transformation requires instructions that return
the high word of a double-length multiply. For this, instructions umulhi
and smulhi have been added to the core instruction set. These will map
directly to single instructions on most non-intel targets.
* intel does not have an instruction exactly like that. For intel,
instructions x86_umulx and x86_smulx have been added. These map to real
instructions and return both result words. The intel legaliser will
rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
word. Tests:
filetests/isa/intel/legalize-mulhi.cton (new file)
filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.
The --time-passes command line option prints a pass timing report to
stdout.
The register allocator doesn't even try to compile unreachable EBBs, so
any values defined in such blocks won't be assigned registers.
Since the dominator tree already has determined which EBBs are
reachable, we should just eliminate any unreachable blocks instead o
trying to do something with the dead code.
Not that this is not a "dead code elimination" pass which would also
remove individual instructions whose results are not used.
This removes the `optimize` option, as one can do that with
`--set`, eg. `--set opt_level=best`. And it adds an option to
print the compilation output.
And, it enables simple_gvn and licm for opt_level=best.
Add a settings::FlagsOrIsa struct which represents a flags reference and
optionally the ISA it belongs to. Use this for passing flags/isa
information to the verifier.
The verify_function() and verify_context() functions are now generic so
they accept either a &Flags or a &TargetISa argument.
Fix the return_at_end verifier tests which no longer require an ISA
specified. The signle "set return_at_end" flag setting now makes it to
the verifier even when no ISA is present to carry it.
* Make passes assert their dependencies consistently.
This avoids ambiguity about whose responsibility it is to run
to compute cfg, domtree, and loop_analysis data.
* Reset the `valid` flag in DominatorTree's `clear()`.
* Remove the redundant assert from DominatorTree::with_function.
* Remove the message strings from obvious asserts.
This avoids having them spill out into multiple lines.
* Refactor calls to `compute` on `Context` objects into helper functions.
Also, move flowgraph() calls out of filetest and into the passes that
need them so that filetest doesn't have embedded knowledge of these
dependencies.
This resolves a TODO about the way Context was running the verifier, and
it makes the Context functions and the filetest runners more transparent.
This also fixes simple_gvn to use the existing dominator tree rather
than computing its own.
This also moves the calls to it out of Context and into the passes that
actually need it, so that Context's functions don't have any logic of
their own.
The legalizer can invalidate the dominator tree, but we don't actually
need a dominator tree during legalization, so defer the construction of
the domtree.
- Add an "invalid" state to the dominator tree along with clear() and
is_valid() methods to test it.
- Invalidate the dominator tree as part of legalization.
- Ensure that a valid dominator tree exists before the passes that need
it.
Together these features add up to a manual invalidation mechanism for
the dominator tree.
Future legalization patterns will have the ability to mutate the
flowgraph, so the domtree's list of RPO blocks is not a good guide for
iteration. Use the layout order instead. This will pick up any new EBBs
inserted.
This will compute the stack frame layout as appropriate for the
function's calling convention and insert prologue and epilogue code.
The default implementation is not useful, each target ISA will need to
override this function.
This function will emit the binary machine code into contiguous raw
memory while sending relocations to a RelocSink.
Add a MemoryCodeSink for generating machine code directly into memory
efficiently. Allow the TargetIsa to provide emit_function
implementations that are specialized to the MemoryCodeSink type to avoid
needless small virtual callbacks to put1() et etc.
This is the main entry point to the code generator. It returns the
computed size of the functions code.
Also add a 'test compile' command which runs the whole code generation
pipeline.
* Replace a single-character string literal with a character literal.
* Use is_some() instead of comparing with Some(_).
* Add code-quotes around type names in comments.
* Use !...is_empty() instead of len() != 0.
* Tidy up redundant returns.
* Remove redundant .clone() calls.
* Remove unnecessary explicit lifetime parameters.
* Tidy up unnecessary '&'s.
* Add parens to make operator precedence explicit.
* Use debug_assert_eq instead of debug_assert with ==.
* Replace a &Vec argument with a &[...].
* Replace `a = a op b` with `a op= b`.
* Avoid unnecessary closures.
* Avoid .iter() and .iter_mut() for iterating over containers.
* Remove unneeded qualification.
Switch to the new domtree.cfg_postorder() which returns a reference to a
pre-computed post-order instead of allocating memory and computing a new
post-order.
* LICM pass
* Uses loop analysis to detect loop tree
* For each loop (starting with the inner ones), create a pre-header and move there loop-invariant instructions
* An instruction is loop invariant if it does not use as argument a value defined earlier in the loop
* File tests to check LICM's correctness
* Optimized pre-header creation
If the loop already has a natural pre-header, we use it instead of creating a new one.
The natural pre-header of a loop is the only predecessor of the header it doesn't dominate.
* Implemented in two passes
* First pass discovers the loops headers (they dominate one of their predecessors)
* Second pass traverses the blocks of each loop
* Discovers the loop tree structure
* Offers a new LoopAnalysis data structure queried from outside the module
* Skeleton simple_gvn pass.
* Basic testing infrastructure for simple-gvn.
* Add can_load and can_store flags to instructions.
* Move the replace_values function into the DataFlowGraph.
* Make InstructionData derive from Hash, PartialEq, and Eq.
* Make EntityList's hash and eq functions panic.
* Change Ieee32 and Ieee64 to store u32 and u64, respectively.
This means that we can verify the basics with verify_context before
moving on to verifying the liveness information.
Live ranges are now verified immediately after computing them and after
register allocation is complete.
The liveness verifier will check that the live ranges are consistent
with the function. It runs as part of the register allocation pipeline
when enable_verifier is set.
The initial implementation checks the live ranges, but not the
ISA-specific constraints and affinities.
The test drivers can stop calling comp_ctx.verify because legalize() and
regalloc() do it themselves now.
This also makes it possible for those two passes to return other
CtonError codes in the future, not just verifier errors.
Run the verify_contexti() function after invoking the legalize() and
regalloc() context functions. This will help catch bad code produced by
these passes.
Legalizing some instructions may require modifications to the control
flow graph, and some operations need to use the CFG analysis.
The CFG reference is threaded through all the legalization functions to
reach the generated expansion functions as well as the legalizer::split
module where it will be used first.
Move the flow graph computation into a compute method which can be
called with multiple functions.
This allows us to reuse the ControlFlowGraph memory and keep an instance
in the Context.
This will provide main entry points for compiling functions, and it
serves as a place for keeping data structures that should be preserved
between function compilations to reduce allocator thrashing.
So far, Context is just basic scaffolding. More to be added.