* Add a pre-opt optimization to change constants into immediates.
This converts 'iadd' + 'iconst' into 'iadd_imm', and so on.
* Optimize away redundant `bint` instructions.
Cretonne has a concept of "Testable" values, which can be either boolean
or integer. When the an instruction needing a "Testable" value receives
the result of a `bint`, converting boolean to integer, eliminate the
`bint`, as it's redundant.
* Postopt: Optimize using CPU flags.
This introduces a post-legalization optimization pass which converts
compare+branch sequences to use flags values on CPUs which support it.
* Define a form of x86's `urm` that doesn't clobber FLAGS.
movzbl/movsbl/etc. don't clobber FLAGS; define a form of the `urm`
recipe that represents this.
* Implement a DCE pass.
This pass deletes instructions with no side effects and no results that
are used.
* Clarify ambiguity about "32-bit" and "64-bit" in comments.
* Add x86 encodings for icmp_imm.
* Add a testcase for postopt CPU flags optimization.
This covers the basic functionality of transforming compare+branch
sequences to use CPU flags.
* Pattern-match irsub_imm in preopt.
* cton-util: fix some clippy unnecessary pass-by-value warnings
* clippy: ignore too many arguments / cyclomatic complexity in module
since these functions are taking args coming from the command line, i
dont think this is actually a valid lint, morally the arguments are all
from one structure
* cton-util: take care of remaining clippy warnings
* cton-reader: fix all non-suspicious clippy warnings
* cton-reader: disable clippy at site of suspicious lint
* cton-frontend: disable clippy at the site of an invalid lint
* cton-frontend: fix clippy warnings, or ignore benign ones
* clippy: ignore the camelcase word WebAssembly in docs
* cton-wasm: fix clippy complaints or ignore benign ones
* cton-wasm tests: fix clippy complaints
* cretonne: starting point turns off all clippy warnings
* cretonne: clippy fixes, or lower allow() to source of problem
* cretonne: more clippy fixes
* cretonne: fix or disable needless_lifetimes lint
this linter is buggy when the declared lifetime is used for another type
constraint.
* cretonne: fix clippy complaint about Pass::NoPass
* rustfmt
* fix prev minor api changes clippy suggested
* add clippy to test-all
* cton-filetests: clippy fixes
* simplify clippy reporting in test-all
* cretonne: document clippy allows better
* cretonne: fix some more clippy lints
* cretonne: fix clippy lints (mostly doc comments)
* cretonne: allow all needless_lifetimes clippy warnings
remove overrides at the false positives
* rustfmt
This allows us to run the tests via a library call rather than just
as a command execution. And, it's a step toward a broader goal, which
is to keep the code in the top-level src directory minimal, with
important functionality exposed as crates.
Refactor the filetests harness so that it can be run as part of
`cargo test`. And begin reorganizing the test harness code in preparation
for moving it out of the src directory.
- Test subcommand files are now named `test_*.rs`.
- cton-util subcommand files now just export their `run` and nothing else.
- src/filetest/mod.rs now also just exports `run` and nothing else.
- Tests are now run in release mode (with debug assertions enabled).
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.
* div/rem by constant powers of two are turned into right shifts, plus some
fixups for the signed cases.
* div/rem by constant non-powers of two are turned into double length
multiplies by a magic constant, plus some fixups involving shifts,
addition and subtraction, that depends on the constant, the word size and
the signedness involved.
* The following cases are transformed: div and rem, signed or unsigned, 32
or 64 bit. The only un-transformed cases are: unsigned div and rem by
zero, signed div and rem by zero or -1.
* This is all incorporated within a new transformation pass, "preopt", in
lib/cretonne/src/preopt.rs.
* In preopt.rs, fn do_preopt() is the main driver. It is designed to be
extensible to transformations of other kinds of instructions. Currently
it merely uses a helper to identify div/rem transformation candidates and
another helper to perform the transformation.
* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
cases where the second arg is an immediate, and cases where the second
arg is an identifier bound to an immediate at its definition point.
* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
transformation proper. It in turn uses magic{S,U}{32,64} to calculate the
magic numbers required for the transformations.
* There are many test cases for the transformation proper:
filetests/preopt/div_by_const_non_power_of_2.cton
filetests/preopt/div_by_const_power_of_2.cton
filetests/preopt/rem_by_const_non_power_of_2.cton
filetests/preopt/rem_by_const_power_of_2.cton
filetests/preopt/div_by_const_indirect.cton
preopt.rs also contains a set of tests for magic number generation.
* The main (non-power-of-2) transformation requires instructions that return
the high word of a double-length multiply. For this, instructions umulhi
and smulhi have been added to the core instruction set. These will map
directly to single instructions on most non-intel targets.
* intel does not have an instruction exactly like that. For intel,
instructions x86_umulx and x86_smulx have been added. These map to real
instructions and return both result words. The intel legaliser will
rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
word. Tests:
filetests/isa/intel/legalize-mulhi.cton (new file)
filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
Cretonne clients don't need to know how the register allocator works.
Export the RegDiversions type from the binemit module instead. It is
used by the "test binemit" driver.
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.
The --time-passes command line option prints a pass timing report to
stdout.
Add new ordered set and map data structures based on B+-trees. These are
not general-purpose data structures like the BTreeSet and BTreeMap types
in the standard library. They are specialized for:
- Keys and values are small `Copy` types, optimized for 32-bit entities.
- Each set or map has a very small footprint, using only 32 bits of
memory when empty.
- Keys are compared using a borrowed comparator object which can provide
context for comparing tiny types that don't contain enough information
to implement `Ord`.
- A whole forest of B-trees can be cleared in constant time without
having to traverse the whole data structure.
The register allocator doesn't even try to compile unreachable EBBs, so
any values defined in such blocks won't be assigned registers.
Since the dominator tree already has determined which EBBs are
reachable, we should just eliminate any unreachable blocks instead o
trying to do something with the dead code.
Not that this is not a "dead code elimination" pass which would also
remove individual instructions whose results are not used.
Add two new arguments:
- table_index is the WebAssembly table referenced in the indirect call.
- sig_index is the WebAssembly signature index. We still have the SigRef
that was created by make_indirect_sig(), but the WebAssembly signature
index may be needed for detecting type mismatches at runtime.
Change the insertion location to a plain FuncCursor rather than a
FunctionBuilder<Local>. The fact that cretonne-wasm uses FunctionBuilder
should be an implementation detail, and the callbacks don't need to
access WebAssembly locals, so they don't need the extended interface.
Add a FunctionBuilder::cursor() method which creates a FuncCursor for
inserting instructions in the current EBB.
Also add a FuncEnvironment::translate_call() method which allows the
environment to override direct calls the same way as indirect calls.
The new PrimaryMap replaces the primary EntityMap and the PrimaryEntityData
marker trait which was causing some confusion. We now have a clear
division between the two types of maps:
- PrimaryMap is used to assign entity numbers to the primary data for an
entity.
- EntityMap is a secondary mapping adding additional info.
The split also means that the secondary EntityMap can now behave as if
all keys have a default value. This means that we can get rid of the
annoying ensure() and get_or_default() methods ther were used everywhere
instead of indexing. Just use normal indexing now; non-existent keys
will return the default value.
Add a new cursor module and define an EncCursor data type in it. An
EncCursor is a cursor that inserts instructions with a valid encoding
for the ISA. This is useful for passes generating code after
legalization.
Implement a builder interface via the new InstInserterBase trait such
that the EncCursor builders support with_result().
Use EncCursor in coalescing.rs instead of the layout cursor as a proof
of concept.
This is trying to keep algorithms out if the ir module which deals with
the intermediate representation.
Also give the layout_stack() function a Result return value so it can
report a soft error when the stack frame is too large instead of
asserting. Since local variables can be arbitrarily large, it is easy
enough to overflow the stack with even a small function.
* API and data structures proposal for the SSA construction module
* Polished API and implemented trivial functions
* API more explicit, Variable now struct parameter
* Sample test written to see how the API could be used
* Implemented local value numbering for SSABuilder
* Implemented SSA within a single Ebb
* Unfinished unoptimized implementation for recursive use and seal
* Working global value numbering
The SSABuilder now create ebb args and modifies jump instructions accordingly
* Updated doc and improved branch argument modifying.
Removed instructions::branch_arguments and instructions::branch_argument_mut
* SSA building: bugfix, asserts and new test case
Missing a key optimization to remove cycles of Phi
* SSA Building: small changes after code review
Created helper function for seal_block (which now contains sanity checks)
* Optimization: removed useless phis (ebb arguments)
Using pessimistic assumption that when using a non-def variable in an unsealed block we create an ebb argument which is removed when sealing if we detect it as useless
Using aliases to avoid rewriting variables
* Changed the semantics of remove_ebb_arg and turned it into a proper API method
* Adapted ssa branch to changes in the DFG API
* Abandonned SparseMaps for EntityMaps, added named structure for headr block data.
* Created skeletton for a Cretonne IL builder frontend
* Frontend IL builder: first draft of implementation with example of instruction methods
* Working basic implementation of the frontend
Missing handling of function arguments and return values
* Interaction with function signature, sample test, more checks
* Test with function verifier, seal and fill sanity check
* Implemented python script to generate ILBuilder methods
* Added support for jump tables and stack slot
* Major API overhaul
* No longer generating rust through Python but implements InstBuilder
* No longer parametrized by user's blocks but use regular `Ebb`
* Reuse of allocated memory via distinction between ILBuilder and FunctionBuilder
* Integrate changes from StackSlot
* Improved error message
* Added support for jump arguments supplied by the user
* Added an ebb_args proxy method needed
* Adapted to Entity_ref splitted into a new module
* Better error messages and fixed tests
* Added method to change jump destination
* We whould be able to add unreachable code
* Added inst_result proxy to frontend
* Import support
* Added optimization for SSA construction:
If multiple predecessors but agree on value don't create EBB argument
* Move unsafe and not write-only funcs apart, improved doc
* Added proxy function for append_ebb_arg
* Support for unreachable code and better layout of the Ebbs
* Fixed a bug yielding an infinite loop in SSA construction
* SSA predecessors lookup code refactoring
* Fixed bug in unreachable definition
* New sanity check and display debug function
* Fixed bug in verifier and added is_pristine ;ethod for frontend
* Extended set of characters printable in function names
To be able to print names of functions in test suite
* Fixes and improvements of SSA construction after code review
* Bugfixes for frontend code simplification
* On-the-fly critical edge splitting in case of br_table with jump arguments
* No more dangling undefined values, now attached as EBB args
* Bugfix: only split corresponding edges on demand, not all br_table edges
* Added signature retrieval method
* Bugfix for critical edge splitting not sealing the ebbs it created
* Proper handling of SSA side effects by the frontend
* Code refactoring: moving frontend and SSA to new crate
* Frontend: small changes and bugfixes after code review
* Convert TypeSet fields to sets; Add BitSet<T> type to rust; Encode ValueTypeSets using BitSet; (still need mypy cleanup)
* nits
* cleanup nits
* forgot mypy type annotations
* rustfmt fixes
* Round 1 comments: filer b2, b4; doc comments in python; move bitset in its own toplevel module; Use Into<u32>
* fixes
* Revert comment to appease rustfmt
The EntityRef trait is used by more than just the EntityMap now, so it
should live in its own module.
Also move the entity_impl! macro into the new module so it can be used
for defining new entity references anywhere.
* LICM pass
* Uses loop analysis to detect loop tree
* For each loop (starting with the inner ones), create a pre-header and move there loop-invariant instructions
* An instruction is loop invariant if it does not use as argument a value defined earlier in the loop
* File tests to check LICM's correctness
* Optimized pre-header creation
If the loop already has a natural pre-header, we use it instead of creating a new one.
The natural pre-header of a loop is the only predecessor of the header it doesn't dominate.
* Implemented in two passes
* First pass discovers the loops headers (they dominate one of their predecessors)
* Second pass traverses the blocks of each loop
* Discovers the loop tree structure
* Offers a new LoopAnalysis data structure queried from outside the module
* Skeleton simple_gvn pass.
* Basic testing infrastructure for simple-gvn.
* Add can_load and can_store flags to instructions.
* Move the replace_values function into the DataFlowGraph.
* Make InstructionData derive from Hash, PartialEq, and Eq.
* Make EntityList's hash and eq functions panic.
* Change Ieee32 and Ieee64 to store u32 and u64, respectively.
Compute exact EBB header offsets and check that branches are in range.
Not implemented yet: Relax branches that are not in range.
Invoke the relax_branches() pass from the 'test binemit' file tests so
they can verify the proper encoding of branch instructions too.
When the CRETONNE_DBG environment variable is set, send debug messages
to a file named cretonne.dbg.*.
The trace facility is only enabled when debug assertions are on.
Add an abi module with code that is probably useful to all ISAs when
implementing this function.
Add a unit() method to RegClassData which can be used to index the
register units in a class.
This will provide main entry points for compiling functions, and it
serves as a place for keeping data structures that should be preserved
between function compilations to reduce allocator thrashing.
So far, Context is just basic scaffolding. More to be added.
This implements the classic Briggs/Torczon sparse set construct.
Adapt it to our existing EntityRef infrastructure so we can use types
keys instead of just integers like the original paper does.
Also provide a SparseSet<T> type alias which implements a sparse set of
entity refeences. This is actually closer to what the original paper
describes.
We will track live ranges separately for each SSA value, rather than per
virtual register like LLVM does.
This is the basis for a register allocator, so place it in a new
regalloc module.
Give these crates each a more standard directory layout with sources in
a 'src' sub-sirectory and Cargo.toml in the top lib/foo directory.
Add license and description fields to each.
The build script for the cretonne crate now lives in
'lib/cretonne/build.rs' separating it from the normal library sources
under 'lib/cretonne/src'.