Commit Graph

262 Commits

Author SHA1 Message Date
Dan Gohman
0b5bb313cb Mark CondCode's functions #[must_use].
It's easy to forget whether they mutate the value in place or return a
new value. Marking them #[must_use] will catch cases where they are used
incorrectly.
2018-03-27 14:16:55 -07:00
Dan Gohman
a661a8a9bb Factor out common ways to call encode from a dfg or func. 2018-03-27 11:53:03 -07:00
Dan Gohman
685cde98a4 Mark loads from globals aligned and notrap.
Mark loads from globals generated by cton_wasm or by legalization as
`aligned` and `notrap`, since memory for these globals should be
allocated by the runtime environment for that purpose. This reduces
the number of potentially trapping instructions, which can reduce
the amount of metadata required by embedding environments.
2018-03-26 21:21:54 -07:00
Dan Gohman
51eea8b89d Add some comments noting unimplemented features.
While use of these features will trigger an `unimplemented!()`, it is
nice to let users know in advance about features which aren't yet
implemented.
2018-03-23 12:46:23 -07:00
Pat Hickey
03ee007624 Use clippy (#276)
* cton-util: fix some clippy unnecessary pass-by-value warnings

* clippy: ignore too many arguments / cyclomatic complexity in module

since these functions are taking args coming from the command line, i
dont think this is actually a valid lint, morally the arguments are all
from one structure

* cton-util: take care of remaining clippy warnings

* cton-reader: fix all non-suspicious clippy warnings

* cton-reader: disable clippy at site of suspicious lint

* cton-frontend: disable clippy at the site of an invalid lint

* cton-frontend: fix clippy warnings, or ignore benign ones

* clippy: ignore the camelcase word WebAssembly in docs

* cton-wasm: fix clippy complaints or ignore benign ones

* cton-wasm tests: fix clippy complaints

* cretonne: starting point turns off all clippy warnings

* cretonne: clippy fixes, or lower allow() to source of problem

* cretonne: more clippy fixes

* cretonne: fix or disable needless_lifetimes lint

this linter is buggy when the declared lifetime is used for another type
constraint.

* cretonne: fix clippy complaint about Pass::NoPass

* rustfmt

* fix prev minor api changes clippy suggested

* add clippy to test-all

* cton-filetests: clippy fixes

* simplify clippy reporting in test-all

* cretonne: document clippy allows better

* cretonne: fix some more clippy lints

* cretonne: fix clippy lints (mostly doc comments)

* cretonne: allow all needless_lifetimes clippy warnings

remove overrides at the false positives

* rustfmt
2018-03-22 13:10:41 -07:00
Dan Gohman
2b3df1a506 Add use declarations for std features.
Merge the `use` parts of the `no_std` branch. This reduces the diffs
between master and the `no_std` branch, making it easier to maintain.

Most of these changes are derived from patches by @lachlansneff in
https://github.com/Cretonne/cretonne/tree/no_std.
2018-03-20 14:07:05 -07:00
Dan Gohman
492fa1283c Define an "interrupt" trap code.
This is a trap code for interrupting the running code, to allow
timeouts and safepoints to be implemented. It is resumable.
2018-03-18 14:54:05 -07:00
Dan Gohman
cc8d6400f4 Rename builder.rs to inst_builder.rs.
This reflects its purpose, to define the `InstBuilder` trait.
2018-03-14 10:51:09 -07:00
Dan Gohman
3afe85ff17 Auto-generate InstructionData.
The meta description has all the information to generate the `InstructionData`
enum, so generate it rather than having a manually-maintained copy.
2018-03-14 10:51:09 -07:00
Dan Gohman
11eddafef8 Avoid using floating-point values in expand_fcvt_to_sint.
Compute the bound values for expand_fcvt_to_sint using bitwise integer
arithmetic rather than floating-point arithmetic, to avoid relying on
host floating point arithmetic.
2018-03-12 12:50:58 -07:00
Dan Gohman
30f8daa9d6 Replace assert! with debug_assert! in production code paths.
This allows the assertions to be disabled in release builds, so that
the code is faster and smaller, at the expense of not performing the
checks. Assertions can be re-enabled in release builds with the
debug-assertions flag in Cargo.toml, as the top-level Cargo.toml
file does.
2018-03-12 12:38:30 -07:00
Dan Gohman
13b167770c Include emergency stack slots when laying out the stack.
Emergency stack slots are a new kind of stack slot added relatively
recently. They need to be allocated a stack offset just like explicit
and spill slots.

Also, make StackSlotData's offset field an Option, to catch problems
like this in the future. Previously the value 0 was used when offsets
weren't assigned yet, however that made it non-obvious when the field
meant "not assigned yet" and when it meant "assigned the value 0".
2018-03-04 21:34:49 -08:00
Dan Gohman
5dc449ec9e Rename "local variables" to "explicit stack slots".
The term "local variables" predated the SSA builder in the front-end
crate, which also provides a way to implement source-language local
variables. The name "explicit stack slot" makes it clear what this
construct is.
2018-02-28 14:04:28 -08:00
Dan Gohman
a5b00b173e Don't renumber entities in the parser.
This makes it easier to debug testcases:
 - the entity numbers in a .cton file match the entity numbers used
   within Cretonne.
 - serializing and deserializing doesn't cause indices to change.

One disadvantage is that if a .cton file uses sparse entity numbers,
deserializing to the in-memory form doesn't compact it. However, the
text format is not intended to be performance-critical, so this isn't
expected to be a big burden.
2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen
b9b1d0fcd5 Add a trapff instruction.
This is the floating point equivalent of trapif: Trap when a given
condition is in the floating-point flags.

Define Intel encodings comparable to the trapif encodings.
2018-02-20 14:35:41 -08:00
Dan Gohman
21215529fe Simplify the code in make_inst_results_reusing; NFC. 2018-02-20 12:15:23 -08:00
Jakob Stoklund Olesen
11c721934c Add a trapif instruction.
This is a conditional trap controlled by integer CPU flags.
Compare to brif.
2018-02-08 14:40:46 -08:00
Julian Seward
6f8a54b6a5 Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets.
Changes:

* Adds a new generic instruction, SELECTIF, that does value selection (a la
  conditional move) similarly to existing SELECT, except that it is
  controlled by condition code input and flags-register inputs.

* Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and
  nothing else.

* Adds new Intel x86_64 instructions BSR and BSF.

* Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets
  using the new BSR, BSF and SELECTIF instructions.

* Implements SELECTIF on x86_64 targets using conditional-moves.

* new test filetests/isa/intel/baseline_clz_ctz_popcount.cton
  (for legalization)

* new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton
  (for encoding)

* Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified
  Rust without rustc complaining.

Fixes #238.
2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen
e3714ddd10 Add a func.inst_offsets() iterator.
This Function method can be used after the final code layout has been
computed. It returns all the instructions in an EBB along with their
encoded size and offset from the beginning of the function.

This is useful for extracting additional metadata about trapping
instructions and other things that may be needed by a VM.
2018-02-01 17:19:05 -08:00
Jakob Stoklund Olesen
416b21c18d Pairwise virtual register coalescing.
Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.

After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.

The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.

This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.

Example 1, the Python interpreter which contains a very large function
with a lot of variables.

Before:
  15.664    0.011  Register allocation
   1.535    1.535  RA liveness analysis
   2.872    1.911  RA coalescing CSSA
   4.436    4.436  RA spilling
   2.610    2.598  RA reloading
   4.200    4.199  RA coloring

After:
   9.795    0.013  Register allocation
   1.372    1.372  RA liveness analysis
   6.231    6.227  RA coalescing CSSA
   0.712    0.712  RA spilling
   0.598    0.598  RA reloading
   0.869    0.869  RA coloring

Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.

Example 2, the clang compiler.

Before:
  57.148    0.035  Register allocation
   9.630    9.630  RA liveness analysis
   7.210    7.169  RA coalescing CSSA
   9.972    9.972  RA spilling
  11.602   11.572  RA reloading
  18.698   18.672  RA coloring

After:
  64.792    0.042  Register allocation
   8.630    8.630  RA liveness analysis
  22.937   22.928  RA coalescing CSSA
   8.684    8.684  RA spilling
   9.559    9.551  RA reloading
  14.939   14.936  RA coloring

Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.

Most examples are less extreme than these two. They just get better code
at about the same compile time.
2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen
085e228358 Remove assertions from the hot value_def() function.
The errors caught by these assertions are also detected by the IL
verifier.

Speed up compilation by 13%.
2018-01-22 11:31:21 -08:00
Jakob Stoklund Olesen
91343f208d Fix quadratic behavior in sequence numbering.
The ir::layout module is assigning sequence numbers to all EBBs and
instructions so relative positions can be computed in constant time.
This works a lot like BASIC line numbers where we initially use numbers
10, 20, 30, ... so we can insert new instructions in the middle of the
sequence without renumbering everything.

In some cases where the coalescer is misbehaving and inserting a lot of
copy instructions, we end up having to renumber a larger and larger
number of instructions to make space in the sequence. This causes the
following reload pass to be very slow, spending most of its time
renumbering instructions.

Fix this by putting an upper limit on the number of instructions we're
willing to renumber locally. When the limit is reached, switch to a full
function renumbering with the major stride of 10. This gives us new
elasticity in the sequence numbers.

- Time to compile the Python interpreter in #229 drops from 4826 s -> 15.8 s.
- The godot benchmark in #226 drops from 1257 s -> 75 s.
- The AngryBots1 benchmark does not have the coalescer misbehavior.
  Its compilation time changes 22.9 s -> 23.1 s.

It's worth noting that the sequence numbering is still technically
quadratic with this fix. The system is not designed to handle a large
number of instructions inserted in a single location. It expects a more
even distribution of new instructions.

We still need to fix the coalescer. It should not insert so many copies
in degenerate cases.
2018-01-22 09:42:26 -08:00
Jakob Stoklund Olesen
d1f236b00a Reimplement coalescer following the Budimlic paper.
The old coalescing algorithm had some algorithmic complexity issues when
dealing with large virtual registers. Reimplement to use a proper
union-find algorithm so we only need one pass through the dominator
forests for virtual registers that are interference free.

Virtual registers that do have interference are split and new registers
built.

This pass is about twice as fast as the old one when dealing with
complex virtual registers.
2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen
af89006b09 Fix some markdown issues.
Work around some cases where the old markdown parser differs from the
new Pulldown parser for the documentation.
2018-01-08 16:19:16 -08:00
Jakob Stoklund Olesen
362a4bdc4c Add well-known names for runtime library functions.
Add a LibCall type which represents runtime library functions that many
be synthesized by Cretonne from pure instructions.

Add a LibCall variant to ExternalName to represent one of these runtime
functions.
2017-12-07 17:50:22 -08:00
Tyler McMullen
ced39f5186 Fix up adjust_sp_imm instruction.
* Use imm64 rather than offset32
* Add predicate to enforce signed 32-bit limit to imm
* Remove AdjustSpImm format
* Add encoding tests for adjust_sp_imm
* Adjust use of adjust_sp_imm in Intel prologue_epilogue to match
2017-12-05 11:49:12 -08:00
Tyler McMullen
ffab87318e Add adjust_sp_imm instruction. Note: This enables using rsp and rbp as normal registers. Which is... wrong. 2017-12-05 11:49:12 -08:00
Tyler McMullen
cdf70ccb77 Add copy_special instruction. 2017-12-05 11:49:12 -08:00
Tyler McMullen
8ed37e352e Add x86_push and x86_pop instructions. 2017-12-05 11:49:12 -08:00
Pat Hickey
9108725b74 extname: extend testcases to 16 chars
this is sufficient for all filetests to pass without truncating
2017-11-23 14:08:47 -08:00
Pat Hickey
894268233e ir::ExternalName representation and constructors rewritten 2017-11-23 14:08:47 -08:00
Jakob Stoklund Olesen
2e0b931590 Provide an fmt::Debug impl for entity references.
Instead of deriving a Debug impl: Ebb(45), use the Display version for
Debug too: ebb45.

This is more readable, and no information is lost.
2017-11-21 10:01:55 -08:00
Dan Gohman
ced7a88ecc Use consistent formatting for module-level comments. 2017-11-08 14:49:27 -08:00
Dan Gohman
e213c2654f Fix branch_destination/analyze_branch for BranchInt/BranchFloat. 2017-11-08 10:58:03 -08:00
Dan Gohman
889b06fd16 Replace as casts with type-conversion functions. 2017-11-08 10:48:44 -08:00
Dan Gohman
3ab4349c1b Use Self instead of repeating the type name. 2017-11-08 10:43:11 -08:00
Dan Gohman
d7796cbf25 Suppress an unused-import warning on AsciiExt.
See

https://users.rust-lang.org/t/psa-dealing-with-warning-unused-import-std-ascii-asciiext-in-today-s-nightly/13726

for details.
2017-11-08 10:40:29 -08:00
Dan Gohman
5dda19035d Implement Hash for ExternalName.
This allows users to have HashMaps etc. with ExternalNames as keys.
2017-11-06 11:34:09 -08:00
Dan Gohman
d9743290ea Elide elidable lifetime parameters. 2017-11-06 11:09:56 -08:00
Dan Gohman
acc6d941a3 Combine redundant match arm bodies. 2017-11-03 16:40:51 -07:00
Dan Gohman
5d063eb8bc Merge reloc_func and reloc_globalsym into reloc_external. 2017-10-31 12:26:33 -07:00
Dan Gohman
6fc45b070a Add a new kind of GlobalVar for symbolic addresses.
These addresses will allow referencing C/C++/Rust-style global variables
by name directly.
2017-10-30 13:26:56 -07:00
Dan Gohman
c2665385b1 Rename FunctionName to ExternalName. 2017-10-30 13:26:56 -07:00
Dan Gohman
bbdce7e3e0 Update the comment for 'notrap()'. 2017-10-24 10:39:01 -07:00
Jakob Stoklund Olesen
b3fb41087e Use the term "Function parameter" instead of "argument".
Rename the ArgumentType type to AbiParam since it describes the ABI
characteristics of a parameter or return value, not just the value type.

In Signature, rename members argument_types and return_types to "params"
and "returns". Again, they are not just types.

Fix a couple lingering references to "EBB arguments".
2017-10-19 17:39:23 -07:00
Jakob Stoklund Olesen
921bcc6c25 Use the term "EBB parameter" everywhere.
Add EBB parameter and EBB argument to the langref glossary to clarify
the distinction between formal EBB parameter values and arguments passed
to branches.

- Replace "ebb_arg" with "ebb_param" in function names that deal with
  EBB parameters.
- Rename the ValueDef variants to Result and Param.
- A bunch of other small langref fixes.

No functional changes intended.
2017-10-19 16:17:09 -07:00
Jakob Stoklund Olesen
57b81a179e Move the CursorBase trait into the cursor module.
Also move the CursorPosition type into the cursor module.

Move layout::cursor into the tests module as LayoutCursor and remove its
ability to insert instructions via the dfg.ins() method. This cursor
type is only used in the layout unit tests now.

The FuncCursor and EncCursor types are the commonly used cursors now.
2017-10-19 12:15:43 -07:00
Jakob Stoklund Olesen
b948de1693 Add a verifier pass for CPU flags.
Only one CPU flags value can be live at a time, and some instructions
clobber the flags.
2017-10-18 15:07:19 -07:00
Jakob Stoklund Olesen
1f98fc491c Add instructions using CPU flags.
Add integer and floating comparison instructions that return CPU flags:
ifcmp, ifcmp_imm, and ffcmp.

Add conditional branch instructions that check CPU flags: brif, brff

Add instructions that check a condition in the CPU flags and return a
b1: trueif, trueff.
2017-10-12 19:12:28 -07:00
Jakob Stoklund Olesen
15461c1e4b Add two new value types: iflags and fflags.
These two value types represent the state of CPU flags after an integer
comparison and a floating point comparison respectively.

Instructions using these types TBD.
2017-10-12 19:05:24 -07:00