Commit Graph

1293 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
ec746c3359 Add a publish-all.sh script.
Set identical version numbers on all cretonne-* crates and print "cargo
publish" commands.

Update all crates to version 0.1.0.
2018-02-23 12:51:14 -08:00
Benjamin Bouvier
f02c8fd1ff Fixes #244: Prints the generated code size and wasm bytecode size in wasm command; 2018-02-23 08:46:09 -08:00
Dan Gohman
c7655c4928 Add minimal README.md files to published crates.
This will put descriptions on the packages' crates.io pages.
2018-02-21 20:50:03 -08:00
Dan Gohman
a55a582d4e Suppress printing of placeholder external functions.
With the change to the parser to preserve indices, it now inserts
placeholders to pad out index spaces as needed. Placeholder functions
use reserved signature indices, so skip them when writing them out,
to avoid writing them out as "sig4294967295".
2018-02-21 13:10:58 -08:00
Dan Gohman
20500913ad Change tabs to spaces, for consistency with rustfmt.
This code is in a macro, which is presumably why rustfmt itself doesn't
reformat it.
2018-02-21 12:32:26 -08:00
Jakob Stoklund Olesen
043d80f7e1 Make the bitset module private.
This is just an implementation detail.
2018-02-21 12:14:50 -08:00
Jakob Stoklund Olesen
eae8261b07 Make the regalloc module private.
Cretonne clients don't need to know how the register allocator works.

Export the RegDiversions type from the binemit module instead. It is
used by the "test binemit" driver.
2018-02-21 12:11:50 -08:00
Jakob Stoklund Olesen
2f58c371bc Make specific ISA sub-modules private.
We don't want ISA-specific details exposed in the public Cretonne APIs.
2018-02-21 12:06:58 -08:00
Jakob Stoklund Olesen
8d388b2218 Fix stack pointer offsets for outgoing arguments.
StackSlotKind::OutgoingArg stack slots have an offset that is relative
to our own stack pointer, while all other stack slot kinds have offsets
that are relative to the caller's stack pointer.

Make sure we generate the right sp-relative offsets for outgoing
arguments too.
2018-02-21 10:34:41 -08:00
Dan Gohman
69cc6c8af3 Clean up more comments referring to "source numbers". 2018-02-20 17:27:46 -08:00
Dan Gohman
a5b00b173e Don't renumber entities in the parser.
This makes it easier to debug testcases:
 - the entity numbers in a .cton file match the entity numbers used
   within Cretonne.
 - serializing and deserializing doesn't cause indices to change.

One disadvantage is that if a .cton file uses sparse entity numbers,
deserializing to the in-memory form doesn't compact it. However, the
text format is not intended to be performance-critical, so this isn't
expected to be a big burden.
2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen
c846ec1626 Catch NaN explicitly in expand_fcvt_to_uint().
When the input is a NaN, we need to generate a different trap code, so
use the new trapff instruction to generate such a trap after the first
floating point comparison.
2018-02-20 14:44:57 -08:00
Jakob Stoklund Olesen
b9b1d0fcd5 Add a trapff instruction.
This is the floating point equivalent of trapif: Trap when a given
condition is in the floating-point flags.

Define Intel encodings comparable to the trapif encodings.
2018-02-20 14:35:41 -08:00
Jakob Stoklund Olesen
1e56d44465 Remember the positive numbers in expand_fcvt_to_sint.
We can get an INT_MIN result when converting a positive number that is
too large to fit in an integer too.
2018-02-20 14:01:13 -08:00
Dan Gohman
21215529fe Simplify the code in make_inst_results_reusing; NFC. 2018-02-20 12:15:23 -08:00
Jakob Stoklund Olesen
ad896d9790 Add more legalization patterns for *_imm instructions.
When the imediate value is out of range for the legal encodings, convert
these instructions to an iconst followed by their register counterparts.
2018-02-20 10:47:46 -08:00
Jakob Stoklund Olesen
a6ab90f205 Legalize irsub_imm. 2018-02-16 15:50:36 -08:00
Jakob Stoklund Olesen
a9e799debb Add an avoid_div_traps setting.
This enables code generation that never causes a SIGFPE signal to be
raised from a division instruction. Instead, division and remainder
calculations are protected by explicit traps.
2018-02-16 13:10:29 -08:00
Pat Hickey
ed24320eda gen_settings: dont try to display a Preset descriptor in Flags (#241)
* gen_settings: dont try to display a Preset descriptor in Flags

Trying to display a preset doesnt make sense, and before this commit it
does not display anything meaningful - the printout just says e.g.
"haswell =\n".

The offset byte a preset descriptor isnt a valid offset into the
flag bytes, it is actually an offset into the PRESETS table. It will
cause a panic when the offset is out of bounds for the flag bytes,
which happens in the intel isa as of this commit.

* intel settings: test that display impl doesnt panic
2018-02-14 11:51:40 -08:00
Dan Gohman
42e1616b82 Update to wasmparser 0.14.1. 2018-02-13 20:14:54 -08:00
Jakob Stoklund Olesen
3ccc3f4f9b Add a stack_check instruction.
This instruction loads a stack limit from a global variable and compares
it to the stack pointer, trapping if the stack has grown beyond the
limit.

Also add a expand_flags transform group containing legalization patterns
for ISAs with CPU flags.

Fixes #234.
2018-02-13 10:48:06 -08:00
Jakob Stoklund Olesen
a73fcb2691 Pass an ISA argument to legalization functions.
This lets them look at the ISA flags.
2018-02-13 10:42:00 -08:00
Jakob Stoklund Olesen
60e70da0e6 Add Intel encodings for ifcmp_imm.
The instruction set has variants with 8-bit and 32-bit signed immediate
operands.

Add a TODO to use a TEST instruction for the special case ifcmp_imm x, 0.
2018-02-13 10:38:46 -08:00
Jakob Stoklund Olesen
788a78caf4 Add Intel encodings for ifcmp_sp.
Also generate an Into<RegUnit> implementation for the RU enums.
2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen
73c4c356c9 Add an ifcmp_sp instruction.
This will be used to implement the stack_check macro.
2018-02-09 13:59:49 -08:00
Jakob Stoklund Olesen
69f70fc61d Add Intel encodings for trapif.
This is implemented as a macro with a conditional jump over a ud2. This
way, we don't have to split up EBBs at every conditional trap.
2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen
11c721934c Add a trapif instruction.
This is a conditional trap controlled by integer CPU flags.
Compare to brif.
2018-02-08 14:40:46 -08:00
Julian Seward
6f8a54b6a5 Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets.
Changes:

* Adds a new generic instruction, SELECTIF, that does value selection (a la
  conditional move) similarly to existing SELECT, except that it is
  controlled by condition code input and flags-register inputs.

* Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and
  nothing else.

* Adds new Intel x86_64 instructions BSR and BSF.

* Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets
  using the new BSR, BSF and SELECTIF instructions.

* Implements SELECTIF on x86_64 targets using conditional-moves.

* new test filetests/isa/intel/baseline_clz_ctz_popcount.cton
  (for legalization)

* new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton
  (for encoding)

* Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified
  Rust without rustc complaining.

Fixes #238.
2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen
e3714ddd10 Add a func.inst_offsets() iterator.
This Function method can be used after the final code layout has been
computed. It returns all the instructions in an EBB along with their
encoded size and offset from the beginning of the function.

This is useful for extracting additional metadata about trapping
instructions and other things that may be needed by a VM.
2018-02-01 17:19:05 -08:00
Jakob Stoklund Olesen
429027e2f2 Stack alignment is 16 bytes on x86_64.
Make sure that the stack frame size is always a multiple of 16 as
specified by the x86_64 ABI.

Heads up @pchickey
2018-01-31 16:19:35 -08:00
Tyler McMullen
ff16583c59 Remove RSP from deref safe register class as well. 2018-01-29 14:18:08 -08:00
Pat Hickey
099b959d57 TargetIsa implies a Display of shared and isa-specific flags 2018-01-29 13:53:08 -08:00
Tyler McMullen
21f0fc39ad Further restrict Intel register classes to prevent incorrect encoding of R12 derefs. 2018-01-29 13:42:11 -08:00
Pat Hickey
3f69581d03 cretonne::Context: add for_function constructor 2018-01-25 18:14:57 -08:00
Jakob Stoklund Olesen
d56ce9e8bf Optimize DomForest::push_node().
The dominator tree pre-order is defined at the EBB granularity, but we
are looking for dominating nodes at the instruction level. This means
that we sometimes need to look higher up the DomForest stack for a
dominating node, using DominatorTree::dominates() instead of
DominatorTreePreorder::dominates().

Each dominance check involves the domtree.last_dominator() function
scanning up the dominator tree, starting from the new node that was
pushed. We can eliminate this duplicate work by exposing the
last_dominator() function to push_node().

As we are searching through nodes on the stack, maintain a last_dom
program point representing the previous return value from
last_dominator(). This way, we're only scanning the dominator tree once.
2018-01-24 15:01:58 -08:00
Jakob Stoklund Olesen
416b21c18d Pairwise virtual register coalescing.
Use a better algorithm for resolving interferences in virtual registers.
This improves code quality by generating much fewer copies on some
complicated functions.

After the initial union-find phase, the check_vreg() function uses a
Budimlic forest to check for interference between the values in the
virtual registers, as before. All the interference-free vregs are done.
Others are passed to synthesize_vreg() which dissolves the vreg and then
attempts to rebuild one or more vregs from the contained values.

The pairwise interference checks use *virtual copies* to make sure that
any future conflicts can be resolved by inserting a copy instruction.
This technique was not present in the old coalescer which caused some
correctness issues.

This coalescing algorithm makes much better code, and it is generally a
bit slower than before. Some of the slowdown is made up by the following
passes being faster because they have to process less code.

Example 1, the Python interpreter which contains a very large function
with a lot of variables.

Before:
  15.664    0.011  Register allocation
   1.535    1.535  RA liveness analysis
   2.872    1.911  RA coalescing CSSA
   4.436    4.436  RA spilling
   2.610    2.598  RA reloading
   4.200    4.199  RA coloring

After:
   9.795    0.013  Register allocation
   1.372    1.372  RA liveness analysis
   6.231    6.227  RA coalescing CSSA
   0.712    0.712  RA spilling
   0.598    0.598  RA reloading
   0.869    0.869  RA coloring

Coalescing is more than twice as slow, but because of the vastly better
code quality, overall register allocation time is improved by 37%.

Example 2, the clang compiler.

Before:
  57.148    0.035  Register allocation
   9.630    9.630  RA liveness analysis
   7.210    7.169  RA coalescing CSSA
   9.972    9.972  RA spilling
  11.602   11.572  RA reloading
  18.698   18.672  RA coloring

After:
  64.792    0.042  Register allocation
   8.630    8.630  RA liveness analysis
  22.937   22.928  RA coalescing CSSA
   8.684    8.684  RA spilling
   9.559    9.551  RA reloading
  14.939   14.936  RA coloring

Here coalescing is 3x slower, but overall regalloc time only regresses
by 13%.

Most examples are less extreme than these two. They just get better code
at about the same compile time.
2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen
b124eaf77d Add a preorder pre_cmp_def() function.
This provides a total ordering of values according to when their
definition appears in the dominator tree pre-order.
2018-01-23 14:55:21 -08:00
Jakob Stoklund Olesen
d2e786a78a Resurrect the VirtRegs::unify() function.
The coalescer needs to be able to merge virtual registers with this
function.

Updated to recycle all virtual registers.
2018-01-23 14:21:36 -08:00
Jakob Stoklund Olesen
085e228358 Remove assertions from the hot value_def() function.
The errors caught by these assertions are also detected by the IL
verifier.

Speed up compilation by 13%.
2018-01-22 11:31:21 -08:00
Jakob Stoklund Olesen
c6bb7afa07 Dampen quadratic behavior in check_cssa().
Use a dominator tree pre-order to speed up the dominance checks and only
check for interference with the nearest dominating predecessor in a
virtual register.

This makes the CSSA verification about 2x as fast.
2018-01-22 10:48:14 -08:00
Jakob Stoklund Olesen
91343f208d Fix quadratic behavior in sequence numbering.
The ir::layout module is assigning sequence numbers to all EBBs and
instructions so relative positions can be computed in constant time.
This works a lot like BASIC line numbers where we initially use numbers
10, 20, 30, ... so we can insert new instructions in the middle of the
sequence without renumbering everything.

In some cases where the coalescer is misbehaving and inserting a lot of
copy instructions, we end up having to renumber a larger and larger
number of instructions to make space in the sequence. This causes the
following reload pass to be very slow, spending most of its time
renumbering instructions.

Fix this by putting an upper limit on the number of instructions we're
willing to renumber locally. When the limit is reached, switch to a full
function renumbering with the major stride of 10. This gives us new
elasticity in the sequence numbers.

- Time to compile the Python interpreter in #229 drops from 4826 s -> 15.8 s.
- The godot benchmark in #226 drops from 1257 s -> 75 s.
- The AngryBots1 benchmark does not have the coalescer misbehavior.
  Its compilation time changes 22.9 s -> 23.1 s.

It's worth noting that the sequence numbering is still technically
quadratic with this fix. The system is not designed to handle a large
number of instructions inserted in a single location. It expects a more
even distribution of new instructions.

We still need to fix the coalescer. It should not insert so many copies
in degenerate cases.
2018-01-22 09:42:26 -08:00
Jakob Stoklund Olesen
10845995a8 Make room for 4-digit pass timings.
(sigh)
2018-01-22 07:51:25 -08:00
Jakob Stoklund Olesen
1bbc529ef9 Improve the variable ordering used by the coloring constraint solver.
The fuzzer bugs #219 and #227 are both cases where the register
allocator coloring pass "runs out of registers". What's really happening
is that the constraint solver failed to find a solution, even when one
existed.

Suppose we have three solver variables:

    v0(GPR, out, global)
    v1(GPR, in)
    v2(GPR, in, out)

And suppose registers %r0 and %r1 are available on both input and output
sides of the instruction, but only %r1 is available for global outputs.
A valid solution would be:

    v0 -> %r1
    v1 -> %r1
    v2 -> %r0

However, the solver would pick registers for the three values in
numerical order because v1 and v2 have the same domain size (=2). This
would assign v1 -> %r0 and then fail to find a free register for v2.

Fix this by prioritizing in+out variables over single-sided variables
even when their domains are equal. This means the v2 gets assigned a
register before v1, and it gets a chance to pick a register that is
still available on both in and out sides.

Also try to avoid depending on value numbers in the solver. These bugs
were hard to reproduce because a test case invariably would have
different value numbers, causing the solver to order its variables
differently and succeed. Throw in the previous solution and original
register assignments as tie breakers which are stable and not dependent
on value numbers.

This is still not a substitute for a proper solver search algorithm that
we will probably have to write eventually.

Fixes #219
Fixes #227
2018-01-19 13:31:26 -08:00
Tyler McMullen
7826fce44f On finding an static OOB heap addr, split the Ebb and recompute the CFG. 2018-01-18 15:49:10 -08:00
Tyler McMullen
850896f05e The addend for a PLTRel4 reloc should be -4. 2018-01-18 14:23:00 -08:00
Jakob Stoklund Olesen
dcad3fa339 Fix coloring bug with combined constraints and global values.
The Intel instruction "v1 = ushr v2, v2" will implicitly fix the output
register for v2 to %rcx because the output is tied to the first input
operand and the second input operand is fixed to %rcx.

Make sure we handle this transitive constraint when checking for
interference with the globally live registers.

Fixes #218
2018-01-17 15:51:08 -08:00
Jakob Stoklund Olesen
0a6500c99a Avoid making solver variables for fixed input constraints.
When the coloring pass sees an instruction with a fixed input register
constraint that is already satisfied, make sure to tell the solver
about it anyway.

There are situations where the solver wants to convert a value to a
solver variable, and we can't allow that if the same value is also used
for a fixed register operand.

Fixes #221.
2018-01-17 15:01:00 -08:00
Jakob Stoklund Olesen
13af22b46b Track register pressure for dead EBB parameters.
The spiller wasn't tracking register pressure correctly for dead EBB
parameters in visit_ebb_header(). Make sure we free any dead EBB
parameters.

Fixes #223
2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen
5463fde203 Don't print unused pressure classes. 2018-01-17 13:19:08 -08:00
Tyler McMullen
eb85aa833c Illegalize rbp/r13 for zero-offset loads on Intel x64 (#225)
* Switch RegClass to a bitmap implementation.

* Add special RegClass to remove r13 from 'ld' recipe.

* Use MASK_LEN constant instead of magic number.

* Enforce that RegClass slicing is only valid on contiguous classes.

* Use Optional[int] for RegClass optional bitmask parameter.

* Add comment explaining use of Intel ISA's GPR_NORIP register class.
2018-01-16 20:05:53 -08:00