Commit Graph

775 Commits

Author SHA1 Message Date
Dan Gohman
db2be8ee01 Verifier: Diagnose an instruction using its own result values. 2018-03-28 21:27:05 -07:00
Dan Gohman
23ab07b54e Support legalizing bconst instructions on x86. 2018-03-28 14:11:16 -07:00
Dan Gohman
79f02e42dd Use movss/movsd rather than movd/movq for floating-point loads and stores.
While there may be CPUs that have a domain crossing penalty here,
this also helps the generated code look more like the code produced
by other compilers.
2018-03-27 11:53:59 -07:00
Dan Gohman
ffe89cdc0a Rename %eflags to %rflags.
EFLAGS is a subregister of RFLAGS. For consistency with GPRs where we
use the 64-bit names to refer to the registers, use the 64-bit name for
RFLAGS as well.
2018-03-27 11:52:57 -07:00
Dan Gohman
685cde98a4 Mark loads from globals aligned and notrap.
Mark loads from globals generated by cton_wasm or by legalization as
`aligned` and `notrap`, since memory for these globals should be
allocated by the runtime environment for that purpose. This reduces
the number of potentially trapping instructions, which can reduce
the amount of metadata required by embedding environments.
2018-03-26 21:21:54 -07:00
Pat Hickey
80d2c5d9bf Implement shift-immediate encodings for x86 (#283)
* add x86 encodings for shift-immediate instructions

implements encodings for ishl_imm, sshr_imm, and ushr_imm. uses 8-bit immediates.

added tests for the encodings to intel/binary64.cton. Canonical versions
come from llvm-mc.

* translate test to use shift-immediates

* shift immediate encodings: use enc_i32_i64

and note why the regular shift encodings cant use it above

* add additional encoding tests for shift immediates

this covers 32 bit mode, and 64 bit operations in 64 bit mode.
2018-03-26 16:48:20 -07:00
Dan Gohman
136d6f5c4b Implement ireduce, sextend, and uextend between i8/i16 and i32/i64. 2018-03-05 15:13:59 -08:00
Dan Gohman
5dc449ec9e Rename "local variables" to "explicit stack slots".
The term "local variables" predated the SSA builder in the front-end
crate, which also provides a way to implement source-language local
variables. The name "explicit stack slot" makes it clear what this
construct is.
2018-02-28 14:04:28 -08:00
Julian Seward
7054f25abb Adds support to transform integer div and rem by constants into cheaper equivalents.
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.

* div/rem by constant powers of two are turned into right shifts, plus some
  fixups for the signed cases.

* div/rem by constant non-powers of two are turned into double length
  multiplies by a magic constant, plus some fixups involving shifts,
  addition and subtraction, that depends on the constant, the word size and
  the signedness involved.

* The following cases are transformed: div and rem, signed or unsigned, 32
  or 64 bit.  The only un-transformed cases are: unsigned div and rem by
  zero, signed div and rem by zero or -1.

* This is all incorporated within a new transformation pass, "preopt", in
  lib/cretonne/src/preopt.rs.

* In preopt.rs, fn do_preopt() is the main driver.  It is designed to be
  extensible to transformations of other kinds of instructions.  Currently
  it merely uses a helper to identify div/rem transformation candidates and
  another helper to perform the transformation.

* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
  cases where the second arg is an immediate, and cases where the second
  arg is an identifier bound to an immediate at its definition point.

* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
  transformation proper.  It in turn uses magic{S,U}{32,64} to calculate the
  magic numbers required for the transformations.

* There are many test cases for the transformation proper:
    filetests/preopt/div_by_const_non_power_of_2.cton
    filetests/preopt/div_by_const_power_of_2.cton
    filetests/preopt/rem_by_const_non_power_of_2.cton
    filetests/preopt/rem_by_const_power_of_2.cton
    filetests/preopt/div_by_const_indirect.cton
  preopt.rs also contains a set of tests for magic number generation.

* The main (non-power-of-2) transformation requires instructions that return
  the high word of a double-length multiply.  For this, instructions umulhi
  and smulhi have been added to the core instruction set.  These will map
  directly to single instructions on most non-intel targets.

* intel does not have an instruction exactly like that.  For intel,
  instructions x86_umulx and x86_smulx have been added.  These map to real
  instructions and return both result words.  The intel legaliser will
  rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
  word.  Tests:
    filetests/isa/intel/legalize-mulhi.cton (new file)
    filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
2018-02-28 11:41:36 -08:00
Dan Gohman
ab9298eafa Make the fst recipe use the deref-safe register class as well. 2018-02-28 10:12:40 -08:00
Dan Gohman
2a26b70854 Update URLs. 2018-02-23 16:16:44 -08:00
Dan Gohman
a55a582d4e Suppress printing of placeholder external functions.
With the change to the parser to preserve indices, it now inserts
placeholders to pad out index spaces as needed. Placeholder functions
use reserved signature indices, so skip them when writing them out,
to avoid writing them out as "sig4294967295".
2018-02-21 13:10:58 -08:00
Jakob Stoklund Olesen
8d388b2218 Fix stack pointer offsets for outgoing arguments.
StackSlotKind::OutgoingArg stack slots have an offset that is relative
to our own stack pointer, while all other stack slot kinds have offsets
that are relative to the caller's stack pointer.

Make sure we generate the right sp-relative offsets for outgoing
arguments too.
2018-02-21 10:34:41 -08:00
Dan Gohman
10dcfcacdb Remove support for entity variables in filecheck.
Now that the parser doesn't renumber indices, there's no need for entity
variables like $v0.
2018-02-20 17:27:46 -08:00
Dan Gohman
a5b00b173e Don't renumber entities in the parser.
This makes it easier to debug testcases:
 - the entity numbers in a .cton file match the entity numbers used
   within Cretonne.
 - serializing and deserializing doesn't cause indices to change.

One disadvantage is that if a .cton file uses sparse entity numbers,
deserializing to the in-memory form doesn't compact it. However, the
text format is not intended to be performance-critical, so this isn't
expected to be a big burden.
2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen
b9b1d0fcd5 Add a trapff instruction.
This is the floating point equivalent of trapif: Trap when a given
condition is in the floating-point flags.

Define Intel encodings comparable to the trapif encodings.
2018-02-20 14:35:41 -08:00
Jakob Stoklund Olesen
ad896d9790 Add more legalization patterns for *_imm instructions.
When the imediate value is out of range for the legal encodings, convert
these instructions to an iconst followed by their register counterparts.
2018-02-20 10:47:46 -08:00
Jakob Stoklund Olesen
a9e799debb Add an avoid_div_traps setting.
This enables code generation that never causes a SIGFPE signal to be
raised from a division instruction. Instead, division and remainder
calculations are protected by explicit traps.
2018-02-16 13:10:29 -08:00
Jakob Stoklund Olesen
3ccc3f4f9b Add a stack_check instruction.
This instruction loads a stack limit from a global variable and compares
it to the stack pointer, trapping if the stack has grown beyond the
limit.

Also add a expand_flags transform group containing legalization patterns
for ISAs with CPU flags.

Fixes #234.
2018-02-13 10:48:06 -08:00
Jakob Stoklund Olesen
60e70da0e6 Add Intel encodings for ifcmp_imm.
The instruction set has variants with 8-bit and 32-bit signed immediate
operands.

Add a TODO to use a TEST instruction for the special case ifcmp_imm x, 0.
2018-02-13 10:38:46 -08:00
Jakob Stoklund Olesen
788a78caf4 Add Intel encodings for ifcmp_sp.
Also generate an Into<RegUnit> implementation for the RU enums.
2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen
69f70fc61d Add Intel encodings for trapif.
This is implemented as a macro with a conditional jump over a ud2. This
way, we don't have to split up EBBs at every conditional trap.
2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen
11c721934c Add a trapif instruction.
This is a conditional trap controlled by integer CPU flags.
Compare to brif.
2018-02-08 14:40:46 -08:00
Julian Seward
6f8a54b6a5 Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets.
Changes:

* Adds a new generic instruction, SELECTIF, that does value selection (a la
  conditional move) similarly to existing SELECT, except that it is
  controlled by condition code input and flags-register inputs.

* Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and
  nothing else.

* Adds new Intel x86_64 instructions BSR and BSF.

* Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets
  using the new BSR, BSF and SELECTIF instructions.

* Implements SELECTIF on x86_64 targets using conditional-moves.

* new test filetests/isa/intel/baseline_clz_ctz_popcount.cton
  (for legalization)

* new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton
  (for encoding)

* Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified
  Rust without rustc complaining.

Fixes #238.
2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen
1bbc529ef9 Improve the variable ordering used by the coloring constraint solver.
The fuzzer bugs #219 and #227 are both cases where the register
allocator coloring pass "runs out of registers". What's really happening
is that the constraint solver failed to find a solution, even when one
existed.

Suppose we have three solver variables:

    v0(GPR, out, global)
    v1(GPR, in)
    v2(GPR, in, out)

And suppose registers %r0 and %r1 are available on both input and output
sides of the instruction, but only %r1 is available for global outputs.
A valid solution would be:

    v0 -> %r1
    v1 -> %r1
    v2 -> %r0

However, the solver would pick registers for the three values in
numerical order because v1 and v2 have the same domain size (=2). This
would assign v1 -> %r0 and then fail to find a free register for v2.

Fix this by prioritizing in+out variables over single-sided variables
even when their domains are equal. This means the v2 gets assigned a
register before v1, and it gets a chance to pick a register that is
still available on both in and out sides.

Also try to avoid depending on value numbers in the solver. These bugs
were hard to reproduce because a test case invariably would have
different value numbers, causing the solver to order its variables
differently and succeed. Throw in the previous solution and original
register assignments as tie breakers which are stable and not dependent
on value numbers.

This is still not a substitute for a proper solver search algorithm that
we will probably have to write eventually.

Fixes #219
Fixes #227
2018-01-19 13:31:26 -08:00
Tyler McMullen
14e39db428 Add filetest for statically out-of-bound heap addresses. 2018-01-18 15:49:10 -08:00
Tyler McMullen
df210bfdea Fix the Intel x64 PIC 'call' test, adding correct addend. 2018-01-18 14:23:00 -08:00
Jakob Stoklund Olesen
1e49431804 Add test case from #216.
The error exposed by this test case no longer happens after the
coalescer was rewritten to to follow the Budimlic paper. It's still a
good coalescer test.

Fixes #216 by including the test case.
2018-01-17 16:19:51 -08:00
Jakob Stoklund Olesen
dcad3fa339 Fix coloring bug with combined constraints and global values.
The Intel instruction "v1 = ushr v2, v2" will implicitly fix the output
register for v2 to %rcx because the output is tied to the first input
operand and the second input operand is fixed to %rcx.

Make sure we handle this transitive constraint when checking for
interference with the globally live registers.

Fixes #218
2018-01-17 15:51:08 -08:00
Jakob Stoklund Olesen
0a6500c99a Avoid making solver variables for fixed input constraints.
When the coloring pass sees an instruction with a fixed input register
constraint that is already satisfied, make sure to tell the solver
about it anyway.

There are situations where the solver wants to convert a value to a
solver variable, and we can't allow that if the same value is also used
for a fixed register operand.

Fixes #221.
2018-01-17 15:01:00 -08:00
Jakob Stoklund Olesen
13af22b46b Track register pressure for dead EBB parameters.
The spiller wasn't tracking register pressure correctly for dead EBB
parameters in visit_ebb_header(). Make sure we free any dead EBB
parameters.

Fixes #223
2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen
d1f236b00a Reimplement coalescer following the Budimlic paper.
The old coalescing algorithm had some algorithmic complexity issues when
dealing with large virtual registers. Reimplement to use a proper
union-find algorithm so we only need one pass through the dominator
forests for virtual registers that are interference free.

Virtual registers that do have interference are split and new registers
built.

This pass is about twice as fast as the old one when dealing with
complex virtual registers.
2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen
cacba1a58f Don't allow EBB parameters to be ghost values.
Ghost instructions and values are supposed to be stored as metadata
alongside the compiled program such that the ghost values can be
computed from the real register/stack values when the program is stopped
for debugging or de-optimization.

If we allow an EBB parameter to be a ghost value, we have no way of
computing its real value using ghost instructions. We would need to know
a complete execution trace of the stopped program to figure out which
values were passed to the ghost parameter.

Instead we require EBB parameters to be real values materialized in
registers or on the stack. We use the regclass_for_abi_type() TargetIsa
callback to determine the initial register class for these parameters.
They can then be spilled later if needed.

Fixes #215.
2018-01-11 16:48:02 -08:00
Jakob Stoklund Olesen
5e094034d4 Fix verifier bug in unreachable code.
We want to disable dominance checks in unreachable code. The
is_reachable() check for EBB parameter values was checking if the
defining EBB was reachable, not the EBB using the value.

This bug showed up in fuzzing and in #213.
2018-01-09 10:47:49 -08:00
Dan Gohman
4f53cc1dad Align IntelGOTPCRel4 with R_X86_64_GOTPCREL.
Add an addend field to reloc_external, and use it to move the
responsibility for accounting for the difference between the end of an
instruction (where the PC is considered to be in PC-relative on intel)
and the beginning of the immediate field into the encoding code.

Specifically, this makes IntelGOTPCRel4 directly correspond to
R_X86_64_GOTPCREL, instead of also carrying an implicit `- 4`.
2017-12-15 16:17:32 -06:00
Dan Gohman
76e31cc1ad Rename GotPCRel4 to GOTPCRel4.
This emphasizes that GOT is being used as an abbreviation rather than
the word "got".
2017-12-15 16:17:32 -06:00
Jakob Stoklund Olesen
febe8e0e51 Allow spilling of EBB arguments.
When the spiller needs to make a register available for a conditional
branch instruction, it can be necessary to spill some of the EBB
arguments on the branch instruction. This is ok because EBB argument
values belong to the same virtual register as the corresponding EBB
parameter and we spill the whole virtreg to the same slot.

Also make sure free_regs() can handle values that are killed by the
current instruction *and* spilled.
2017-12-14 13:57:13 -06:00
Jakob Stoklund Olesen
d617d5e0f3 Use a domtree pre-order instead of a CFG RPO for coalescing.
The stack implementation if the Budimlic dominator forest doesn't work
correctly with a CFG RPO. It needs the domtree pre-order.

Also handle EBB pre-order vs inst-level preorder. Manage the stack
according to EBB dominance. Look for a dominating value by searching the
stack. This is different from the Budimlic algorithm because we're
computing the dominator tree pre-order with EBB granularity only.

Fixes #207.
2017-12-13 16:22:01 -06:00
Jakob Stoklund Olesen
a825427786 Avoid reloading spilled EBB arguments.
The coalescer makes sure that matching EBB arguments and parameters are
always in the same virtual registers, and therefore also in the same
stack slot if they are spilled.

This means that the reload pass should never rewrite an EBB argument if
the argument value is spilled. This comes up in cases where the branch
instruction needs the same value in a register:

    brnz v9, ebb3(v9)

If the virtual register containing v9 is spilled, the branch instruction
must be reloaded like:

    v52 = fill v9
    brnz v52, ebb3(v9)

The branch register argument must be rewritten, and the EBB argument
must be referring to the original stack value.

Fixes #208.
2017-12-13 15:22:05 -06:00
Pat Hickey
ed81bc21be filetests: add filetests for intel PIC encodings 2017-12-12 19:29:52 -08:00
Jakob Stoklund Olesen
a888b2a6f1 Dominator tree pre-order.
Add a DominatorTreePreorder data structure which can be initialized for
a DominatorTree and used for queries involving a pre-order of the
dominator tree.

Print out the pre-order and send it through filecheck in "test domtree"
file tests.
2017-12-08 17:43:15 -08:00
Jakob Stoklund Olesen
7d5f2f0404 Convert the CFG traversal tests to file tests.
Add a "cfg_postorder:" printout to the "test domtree" file tests and use
that to check the computed CFG post-order instead of doing it manually
with Rust code.
2017-12-08 13:58:18 -08:00
Jakob Stoklund Olesen
a7eb13a151 Expand unknown instructions to runtime library calls. 2017-12-08 10:37:50 -08:00
Jakob Stoklund Olesen
f03729d742 Fix generated code for ISA predicates on encoding recipes.
The generated code had syntax errors and inverted logic.

Add an SSE 4.1 requirement to the floating point rounding instructions.
2017-12-08 10:37:50 -08:00
Tyler McMullen
7988d0c54c Add 8-bit variation of adjust_sp_imm for 32-bit and 64-bit Intel. 2017-12-05 11:49:12 -08:00
Tyler McMullen
5783ea2c9a Account for return address when reserving stack space for CSRs. 2017-12-05 11:49:12 -08:00
Tyler McMullen
a75248d2cf Move the initial stack pointer adjustment to after the CSR pushes. 2017-12-05 11:49:12 -08:00
Tyler McMullen
ebcbd54f61 Add 'compile' test and confirm the pro/epilogue is added. Fix regression this revealed. 2017-12-05 11:49:12 -08:00
Tyler McMullen
ced39f5186 Fix up adjust_sp_imm instruction.
* Use imm64 rather than offset32
* Add predicate to enforce signed 32-bit limit to imm
* Remove AdjustSpImm format
* Add encoding tests for adjust_sp_imm
* Adjust use of adjust_sp_imm in Intel prologue_epilogue to match
2017-12-05 11:49:12 -08:00
Tyler McMullen
1a11c351b5 Add tests and documentation for x86_(push|pop). Fix up encoding issues revealed by tests. 2017-12-05 11:49:12 -08:00