Commit Graph

424 Commits

Author SHA1 Message Date
Dan Gohman
b523b69c16 Make bash function syntax consistent with other scripts in the repo. 2018-03-30 13:38:30 -07:00
Dan Gohman
6606b88136 Optimize immediates and compare and branch sequences (#286)
* Add a pre-opt optimization to change constants into immediates.

This converts 'iadd' + 'iconst' into 'iadd_imm', and so on.

* Optimize away redundant `bint` instructions.

Cretonne has a concept of "Testable" values, which can be either boolean
or integer. When the an instruction needing a "Testable" value receives
the result of a `bint`, converting boolean to integer, eliminate the
`bint`, as it's redundant.

* Postopt: Optimize using CPU flags.

This introduces a post-legalization optimization pass which converts
compare+branch sequences to use flags values on CPUs which support it.

* Define a form of x86's `urm` that doesn't clobber FLAGS.

movzbl/movsbl/etc. don't clobber FLAGS; define a form of the `urm`
recipe that represents this.

* Implement a DCE pass.

This pass deletes instructions with no side effects and no results that
are used.

* Clarify ambiguity about "32-bit" and "64-bit" in comments.

* Add x86 encodings for icmp_imm.

* Add a testcase for postopt CPU flags optimization.

This covers the basic functionality of transforming compare+branch
sequences to use CPU flags.

* Pattern-match irsub_imm in preopt.
2018-03-30 12:30:07 -07:00
Tyler McMullen
951ff11f85 [WIP] Add a Trap sink to code generation (#279)
* First draft of TrapSink implementation.

* Add trap sink calls to 'trapif' and 'trapff' recipes.

* Add SourceLoc to trap sink calls, and add trap sink calls to all loads and stores.

* Add IntegerDivisionByZero trap to div recipe.

* Only emit load/store traps if 'notrap' flag is not set on the instruction.

* Update filetest machinery to add new trap sink functionality.

* Update filetests to include traps in output.

* Add a few more trap outputs to filetests.

* Add trap output to CLI tool.
2018-03-28 22:48:03 -07:00
Dan Gohman
57cd69d8b4 Say "IR" instead of "IL".
While the specifics of these terms are debatable, "IR" generally
isn't incorrect in this context, and is the more widely recognized
term at this time.

See also the discussion in #267.

Fixes #267.
2018-03-28 22:07:26 -07:00
Dan Gohman
23ab07b54e Support legalizing bconst instructions on x86. 2018-03-28 14:11:16 -07:00
Dan Gohman
c3f044ff46 Note that the "widen" legalization group is not yet implemented. 2018-03-28 13:44:54 -07:00
Dan Gohman
8d560cf8ba Fix Rust syntax in generated code.
This code is not currently emitted, though it will be when there
are more legalization rules.
2018-03-28 12:58:31 -07:00
Dan Gohman
79f02e42dd Use movss/movsd rather than movd/movq for floating-point loads and stores.
While there may be CPUs that have a domain crossing penalty here,
this also helps the generated code look more like the code produced
by other compilers.
2018-03-27 11:53:59 -07:00
Dan Gohman
ffe89cdc0a Rename %eflags to %rflags.
EFLAGS is a subregister of RFLAGS. For consistency with GPRs where we
use the 64-bit names to refer to the registers, use the 64-bit name for
RFLAGS as well.
2018-03-27 11:52:57 -07:00
Dan Gohman
685cde98a4 Mark loads from globals aligned and notrap.
Mark loads from globals generated by cton_wasm or by legalization as
`aligned` and `notrap`, since memory for these globals should be
allocated by the runtime environment for that purpose. This reduces
the number of potentially trapping instructions, which can reduce
the amount of metadata required by embedding environments.
2018-03-26 21:21:54 -07:00
Pat Hickey
80d2c5d9bf Implement shift-immediate encodings for x86 (#283)
* add x86 encodings for shift-immediate instructions

implements encodings for ishl_imm, sshr_imm, and ushr_imm. uses 8-bit immediates.

added tests for the encodings to intel/binary64.cton. Canonical versions
come from llvm-mc.

* translate test to use shift-immediates

* shift immediate encodings: use enc_i32_i64

and note why the regular shift encodings cant use it above

* add additional encoding tests for shift immediates

this covers 32 bit mode, and 64 bit operations in 64 bit mode.
2018-03-26 16:48:20 -07:00
Dan Gohman
3ec7918ba1 Tidy up a redundant import. 2018-03-22 13:18:25 -07:00
Dan Gohman
e787357520 Wrap build.py's contents in a main function.
This prevents its local variables from becoming global variables.
2018-03-22 13:18:25 -07:00
Pat Hickey
03ee007624 Use clippy (#276)
* cton-util: fix some clippy unnecessary pass-by-value warnings

* clippy: ignore too many arguments / cyclomatic complexity in module

since these functions are taking args coming from the command line, i
dont think this is actually a valid lint, morally the arguments are all
from one structure

* cton-util: take care of remaining clippy warnings

* cton-reader: fix all non-suspicious clippy warnings

* cton-reader: disable clippy at site of suspicious lint

* cton-frontend: disable clippy at the site of an invalid lint

* cton-frontend: fix clippy warnings, or ignore benign ones

* clippy: ignore the camelcase word WebAssembly in docs

* cton-wasm: fix clippy complaints or ignore benign ones

* cton-wasm tests: fix clippy complaints

* cretonne: starting point turns off all clippy warnings

* cretonne: clippy fixes, or lower allow() to source of problem

* cretonne: more clippy fixes

* cretonne: fix or disable needless_lifetimes lint

this linter is buggy when the declared lifetime is used for another type
constraint.

* cretonne: fix clippy complaint about Pass::NoPass

* rustfmt

* fix prev minor api changes clippy suggested

* add clippy to test-all

* cton-filetests: clippy fixes

* simplify clippy reporting in test-all

* cretonne: document clippy allows better

* cretonne: fix some more clippy lints

* cretonne: fix clippy lints (mostly doc comments)

* cretonne: allow all needless_lifetimes clippy warnings

remove overrides at the false positives

* rustfmt
2018-03-22 13:10:41 -07:00
Dan Gohman
ca4582ae82 Rename the recipes for x86 spill/fill instructions.
Both "sp" and "fi" have multiple meanings in this context, so use slightly
longer but less ambiguous names.
2018-03-20 13:28:35 -07:00
Afnan Enayet
9a49bc2ec9 Rename I32 -> X86_32 and I64 -> X86_64 (#271)
* Rename `I32` -> `X86_32` and `I64` -> `X86_64`

* Format file to pass flake8 tests

* Fix comment so lines are under 80 char limit

* Remove trailing whitespace from comment

* Renamed `enc_i64` to `enc_x86_64` as per suggestion from PR
2018-03-18 13:50:51 -07:00
Dan Gohman
b2acd457d5 Use OrderedDict rather than explicit sorting.
This reduces churn in the generated files, making it easier to inspect
changes.
2018-03-15 10:35:06 -07:00
Dan Gohman
c842b9aaa1 Code cleanup: import OrderedDict rather than collections.OrderedDict 2018-03-15 10:13:26 -07:00
Dan Gohman
272d03d8fc Add a utility for generating Rust 'match' expressions.
This makes it a little simpler to generate 'match' statements, and
it performs deduplication of identical arms. And it means I don't
have to think about as many strings like '{} {{ {}.. }} => {}'
when I'm trying to think about how instructions work :-).
2018-03-14 10:51:09 -07:00
Dan Gohman
d9712f5d7d Elaborate on some comments in generated source files.
Recipe names are fairly obscure, so the more context we can give
when using them the better.
2018-03-14 10:51:09 -07:00
Dan Gohman
cc8d6400f4 Rename builder.rs to inst_builder.rs.
This reflects its purpose, to define the `InstBuilder` trait.
2018-03-14 10:51:09 -07:00
Dan Gohman
3afe85ff17 Auto-generate InstructionData.
The meta description has all the information to generate the `InstructionData`
enum, so generate it rather than having a manually-maintained copy.
2018-03-14 10:51:09 -07:00
Dan Gohman
2b9229d715 Fix > 80-column lines for flake8. 2018-03-12 13:02:55 -07:00
Dan Gohman
30f8daa9d6 Replace assert! with debug_assert! in production code paths.
This allows the assertions to be disabled in release builds, so that
the code is faster and smaller, at the expense of not performing the
checks. Assertions can be re-enabled in release builds with the
debug-assertions flag in Cargo.toml, as the top-level Cargo.toml
file does.
2018-03-12 12:38:30 -07:00
Dan Gohman
b8a106adf0 Remove the "has_sse2" flag.
Cretonne currently requires SSE2 support pervasively, so it's not meaningful
to have a setting for it.
2018-03-12 12:38:01 -07:00
Dan Gohman
136d6f5c4b Implement ireduce, sextend, and uextend between i8/i16 and i32/i64. 2018-03-05 15:13:59 -08:00
Dan Gohman
6e94e70f30 Use an https URL rather than http.
Found by sphinx's linkcheck.
2018-03-05 06:55:27 -08:00
Dan Gohman
c59e9180de Tidy up whitespace. 2018-03-05 06:55:27 -08:00
Dan Gohman
804b56d0f2 Document that "enable_float=false" isn't implemented yet. 2018-03-04 21:34:49 -08:00
Julian Seward
7054f25abb Adds support to transform integer div and rem by constants into cheaper equivalents.
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.

* div/rem by constant powers of two are turned into right shifts, plus some
  fixups for the signed cases.

* div/rem by constant non-powers of two are turned into double length
  multiplies by a magic constant, plus some fixups involving shifts,
  addition and subtraction, that depends on the constant, the word size and
  the signedness involved.

* The following cases are transformed: div and rem, signed or unsigned, 32
  or 64 bit.  The only un-transformed cases are: unsigned div and rem by
  zero, signed div and rem by zero or -1.

* This is all incorporated within a new transformation pass, "preopt", in
  lib/cretonne/src/preopt.rs.

* In preopt.rs, fn do_preopt() is the main driver.  It is designed to be
  extensible to transformations of other kinds of instructions.  Currently
  it merely uses a helper to identify div/rem transformation candidates and
  another helper to perform the transformation.

* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
  cases where the second arg is an immediate, and cases where the second
  arg is an identifier bound to an immediate at its definition point.

* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
  transformation proper.  It in turn uses magic{S,U}{32,64} to calculate the
  magic numbers required for the transformations.

* There are many test cases for the transformation proper:
    filetests/preopt/div_by_const_non_power_of_2.cton
    filetests/preopt/div_by_const_power_of_2.cton
    filetests/preopt/rem_by_const_non_power_of_2.cton
    filetests/preopt/rem_by_const_power_of_2.cton
    filetests/preopt/div_by_const_indirect.cton
  preopt.rs also contains a set of tests for magic number generation.

* The main (non-power-of-2) transformation requires instructions that return
  the high word of a double-length multiply.  For this, instructions umulhi
  and smulhi have been added to the core instruction set.  These will map
  directly to single instructions on most non-intel targets.

* intel does not have an instruction exactly like that.  For intel,
  instructions x86_umulx and x86_smulx have been added.  These map to real
  instructions and return both result words.  The intel legaliser will
  rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
  word.  Tests:
    filetests/isa/intel/legalize-mulhi.cton (new file)
    filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
2018-02-28 11:41:36 -08:00
Dan Gohman
ab9298eafa Make the fst recipe use the deref-safe register class as well. 2018-02-28 10:12:40 -08:00
Dan Gohman
d394ae0902 Enable "set -euo pipefail" in all bash scripts.
This enables "set -e", "set -u", and "set -o pipefail", which
catch common errors.
2018-02-27 15:32:21 -08:00
Jakob Stoklund Olesen
2f58c371bc Make specific ISA sub-modules private.
We don't want ISA-specific details exposed in the public Cretonne APIs.
2018-02-21 12:06:58 -08:00
Jakob Stoklund Olesen
b9b1d0fcd5 Add a trapff instruction.
This is the floating point equivalent of trapif: Trap when a given
condition is in the floating-point flags.

Define Intel encodings comparable to the trapif encodings.
2018-02-20 14:35:41 -08:00
Jakob Stoklund Olesen
ad896d9790 Add more legalization patterns for *_imm instructions.
When the imediate value is out of range for the legal encodings, convert
these instructions to an iconst followed by their register counterparts.
2018-02-20 10:47:46 -08:00
Jakob Stoklund Olesen
a6ab90f205 Legalize irsub_imm. 2018-02-16 15:50:36 -08:00
Jakob Stoklund Olesen
a9e799debb Add an avoid_div_traps setting.
This enables code generation that never causes a SIGFPE signal to be
raised from a division instruction. Instead, division and remainder
calculations are protected by explicit traps.
2018-02-16 13:10:29 -08:00
Pat Hickey
ed24320eda gen_settings: dont try to display a Preset descriptor in Flags (#241)
* gen_settings: dont try to display a Preset descriptor in Flags

Trying to display a preset doesnt make sense, and before this commit it
does not display anything meaningful - the printout just says e.g.
"haswell =\n".

The offset byte a preset descriptor isnt a valid offset into the
flag bytes, it is actually an offset into the PRESETS table. It will
cause a panic when the offset is out of bounds for the flag bytes,
which happens in the intel isa as of this commit.

* intel settings: test that display impl doesnt panic
2018-02-14 11:51:40 -08:00
Jakob Stoklund Olesen
3ccc3f4f9b Add a stack_check instruction.
This instruction loads a stack limit from a global variable and compares
it to the stack pointer, trapping if the stack has grown beyond the
limit.

Also add a expand_flags transform group containing legalization patterns
for ISAs with CPU flags.

Fixes #234.
2018-02-13 10:48:06 -08:00
Jakob Stoklund Olesen
a73fcb2691 Pass an ISA argument to legalization functions.
This lets them look at the ISA flags.
2018-02-13 10:42:00 -08:00
Jakob Stoklund Olesen
60e70da0e6 Add Intel encodings for ifcmp_imm.
The instruction set has variants with 8-bit and 32-bit signed immediate
operands.

Add a TODO to use a TEST instruction for the special case ifcmp_imm x, 0.
2018-02-13 10:38:46 -08:00
Jakob Stoklund Olesen
788a78caf4 Add Intel encodings for ifcmp_sp.
Also generate an Into<RegUnit> implementation for the RU enums.
2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen
73c4c356c9 Add an ifcmp_sp instruction.
This will be used to implement the stack_check macro.
2018-02-09 13:59:49 -08:00
Jakob Stoklund Olesen
69f70fc61d Add Intel encodings for trapif.
This is implemented as a macro with a conditional jump over a ud2. This
way, we don't have to split up EBBs at every conditional trap.
2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen
11c721934c Add a trapif instruction.
This is a conditional trap controlled by integer CPU flags.
Compare to brif.
2018-02-08 14:40:46 -08:00
Julian Seward
6f8a54b6a5 Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets.
Changes:

* Adds a new generic instruction, SELECTIF, that does value selection (a la
  conditional move) similarly to existing SELECT, except that it is
  controlled by condition code input and flags-register inputs.

* Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and
  nothing else.

* Adds new Intel x86_64 instructions BSR and BSF.

* Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets
  using the new BSR, BSF and SELECTIF instructions.

* Implements SELECTIF on x86_64 targets using conditional-moves.

* new test filetests/isa/intel/baseline_clz_ctz_popcount.cton
  (for legalization)

* new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton
  (for encoding)

* Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified
  Rust without rustc complaining.

Fixes #238.
2018-02-06 09:43:00 -08:00
Tyler McMullen
ff16583c59 Remove RSP from deref safe register class as well. 2018-01-29 14:18:08 -08:00
Tyler McMullen
21f0fc39ad Further restrict Intel register classes to prevent incorrect encoding of R12 derefs. 2018-01-29 13:42:11 -08:00
Tyler McMullen
850896f05e The addend for a PLTRel4 reloc should be -4. 2018-01-18 14:23:00 -08:00
Tyler McMullen
eb85aa833c Illegalize rbp/r13 for zero-offset loads on Intel x64 (#225)
* Switch RegClass to a bitmap implementation.

* Add special RegClass to remove r13 from 'ld' recipe.

* Use MASK_LEN constant instead of magic number.

* Enforce that RegClass slicing is only valid on contiguous classes.

* Use Optional[int] for RegClass optional bitmask parameter.

* Add comment explaining use of Intel ISA's GPR_NORIP register class.
2018-01-16 20:05:53 -08:00