Commit Graph

1113 Commits

Author SHA1 Message Date
Lachlan Sneff
3686fc2fc7 Fix typos caused by find-and-replace 2018-06-15 18:33:30 -07:00
Lachlan Sneff
5c320a0d30 Change GlobalVar to GlobalValue 2018-06-15 18:33:30 -07:00
Sergey Pepyakin
a8923e2185 Add 64-bit versions. 2018-06-12 11:54:23 -07:00
Sergey Pepyakin
399860e2aa Encode bnot on x86. 2018-06-12 11:54:23 -07:00
Dan Gohman
3027579cbe Add a test for instruction shrinking. 2018-06-12 05:02:25 -07:00
Dan Gohman
0daa560368 Accept redundant alias definitions.
In the text format, allow aliases to be defined multiple times, as long
as they're always aliasing the same value.

write.rs is already emitting redundant aliases, because it emits them at
their uses, so this change allows the parser to be able to parse such
code.
2018-06-12 05:02:25 -07:00
Sergey Pepyakin
73b6468d25 Parsing stack_limit 2018-06-09 13:18:12 -07:00
Sergey Pepyakin
e9111d1de2 Ditch stack_check instruction 2018-06-09 13:18:12 -07:00
Dan Gohman
4e67e08efd Use the target-lexicon crate.
This switches from a custom list of architectures to use the
target-lexicon crate.

 - "set is_64bit=1; isa x86" is replaced with "target x86_64", and
   similar for other architectures, and the `is_64bit` flag is removed
   entirely.

 - The `is_compressed` flag is removed too; it's no longer being used to
   control REX prefixes on x86-64, ARM and Thumb are separate
   architectures in target-lexicon, and we can figure out how to
   select RISC-V compressed encodings when we're ready.
2018-05-30 06:13:35 -07:00
Dan Gohman
c21af29c79 Legalize libcall signatures.
Explicitly legalize signatures created for libcalls.
2018-05-29 10:21:52 -07:00
data-pup
191bab162b Optimize 0.0 floating point constants. (#344)
* Optimize 0.0 floating point constants. Rather than using the existing
process of emitting bit patterns and moving them into floating point
registers, use the `xorps` instruction to zero out the register.

* is_zero predicate function will not accept negative zero. Fixed formatting for encoding recipe and filetests.
2018-05-24 14:16:25 -07:00
Tyler McMullen
f636d795c5 load_complex and store_complex instructions (#309)
* Start adding the load_complex and store_complex instructions.

N.b.:
The text format is not correct yet. Requires changes to the lexer and parser.
I'm not sure why I needed to change the RuntimeError to Exception yet. Will fix.

* Get first few encodings of load_complex working. Still needs var args type checking.

* Clean up ModRM helper functions in binemit.

* Implement 32-bit displace for load_complex

* Use encoding helpers instead of doing them all by hand

* Initial implementation of store_complex

* Parse value list for load/store_complex with + as delimiter. Looks nice.

* Add sign/zero-extension and size variants for load_complex.

* Add size variants of store_complex.

* Add asm helper lines to load/store complex bin tests.

* Example of length-checking the instruction ValueList for an encoding. Extremely questionable implementation.

* Fix Python linting issues

* First draft of postopt pass to fold adds and loads into load_complex. Just simple loads for now.

* Optimization pass now works with all types of loads.

* Add store+add -> store_complex to postopt pass

* Put complex address optimization behind ISA flag.

* Add load/store complex for f32 and f64

* Fixes changes to lexer that broke NaN parsing.

Abstracts away the repeated checks for whether or not the characters
following a + or - are going to be parsed as a number or not.

* Fix formatting issues

* Fix register restrictions for complex addresses.

* Encoding tests for x86-32.

* Add documentation for newly added instructions, recipes, and cdsl changes.

* Fix python formatting again

* Apply value-list length predicates to all LoadComplex and StoreComplex instructions.

* Add predicate types to new encoding helpers for mypy.

* Import FieldPredicate to satisfy mypy.

* Add and fix some "asm" strings in the encoding tests.

* Line-up 'bin' comments in x86/binary64 test

* Test parsing of offset-less store_complex instruction.

* 'sNaN' not 'sNan'

* Bounds check the lookup for polymorphic typevar operand.

* Fix encodings for istore16_complex.
2018-05-09 14:07:00 -05:00
Steffen Butzer
5aa84a744b windows fastcall (x64) call convention (#314)
* initial set of work for windows fastcall (x64) call convention

- call conventions: rename `fastcall` to `windows_fastcall`
- add initial set of filetests
- ensure arguments are written after the shadow space/store (offset-wise)
  The shadow space available before the arguments (range 0..32)
  is not used as spill space yet.

* address review feedback
2018-05-09 13:18:30 -05:00
Dan Gohman
5b69930e03 Legalize bnot using xor with -1. 2018-05-05 11:21:06 -05:00
Pat Hickey
bb612af37a x86 recipes: emit StackOverflow trap for all sp-relative loads and stores (#325)
* x86 recipes: emit StackOverflow trap for all sp-relative loads and stores

* x86 recipes: emit StackOverflow trap for push and pop

* x86 binary filetests: add stk_ovf trap annotations
2018-05-03 18:09:07 -07:00
Dan Gohman
7e1f157692 Use %esp rather than %rsp in x86-32 test comments. 2018-05-02 09:41:34 -07:00
Dan Gohman
5bcfd47f3f Remove the non-REX encodings for regmove et al.
regmove, regfill, and regspill have immediates which aren't value
operands, so they aren't in the set of things that can be described by
the existing constraint system. Consequently, constraints saying that
the non-REX encodings only support registers that don't need REX
prefixes don't work. Fow now, just remove the non-REX encodings, so
that they don't get selected when they aren't valid.

This fixes the last known issue with instruction shrinking, so it can
be re-enabled.
2018-04-22 22:31:56 -07:00
Dan Gohman
3b1d805758 Stack overflow checking with stack probes.
This adds a libcall name, a calling convention, and settings for
emitting stack probes, and implements them for x86 system_v ABIs.
2018-04-22 21:52:12 -07:00
Dan Gohman
c5b15c2396 Refactor calling convention settings. (#304)
Add a calling-convention setting to the `Flags` used as part of the
`TargetIsa`. This allows Cretonne code that generates calls to use the
correct convention, such as when emitting libcalls during legalization
or when the wasm frontend is decoding functions. This setting can be
overridden per-function.

This also adds "fast", "cold", and "fastcall" conventions, with "fast"
as the new default. Note that "fast" and "cold" are not intended to be
ABI-compatible across Cretonne versions.

This will also ensure Windows users will get an `unimplemented!` rather
than silent calling-convention mismatches, which reflects the fact that
Windows calling conventions are not yet implemented.

This also renames SpiderWASM, which isn't camel-case, to Baldrdash,
which is, and which is also a more relevant name.
2018-04-22 21:35:18 -07:00
Dan Gohman
dfb24f1934 Fix x86 encoding of uextend/sextend from 8-bit inputs.
The x86-32 and non-REX encodings of movsbl and movzbl require one of
the ABCD registers as input.
2018-04-20 12:12:29 -07:00
Dan Gohman
cb3c5a1384 Revert the change to this test that accompanied the shrinking pass too. 2018-04-19 18:57:59 -07:00
Dan Gohman
bce8af97e3 Add an instruction shrinking pass.
When an instruction has multiple valid encodings, such as with and
without a REX prefix on x86-64, Cretonne typically picks the encoding
which gives the register allocator the most flexibility, which is
typically the longest encoding. This patch adds a pass that runs after
register allocation that picks the smallest encoding, working within the
constraints of the register allocator's choices. The result is smaller
and easier to read encodings.

In the future, we may want to merge this pass into the relaxation pass,
or possibly fold it into the final encoding step, however for now, a
discrete pass will suffice.
2018-04-19 17:04:56 -07:00
Dan Gohman
583ae56fd2 Use opt_level instead of is_compressed for encoding optimizations.
Choosing smaller instruction encodings on eg. x86 is an optimization,
rather than a useful discrete setting.

Use "is_compressed" only for ISAs that have an explicit compression feature
that users of the output may to be aware of, such as RISC-V's RVC or
ARM's Thumb-2.
2018-04-19 16:33:38 -07:00
Dan Gohman
d3e76b8a51 Add more tests for non-PIC calls on x86-64. 2018-04-17 22:12:41 -07:00
Dan Gohman
f43b6aca1a Use lower-case letters for github URLs.
This makes it a little more consistent; now, "cretonne" is never capitalized
in identifier, path, or URL contexts. It is capitalized in natural
language contexts when referring to the project.
2018-04-17 09:47:11 -07:00
Dan Gohman
24fa169e1f Rename the 'cretonne' crate to 'cretonne-codegen'.
This fixes the next part of #287.
2018-04-17 09:46:56 -07:00
Dan Gohman
56f11e76b4 Use PC-relative encodings for colocated functions on non-PIC.
Colocated functions are expected to be defined within the PC-relative
immediate range on x86-64, so allow this addressing for non-PIC as well
as PIC.
2018-04-16 16:27:27 -07:00
Dan Gohman
0e57f3d0ea Add a "colocated" flag to symbol references. (#298)
This adds a "colocated" flag to function and symbolic global variables which
indicates that they are defined along with the current function, so they can
use PC-relative addressing.

This also changes the function decl syntax; the name now always precedes the
signature, and the "function" keyword is no longer included.
2018-04-13 15:00:09 -07:00
Dan Gohman
1c760ab179 Rename intel to x86.
x86 is the more accurate name, as there are non-Intel x86 implementations.

Fixes #263.
2018-04-12 10:02:16 -07:00
Dan Gohman
18b2f12150 Save/restore callee-saved registers used in regmove/regfill.
The regmove and regfill instructions temporarily divert a value's
location, and these temporary diversions are not reflected in
`func.locations`. For now, make an extra scan through the instructions
of the function to find any regmove or regfill instructions in order to
find all used callee-saved registers.

This fixes #296.
2018-04-09 22:33:54 -07:00
Dan Gohman
eab57c0a40 Use large-model addressing for calls when in non-PIC mode.
The main use for non-PIC code at present is JIT code, and JIT code can
live anywhere in memory and reference other symbols defined anywhere in
memory, so it needs to use the "large" code model.

func_addr and globalsym_addr instructions were already using `movabs`
to support arbitrary 64-bit addresses, so this just makes calls be
legalized to support arbitrary 64-bit addresses also.
2018-04-08 22:37:35 -07:00
Dan Gohman
b0d414731c The addend for a PCRel4 reloc should be -4 too. 2018-04-07 06:15:33 -07:00
Tyler McMullen
775c674b38 Only save callee-saved registers that are used (#293)
* Only save callee-saved registers that are actually being used.

* Rename AllocatableSet to RegisterSet

* Style cleanup and small renames for readability.

* Adjust x86 prologue-epilogue test to account for callee-saved register optimization.

* Add more tests for prologue-epilogue optimizations.
2018-04-03 14:44:12 -07:00
Dan Gohman
e107793b68 Pre-opt: Use the correct operand in the irsub_imm pattern. 2018-03-30 21:02:26 -07:00
Dan Gohman
9e4ab7dc86 Rename CallConv::Native to CallConv::SystemV. (#291)
To keep cross-compiling straightforward, Cretonne shouldn't have any
behavior that depends on the host. This renames the "Native" calling
convention to "SystemV", which has a defined meaning for each target,
so that it's clear that the calling convention doesn't change
depending on what host Cretonne is running on.
2018-03-30 12:32:14 -07:00
Dan Gohman
6606b88136 Optimize immediates and compare and branch sequences (#286)
* Add a pre-opt optimization to change constants into immediates.

This converts 'iadd' + 'iconst' into 'iadd_imm', and so on.

* Optimize away redundant `bint` instructions.

Cretonne has a concept of "Testable" values, which can be either boolean
or integer. When the an instruction needing a "Testable" value receives
the result of a `bint`, converting boolean to integer, eliminate the
`bint`, as it's redundant.

* Postopt: Optimize using CPU flags.

This introduces a post-legalization optimization pass which converts
compare+branch sequences to use flags values on CPUs which support it.

* Define a form of x86's `urm` that doesn't clobber FLAGS.

movzbl/movsbl/etc. don't clobber FLAGS; define a form of the `urm`
recipe that represents this.

* Implement a DCE pass.

This pass deletes instructions with no side effects and no results that
are used.

* Clarify ambiguity about "32-bit" and "64-bit" in comments.

* Add x86 encodings for icmp_imm.

* Add a testcase for postopt CPU flags optimization.

This covers the basic functionality of transforming compare+branch
sequences to use CPU flags.

* Pattern-match irsub_imm in preopt.
2018-03-30 12:30:07 -07:00
Tyler McMullen
951ff11f85 [WIP] Add a Trap sink to code generation (#279)
* First draft of TrapSink implementation.

* Add trap sink calls to 'trapif' and 'trapff' recipes.

* Add SourceLoc to trap sink calls, and add trap sink calls to all loads and stores.

* Add IntegerDivisionByZero trap to div recipe.

* Only emit load/store traps if 'notrap' flag is not set on the instruction.

* Update filetest machinery to add new trap sink functionality.

* Update filetests to include traps in output.

* Add a few more trap outputs to filetests.

* Add trap output to CLI tool.
2018-03-28 22:48:03 -07:00
Dan Gohman
e5ec7242cc Fix handling of value aliases, and re-enable LICM.
Value aliases aren't instructions, so they don't have a location in the
CFG, so it's not meaningful to query whether a value alias is defined
within a loop.
2018-03-28 22:06:52 -07:00
Dan Gohman
db2be8ee01 Verifier: Diagnose an instruction using its own result values. 2018-03-28 21:27:05 -07:00
Dan Gohman
23ab07b54e Support legalizing bconst instructions on x86. 2018-03-28 14:11:16 -07:00
Dan Gohman
79f02e42dd Use movss/movsd rather than movd/movq for floating-point loads and stores.
While there may be CPUs that have a domain crossing penalty here,
this also helps the generated code look more like the code produced
by other compilers.
2018-03-27 11:53:59 -07:00
Dan Gohman
ffe89cdc0a Rename %eflags to %rflags.
EFLAGS is a subregister of RFLAGS. For consistency with GPRs where we
use the 64-bit names to refer to the registers, use the 64-bit name for
RFLAGS as well.
2018-03-27 11:52:57 -07:00
Dan Gohman
685cde98a4 Mark loads from globals aligned and notrap.
Mark loads from globals generated by cton_wasm or by legalization as
`aligned` and `notrap`, since memory for these globals should be
allocated by the runtime environment for that purpose. This reduces
the number of potentially trapping instructions, which can reduce
the amount of metadata required by embedding environments.
2018-03-26 21:21:54 -07:00
Pat Hickey
80d2c5d9bf Implement shift-immediate encodings for x86 (#283)
* add x86 encodings for shift-immediate instructions

implements encodings for ishl_imm, sshr_imm, and ushr_imm. uses 8-bit immediates.

added tests for the encodings to intel/binary64.cton. Canonical versions
come from llvm-mc.

* translate test to use shift-immediates

* shift immediate encodings: use enc_i32_i64

and note why the regular shift encodings cant use it above

* add additional encoding tests for shift immediates

this covers 32 bit mode, and 64 bit operations in 64 bit mode.
2018-03-26 16:48:20 -07:00
Dan Gohman
136d6f5c4b Implement ireduce, sextend, and uextend between i8/i16 and i32/i64. 2018-03-05 15:13:59 -08:00
Dan Gohman
5dc449ec9e Rename "local variables" to "explicit stack slots".
The term "local variables" predated the SSA builder in the front-end
crate, which also provides a way to implement source-language local
variables. The name "explicit stack slot" makes it clear what this
construct is.
2018-02-28 14:04:28 -08:00
Julian Seward
7054f25abb Adds support to transform integer div and rem by constants into cheaper equivalents.
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.

* div/rem by constant powers of two are turned into right shifts, plus some
  fixups for the signed cases.

* div/rem by constant non-powers of two are turned into double length
  multiplies by a magic constant, plus some fixups involving shifts,
  addition and subtraction, that depends on the constant, the word size and
  the signedness involved.

* The following cases are transformed: div and rem, signed or unsigned, 32
  or 64 bit.  The only un-transformed cases are: unsigned div and rem by
  zero, signed div and rem by zero or -1.

* This is all incorporated within a new transformation pass, "preopt", in
  lib/cretonne/src/preopt.rs.

* In preopt.rs, fn do_preopt() is the main driver.  It is designed to be
  extensible to transformations of other kinds of instructions.  Currently
  it merely uses a helper to identify div/rem transformation candidates and
  another helper to perform the transformation.

* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
  cases where the second arg is an immediate, and cases where the second
  arg is an identifier bound to an immediate at its definition point.

* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
  transformation proper.  It in turn uses magic{S,U}{32,64} to calculate the
  magic numbers required for the transformations.

* There are many test cases for the transformation proper:
    filetests/preopt/div_by_const_non_power_of_2.cton
    filetests/preopt/div_by_const_power_of_2.cton
    filetests/preopt/rem_by_const_non_power_of_2.cton
    filetests/preopt/rem_by_const_power_of_2.cton
    filetests/preopt/div_by_const_indirect.cton
  preopt.rs also contains a set of tests for magic number generation.

* The main (non-power-of-2) transformation requires instructions that return
  the high word of a double-length multiply.  For this, instructions umulhi
  and smulhi have been added to the core instruction set.  These will map
  directly to single instructions on most non-intel targets.

* intel does not have an instruction exactly like that.  For intel,
  instructions x86_umulx and x86_smulx have been added.  These map to real
  instructions and return both result words.  The intel legaliser will
  rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
  word.  Tests:
    filetests/isa/intel/legalize-mulhi.cton (new file)
    filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
2018-02-28 11:41:36 -08:00
Dan Gohman
ab9298eafa Make the fst recipe use the deref-safe register class as well. 2018-02-28 10:12:40 -08:00
Dan Gohman
2a26b70854 Update URLs. 2018-02-23 16:16:44 -08:00
Dan Gohman
a55a582d4e Suppress printing of placeholder external functions.
With the change to the parser to preserve indices, it now inserts
placeholders to pad out index spaces as needed. Placeholder functions
use reserved signature indices, so skip them when writing them out,
to avoid writing them out as "sig4294967295".
2018-02-21 13:10:58 -08:00