Commit Graph

99 Commits

Author SHA1 Message Date
Julian Seward
7054f25abb Adds support to transform integer div and rem by constants into cheaper equivalents.
Adds support for transforming integer division and remainder by constants
into sequences that do not involve division instructions.

* div/rem by constant powers of two are turned into right shifts, plus some
  fixups for the signed cases.

* div/rem by constant non-powers of two are turned into double length
  multiplies by a magic constant, plus some fixups involving shifts,
  addition and subtraction, that depends on the constant, the word size and
  the signedness involved.

* The following cases are transformed: div and rem, signed or unsigned, 32
  or 64 bit.  The only un-transformed cases are: unsigned div and rem by
  zero, signed div and rem by zero or -1.

* This is all incorporated within a new transformation pass, "preopt", in
  lib/cretonne/src/preopt.rs.

* In preopt.rs, fn do_preopt() is the main driver.  It is designed to be
  extensible to transformations of other kinds of instructions.  Currently
  it merely uses a helper to identify div/rem transformation candidates and
  another helper to perform the transformation.

* In preopt.rs, fn get_div_info() pattern matches to find candidates, both
  cases where the second arg is an immediate, and cases where the second
  arg is an identifier bound to an immediate at its definition point.

* In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the
  transformation proper.  It in turn uses magic{S,U}{32,64} to calculate the
  magic numbers required for the transformations.

* There are many test cases for the transformation proper:
    filetests/preopt/div_by_const_non_power_of_2.cton
    filetests/preopt/div_by_const_power_of_2.cton
    filetests/preopt/rem_by_const_non_power_of_2.cton
    filetests/preopt/rem_by_const_power_of_2.cton
    filetests/preopt/div_by_const_indirect.cton
  preopt.rs also contains a set of tests for magic number generation.

* The main (non-power-of-2) transformation requires instructions that return
  the high word of a double-length multiply.  For this, instructions umulhi
  and smulhi have been added to the core instruction set.  These will map
  directly to single instructions on most non-intel targets.

* intel does not have an instruction exactly like that.  For intel,
  instructions x86_umulx and x86_smulx have been added.  These map to real
  instructions and return both result words.  The intel legaliser will
  rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half
  word.  Tests:
    filetests/isa/intel/legalize-mulhi.cton (new file)
    filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)
2018-02-28 11:41:36 -08:00
Dan Gohman
af154655d7 Replace as casts with type-conversion functions.
https://github.com/rust-lang-nursery/rust-clippy/wiki#cast_lossless
2018-02-26 15:25:46 -08:00
Jakob Stoklund Olesen
c846ec1626 Catch NaN explicitly in expand_fcvt_to_uint().
When the input is a NaN, we need to generate a different trap code, so
use the new trapff instruction to generate such a trap after the first
floating point comparison.
2018-02-20 14:44:57 -08:00
Jakob Stoklund Olesen
1e56d44465 Remember the positive numbers in expand_fcvt_to_sint.
We can get an INT_MIN result when converting a positive number that is
too large to fit in an integer too.
2018-02-20 14:01:13 -08:00
Jakob Stoklund Olesen
a9e799debb Add an avoid_div_traps setting.
This enables code generation that never causes a SIGFPE signal to be
raised from a division instruction. Instead, division and remainder
calculations are protected by explicit traps.
2018-02-16 13:10:29 -08:00
Pat Hickey
ed24320eda gen_settings: dont try to display a Preset descriptor in Flags (#241)
* gen_settings: dont try to display a Preset descriptor in Flags

Trying to display a preset doesnt make sense, and before this commit it
does not display anything meaningful - the printout just says e.g.
"haswell =\n".

The offset byte a preset descriptor isnt a valid offset into the
flag bytes, it is actually an offset into the PRESETS table. It will
cause a panic when the offset is out of bounds for the flag bytes,
which happens in the intel isa as of this commit.

* intel settings: test that display impl doesnt panic
2018-02-14 11:51:40 -08:00
Jakob Stoklund Olesen
a73fcb2691 Pass an ISA argument to legalization functions.
This lets them look at the ISA flags.
2018-02-13 10:42:00 -08:00
Jakob Stoklund Olesen
788a78caf4 Add Intel encodings for ifcmp_sp.
Also generate an Into<RegUnit> implementation for the RU enums.
2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen
69f70fc61d Add Intel encodings for trapif.
This is implemented as a macro with a conditional jump over a ud2. This
way, we don't have to split up EBBs at every conditional trap.
2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen
429027e2f2 Stack alignment is 16 bytes on x86_64.
Make sure that the stack frame size is always a multiple of 16 as
specified by the x86_64 ABI.

Heads up @pchickey
2018-01-31 16:19:35 -08:00
Pat Hickey
099b959d57 TargetIsa implies a Display of shared and isa-specific flags 2018-01-29 13:53:08 -08:00
Pat Hickey
d444044e9e intel isa: comments to explain rip-relative addressing encoding 2017-12-12 19:29:52 -08:00
Pat Hickey
88b30ff386 refactor Reloc to an enum of every architecture's reloc types
https://github.com/stoklund/cretonne/pull/206#issuecomment-350905016
2017-12-12 13:57:10 -08:00
Jakob Stoklund Olesen
60c456c1ec Add a compilation pass timing facility.
Individual compilation passes call the corresponding timing::*()
function and hold on to their timing token while they run. This causes
nested per-pass timing information to be recorded in thread-local
storage.

The --time-passes command line option prints a pass timing report to
stdout.
2017-12-06 17:04:23 -08:00
Tyler McMullen
3b937f5917 Add separate spiderwasm prologue/epilogue to intel's abi.rs 2017-12-05 11:49:12 -08:00
Tyler McMullen
5783ea2c9a Account for return address when reserving stack space for CSRs. 2017-12-05 11:49:12 -08:00
Tyler McMullen
a75248d2cf Move the initial stack pointer adjustment to after the CSR pushes. 2017-12-05 11:49:12 -08:00
Tyler McMullen
ebcbd54f61 Add 'compile' test and confirm the pro/epilogue is added. Fix regression this revealed. 2017-12-05 11:49:12 -08:00
Tyler McMullen
694658b949 Move entirety of prologue_epilogue logic to abi module. 2017-12-05 11:49:12 -08:00
Tyler McMullen
0fb59dc589 Fix the ordering of return values. 2017-12-05 11:49:12 -08:00
Tyler McMullen
c156eb9ff7 Refactor prologue_epilogue. Break out into functions. Remove Vecs. 2017-12-05 11:49:12 -08:00
Tyler McMullen
c78a191294 Use layout.last_inst to find 'return' opcodes, rather than iterating. 2017-12-05 11:49:12 -08:00
Tyler McMullen
66eccb7859 Use opcode's is_return() rather than pattern-matching. 2017-12-05 11:49:12 -08:00
Tyler McMullen
a26d438b30 Use returned Value from append_ebb_param in prologue_epilogue. 2017-12-05 11:49:12 -08:00
Tyler McMullen
ced39f5186 Fix up adjust_sp_imm instruction.
* Use imm64 rather than offset32
* Add predicate to enforce signed 32-bit limit to imm
* Remove AdjustSpImm format
* Add encoding tests for adjust_sp_imm
* Adjust use of adjust_sp_imm in Intel prologue_epilogue to match
2017-12-05 11:49:12 -08:00
Tyler McMullen
2f3edc1bc6 Fix issue in which CSR returns were incorrectly ordered. 2017-12-05 11:49:12 -08:00
Tyler McMullen
daa7a21d05 Adjust prologue/epilogue generation to work with 32-bit Intel arch. 2017-12-05 11:49:12 -08:00
Tyler McMullen
cf9d287bea Pop the callee-saved registers in the function epilogue(s). 2017-12-05 11:49:12 -08:00
Tyler McMullen
f31a764fc8 Save all callee-saved registers in prologue. 2017-12-05 11:49:12 -08:00
Tyler McMullen
b049916d35 Move %rsp to %rbp, not the reverse.
This also takes away %rsp and %rbp from regalloc again. This may cause
tests to fail temporarily.
2017-12-05 11:49:12 -08:00
Tyler McMullen
e0c8ab49e1 Only emit an adjust_sp_imm if the stack_size is non-zero. 2017-12-05 11:49:12 -08:00
Tyler McMullen
d12587f218 Assign explicit register location to result of frame-pointer pop. 2017-12-05 11:49:12 -08:00
Tyler McMullen
544c148b25 Fix the stack slots, so the locations of local var slots are correct. 2017-12-05 11:49:12 -08:00
Tyler McMullen
c1a64a5dc7 Insert a basic epilogue. 2017-12-05 11:49:12 -08:00
Tyler McMullen
60b6bc3ab7 Sketch of prologue generation 2017-12-05 11:49:12 -08:00
Tyler McMullen
ffab87318e Add adjust_sp_imm instruction. Note: This enables using rsp and rbp as normal registers. Which is... wrong. 2017-12-05 11:49:12 -08:00
Tyler McMullen
7956084121 Treat VmContext as positional when using Native CallConv (#195)
* Treat VMContext as standard positional argument when using Native CallConv.

This requires threading the CallConv through legalize_args and into ArgAssigner.

* Stash CallConv in the intel-specific Args struct, for use ArgAssigner.
2017-11-21 17:08:31 -08:00
Dan Gohman
889b06fd16 Replace as casts with type-conversion functions. 2017-11-08 10:48:44 -08:00
Jakob Stoklund Olesen
1b71285b34 Return bools in GPR registers.
Boolean types are returned in %rax, so regclass_for_abi_type() should
return GPR.

Fixes #179.
2017-10-25 13:34:55 -07:00
Jakob Stoklund Olesen
b3fb41087e Use the term "Function parameter" instead of "argument".
Rename the ArgumentType type to AbiParam since it describes the ABI
characteristics of a parameter or return value, not just the value type.

In Signature, rename members argument_types and return_types to "params"
and "returns". Again, they are not just types.

Fix a couple lingering references to "EBB arguments".
2017-10-19 17:39:23 -07:00
Jakob Stoklund Olesen
921bcc6c25 Use the term "EBB parameter" everywhere.
Add EBB parameter and EBB argument to the langref glossary to clarify
the distinction between formal EBB parameter values and arguments passed
to branches.

- Replace "ebb_arg" with "ebb_param" in function names that deal with
  EBB parameters.
- Rename the ValueDef variants to Result and Param.
- A bunch of other small langref fixes.

No functional changes intended.
2017-10-19 16:17:09 -07:00
Jakob Stoklund Olesen
5d065c4d8f Add encodings for CPU flags instructions.
Branch on flags: brif, brff,
Compare integers to flags: ifcmp
Compare floats to flags: ffcmp
Convert flags to b1: trueif, trueff
2017-10-16 13:07:23 -07:00
Jakob Stoklund Olesen
89a24b2f13 Rename ScalarType to LaneType.
The word "scalar" is a bit vague and tends to mean "non-vector". Since
we are about to add new CPU flag value types that can't appear as vector
lanes, make the distinction clear: LaneType represents value types that
can appear as a vector lane.

Also replace the Type::is_scalar() method with an is_vector() method.
2017-10-12 10:39:12 -07:00
Jakob Stoklund Olesen
ce4d723a73 Give RegClassData a reference to its parent RegInfo.
This makes it possible to materialize new RegClass references without
requiring a RegInfo reference to be passed around.

- Move the RegInfo::toprc() method to RegClassData.
- Rename RegClassData::intersect() to intersect_index() and provide a
  new intersect() which returns a register class.
- Remove some &RegInfo parameters that are no longer needed.
2017-10-04 17:02:09 -07:00
Jakob Stoklund Olesen
711e5cd644 Handle srem INT_MIN, -1 correctly.
The x86_divmodx traps on integer overflow, but the srem instruction is
not supposed to trap with a -1 divisor.

Generate a legalization expansion for srem that special-cases the -1
divisor to simply return 0.
2017-09-29 08:53:49 -07:00
Jakob Stoklund Olesen
2888ff5bf3 Fix a corner case in fcvt_to_sint.i32.f64 legalization.
An f64 can represent multiple values in the range INT_MIN-1 < x <=
INT_MIN which all truncate to INT_MIN, so comparing the input value
against INT_MIN is not good enough.

Instead, detect overflow on x <= INT_MIN-1 when INT_MIN-1 is an exact
floating point value.
2017-09-28 14:24:39 -07:00
Jakob Stoklund Olesen
8abcdac5a1 Legalize fcvt_to_sint and fcvt_to_uint for Intel64.
We need to generate traps on NaN and overflow.
2017-09-28 12:00:38 -07:00
Jakob Stoklund Olesen
34146435e5 Legalize unsigned-to-float conversions for Intel 64.
Also make sure we generate type checks for the controlling type variable
in legalization patterns. This is not needed for encodings since the
encoding tables are already keyed on the controlling type variable.
2017-09-28 11:39:19 -07:00
Jakob Stoklund Olesen
84471a8431 Add some very basic support for the Intel32 ABI.
In 32-bit mode, all function arguments are passed on the stack, not in
registers.

This ABI support is not complete or properly tested, but at least it
doesn't try to pass arguments in r8.
2017-09-27 12:55:34 -07:00
Jakob Stoklund Olesen
b6b474a8c9 Add Intel legalization for fmin and fmax.
The native x86_fmin and x86_fmax instructions don't behave correctly for
NaN inputs and when comparing +0.0 to -0.0, so we need separate branches
for those cases.
2017-09-27 12:55:34 -07:00