Commit Graph

1601 Commits

Author SHA1 Message Date
Benjamin Bouvier
79cff73da5 machinst x64: implement loads/stores for v128 SIMD types;
This made it possible to enable more SIMD tests from the spec test suite
too.
2020-09-23 16:42:03 +02:00
Jakub Krauz
f6a140a662 arm32 codegen
This commit adds arm32 code generation for some IR insts.
Floating-point instructions are not supported, because regalloc
does not allow to represent overlapping register classes,
which are needed by VFP/Neon.

There is also no support for big-endianness, I64 and I128 types.
2020-09-22 12:49:42 +02:00
Chris Fallin
1c7fa7f785 Merge pull request #2181 from jgouly/madd-opt
arm64: Combine mul + add into madd
2020-09-15 11:52:33 -07:00
Joshua Nelson
d28abad441 Upgrade to target-lexicon 0.11
This allows downstream library users to use `CDataModel` without having
to install two different versions of target-lexicon.
2020-09-15 11:40:09 -07:00
Nick Fitzgerald
e1c8878b33 cranelift_codegen::souper_harvest: Move preopt out of Context, into clif-util
This allows for more flexibility of when/where to harvest LHS candidates. For
example, we could choose to harvest candidates that overlap with and supercede
our current preopt peepholes.

This commit also makes sure that we compute the CFG before running preopt, when
harvesting LHS candidates via `clif-util souper-harvest`.
2020-09-14 16:27:47 -07:00
Nick Fitzgerald
c87aaeeece cranelift_codegen::souper_harvest: Update TODOs to include more instructions 2020-09-14 16:27:47 -07:00
Nick Fitzgerald
b2acec1164 Harvest integer comparisons into Souper left-hand side candidates 2020-09-14 16:27:47 -07:00
Nick Fitzgerald
5a87171121 Do not use the matches! macro so we work with older rustc versions 2020-09-14 16:27:47 -07:00
Nick Fitzgerald
3a6dd832c0 Harvest left-hand side superoptimization candidates.
Given a clif function, harvest all its integer subexpressions, so that they can
be fed into [Souper](https://github.com/google/souper) as candidates for
superoptimization. For some of these candidates, Souper will successfully
synthesize a right-hand side that is equivalent but has lower cost than the
left-hand side. Then, we can combine these left- and right-hand sides into a
complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value `x`, we do a post-order
traversal of the dataflow graph starting from `x`. As we do this traversal, we
maintain a map from clif values to their translated Souper values. We stop
traversing when we reach anything that can't be translated into Souper IR: a
memory load, a float-to-int conversion, a block parameter, etc. For values
produced by these instructions, we create a Souper `var`, which is an input
variable to the optimization. For instructions that have a direct mapping into
Souper IR, we get the Souper version of each of its operands and then create the
Souper version of the instruction itself. It should now be clear why we do a
post-order traversal: we need an instruction's translated operands in order to
translate the instruction itself. Once this instruction is translated, we update
the clif-to-souper map with this new translation so that any other instruction
that uses this result as an operand has access to the translated value. When the
traversal is complete we return the translation of `x` as the root of left-hand
side candidate.
2020-09-14 16:27:47 -07:00
Johnnie Birch
07d0d32b69 Adds i64x2.mul for the new backend targeting x64 2020-09-11 13:17:42 -07:00
Joey Gouly
22369cfa0d arm64: Combine mul + add into madd
Copyright (c) 2020, Arm Limited.
2020-09-11 18:06:19 +01:00
Benjamin Bouvier
3849dc18b1 machinst x64: revamp integer immediate emission;
In particular:

- try to optimize the integer emission into a 32-bit emission, when the
high bits are all zero, and stop relying on the caller of `imm_r` to
ensure this.
- rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`.
- generate a sign-extending mov 32-bit immediate to 64-bits, whenever
possible.
- fix a few places where the previous commit did introduce the
generation of zero-constants with xor, when calling `put_input_to_reg`,
thus clobbering the flags before they were read.
2020-09-11 18:13:30 +02:00
Benjamin Bouvier
d9052d0a9c machinst x64: generate copies of constants during lowering; 2020-09-11 17:41:44 +02:00
Benjamin Bouvier
cace32746f machinst x64: pattern-match addresses that are base+cst index; 2020-09-11 17:41:44 +02:00
Benjamin Bouvier
a1bdf11602 machinst x64: fix gen_store_base_offset for multi-value returns;
The previous method assumed that this could be used only for I64 values,
but this is actually used for multi-value returns, which can have any
type.
2020-09-10 11:17:41 +02:00
Chris Fallin
bd3ba0a774 Merge pull request #2189 from bnjbvr/x64-refactor-sub
machinst x64: a few small refactorings/renamings
2020-09-09 12:40:59 -07:00
Benjamin Bouvier
b4a2dd37a4 machinst x64: rename input_to_reg to put_input_to_reg;
Eventually, we should be able to unify this function's implementation
with the aarch64 one; but the latter does much more, and this would
require abstractions brought up in another pending PR#2142.
2020-09-09 18:03:59 +02:00
Benjamin Bouvier
cb96d16ac7 machinst x64: inline helper used only once; 2020-09-09 18:03:59 +02:00
Benjamin Bouvier
7a833f442a machinst: common up some instruction data helpers; 2020-09-09 18:03:59 +02:00
Benjamin Bouvier
a835c247c0 machinst: make get_output_reg target independent; 2020-09-09 18:03:59 +02:00
Benjamin Bouvier
6a3c4fb54e machinst x64: rename output_to_reg to get_output_reg; 2020-09-09 18:03:59 +02:00
Benjamin Bouvier
9620ce6bdf machinst x64: mask shift count too; 2020-09-09 18:03:59 +02:00
Benjamin Bouvier
9c328cc64b machinst x64: Remove unfinished comment; 2020-09-09 18:03:59 +02:00
Anton Kirilov
f612e8e7b2 AArch64: Add various missing SIMD bits
In addition, improve the code for stack pointer manipulation.

Copyright (c) 2020, Arm Limited.
2020-09-09 13:37:50 +01:00
Chris Fallin
e8f772c1ac x64 new backend: port ABI implementation to shared infrastructure with AArch64.
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
2020-09-08 17:59:01 -07:00
Chris Fallin
3d6c4d312f Merge pull request #2187 from akirilov-arm/ALUOp3
AArch64: Introduce an enum for ternary integer operations
2020-09-08 12:57:59 -07:00
Chris Fallin
e913bcb26a Merge pull request #2179 from jgouly/mvn
arm64: Don't always materialise a 64-bit constant
2020-09-08 09:17:08 -07:00
bjorn3
9428480230 Merge SignExtendAlAh and SignExtendRaxRdx 2020-09-08 15:00:24 +02:00
bjorn3
3dcda164dc Fix nits 2020-09-08 15:00:24 +02:00
bjorn3
9999913a31 Fix sign extension
Co-authored-by: Max Graey <maxgraey@gmail.com>
2020-09-08 15:00:24 +02:00
bjorn3
067255ef45 x64: Implement rotl and rotr for small integers 2020-09-08 15:00:24 +02:00
bjorn3
4251a950ba x64: Implement ishl, ushr and sshr for small integers 2020-09-08 15:00:24 +02:00
bjorn3
cc35f1e9bb x64: Misc small integer fixes 2020-09-08 15:00:24 +02:00
bjorn3
ce033f2a0c x64: Fix udiv and sdiv for 8bit integers 2020-09-08 15:00:24 +02:00
bjorn3
74642b166f x64: Implement ineg and bnot 2020-09-08 15:00:24 +02:00
Anton Kirilov
e92f949663 AArch64: Introduce an enum for ternary integer operations
This commit performs a small cleanup in the AArch64 backend - after
the MAdd and MSub variants have been extracted, the ALUOp enum is
used purely for binary integer operations.

Also, Inst::Mov has been renamed to Inst::Mov64 for consistency.

Copyright (c) 2020, Arm Limited.
2020-09-08 13:22:22 +01:00
Johnnie Birch
a64af55cda Adds x64 packed negation for the new backend 2020-09-07 11:56:05 -07:00
Joey Gouly
650d48cd84 arm64: Don't always materialise a 64-bit constant
This improves the mov/movk/movn sequnce when the high half of the
64-bit value is all zero.

Copyright (c) 2020, Arm Limited.
2020-09-01 13:29:01 +01:00
Benjamin Bouvier
a7f7c23bf9 machinst aarch64: in baldrdash, allow returning only one value across register classes;
Baldrdash's API requires that there is at most one result in a register,
across all the possible register classes: in particular, it's not
possible to return an i64 value in a register while returning an v128
value in another register.

This patch adds a notion of "remaining register values", so this is
properly taking into account when choosing whether a return value may be
put into a register or not.
2020-08-31 12:36:26 +02:00
Julian Seward
8ac4bd1d0d CL/newBE/x64: Lowering of scalar shifts: fix shift-by-imm generation
The logic for generation of shifts-by-immediate was not quite right.  The result was that even
shifts by an amount known at compile time were being done by moving the shift immediate into %cl
and then doing a variable shift by %cl.  The effect is worse than it sounds, because all of
those shift constants are small and often used in multiple places, so they were GVN'd up and
often ended up at the entry block of the function.  Hence these were connected to the use points
by long live ranges which got spilled.  So all in all, most of the win here comes from avoiding
spilling.

The problem was caused by this line, in the `Opcode::Ishl | Opcode::Ushr ..` case:
```
   let (count, rhs) = if let Some(cst) = ctx.get_constant(inputs[1].insn) {
```
`inputs[]` appears to refer to this CLIF instruction's inputs, and bizarrely `inputs[].insn` all
refer to the instruction (the shift) itself.  Hence `ctx.get_constant(inputs[1].insn)` asks
"does this shift instruction produce a constant" to which the answer is always "no", so the
shift-by-unknown amount code is always generated.  The fix here is to change that expression to
```
   let (count, rhs) = if let Some(cst) = ctx.get_input(insn, 1).constant {
```
`get_input`'s result conveniently includes a `constant` field of type `Option<u64>`, so we just
use that instead.
2020-08-27 11:48:35 +02:00
Benjamin Bouvier
7c85654285 Address review comments. 2020-08-24 17:00:30 +02:00
Benjamin Bouvier
ee76e01efc machinst: fix the pinned reg hack;
The pinned register hack didn't work because the GetPinnedReg is marked
as having side-effects, so that GVN wouldn't try to common it out.

This commit tweaks the function used during lowering to vcode, so that
the GetPinnedReg opcode is specially handled. It's a bit lame, but it
makes the hack work again.

Also, use_input needs to be a no-op for real registers.
2020-08-24 17:00:30 +02:00
Benjamin Bouvier
efff43e769 machinst x64: fold address modes on loads/stores; 2020-08-24 17:00:30 +02:00
Benjamin Bouvier
b830ee79de machinst x64: commute operands of integer operations if one input is an immediate; 2020-08-24 17:00:30 +02:00
Benjamin Bouvier
cca10b87cb machinst x64: optimize select/brz/brnz when the input is a comparison; 2020-08-24 17:00:30 +02:00
Julian Seward
620e4b4e82 This patch fills in the missing pieces needed to support wasm atomics on newBE/x64.
It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`,
`AtomicLoad`, `AtomicStore` and `Fence`.

The translation is straightforward.  `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad`
becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a
normal store followed by `mfence`, and `Fence` becomes `mfence`.  `AtomicRmw` is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to `cmpxchg` it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation.  `AtomicRmw` could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required.  Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously.  The `LegacyPrefix`
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.
2020-08-24 11:50:06 +02:00
Anton Kirilov
b895ac0e40 AArch64: Implement SIMD conversions
Copyright (c) 2020, Arm Limited.
2020-08-21 18:03:50 +01:00
Andrew Brown
2767b2efc6 machinst x64: add Inst::[move|load|store] for choosing the correct x86 instruction
This change primarily adds the ability to lower packed `[move|load|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move|load|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).
2020-08-20 12:37:22 -07:00
Andrew Brown
cf598dc35b machinst x64: add packed moves for different vector types 2020-08-20 12:37:22 -07:00
Chris Fallin
debacec1c5 Merge pull request #2150 from jgouly/mul64s
arm64: Implement SIMD i64x2 multiply
2020-08-20 11:57:56 -07:00