Commit Graph

31 Commits

Author SHA1 Message Date
Yury Delendik
de4af90af6 machinst x64: New backend unwind (#2266)
Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.
2020-10-23 15:19:41 -05:00
Nick Fitzgerald
e1c8878b33 cranelift_codegen::souper_harvest: Move preopt out of Context, into clif-util
This allows for more flexibility of when/where to harvest LHS candidates. For
example, we could choose to harvest candidates that overlap with and supercede
our current preopt peepholes.

This commit also makes sure that we compute the CFG before running preopt, when
harvesting LHS candidates via `clif-util souper-harvest`.
2020-09-14 16:27:47 -07:00
Nick Fitzgerald
3a6dd832c0 Harvest left-hand side superoptimization candidates.
Given a clif function, harvest all its integer subexpressions, so that they can
be fed into [Souper](https://github.com/google/souper) as candidates for
superoptimization. For some of these candidates, Souper will successfully
synthesize a right-hand side that is equivalent but has lower cost than the
left-hand side. Then, we can combine these left- and right-hand sides into a
complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value `x`, we do a post-order
traversal of the dataflow graph starting from `x`. As we do this traversal, we
maintain a map from clif values to their translated Souper values. We stop
traversing when we reach anything that can't be translated into Souper IR: a
memory load, a float-to-int conversion, a block parameter, etc. For values
produced by these instructions, we create a Souper `var`, which is an input
variable to the optimization. For instructions that have a direct mapping into
Souper IR, we get the Souper version of each of its operands and then create the
Souper version of the instruction itself. It should now be clear why we do a
post-order traversal: we need an instruction's translated operands in order to
translate the instruction itself. Once this instruction is translated, we update
the clif-to-souper map with this new translation so that any other instruction
that uses this result as an operand has access to the translated value. When the
traversal is complete we return the translation of `x` as the root of left-hand
side candidate.
2020-09-14 16:27:47 -07:00
Nick Fitzgerald
05bf9ea3f3 Rename "Stackmap" to "StackMap"
And "stackmap" to "stack_map".

This commit is purely mechanical.
2020-08-07 10:08:44 -07:00
Chris Fallin
72e6be9342 Rework of MachInst isel, branch fixups and lowering, and block ordering.
This patch includes:

- A complete rework of the way that CLIF blocks and edge blocks are
  lowered into VCode blocks. The new mechanism in `BlockLoweringOrder`
  computes RPO over the CFG, but with a twist: it merges edge blocks intto
  heads or tails of original CLIF blocks wherever possible, and it does
  this without ever actually materializing the full nodes-plus-edges
  graph first. The backend driver lowers blocks in final order so
  there's no need to reshuffle later.

- A new `MachBuffer` that replaces the `MachSection`. This is a special
  version of a code-sink that is far more than a humble `Vec<u8>`. In
  particular, it keeps a record of label definitions and label uses,
  with a machine-pluggable `LabelUse` trait that defines various types
  of fixups (basically internal relocations).

  Importantly, it implements some simple peephole-style branch rewrites
  *inline in the emission pass*, without any separate traversals over
  the code to use fallthroughs, swap taken/not-taken arms, etc. It
  tracks branches at the tail of the buffer and can (i) remove blocks
  that are just unconditional branches (by redirecting the label), (ii)
  understand a conditional/unconditional pair and swap the conditional
  polarity when it's helpful; and (iii) remove branches that branch to
  the fallthrough PC.

  The `MachBuffer` also implements branch-island support. On
  architectures like AArch64, this is needed to allow conditional
  branches within plausibly-attainable ranges (+/- 1MB on AArch64
  specifically). It also does this inline while streaming through the
  emission, without any sort of fixpoint algorithm or later moving of
  code, by simply tracking outstanding references and "deadlines" and
  emitting an island just-in-time when we're in danger of going out of
  range.

- A rework of the instruction selector driver. This is largely following
  the same algorithm as before, but is cleaned up significantly, in
  particular in the API: the machine backend can ask for an input arg
  and get any of three forms (constant, register, producing
  instruction), indicating it needs the register or can merge the
  constant or producing instruction as appropriate. This new driver
  takes special care to emit constants right at use-sites (and at phi
  inputs), minimizing their live-ranges, and also special-cases the
  "pinned register" to avoid superfluous moves.

Overall, on `bz2.wasm`, the results are:

    wasmtime full run (compile + runtime) of bz2:

    baseline:   9774M insns, 9742M cycles, 3.918s
    w/ changes: 7012M insns, 6888M cycles, 2.958s  (24.5% faster, 28.3% fewer insns)

    clif-util wasm compile bz2:

    baseline:   2633M insns, 3278M cycles, 1.034s
    w/ changes: 2366M insns, 2920M cycles, 0.923s  (10.7% faster, 10.1% fewer insns)

    All numbers are averages of two runs on an Ampere eMAG.
2020-05-16 23:08:22 -07:00
Julian Seward
0bc0503f3f Add a transformation pass which removes phi nodes to which it can demonstrate
that only one value ever flows.  Has been observed to improve generated code
run times by up to 8%.  Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.
2020-05-08 09:41:16 +02:00
Benjamin Bouvier
d50e956974 Remove an unnecessary mutable qualifier when passing a function to compile; 2020-04-21 12:12:56 +02:00
Peter Huene
f7e9f86ba9 Refactor unwind generation in Cranelift.
This commit makes the following changes to unwind information generation in
Cranelift:

* Remove frame layout change implementation in favor of processing the prologue
  and epilogue instructions when unwind information is requested.  This also
  means this work is no longer performed for Windows, which didn't utilize it.
  It also helps simplify the prologue and epilogue generation code.

* Remove the unwind sink implementation that required each unwind information
  to be represented in final form. For FDEs, this meant writing a
  complete frame table per function, which wastes 20 bytes or so for each
  function with duplicate CIEs.  This also enables Cranelift users to collect the
  unwind information and write it as a single frame table.

* For System V calling convention, the unwind information is no longer stored
  in code memory (it's only a requirement for Windows ABI to do so).  This allows
  for more compact code memory for modules with a lot of functions.

* Deletes some duplicate code relating to frame table generation.  Users can
  now simply use gimli to create a frame table from each function's unwind
  information.

Fixes #1181.
2020-04-16 11:15:32 -07:00
Chris Fallin
7da6101732 Merge pull request #1494 from cfallin/arm64-merge
Add new `MachInst` backend and ARM64 support.
2020-04-16 10:02:02 -07:00
Chris Fallin
48cf2c2f50 Address review comments:
- Undo temporary changes to default features (`all-arch`) and a
  signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
  "undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
  rust-lang/compiler-builtins [1], LLVM only supports stack probes on
  x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
  to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
  u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
  mutate the input IR.
- Lots of other miscellaneous cleanups.

[1] cae3e6ea23/src/probestack.rs (L39)
2020-04-15 17:21:28 -07:00
Chris Fallin
60990aeaae ARM64 backend, part 8 / 11: integration.
This patch ties together the new backend infrastructure with the
existing Cranelift codegen APIs.

With all patches in this series up to this patch applied, the ARM64
compiler is now functional and can be used. Two uses of this
functionality -- filecheck-based tests and integration into wasmtime --
will come in subsequent patches.
2020-04-11 17:52:37 -07:00
iximeow
4cca510085 Windows FPRs preservation (#1216)
Preserve FPRs as required by the Windows fastcall calling convention.

This exposes an implementation limit due to Cranelift's approach to stack layout, which conflicts with expectations Windows makes in SEH layout - functions where the Cranelift user desires fastcall unwind information, that require preservation of an ABI-reserved FPR, that have a stack frame 240 bytes or larger, now produce an error when compiled. Several wasm spectests were disabled because they would trip this limit. This is a temporary constraint that should be fixed promptly.

Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
2020-04-10 13:27:20 -07:00
Yury Delendik
bd88155483 Refactor unwind; add FDE support. (#1320)
* Refactor unwind

* add FDE support

* use sink directly in emit functions

* pref off all unwinding generation with feature
2020-01-13 10:32:55 -06:00
llogiq
0d8f8bc71f Fix some clippy warnings (#1277) 2019-12-07 09:47:43 -08:00
Peter Huene
8923bac7e8 Implement emitting Windows unwind information for fastcall functions. (#1155)
* Implement emitting Windows unwind information for fastcall functions.

This commit implements emitting Windows unwind information for x64 fastcall
calling convention functions.

The unwind information can be used to construct a Windows function table at
runtime for JIT'd code, enabling stack walking and unwinding by the operating
system.

* Address code review feedback.

This commit addresses code review feedback:

* Remove unnecessary unsafe code.
* Emit the unwind information always as little endian.
* Fix comments.

A dependency from cranelift-codegen to the byteorder crate was added.
The byteorder crate is a no-dependencies crate with a reasonable
abstraction for writing binary data for a specific endianness.

* Address code review feedback.

* Disable default features for the `byteorder` crate.
* Add a comment regarding the Windows ABI unwind code numerical values.
* Panic if we encounter a Windows function with a prologue greater than 256
  bytes in size.
2019-11-05 13:14:30 -08:00
bjorn3
bb8fa40ef0 Rustfmt 2019-10-02 11:50:44 -07:00
bjorn3
10e226f9ff Always use extern crate std in cranelift-codegen 2019-10-02 11:50:44 -07:00
julian-seward1
9e088e4164 Reorganise optimisation level settings, and make the insn shrink pass optional (#1044)
This patch:

* removes the "default" opt level, on the basis that it has no definition and
  is referred to nowhere in the compiler.

* renames the "fastest" level to "none".  The resulting set of transformations
  is unchanged.

* renames the "best" level to "speed_and_size".  The resulting set of
  transformations is unchanged.

* adds a new level, "speed".  This is the same as "speed_and_size" except that
  it omits transformations aimed only at reducing code size.  Currently it
  omits only the insn shrinking pass.
2019-09-19 18:51:25 +02:00
Benjamin Bouvier
cad2074587 [codegen] Don't simplify an operation if it would result in unnecessary legalization;
Converting something like iadd.i64 on a 32-bits architecture into a
iadd_imm.i64 will result in the instruction being legalized back to an
iadd.i64 later on, creating unnecessary churn.

This commit implements avoid doing so, and changes the target ISA to a
64-bits platform for tests than ran into this, as well as making sure
this won't happen on 32-bits platforms.
2019-09-10 19:50:34 +02:00
Andrew Brown
7f152611a4 Log compiled and legalized functions 2019-09-10 10:45:12 -07:00
Nicolas B. Pierron
7e398af999 Basic-block: Insert regmove instructions in new blocks dedicated to hold the diversions. 2019-09-05 14:55:35 +02:00
Nicolas B. Pierron
381578311c Split edges to have a block to add regmove & copy instructions.
When using basic block instructions cannot be added in-between jump instructions which are ending basic blocks. These changes create extra basic blocks such that extra space is available for the spilling and moving registers where they are expected.
2019-08-30 18:44:35 +02:00
julian-seward1
b8fb52446c Cranelift: implement redundant fill removal on tree-shaped CFG regions. Mozilla bug 1570584. (#906) 2019-08-25 19:37:34 +02:00
Carmen Kwan
19257f80c1 Add reference types R32 and R64
-Add resumable_trap, safepoint, isnull, and null instructions
-Add Stackmap struct and StackmapSink trait

Co-authored-by: Mir Ahmed <mirahmed753@gmail.com>
Co-authored-by: Dan Gohman <sunfish@mozilla.com>
2019-08-16 11:35:16 -07:00
Sean Stangl
c7b4b98cac Add a fold_redundant_jumps() pass to the branch relaxation phase. (#887) 2019-08-09 15:30:11 -06:00
Benjamin Bouvier
d7d48d5cc6 Add the dyn keyword before trait objects; 2019-06-24 11:42:26 +02:00
Lars T Hansen
420850adf0 Record information about sections of emitted code+data.
The result of the emitter is a vector of bytes holding machine code,
jump tables, and (in the future) other read-only data.  Some clients,
notably Firefox's Wasm compiler, needs to separate the machine code
from the data in order to insert more code directly after the code
generated by Cranelift.

To make such separation possible, we record more information about the
emitted bytes: the sizes of each of the sections of code, jump tables,
and read-only data, as well as the locations within the code that
reference (PC-relatively) the jump tables and read-only data.
2019-05-31 08:39:57 +02:00
Yury Delendik
8f95c51730 Reconstruct locations of the original source variable 2019-05-09 00:35:44 -07:00
Tyler McMullen
947130be81 Add branch order optimization to simple_preopt to encourage fallthroughs. 2019-04-17 13:15:41 +02:00
Lars T Hansen
141ccb9e9d Add a --disasm option to clif-util wasm and compile (#713)
- Both the `wasm` and `compile` commands get this new subcommand, and it defaults to false.  This means that test runs with `wasm` can request disassembly (the main reason I am doing this) while test runs with `compile` now must request it, this changes current behavior.
- Switch to using context.compile_and_emit directly, and make the reloc and trap printers just accumulate output, not print it.  This allows us to factor the printing code into the disasm module.
2019-03-27 12:57:13 +01:00
lazypassion
747ad3c4c5 moved crates in lib/ to src/, renamed crates, modified some files' text (#660)
moved crates in lib/ to src/, renamed crates, modified some files' text (#660)
2019-01-28 15:56:54 -08:00