Commit Graph

210 Commits

Author SHA1 Message Date
Chris Fallin
c84d6be6f4 Detailed debug-info (DWARF) support in new backends (initially x64).
This PR propagates "value labels" all the way from CLIF to DWARF
metadata on the emitted machine code. The key idea is as follows:

- Translate value-label metadata on the input into "value_label"
  pseudo-instructions when lowering into VCode. These
  pseudo-instructions take a register as input, denote a value label,
  and semantically are like a "move into value label" -- i.e., they
  update the current value (as seen by debugging tools) of the given
  local. These pseudo-instructions emit no machine code.

- Perform a dataflow analysis *at the machine-code level*, tracking
  value-labels that propagate into registers and into [SP+constant]
  stack storage. This is a forward dataflow fixpoint analysis where each
  storage location can contain a *set* of value labels, and each value
  label can reside in a *set* of storage locations. (Meet function is
  pairwise intersection by storage location.)

  This analysis traces value labels symbolically through loads and
  stores and reg-to-reg moves, so it will naturally handle spills and
  reloads without knowing anything special about them.

- When this analysis converges, we have, at each machine-code offset, a
  mapping from value labels to some number of storage locations; for
  each offset for each label, we choose the best location (prefer
  registers). Note that we can choose any location, as the symbolic
  dataflow analysis is sound and guarantees that the value at the
  value_label instruction propagates to all of the named locations.

- Then we can convert this mapping into a format that the DWARF
  generation code (wasmtime's debug crate) can use.

This PR also adds the new-backend variant to the gdb tests on CI.
2021-01-21 15:59:49 -08:00
bjorn3
81d248c057 Implement Mach-O TLS access for x64 newBE 2021-01-21 18:25:56 +01:00
Chris Fallin
0f563f786a Add ELF TLS support in new x64 backend.
This follows the implementation in the legacy x86 backend, including
hardcoded sequence that is compatible with what the linker expects. We
could potentially do better here, but it is likely not necessary.

Thanks to @bjorn3 for a bugfix to an earlier version of this.
2021-01-17 22:48:51 -08:00
Chris Fallin
71ead6e31d x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).

The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.

Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:

- fix REX prefix logic for some 8-bit instructions.

  When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
  semantics are somewhat subtle: without the REX prefix, register numbers
  4--7 correspond to the second-to-lowest byte of the first four registers
  (AH, CH, BH, DH), whereas with the REX prefix, these register numbers
  correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
  emit a REX byte for instructions with 8-bit cases (this is harmless even
  if unneeded), but this would unnecessarily inflate code size; instead,
  the usual approach is to emit it only for these registers.

  This logic was present in some cases but missing for some other
  instructions: divide, not, negate, shifts.

  Fixes #2508.

- avoid unaligned SSE loads on some f64 ops.

  The implementations of several FP ops, such as fabs/fneg, used SSE
  instructions. This is not a problem per-se, except that load-op merging
  did not take *alignment* into account. Specifically, if an op on an f64
  loaded from memory happened to merge that load, and the instruction into
  which it was merged was an SSE instruction, then the SSE instruction
  imposes stricter (128-bit) alignment requirements than the load.f64 did.

  This PR simply forces any instruction lowerings that could use SSE
  instructions to implement non-SIMD operations to take inputs in
  registers only, and avoid load-op merging.

  Fixes #2507.

- two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.

  - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
    the remainder in AH, not RDX, different from all the other width-forms
    of this instruction.

  - select.b1: we were not recognizing selects of boolean values as
    integer-typed operations, so we were generating XMM moves instead (!).
2021-01-14 13:45:50 -08:00
Chris Fallin
81bc811236 Merge pull request #2558 from cfallin/pic-symbol-refs
x64: support PC-rel symbol references using the GOT when in PIC mode.
2021-01-08 10:03:10 -08:00
Chris Fallin
3ee898cb2c x64: support PC-rel symbol references using the GOT when in PIC mode. 2021-01-07 22:46:56 -08:00
Chris Fallin
dbd2241b60 x64: handle tests of b1 values correctly (only LSB is defined).
Previously, `select` and `brz`/`brnz` instructions, when given a `b1`
boolean argument, would test whether that boolean argument was nonzero,
rather than whether its LSB was nonzero. Since our invariant for mapping
CLIF state to machine state is that bits beyond the width of a value are
undefined, the proper lowering is to test only the LSB.

(aarch64 does not have the same issue because its `Extend` pseudoinst
already properly handles masking of b1 values when a zero-extend is
requested, as it is for select/brz/brnz.)

Found by Nathan Ringo on Zulip [1] (thanks!).

[1]
https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s
2021-01-05 14:45:46 -08:00
Yury Delendik
2964023a77 [SIMD][x86_64] Add encoding for PMADDWD (#2530)
* [SIMD][x86_64] Add encoding for PMADDWD

* also for "experimental_x64"
2020-12-24 07:52:50 -06:00
Johnnie Birch
f705a72aeb Refactor packed moves to use xmm_mov instead of xmm_rm_r
Refactors previous packed move implementation to use xmm_mov
instead of xmm_rm_r which looks to simplify register accounting
during lowering.
2020-12-16 17:13:27 -08:00
Johnnie Birch
51973aefbb Implements x64 SIMD loads for the new backend. 2020-12-16 17:13:27 -08:00
Chris Fallin
2cec20aa57 Merge pull request #2486 from cfallin/fix-probestack
Two Lucet-related fixes to stack overflow handling.
2020-12-07 16:47:37 -08:00
Chris Fallin
3a01d14712 Two Lucet-related fixes to stack overflow handling.
Lucet uses stack probes rather than explicit stack limit checks as
Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I
previously was not running some Lucet runtime tests with the new
backend, so was missing some test failures due to missing pieces in the
new backend.

This PR adds (i) calls to probestack, when enabled, in the prologue of
every function with a stack frame larger than one page (configurable via
flags); and (ii) trap metadata for every instruction on x86-64 that can
access the stack, hence be the first point at which a stack overflow is
detected when the stack pointer is decremented.
2020-12-07 16:08:53 -08:00
Johnnie Birch
a33e755cb2 Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest 2020-12-02 13:44:51 -08:00
Johnnie Birch
2cc501427e Add remaining X86_64 support for pack w/ signed/unsigned saturation
Adds lowering for packssdw, packusdw, packuswb
2020-11-22 23:14:29 -08:00
Johnnie Birch
124096735b Add support for palignr for X86_64 vcode backend 2020-11-22 22:14:02 -08:00
Johnnie Birch
615a575da1 Add support for x86_64 packed move lowering for the vcode backend 2020-11-22 20:23:00 -08:00
Chris Fallin
073c727a74 x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap.
This end result was previously enacted by carrying a `SourceLoc` on
every load/store, which was somewhat cumbersome, and only indirectly
encoded metadata about a memory reference (can it trap) by its presence
or absence. We have a type for this -- `MemFlags` -- that tells us
everything we might want to know about a load or store, and we should
plumb it through to code emission instead.

This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load
and store `Inst` variants on aarch64. These two choices seem to factor
things out in the nicest way: there are relatively few load/store insts
on aarch64 but many addressing modes, while the opposite is true on x64.
2020-11-17 11:43:06 -08:00
Andrew Brown
8ba92853be [machinst x64]: add punpack[hl]bw instructions 2020-11-12 14:21:45 -08:00
Andrew Brown
8131b15921 [machinst x64]: allow addressing of constants 2020-11-12 14:21:45 -08:00
Chris Fallin
4dce51096d MachInst backends: handle SourceLocs out-of-band, not in Insts.
In existing MachInst backends, many instructions -- any that can trap or
result in a relocation -- carry `SourceLoc` values in order to propagate
the location-in-original-source to use to describe resulting traps or
relocation errors.

This is quite tedious, and also error-prone: it is likely that the
necessary plumbing will be missed in some cases, and in any case, it's
unnecessarily verbose.

This PR factors out the `SourceLoc` handling so that it is tracked
during emission as part of the `EmitState`, and plumbed through
automatically by the machine-independent framework. Instruction emission
code that directly emits trap or relocation records can query the
current location as necessary. Then we only need to ensure that memory
references and trap instructions, at their (one) emission point rather
than their (many) lowering/generation points, are wired up correctly.

This does have the side-effect that some loads and stores that do not
correspond directly to user code's heap accesses will have unnecessary
but harmless trap metadata. For example, the load that fetches a code
offset from a jump table will have a 'heap out of bounds' trap record
attached to it; but because it is bounds-checked, and will never
actually trap if the lowering is correct, this should be harmless.  The
simplicity improvement here seemed more worthwhile to me than plumbing
through a "corresponds to user-level load/store" bit, because the latter
is a bit complex when we allow for op merging.

Closes #2290: though it does not implement a full "metadata" scheme as
described in that issue, this seems simpler overall.
2020-11-10 15:46:53 -08:00
Andrew Brown
83f182b390 Implement initial emission of constants
This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.
2020-11-05 14:25:02 -08:00
Andrew Brown
6725b6b129 [machinst x64]: implement bitmask 2020-10-28 15:16:36 -07:00
Johnnie Birch
8bbe6a25a9 Add support for packed float to signed int conversion
Implements i32x4.trunc_sat_f32x4_s
2020-10-28 13:02:50 -07:00
Chris Fallin
c35904a8bf Merge pull request #2278 from akirilov-arm/load_splat
Introduce the Cranelift IR instruction `LoadSplat`
2020-10-28 12:54:03 -07:00
Johnnie Birch
f27c0f3434 Adds support for signed packed integer conversion to float
f32x4.convert_i32x4_s
2020-10-16 14:16:53 -07:00
Andrew Brown
d990dd4c9a [machinst x64]: add source locations to more instruction formats
In order to register traps for `load_splat`, several instruction formats need knowledge of `SourceLoc`s; however, since the x64 backend does not correctly and completely register traps for `RegMem::Mem` variants I opened https://github.com/bytecodealliance/wasmtime/issues/2290 to discuss and resolve this issue. In the meantime, the current behavior (i.e. remaining largely unaware of `SourceLoc`s) is retained.
2020-10-14 09:43:33 -07:00
Andrew Brown
3c55523d40 [machinst x64]: implement packed and, and_not, xor, or 2020-10-09 10:04:50 -07:00
Andrew Brown
c8cce5d2d7 [machinst x64]: enable packed saturated arithmetic 2020-10-08 08:46:20 -07:00
Benjamin Bouvier
116acb8dcd machinst x64: emit nop of variable sizes; 2020-10-08 10:05:57 +02:00
Benjamin Bouvier
a470f1e0cd machinst x64: remove dead code and allow(dead_code) annotation;
The BranchTarget is always used as a label, so just use a plain
MachLabel in this case.
2020-10-08 10:05:57 +02:00
Benjamin Bouvier
e32e6fb612 machinst x64: check SSE requirements for instructions against enabled features; 2020-10-08 09:21:51 +02:00
Benjamin Bouvier
c5bbc87498 machinst: allow passing constant information to the instruction emitter;
A new associated type Info is added to MachInstEmit, which is the
immutable counterpart to State. It can't easily be constructed from an
ABICallee, since it would require adding an associated type to the
latter, and making so leaks the associated type in a lot of places in
the code base and makes the code harder to read. Instead, the EmitInfo
state can simply be passed to the `Vcode::emit` function directly.
2020-10-08 09:21:51 +02:00
Benjamin Bouvier
df8f85f4bc machinst x64: remove non_camel_case_types; 2020-10-05 17:44:31 +02:00
Benjamin Bouvier
4a10a78e33 machinst x64: remove non_snake_case; 2020-10-05 17:44:31 +02:00
Andrew Brown
16a2538ecd [machinst x64]: rename Inst::XmmUninitializedValue and document
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
2020-10-02 08:29:31 -07:00
Andrew Brown
50b9399006 [machinst x64]: lower remaining lane operations--any_true, all_true, splat 2020-10-02 08:29:31 -07:00
Andrew Brown
0579e9f9de [machinst x64]: add packed OR 2020-10-02 08:29:31 -07:00
Andrew Brown
74226d6781 [machinst x64]: add integer comparisons 2020-10-02 08:29:31 -07:00
Andrew Brown
050f078f86 [machinst x64]: add saturating addition implementation 2020-09-29 08:45:12 -07:00
Andrew Brown
a64abf9b76 [machinst x64]: add shuffle implementation 2020-09-29 08:45:12 -07:00
Andrew Brown
f4836f9ca9 [machinst x64]: add extractlane implementation 2020-09-29 08:45:12 -07:00
Andrew Brown
29fa894790 [machinst x64]: add insertlane implementation 2020-09-29 08:45:12 -07:00
Andrew Brown
ac2bf9d246 [machinst x64]: add packed min/max implementations 2020-09-23 15:40:46 -07:00
Andrew Brown
7546d98844 [machinst x64]: add avg_round implementation 2020-09-23 15:40:46 -07:00
Andrew Brown
b202464fa0 [machinst x64]: add iabs implementation 2020-09-23 15:40:46 -07:00
Benjamin Bouvier
3849dc18b1 machinst x64: revamp integer immediate emission;
In particular:

- try to optimize the integer emission into a 32-bit emission, when the
high bits are all zero, and stop relying on the caller of `imm_r` to
ensure this.
- rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`.
- generate a sign-extending mov 32-bit immediate to 64-bits, whenever
possible.
- fix a few places where the previous commit did introduce the
generation of zero-constants with xor, when calling `put_input_to_reg`,
thus clobbering the flags before they were read.
2020-09-11 18:13:30 +02:00
bjorn3
9428480230 Merge SignExtendAlAh and SignExtendRaxRdx 2020-09-08 15:00:24 +02:00
bjorn3
067255ef45 x64: Implement rotl and rotr for small integers 2020-09-08 15:00:24 +02:00
bjorn3
ce033f2a0c x64: Fix udiv and sdiv for 8bit integers 2020-09-08 15:00:24 +02:00
bjorn3
74642b166f x64: Implement ineg and bnot 2020-09-08 15:00:24 +02:00