Commit Graph

5 Commits

Author SHA1 Message Date
Chris Fallin
456561f431 x64 and aarch64: allow StructArgument and StructReturn args.
The StructReturn ABI is fairly simple at the codegen/isel level: we only
need to take care to return the sret pointer as one of the return values
if that wasn't specified in the initial function signature.

Struct arguments are a little more complex. A struct argument is stored
as a chunk of memory in the stack-args space. However, the CLIF
semantics are slightly special: on the caller side, the parameter passed
in is a pointer to an arbitrary memory block, and we must memcpy this
data to the on-stack struct-argument; and on the callee side, we provide
a pointer to the passed-in struct-argument as the CLIF block param
value.

This is necessary to support various ABIs other than Wasm, such as that
of Rust (with the cg_clif codegen backend).
2021-01-17 23:11:45 -08:00
Chris Fallin
6eea015d6c Multi-register value support: framework for Values wider than machine regs.
This will allow for support for `I128` values everywhere, and `I64`
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to *reason* about a `Value` residing in more than
one register.
2021-01-05 17:45:02 -08:00
Chris Fallin
3c8cb7b908 MachInst lowering logic: allow effectful instructions to merge.
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) *into* other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.

The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.

The first insight is that given the right basic conditions, i.e.  that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
*one less* than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming *out* of the first (effectful def) instruction and *in* to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.

The second insight is that we can keep state as we scan, tracking the
"current color", and *update* this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively *re-color* every
instruction it moved over; this may allow further sinks.

Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):

```
  v0 = load x
  v1 = load y
  v2 = add v0, 1
  v3 = add v1, 1
```

Scanning from bottom to top, we first see the add producing `v3` and we
can sink the load producing `v1` into it, producing a load + ALU-op
machine instruction. This is legal because `v1` moves over only `v2`,
which is a pure instruction. Consider, though, `v2`: under a simple
scheme that has no other context, `v0` could not sink to `v2` because it
would move over `v1`, another load. But because we already sunk `v1`
down to `v3`, we are free to sink `v0` to `v2`; the update of the
"current color" during the scan allows this.

This PR also cleans up the `LowerCtx` interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly from `LowerCtx::get_input()`, it now returns
zero or more of (constant, mergable inst) from
`LowerCtx::maybe_get_input_as_source_or_const()`, and returns the
register only from `LowerCtx::put_input_in_reg()`. This removes the need
to explicitly denote uses of the register, so it's a little safer.

Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize the `x64` backend by using direct-memory operands.
2020-11-16 14:53:45 -08:00
Chris Fallin
5df8840483 Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.
`lucetc` currently *almost*, but not quite, works with the new x64
backend; the only missing piece is support for the particular
instructions emitted as part of its prologue stack-check.

We do not normally see `brff`, `brif`, or `ifcmp_sp` in CLIF generated by
`cranelift-wasm` without the old-backend legalization rules, so these
were not supported in the new x64 backend as they were not necessary for
Wasm MVP support. Using them resulted in an `unimplemented!()` panic.

This PR adds support for `brff` and `brif` analogously to how AArch64
implements them, by pattern-matching the `ifcmp` / `ffcmp` directly.
Then `ifcmp_sp` is a straightforward variant of `ifcmp`.

Along the way, this also removes the notion of "fallthrough block" from
the branch-group lowering method; instead, `fallthrough` instructions
are handled as normal branches to their explicitly-provided targets,
which (in the original CLIF) match the fallthrough block. The reason for
this is that the block reordering done as part of lowering can change
the fallthrough block. We were not using `fallthrough` instructions in
the output produced by `cranelift-wasm`, so this, too, was not
previously caught.

With these changes, the `lucetc` crate in Lucet passes all tests with
the `x64` feature-flag added to its `cranelift-codegen` dependency.
2020-11-11 13:43:39 -08:00
Jakub Krauz
f6a140a662 arm32 codegen
This commit adds arm32 code generation for some IR insts.
Floating-point instructions are not supported, because regalloc
does not allow to represent overlapping register classes,
which are needed by VFP/Neon.

There is also no support for big-endianness, I64 and I128 types.
2020-09-22 12:49:42 +02:00