Commit Graph

9391 Commits

Author SHA1 Message Date
Benjamin Bouvier
2c1d370465 CI: use fixed version of Rust nightly following build failures in staticvec 2020-07-28 12:29:49 +02:00
Benjamin Bouvier
7f109a5198 machinst x64: use a sign-extension when loading jump table offsets;
The jump table offset that's loaded out of the jump table could be
signed (if it's an offset to before the jump table itself), so we should
use a signed extension there, not an unsigned extension.
2020-07-28 12:29:49 +02:00
Chris Fallin
8fd92093a4 Merge pull request #2061 from cfallin/aarch64-amode
Aarch64 codegen quality: support more general add+extend address computations.
2020-07-27 13:48:55 -07:00
Chris Fallin
f9b98f0ddc Aarch64 codegen quality: support more general add+extend computations.
Previously, our pattern-matching for generating load/store addresses was
somewhat limited. For example, it could not use a register-extend
address mode to handle the following CLIF:

```
   v2760 = uextend.i64 v985
   v2761 = load.i64 notrap aligned readonly v1
   v1018 = iadd v2761, v2760
   store v1017, v1018
```

This PR adds more general support for address expressions made up of
additions and extensions. In particular, it pattern-matches a tree of
64-bit `iadd`s, optionally with `uextend`/`sextend` from 32-bit values
at the leaves, to collect the list of all addends that form the address.
It also collects all offsets at leaves, combining them.
It applies a series of heuristics to make the best use of the
available addressing modes, filling the load/store itself with as many
64-bit registers, zero/sign-extended 32-bit registers, and/or an offset,
then computing the rest with add instructions as necessary. It attempts
to make use of immediate forms (add-immediate or subtract-immediate)
whenever possible, and also uses the built-in extend operators on add
instructions when possible. There are certainly cases where this is not
optimal (i.e., does not generate the strictly shortest sequence of
instructions), but it should be good enough for most code.

Using `perf stat` to measure instruction count (runtime only, on
wasmtime, after populating the cache to avoid measuring compilation),
this impacts `bz2` as follows:

```
pre:

       1006.410425      task-clock (msec)         #    1.000 CPUs utilized
               113      context-switches          #    0.112 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,036      page-faults               #    0.005 M/sec
     3,221,547,476      cycles                    #    3.201 GHz
     4,000,670,104      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,958,613      branch-misses

       1.006071348 seconds time elapsed

post:

        963.499525      task-clock (msec)         #    0.997 CPUs utilized
               117      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,081      page-faults               #    0.005 M/sec
     3,039,687,673      cycles                    #    3.155 GHz
     3,837,761,690      instructions              #    1.26  insn per cycle
   <not supported>      branches
        28,254,585      branch-misses

       0.966072682 seconds time elapsed
```

In other words, this reduces instruction count by 4.1% on `bz2`.
2020-07-27 13:10:50 -07:00
Peter Huene
9b340f27f7 Update the .NET docs for the current .NET API. (#2069) 2020-07-24 23:49:05 -05:00
Chris Fallin
bad99c93b1 Merge pull request #2051 from cfallin/aarch64-add-negative-imm
Aarch64 codegen quality: handle add-negative-imm as subtract.
2020-07-24 12:26:54 -07:00
Chris Fallin
1b80860f1f Aarch64 codegen quality: handle add-negative-imm as subtract.
We often see patterns like:

```
    mov w2, #0xffff_ffff   // uses ORR with logical immediate form
    add w0, w1, w2
```

which is just `w0 := w1 - 1`. It would be much better to recognize when
the inverse of an immediate will fit in a 12-bit immediate field if the
immediate itself does not, and flip add to subtract (and vice versa), so
we can instead generate:

```
    sub w0, w1, #1
```

We see this pattern in e.g. `bz2`, where this commit makes the following
difference (counting instructions with `perf stat`, filling in the
wasmtime cache first then running again to get just runtime):

pre:

```
        992.762250      task-clock (msec)         #    0.998 CPUs utilized
               109      context-switches          #    0.110 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,035      page-faults               #    0.005 M/sec
     3,224,119,134      cycles                    #    3.248 GHz
     4,000,521,171      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,573,755      branch-misses

       0.995072322 seconds time elapsed
```

post:

```
        993.853850      task-clock (msec)         #    0.998 CPUs utilized
               123      context-switches          #    0.124 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,072      page-faults               #    0.005 M/sec
     3,201,278,337      cycles                    #    3.221 GHz
     3,917,061,340      instructions              #    1.22  insn per cycle
   <not supported>      branches
        28,410,633      branch-misses

       0.996008047 seconds time elapsed
```

In other words, a 2.1% reduction in instruction count on `bz2`.
2020-07-24 11:41:33 -07:00
Benjamin Bouvier
35d9ab19b7 Review fixes; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
ad4a2f919f machinst x64: implement support for reference types; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
987c616bf5 machinst x64: implement support for dynamic heaps and explicit bound checks; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
2e3ad3227d machinst x64: fix encoding of movzx/movsx with non-ABCD input registers;
Using an input register that doesn't belong to the ABCD family (al,
etc.) as the source of movsx/movzx requires a redundant REX prefix, that
was not emitted.
2020-07-24 19:29:12 +02:00
Benjamin Bouvier
de4923356a machinst x64: fix fcmp comparison for NotEqual;
We have to emit both checks against the parity bit (for unordered) and
non-equality bit (for equality), otherwise this returns false when
comparing NaN against itself.
2020-07-24 19:29:12 +02:00
Benjamin Bouvier
4b26f5b120 machinst x64: baldrdash: fix multi-value when both gpr and xmm are returned;
In baldrdash, only the first return value may live in a register, be it
an integer or a floating point value.
2020-07-24 19:29:12 +02:00
Benjamin Bouvier
aa103698d4 machinst x64: extend Copysign to work for f64 inputs too; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
694af3aec2 machinst x64: implement float Floor/Ceil/Trunc/Nearest as VM calls; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
48ec806a9d machinst x64: implement Fabs/Fneg in terms of other instructions; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
03b9e1e86a machinst x64: implement float min/max with the right semantics; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
e43310a088 machinst x64: adapt conversions for saturation behaviors; 2020-07-24 19:29:12 +02:00
Benjamin Bouvier
cd54f05efd machinst x64: implement float-to-int and int-to-float conversions; 2020-07-24 19:29:12 +02:00
Chris Fallin
37a09c4ef6 Merge pull request #2066 from cfallin/machinst-timing
Add timing for several new-backend stages.
2020-07-23 10:54:17 -07:00
Yury Delendik
42127aac4e Refactor Cache logic to include debug information (#2065)
* move caching to the CompilationArtifacts

* mv cache_config from Compiler to CompiledModule

* hash isa flags

* no cache for wasm2obj

* mv caching to wasmtime crate

* account each Compiler field when hash
2020-07-23 12:10:13 -05:00
Chris Fallin
2b9fefe89a Add timing for several new-backend stages.
This PR adds a bit more granularity to the output of e.g. `clif-util
compile -T`, indicating how much time is spent in VCode lowering and
various other new-backend-specific tasks.
2020-07-23 09:54:39 -07:00
Chris Fallin
87eb4392c4 Merge pull request #2063 from jgouly/vselect
arm64: Implement Vselect opcode
2020-07-22 13:35:46 -07:00
Chris Fallin
44ef8247a9 Merge pull request #2062 from akirilov-arm/extract_lane
AArch64: Improve code generation for Extractlane + Sextend / Uextend
2020-07-22 13:35:00 -07:00
Chris Fallin
d22cefd220 Merge pull request #2058 from cfallin/aarch64-fix-bool
Aarch64 codegen: represent bool `true` as -1, not 1.
2020-07-22 13:16:12 -07:00
Chris Fallin
b8f6d53a6b Aarch64 codegen: represent bool true as -1, not 1.
It seems that this is actually the correct behavior for bool types wider
than `b1`; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. For `b8`..`b64`, this results in an extra negation after a
`cset` when a bool is produced by an `icmp`/`fcmp`, but the most common
case (`b1`) is unaffected, because an all-ones one-bit value is just
`1`.

An example of this assumption can be seen here:

399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)

Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.
2020-07-22 12:30:55 -07:00
Alex Crichton
8a04fc3cdc Refactor wasmtime's internal cache slightly (#2057)
Be more generic over the type being serialized, and remove an
intermediate struct which was only used for serialization but isn't
necessary.
2020-07-22 10:32:53 -05:00
Joey Gouly
5355c3e3d5 arm64: Implement Vselect opcode
This is implemented the same as Bitselect, as the controlling vector
is a boolean vector. A boolean vector in cranelift has elements
that are either 0 or all 1s, so it can be used to select elements
lane wise.

Copyright (c) 2020, Arm Limited.
2020-07-22 12:50:29 +01:00
Anton Kirilov
420c4f06b8 AArch64: Improve code generation for Extractlane + Sextend / Uextend
Copyright (c) 2020, Arm Limited.
2020-07-22 11:47:51 +01:00
Yury Delendik
399ee0a54c Serialize and deserialize compilation artifacts. (#2020)
* Serialize and deserialize Module
* Use bincode to serialize
* Add wasm_module_serialize; docs
* Simple tests
2020-07-21 15:05:50 -05:00
Nick Fitzgerald
c420f65214 Merge pull request #2052 from fitzgen/gc-function-in-c-api
Add a GC function to the C API
2020-07-21 10:37:20 -07:00
Nick Fitzgerald
17b99cc9c8 examples: Add a GC call to the externref Rust example 2020-07-21 09:43:01 -07:00
Nick Fitzgerald
56c517d265 examples: Add a GC call to the externref C example 2020-07-21 09:42:54 -07:00
Nick Fitzgerald
2efb46afd5 wasmtime-c-api: Add wasmtime_store_gc for GCing externrefs 2020-07-21 09:33:34 -07:00
Chris Fallin
96ef2f1a1b Fix u8::MAX -> std::u8::MAX. (#2047)
As per Carlo Kok on Zulip #cranelift, this breaks builds with stable
Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to
support older versions if we can easily do so.

This PR also adds `cranelift-tools` to the crates checked on CI with
Rust 1.41.0, which pulls in all backends (including `aarch64`).
2020-07-20 14:59:15 -05:00
Dan Gohman
4c15a4daf2 Use AsRef<Path> instead of AsRef<OsStr> in yanix functions (#1950)
* Use AsRef<Path> instead of AsRef<OsStr> in yanix functions.

`AsRef<Path>` makes these more consistent with `std` interfaces, making
them easier to use outside of wasi-common.

Also, refactor the conversion to `CString` into a helper function.

* Reduce clutter from fully-qualifying names.

* rustfmt
2020-07-20 10:02:45 -07:00
Chris Fallin
784e2f1480 Merge pull request #2038 from jgouly/arith2
arm64: Enable arith2 tests
2020-07-20 09:00:10 -07:00
Ömer Sinan Ağacan
3fcf9fcf3e Cranelift README: fix markdown link syntax (#2044) 2020-07-19 19:41:53 -05:00
Chris Fallin
1b3b2dbfd0 Merge pull request #2043 from cfallin/csel-opt
Aarch64: handle csel with icmp/fcmp source without materializing the bool.
2020-07-18 19:33:47 -07:00
Chris Fallin
ea894c0eeb Merge pull request #2042 from cfallin/aarch64-fix-regshift-mask
Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.
2020-07-18 19:33:35 -07:00
Chris Fallin
21dac670f0 Aarch64: handle csel with icmp/fcmp source without materializing the bool.
Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.

On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):

pre:

       1117.232000      task-clock (msec)         #    1.000 CPUs utilized
               133      context-switches          #    0.119 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,041      page-faults               #    0.005 M/sec
     3,511,615,100      cycles                    #    3.143 GHz
     4,272,427,772      instructions              #    1.22  insn per cycle
   <not supported>      branches
        27,980,906      branch-misses

       1.117299838 seconds time elapsed

post:

       1003.738075      task-clock (msec)         #    1.000 CPUs utilized
               121      context-switches          #    0.121 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,052      page-faults               #    0.005 M/sec
     3,224,875,393      cycles                    #    3.213 GHz
     4,000,838,686      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,928,232      branch-misses

       1.003440004 seconds time elapsed

In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.
2020-07-17 21:10:21 -07:00
Nick Fitzgerald
b35cf7cf8e Merge pull request #1960 from fitzgen/peepmatic-generic-over-ir
peepmatic: Be generic over the IR we are optimizing
2020-07-17 17:05:55 -07:00
Nick Fitzgerald
ee5982fd16 peepmatic: Be generic over the operator type
This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to
`peepmatic_runtime::Operator` conversion overhead, and paves the way for
allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations).

Rather than defining our own `peepmatic::Operator` type like we used to, now the
whole `peepmatic` crate is effectively generic over a `TOperator` type
parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode`
as the concrete type for our `TOperator` type parameter. For testing, we also
define a `TestOperator` type, so that we can test Peepmatic code without
building all of Cranelift, and we can keep them somewhat isolated from each
other.

The methods that `peepmatic::Operator` had are now translated into trait bounds
on the `TOperator` type. These traits need to be shared between all of
`peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic
integration. Therefore, these new traits live in a new crate:
`peepmatic-traits`. This crate acts as a header file of sorts for shared
trait/type/macro definitions.

Additionally, the `peepmatic-runtime` crate no longer depends on the
`peepmatic-macro` procedural macro crate, which should lead to faster build
times for Cranelift when it is using pre-built peephole optimizers.
2020-07-17 16:16:49 -07:00
Chris Fallin
9bd9c628aa Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.
We had previously fixed a bug in which constant shift amounts should be
masked to modulo the number of bits in the operand; however, we did not
fix the analogous case for shifts incorporated into the second register
argument of ALU instructions that support integrated shifts.  This
failure to mask resulted in illegal instructions being generated, e.g.
in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes
the issue by masking the amount, as the shift semantics require.
2020-07-17 14:55:23 -07:00
Nick Fitzgerald
ae95ad8733 cranelift: Don't build peepmatic-based optimizations in build.rs
Instead, when the `rebuild-peephole-optimizers` feature is enabled, rebuild them
the first time they are used. This allows peepmatic to run when Cranelift's
`Opcode` is defined and available, which paves the way forward for:

* merging `peepmatic_runtime::operator::Operator` and Cranelift's `Opcode` (we
  are wasting a bunch of cycles converting between the two of them), and

* supporting vcode optimizations in `peepmatic`.
2020-07-17 14:35:16 -07:00
Alex Crichton
978070c020 Verify crates are publish-able on CI (#2036)
This commit updates our CI to verify that all crates are publish-able at
all times on every commit. During the 0.19.0 release we found another
case where the crates as they live in this repository weren't
publish-able, so the hope is that this no longer comes up again!

The script added in this commit also takes the time/liberty to remove
the existing bump/publish scripts and instead replace them with one Rust
script originally sourced from wasm-bindgen. The intention of this
script is that it has three modes:

* `./publish bump` - bumps version numbers which are sent as a PR to get
  reviewed (probably with a changelog as well)

* `./publish verify` - run on CI on every commit, builds every crate we
  publish as if it's being published to crates.io, notably without raw
  access to other crates in the repository.

* `./publish publish` - publishes all crates to crates.io, passing the
  `--no-verify` flag to make this a much speedier process than it is
  today.
2020-07-17 16:19:35 -05:00
Johnnie Birch
a7cedf3100 Add support for 32 bit and 64 bit fcmp for the new backend
Implements commiss and commisd.
2020-07-17 13:46:54 -07:00
Alex Crichton
fbc05faa49 Fix wasm_val_copy for null funcref/externref (#2041)
This commit fixes `Clone for wasm_val_t` to avoid attempting to chase a
null pointer. It also fixes the implementation for `FuncRef` values by
cloning their internal `wasm_ref_t` as well.
2020-07-17 14:46:02 -05:00
Alex Crichton
3aeab23bf1 Fix leaking funcrefs in the C API (#2040)
This commit adds a case to the destructor of `wasm_val_t` to be sure to
deallocate the `Box<wasm_ref_t>`.
2020-07-17 14:45:55 -05:00
Alex Crichton
c3ff0754d4 Fix a panic with Func::new and reference types (#2039)
Currently `Func::new` will panic if one of the arguments of the function
is a reference type and the `Store` doesn't have reference types
enabled. This happens because cranelift isn't configure to enable stack
maps but the register allocators expects them to exist when reference
types are seen.

The fix here is to always enable reference types in cranelift for our
trampoline generation and `Func::new`. This should hopefully ensure that
trampolines are generated correctly and they'll just not be able to get
hooked up to an `Instance` because validation will prevent reference
types from being used elsewhere.
2020-07-17 12:05:42 -05:00