Commit Graph

1238 Commits

Author SHA1 Message Date
Alex Crichton
c89dc55108 Add a two-week delay to Wasmtime's release process (#3955)
* Bump to 0.36.0

* Add a two-week delay to Wasmtime's release process

This commit is a proposal to update Wasmtime's release process with a
two-week delay from branching a release until it's actually officially
released. We've had two issues lately that came up which led to this proposal:

* In #3915 it was realized that changes just before the 0.35.0 release
  weren't enough for an embedding use case, but the PR didn't meet the
  expectations for a full patch release.

* At Fastly we were about to start rolling out a new version of Wasmtime
  when over the weekend the fuzz bug #3951 was found. This led to the
  desire internally to have a "must have been fuzzed for this long"
  period of time for Wasmtime changes which we felt were better
  reflected in the release process itself rather than something about
  Fastly's own integration with Wasmtime.

This commit updates the automation for releases to unconditionally
create a `release-X.Y.Z` branch on the 5th of every month. The actual
release from this branch is then performed on the 20th of every month,
roughly two weeks later. This should provide a period of time to ensure
that all changes in a release are fuzzed for at least two weeks and
avoid any further surprises. This should also help with any last-minute
changes made just before a release if they need tweaking since
backporting to a not-yet-released branch is much easier.

Overall there are some new properties about Wasmtime with this proposal
as well:

* The `main` branch will always have a section in `RELEASES.md` which is
  listed as "Unreleased" for us to fill out.
* The `main` branch will always be a version ahead of the latest
  release. For example it will be bump pre-emptively as part of the
  release process on the 5th where if `release-2.0.0` was created then
  the `main` branch will have 3.0.0 Wasmtime.
* Dates for major versions are automatically updated in the
  `RELEASES.md` notes.

The associated documentation for our release process is updated and the
various scripts should all be updated now as well with this commit.

* Add notes on a security patch

* Clarify security fixes shouldn't be previewed early on CI
2022-04-01 13:11:10 -05:00
Andrew Brown
bd6fe11ca9 cranelift: remove load_complex and store_complex (#3976)
This change removes all variants of `load*_complex` and `store*_complex`
from Cranelift; this is a breaking change to the instructions exposed by
CLIF. The complete list of instructions removed is: `load_complex`,
`store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`,
`sload8_complex`, `uload16_complex`, `sload16_complex`,
`istore16_complex`, `uload32_complex`, `sload32_complex`,
`istore32_complex`, `uload8x8_complex`, `sload8x8_complex`,
`sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`,
`sload32x2_complex`.

The rationale for this removal is that the Cranelift backend now has the
ability to pattern-match multiple upstream additions in order to
calculate the address to access. Previously, this was not possible so
the `*_complex` instructions were needed. Over time, these instructions
have fallen out of use in this repository, making the additional
overhead of maintaining them a chore.
2022-03-31 10:05:10 -07:00
Andrew Brown
e8dd13cf87 x64: port the remainder of select to ISLE (#3973)
Previous changes had ported the difficult "`select` based on an `fcmp`"
patterns to ISLE; this completes porting of `select` by moving over the
final two kinds of patterns:
 - `select` based on an `icmp`
 - `select` based on a value
2022-03-30 13:32:26 -07:00
Andrew Brown
5d8dd648d7 x64: port fcmp to ISLE (#3967)
* x64: port scalar `fcmp` to ISLE

Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new
type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`.

* isle: rename `vec128` to `ty_vec12`

This refactoring changes the name of the `vec128` matcher function to
follow the `ty_*` convention of the other type matchers. It also makes
the helper an inline function call.

* x64: port vector `fcmp` to ISLE
2022-03-29 15:41:49 -07:00
Andrew Brown
4d5bd5f90e x64: fix register allocation panic due to load-coalesced value (#3954)
Fuzz testing identified a lowering case for CLIF's `icmp` in which the
double use of a loaded operand resulted in a register allocation error.
This change manually adds `put_in_xmm` to avoid load-coalescing these
values and includes a CLIF filetest to trigger this issue. Closes #3951.

I opened #3953 to discuss a way in which this kind of mistake (i.e.,
forgetting to add `put_in_*` in certain situations) could be avoided.
2022-03-21 18:46:27 -07:00
Andrew Brown
5fa104205d x64: improve generation of i128 icmp (#3946)
Previously, we used the flags of `AND` for `SETcc`. This change uses
`TEST` instead, which discards the AND result but sets the flags needed
for `SETcc`. This reduces register pressure slightly for this sequence.
2022-03-18 16:36:31 -07:00
Andrew Brown
e92cbfb283 x64: port icmp to ISLE (#3886)
* x64: port GPR-held `icmp` to ISLE
* x64: port equality `icmp` for i128 type
* x64: port `icmp` for vector types
* x64: rename from_intcc to intcc_to_cc
2022-03-18 11:22:09 -07:00
Chris Fallin
58062b5efe x64 backend: fix fpcmp to avoid load-op merging. (#3934)
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in #2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in #3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
2022-03-16 09:48:20 -07:00
FreddieLiardet
13b9396931 Add vector compare to 0 optims (#3887)
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
2022-03-09 16:20:06 -08:00
Chris Fallin
26ce9a3853 Fix uextend on x64 for non-i32-source cases. (#3906)
In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly.

This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases.

No security impact to Wasm as Wasm does not use narrow integer types.

Thanks @bjorn3 for reporting!
2022-03-09 11:10:59 -08:00
wasmtime-publish
9137b4a50e Bump Wasmtime to 0.35.0 (#3885)
[automatically-tag-and-release-this-commit]

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-03-07 15:18:34 -06:00
Chris Fallin
d9dfc44c32 ISLE: port more ops on x64 to lowering patterns. (#3855) 2022-02-28 13:28:42 -08:00
Chris Fallin
24f145cd1e Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848) 2022-02-28 09:45:13 -08:00
Sam Parker
d307a4ab9a [AArch64] Improve AtomicRMWLoop (#3839)
Add more tests, use accurate disassembly, respect data sizes and
simplify the Xchg implementation.

Copyright (c) 2022, Arm Limited
2022-02-23 10:47:59 -08:00
Andrew Brown
f87c61176a x64: port select to ISLE (#3682)
* x64: port `select` using an FP comparison to ISLE

This change includes quite a few interlocking parts, required mainly by
the current x64 conventions in ISLE:
 - it adds a way to emit a `cmove` with multiple OR-ing conditions;
   because x64 ISLE cannot currently safely emit a comparison followed
   by several jumps, this adds `MachInst::CmoveOr` and
   `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro
   instructions hide the multi-instruction sequence in `lower.isle`
 - to properly keep track of what instructions consume and produce
   flags, @cfallin added a way to pass around variants of
   `ConsumesFlags` and `ProducesFlags`--these changes affect all
   backends
 - then, to lower the `fcmp + select` CLIF, this change adds several
   `cmove*_from_values` helpers that perform all of the awkward
   conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one
   upside is that now these lowerings have much-improved documentation
   explaining why the various `FloatCC` and `CC` choices are made the
   the way they are.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-02-23 10:03:16 -08:00
Chris Fallin
1c014d129a Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816)
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
2022-02-16 17:29:30 -08:00
Chris Fallin
ca0e8d0a1d Remove incomplete/unmaintained ARM32 backend (for now). (#3799)
In #3721, we have been discussing what to do about the ARM32 backend in
Cranelift. Currently, this backend supports only 32-bit types, which is
insufficient for full Wasm-MVP; it's missing other critical bits, like
floating-point support; and it has only ever been exercised, AFAIK, via
the filetests for the individual CLIF instructions that are implemented.

We were very very thankful for the original contribution of this
backend, even in its partial state, and we had hoped at the time that we
could eventually mature it in-tree until it supported e.g. Wasm and
other use-cases. But that hasn't yet happened -- to the blame of no-one,
to be clear, we just haven't had a contributor with sufficient time.

Unfortunately, the existence of the backend and lack of active
maintainer now potentially pose a bit of a burden as we hope to make
continuing changes to the backend framework. For example, the ISLE
migration, and the use of regalloc2 that it will allow, would need all
of the existing lowering patterns in the hand-written ARM32 backend to
be rewritten as ISLE rules.

Given that we don't currently have the resources to do this, we think
it's probably best if we, sadly, for now remove this partial backend.
This is not in any way a statement of what we might accept in the
future, though. If, in the future, an ARM32 backend updated to our
latest codebase with an active maintainer were to appear, we'd be happy
to merge it (and likewise for any other architecture!). But for now,
this is probably the best path. Thanks again to the original contributor
@jmkrauz and we hope that this work can eventually be brought back and
reused if someone has the time to do so!
2022-02-14 15:03:52 -08:00
Mrmaxmeier
84b9c7bb8a cranelift/x64: lower min and max for <= i64 (#3748)
* cranelift/x64: lower min and max for <= `i64`

* cranelift: add runtests for integer min/max
2022-02-14 10:21:19 -08:00
Ulrich Weigand
9c5c872b3b s390x: Add support for all remaining atomic operations (#3746)
This adds support for all atomic operations that were unimplemented
so far in the s390x back end:
- atomic_rmw operations xchg, nand, smin, smax, umin, umax
- $I8 and $I16 versions of atomic_rmw and atomic_cas
- little endian versions of atomic_rmw and atomic_cas

All of these have to be implemented by a compare-and-swap loop;
and for the $I8 and $I16 versions the actual atomic instruction
needs to operate on the surrounding aligned 32-bit word.

Since we cannot emit new control flow during ISLE instruction
selection, these compare-and-swap loops are emitted as a single
meta-instruction to be expanded at emit time.

However, since there is a large number of different versions of
the loop required to implement all the above operations, I've
implemented a facility to allow specifying the loop bodies
from within ISLE after all, by creating a vector of MInst
structures that will be emitted as part of the meta-instruction.

There are still restrictions, in particular instructions that
are part of the loop body may not modify any virtual register.
But even so, this approach looks preferable to doing everything
in emit.rs.

A few instructions needed in those compare-and-swap loop bodies
were added as well, in particular the RxSBG family of instructions
as well as the LOAD REVERSED in-register byte-swap instructions.

This patch also adds filetest runtests to verify the semantics
of all operations, in particular the subword and little-endian
variants (those are currently only executed on s390x).
2022-02-08 13:48:44 -08:00
wasmtime-publish
39b88e4e9e Release Wasmtime 0.34.0 (#3768)
* Bump Wasmtime to 0.34.0

[automatically-tag-and-release-this-commit]

* Add release notes for 0.34.0

* Update release date to today

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-02-07 19:16:26 -06:00
Ulrich Weigand
cee00c6591 s390x: Refactor branch and jumptable emission
The BranchTarget abstraction is no longer needed, since all branches are
being emitted using a MachLabel target.  Remove BranchTarget and simply
use MachLabel everywhere a branch target is required.  (This brings the
s390x back-end in line with what x64 does as well.)

In addition, simplify jumptable emission by moving all instructions
that do not depend on the internal label (i.e. the conditional branch
to the default label, as well as the scaling the index register) out of
the combined JTSequence instruction.

This refactoring will make moving branch generation to ISLE easier.
2022-01-24 12:22:53 +01:00
Ulrich Weigand
a94e72b5b7 s390x: Add ISLE support
This adds ISLE support for the s390x back-end and moves lowering
of most instructions to ISLE.  The only instructions still remaining
are calls, returns, traps, and branches, most of which will need
additional support in common code.

Generated code is not intended to be (significantly) different
than before; any additional optimizations now made easier to
implement due to the ISLE layer can be added in follow-on patches.

There were a few differences in some filetests, but those are all
just simple register allocation changes (and all to the better!).
2022-01-21 19:30:56 +01:00
Ulrich Weigand
c08a013b53 s390x: Codegen fixes and preparation for ISLE migration
In preparing the back-end to move to ISLE, I detected a
number of codegen bugs in the existing code, which are
fixed here:

- Fix internal compiler error with uload16/icmp corner case.
- Fix broken Cls lowering.
- Correctly mask shift count for i8/i16 shifts.

In addition, I made several changes to operand encodings
in various MInst patterns.  These should not have any
functional effect, but will make the ISLE migration easier:

- Encode floating-point constants as u32/u64 in MInst patterns.
- Encode shift amounts as u8 and Reg in ShiftOp pattern.
- Use MemArg in LoadMultiple64 and StoreMultiple64 patterns.
2022-01-20 16:59:18 +01:00
Freddie Liardet
b5531580e7 Improve code generation for floating-point constants
Copyright (c) 2022, Arm Limited.
2022-01-18 10:39:05 +00:00
Nick Fitzgerald
a41fdb0303 cranelift: Port rotr lowering to ISLE on x64 2022-01-13 14:59:09 -08:00
Chris Fallin
13f17db297 Merge pull request #3680 from bjorn3/remove_code_sink
Remove the CodeSink interface in favor of MachBufferFinalized
2022-01-12 10:47:23 -08:00
Nick Fitzgerald
7454f1f3af cranelift: port sshr to ISLE on x64 (#3681) 2022-01-12 09:13:58 -06:00
bjorn3
b803514d55 Remove sink arguments from compile_and_emit
The data can be accessed after the fact using context.mach_compile_result
2022-01-11 18:17:29 +01:00
bjorn3
55d722db05 Remove CodeSink 2022-01-11 17:10:37 +01:00
bjorn3
a48a60f958 Remove reloc_external from CodeSink
And introduce MachBufferFinalized::relocs() in the place.
2022-01-11 16:54:27 +01:00
bjorn3
63e2360346 Remove trap from CodeSink
And introduce MachBufferFinalized::traps() in the place.
2022-01-11 16:42:52 +01:00
bjorn3
37598ad170 Remove end_codegen method from CodeSink 2022-01-11 14:52:04 +01:00
bjorn3
354c4f7bf8 Remove unused CodeSink methods 2022-01-11 14:52:04 +01:00
Alex Crichton
1ef0abb12c Update lots of isa/*/*.clif tests to precise-output (#3677)
* Update lots of `isa/*/*.clif` tests to `precise-output`

This commit goes through the `aarch64` and `x64` subdirectories and
subjectively changes tests from `test compile` to add `precise-output`.
This then auto-updates all the test expectations so they can be
automatically instead of manually updated in the future. Not all tests
were migrated, largely subject to the whims of myself, mainly looking to
see if the test was looking for specific instructions or just checking
the whole assembly output.

* Filter out `;;` comments from test expctations

Looks like the cranelift parser picks up all comments, not just those
trailing the function, so use a convention where `;;` is used for
human-readable-comments in test cases and `;`-prefixed comments are the
test expectation.
2022-01-10 13:38:23 -06:00
Alex Crichton
a8ea0ec097 cranelift: Add ability to auto-update test expectations (#3612)
* cranelift: Add ability to auto-update test expectations

One of the problems of the current `*.clif` testing is that the files
are difficult to update when widespread changes are made (such as
removing modification of the frame pointer). Additionally when changing
register allocation or similar it can cause a large number of changes in
tests but the tests themselves didn't actually break. For this reason
this commit adds the ability to automatically update test expectations.

The idea behind this commit is that tests of the form `test compile` can
also optionally be flagged with the `precise-output` flag:

    test compile precise-output

and when doing so the compiled form of each function is asserted to 100%
match the following comments and their test expectations. If a match is
not found then a `BLESS=1` environment variable can be used to
automatically rewrite the test file itself with the correct assertion.
If the environment variable isn't present and the expectation doesn't
match then the test fails.

It's hoped that, if approved, a follow-up commit can add
`precise-output` to all current `test compile` tests (or make it the
default) and all tests can be mass-updated. When developing locally test
expectations need not be written and instead tests can be run with
`BLESS=1` and the output can be manually verified. The environment
variable will not be present on CI which means that changes to the
output which don't also change the test expectation will cause CI to
fail. Furthermore this should still make updates to the test output
easily readable in review on CI because the test expectations are
intended to look the same as before.

Closes #1539

* Use raw vcode output in tests

* Fix a merge conflict

* Review comments
2022-01-10 11:59:45 -06:00
Nick Fitzgerald
95d8dd1424 cranelift: Re-add some tests that were accidentally removed 2022-01-07 11:00:58 -08:00
Nick Fitzgerald
6b5e9d8732 Merge pull request #3659 from fitzgen/vselect-isle
cranelift: Port `vselect` over to ISLE on x64
2022-01-06 14:51:33 -08:00
Nick Fitzgerald
056f7c2674 cranelift: Port vselect over to ISLE on x64 2022-01-06 14:10:57 -08:00
Alex Crichton
72e2b7fe80 aarch64: Migrate bitrev/clz/cls/ctz to ISLE (#3658)
This commit migrates these existing instructions to ISLE from the manual
lowerings implemented today. This was mostly straightforward but while I
was at it I fixed what appeared to be broken translations for I{8,16}
for `clz`, `cls`, and `ctz`. Previously the lowerings would produce
results as-if the input was 32-bits, but now I believe they all
correctly account for the bit-width.
2022-01-06 15:18:32 -06:00
Nick Fitzgerald
b60a4df2af cranelift: Move bitselect runtest file to shared runtests directory 2022-01-06 11:25:27 -08:00
Nick Fitzgerald
09aa09fd76 cranelift: Port bitselect over to ISLE on x64 2022-01-06 11:22:54 -08:00
wasmtime-publish
8043c1f919 Release Wasmtime 0.33.0 (#3648)
* Bump Wasmtime to 0.33.0

[automatically-tag-and-release-this-commit]

* Update relnotes for 0.33.0

* Wordsmithing relnotes

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-01-05 13:26:50 -06:00
Chris Fallin
e2b37a57dc Merge pull request #3639 from bjorn3/machinst_cleanups
Various cleanups around machinst
2022-01-05 10:01:27 -08:00
Chris Fallin
833ebeed76 Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation.
This patch makes spillslot allocation, spilling and reloading all based
on register class only. Hence when we have a 32- or 64-bit value in a
128-bit XMM register on x86-64 or vector register on aarch64, this
results in larger spillslots and spills/restores.

Why make this change, if it results in less efficient stack-frame usage?
Simply put, it is safer: there is always a risk when allocating
spillslots or spilling/reloading that we get the wrong type and make the
spillslot or the store/load too small. This was one contributing factor
to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that
puns an arbitrary user-controlled vector constant over another
stackslot. (If this were a pointer, that could result in RCE. SIMD is
not yet on by default in a release, fortunately.

In particular, we have not been particularly careful about using moves
between values of different types, for example with `raw_bitcast` or
with certain SIMD operations, and such moves indicate to regalloc.rs
that vregs are in equivalence classes and some arbitrary vreg in the
class is provided when allocating the spillslot or spilling/reloading.
Since regalloc.rs does not track actual type, and since we haven't been
careful about moves, we can't really trust this "arbitrary vreg in
equivalence class" to provide accurate type information.

In the fix to CVE-2021-32629 we fixed this for integer registers by
always spilling/reloading 64 bits; this fix can be seen as the analogous
change for FP/vector regs.
2022-01-04 13:24:40 -08:00
bjorn3
4915162230 Remove unnecessary fields from CodeInfo 2022-01-04 18:05:45 +01:00
bjorn3
32c3afe4b3 Add regression runtests 2021-12-17 20:58:32 +01:00
Alex Crichton
e94ebc2263 aarch64: Translate rot{r,l} to ISLE (#3614)
This commit translates the `rotl` and `rotr` lowerings already existing
to ISLE. The port was relatively straightforward with the biggest
changing being the instructions generated around i128 rotl/rotr
primarily due to register changes.
2021-12-17 12:37:17 -06:00
Alex Crichton
d8974ce6bc aarch64: Migrate ishl/ushr/sshr to ISLE (#3608)
* aarch64: Migrate ishl/ushr/sshr to ISLE

This commit migrates the `ishl`, `ushr`, and `sshr` instructions to
ISLE. These involve special cases for almost all types of integers
(including vectors) and helper functions for the i128 lowerings since
the i128 lowerings look to be used for other instructions as well. This
doesn't delete the i128 lowerings in the Rust code just yet because
they're still used by Rust lowerings, but they should be deletable in
due time once those lowerings are translated to ISLE.

* Use more descriptive names for i128 lowerings

* Use a with_flags-lookalike for csel

* Use existing `with_flags_*`

* Coment backwards order

* Update generated code
2021-12-16 17:37:53 -06:00
Chris Fallin
fd171ca063 Fix OperandSize: need clamp-to-32-bit behavior in most cases, but true-width for shifts. 2021-12-16 12:32:28 -08:00
Chris Fallin
1323ae417e Fix some 16- and 8-bit behavior in x64 backend related to rotates.
Uncovered by @bjorn3 (thanks!): 8- and 16-bit rotates were not working
properly in recent versions of Cranelift with part of the lowering
migrated to ISLE.

This PR fixes a few issues:

- 8- and 16-bit rotate-left needs to mask a constant amount, if any,
  because we use a 32-bit rotate instruction and so don't get the
  appropriate shift-amount masking for free from x86 semantics.

- `operand_size_from_type` was incorrect: it only handled 32- and 64-bit
  types and silently returned `OperandSize::Size32` for everything else.
  Now uses the `OperandSize::from_ty(ty)` helper as the pre-ISLE code
  did.

Our test coverage for narrow value types is not great; this PR adds some
runtests for rotl/rotr but more would always be better!
2021-12-16 11:34:24 -08:00