Commit Graph

3643 Commits

Author SHA1 Message Date
Chris Fallin
5774e068b7 Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042)
Previously, the block successor accumulation and the blockparam branch
arg setup were decoupled. The lowering backend implicitly specified
the order of successor edges via its `MachTerminator` enum on the last
instruction in the block, while the `Lower` toplevel
machine-independent driver set up blockparam branch args in the edge
order seen in CLIF.

In some cases, these orders did not match -- for example, when the
conditional branch depended on an FP condition that was implemented by
swapping taken/not-taken edges and inverting the condition code.

This PR refactors the successor handling to be centralized in `Lower`
rather than flow through the terminator `MachInst`, and adds a
successor block and its blockparam args at the same time, ensuring the
orders match.
2022-04-18 09:53:57 -07:00
Chris Fallin
7cf5f05830 Cranelift: remove slow invariant validation in cfg(fuzzing) from MachBuffer. (#4038)
Following the merge of regalloc2 support, this became slower because we
are stricter about the critical-edge invariant, generating a separate
edge block for every out-edge even if two or more out-edges go to the
same successor (this is significant in cases of `br_table` with many
entries having the same target block, for example).

Many of those edge blocks are empty and end up collapsed by the
MachBuffer, which leads to a large set of aliased labels.

The invariant validation will dutifully iterate over all the data
structures at every step, validating all of our conditions. But this
gets way slower in the new context, to the point that we'll probably
have some fuzz timeouts.

This was pointed out in [1] but I missed removing this in #3989. Given
that `MachBuffer` has been around for nearly two years now, has been
fuzzed continuously with the invariant validation for that time, and
also has a correctness proof in the comments, it's probably reasonable
to remove this high (recently increased) cost from the fuzzing-specific
compilation configuration.

[1]
https://github.com/bytecodealliance/wasmtime/pull/3989#discussion_r847712263
2022-04-15 09:04:02 -05:00
Sam Parker
cf533a8041 [AArch64] Merge Fcmp32 and Fcmp64 (#4032)
Copyright (c) 2022, Arm Limited.
2022-04-14 15:39:43 -07:00
Sam Parker
682ef7b470 [AArch64] Refactor Mov instructions (#4033)
Merge Mov32 and Mov64 into a single instruction parameterized by a new
OperandSize field. Also combine the Mov[K,N,Z] into a single instruction
with a new opcode to select between the operations.

Copyright (c) 2022, Arm Limited.
2022-04-14 14:51:12 -07:00
Sam Parker
dd442a4d2f [AArch64] Merge 32- and 64-bit FPUOp1 (#4031)
Copyright (c) 2022, Arm Limited.
2022-04-14 14:00:48 -07:00
Sam Parker
7c0ea28fc8 [AArch64] Merge 32- and 64-bit FPUOp2 (#4029)
And remove the unused saturating add/sub opcodes.

Copyright (c) 2022, Arm Limited.
2022-04-14 13:07:00 -07:00
Sam Parker
e142f587a7 [AArch64] Refactor ALUOp3 (#3950)
As well as adding generic pattern for msub along with runtests
for madd and msub.

Copyright (c) 2022, Arm Limited.
2022-04-14 12:16:56 -07:00
Chris Fallin
a0318f36f0 Switch Cranelift over to regalloc2. (#3989)
This PR switches Cranelift over to the new register allocator, regalloc2.

See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.

Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:

```
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A
```
2022-04-14 10:28:21 -07:00
Nikita Baksalyar
f9cf4fe640 Fix documentation for codegen::Context::compile (#4019)
The function docs incorrectly referred to an argument that's no longer there.
2022-04-12 13:01:00 -07:00
Andrew Brown
7a55779c6b x64: fix miscompilation of select.i128 (#4017)
Issue #3963 identified a miscompilation with select in which the second
in the pair of `CMOV`s (one pair per `i128` register) used the wrong
flag. This change fixes the error in the x64 ISLE helper function
emitting these `CMOV` instructions.
2022-04-12 09:56:57 -07:00
uint256_t
057c9865c2 Delete cranelift regalloc document (#4013) 2022-04-11 13:55:55 -07:00
Mrmaxmeier
1bf0c8c220 cranelift: fix broken selinux-fix refactor (#4001) 2022-04-07 18:45:21 -07:00
Andrew Brown
f62199da8c x64: port load to ISLE (#3993)
This change moves the majority of the lowerings for CLIF's `load`
instruction over to ISLE. To do so, it also migrates the previous
mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE
rules (see `to_amode`).
2022-04-07 18:31:22 -07:00
Alex Crichton
d147802d51 Update wasm-tools crates (#3997)
* Update wasm-tools crates

This commit updates the wasm-tools family of crates as used in Wasmtime.
Notably this brings in the update which removes module linking support
as well as a number of internal refactorings around names and such
within wasmparser itself. This updates all of the wasm translation
support which binds to wasmparser as appropriate.

Other crates all had API-compatible changes for at least what Wasmtime
used so no further changes were necessary beyond updating version
requirements.

* Update a test expectation
2022-04-05 14:32:33 -05:00
wasmtime-publish
78a595ac88 Bump Wasmtime to 0.37.0 (#3994)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-04-05 09:24:28 -05:00
Alex Crichton
7b5176baea Upgrade all crates to the Rust 2021 edition (#3991)
* Upgrade all crates to the Rust 2021 edition

I've personally started using the new format strings for things like
`panic!("some message {foo}")` or similar and have been upgrading crates
on a case-by-case basis, but I think it probably makes more sense to go
ahead and blanket upgrade everything so 2021 features are always
available.

* Fix compile of the C API

* Fix a warning

* Fix another warning
2022-04-04 12:27:12 -05:00
Alex Crichton
c89dc55108 Add a two-week delay to Wasmtime's release process (#3955)
* Bump to 0.36.0

* Add a two-week delay to Wasmtime's release process

This commit is a proposal to update Wasmtime's release process with a
two-week delay from branching a release until it's actually officially
released. We've had two issues lately that came up which led to this proposal:

* In #3915 it was realized that changes just before the 0.35.0 release
  weren't enough for an embedding use case, but the PR didn't meet the
  expectations for a full patch release.

* At Fastly we were about to start rolling out a new version of Wasmtime
  when over the weekend the fuzz bug #3951 was found. This led to the
  desire internally to have a "must have been fuzzed for this long"
  period of time for Wasmtime changes which we felt were better
  reflected in the release process itself rather than something about
  Fastly's own integration with Wasmtime.

This commit updates the automation for releases to unconditionally
create a `release-X.Y.Z` branch on the 5th of every month. The actual
release from this branch is then performed on the 20th of every month,
roughly two weeks later. This should provide a period of time to ensure
that all changes in a release are fuzzed for at least two weeks and
avoid any further surprises. This should also help with any last-minute
changes made just before a release if they need tweaking since
backporting to a not-yet-released branch is much easier.

Overall there are some new properties about Wasmtime with this proposal
as well:

* The `main` branch will always have a section in `RELEASES.md` which is
  listed as "Unreleased" for us to fill out.
* The `main` branch will always be a version ahead of the latest
  release. For example it will be bump pre-emptively as part of the
  release process on the 5th where if `release-2.0.0` was created then
  the `main` branch will have 3.0.0 Wasmtime.
* Dates for major versions are automatically updated in the
  `RELEASES.md` notes.

The associated documentation for our release process is updated and the
various scripts should all be updated now as well with this commit.

* Add notes on a security patch

* Clarify security fixes shouldn't be previewed early on CI
2022-04-01 13:11:10 -05:00
Chris Fallin
666c2554ea Merge pull request from GHSA-gwc9-348x-qwv2
* Run the GC smoketest with epoch support enabled as well.

* Handle safepoints in cold blocks properly.

Currently, the way that we find safepoint slots for a given instruction
relies on the instruction index order in the safepoint list matching the
order of instruction emission.

Previous to the introduction of cold-block support, this was trivially
satisfied by sorting the safepoint list: we emit instructions 0, 1, 2,
3, 4, ..., and so if we have safepoints at instructions 1 and 4, we will
encounter them in that order.

However, cold blocks are supported by swizzling the emission order at
the last moment (to avoid having to renumber instructions partway
through the compilation pipeline), so we actually emit instructions out
of index order when cold blocks are present.

Reference-type support in Wasm in particular uses cold blocks for
slowpaths, and has live refs and safepoints in these slowpaths, so we
can reliably "skip" a safepoint (not emit any metadata for it) in the
presence of reftype usage.

This PR fixes the emission code by building a map from instruction index
to safepoint index first, then doing lookups through this map, rather
than following along in-order as it emits instructions.
2022-03-31 14:26:01 -07:00
Andrew Brown
bd6fe11ca9 cranelift: remove load_complex and store_complex (#3976)
This change removes all variants of `load*_complex` and `store*_complex`
from Cranelift; this is a breaking change to the instructions exposed by
CLIF. The complete list of instructions removed is: `load_complex`,
`store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`,
`sload8_complex`, `uload16_complex`, `sload16_complex`,
`istore16_complex`, `uload32_complex`, `sload32_complex`,
`istore32_complex`, `uload8x8_complex`, `sload8x8_complex`,
`sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`,
`sload32x2_complex`.

The rationale for this removal is that the Cranelift backend now has the
ability to pattern-match multiple upstream additions in order to
calculate the address to access. Previously, this was not possible so
the `*_complex` instructions were needed. Over time, these instructions
have fallen out of use in this repository, making the additional
overhead of maintaining them a chore.
2022-03-31 10:05:10 -07:00
Andrew Brown
e8dd13cf87 x64: port the remainder of select to ISLE (#3973)
Previous changes had ported the difficult "`select` based on an `fcmp`"
patterns to ISLE; this completes porting of `select` by moving over the
final two kinds of patterns:
 - `select` based on an `icmp`
 - `select` based on a value
2022-03-30 13:32:26 -07:00
Andrew Brown
5d8dd648d7 x64: port fcmp to ISLE (#3967)
* x64: port scalar `fcmp` to ISLE

Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new
type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`.

* isle: rename `vec128` to `ty_vec12`

This refactoring changes the name of the `vec128` matcher function to
follow the `ty_*` convention of the other type matchers. It also makes
the helper an inline function call.

* x64: port vector `fcmp` to ISLE
2022-03-29 15:41:49 -07:00
Damian Heaton
6c8c94723a Scalar values in vectorizelanes & extractlanes (#3922)
- `extractlanes` will now function on a scalar value, returning the
value as a single-element array.
- `vectorizelanes` will accept a single-element array, returning the
contained value.

Existing `if !x.is_vector()` code-patterns have been simplified as a
result.

Copyright (c) 2022 Arm Limited
2022-03-28 09:32:59 -07:00
Alex Crichton
76b82910c9 Remove the module linking implementation in Wasmtime (#3958)
* Remove the module linking implementation in Wasmtime

This commit removes the experimental implementation of the module
linking WebAssembly proposal from Wasmtime. The module linking is no
longer intended for core WebAssembly but is instead incorporated into
the component model now at this point. This means that very large parts
of Wasmtime's implementation of module linking are no longer applicable
and would change greatly with an implementation of the component model.

The main purpose of this is to remove Wasmtime's reliance on the support
for module-linking in `wasmparser` and tooling crates. With this
reliance removed we can move over to the `component-model` branch of
`wasmparser` and use the updated support for the component model.
Additionally given the trajectory of the component model proposal the
embedding API of Wasmtime will not look like what it looks like today
for WebAssembly. For example the core wasm `Instance` will not change
and instead a `Component` is likely to be added instead.

Some more rationale for this is in #3941, but the basic idea is that I
feel that it's not going to be viable to develop support for the
component model on a non-`main` branch of Wasmtime. Additionaly I don't
think it's viable, for the same reasons as `wasm-tools`, to support the
old module linking proposal and the new component model at the same
time.

This commit takes a moment to not only delete the existing module
linking implementation but some abstractions are also simplified. For
example module serialization is a bit simpler that there's only one
module. Additionally instantiation is much simpler since the only
initializer we have to deal with are imports and nothing else.

Closes #3941

* Fix doc link

* Update comments
2022-03-23 14:57:34 -05:00
Andrew Brown
4d5bd5f90e x64: fix register allocation panic due to load-coalesced value (#3954)
Fuzz testing identified a lowering case for CLIF's `icmp` in which the
double use of a loaded operand resulted in a register allocation error.
This change manually adds `put_in_xmm` to avoid load-coalescing these
values and includes a CLIF filetest to trigger this issue. Closes #3951.

I opened #3953 to discuss a way in which this kind of mistake (i.e.,
forgetting to add `put_in_*` in certain situations) could be avoided.
2022-03-21 18:46:27 -07:00
Andrew Brown
3bfbb3226e x64: prefix all machine instructions with x64_ (#3947)
This change is refactoring only--it should have no logic changes. As
discussed previously, prefixing all machine code instructions with
`x64_` will make it easier to identify what parts of the ISLE code
correspond to single instructions and what parts rely on helpers that
may emit more than one instruction.
2022-03-18 17:53:15 -07:00
Andrew Brown
5fa104205d x64: improve generation of i128 icmp (#3946)
Previously, we used the flags of `AND` for `SETcc`. This change uses
`TEST` instead, which discards the AND result but sets the flags needed
for `SETcc`. This reduces register pressure slightly for this sequence.
2022-03-18 16:36:31 -07:00
Andrew Brown
e92cbfb283 x64: port icmp to ISLE (#3886)
* x64: port GPR-held `icmp` to ISLE
* x64: port equality `icmp` for i128 type
* x64: port `icmp` for vector types
* x64: rename from_intcc to intcc_to_cc
2022-03-18 11:22:09 -07:00
Chris Fallin
58062b5efe x64 backend: fix fpcmp to avoid load-op merging. (#3934)
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in #2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in #3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
2022-03-16 09:48:20 -07:00
Alex Crichton
62a6a7ab6c Use const-initialized thread locals (#3923)
This was a relatively recent feature added to the Rust standard library
which should help accelerate calls into WebAssembly slightly.
2022-03-14 12:29:58 -05:00
FreddieLiardet
13b9396931 Add vector compare to 0 optims (#3887)
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
2022-03-09 16:20:06 -08:00
Alex Crichton
4d404c90b4 Ensure functions are aligned properly on AArch64 (#3908)
Previously (as in an hour ago) #3905 landed a new ability for fuzzing to
arbitrarily insert padding between functions. Running some fuzzers
locally though this instantly hit a lot of problems on AArch64 because
the arbitrary padding isn't aligned to 4 bytes like all other functions
are. To fix this issue appending functions now correctly aligns the
output as appropriate for the platform. The alignment argument for
appending was switched to `None` where `None` means "use the platform
default" and otherwise and explicit alignment can be specified for
inserting other data (like arbitrary padding or Windows unwind tables).
2022-03-09 15:45:30 -06:00
Chris Fallin
26ce9a3853 Fix uextend on x64 for non-i32-source cases. (#3906)
In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly.

This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases.

No security impact to Wasm as Wasm does not use narrow integer types.

Thanks @bjorn3 for reporting!
2022-03-09 11:10:59 -08:00
wasmtime-publish
9137b4a50e Bump Wasmtime to 0.35.0 (#3885)
[automatically-tag-and-release-this-commit]

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-03-07 15:18:34 -06:00
Alex Crichton
ad5ce38467 Remove all-arch from cranelift-interpreter (#3872)
I frequently notice that the fuzz build of `cranelift-codegen` takes an
extremely long time and recently realized that one issue is that when
fuzzers are built we enable all of the backends in `cranelift-codegen`
but AFAIK only the native backend is actually fuzzed. I traced the
inclusion of `all-arch` back to #2323, specifically [this comment][1]
and it looks like now that the old backend is removed this should be
able to be removed as well.

[1]: https://github.com/bytecodealliance/wasmtime/pull/2323#discussion_r515228552
2022-03-02 12:03:16 -06:00
Alexa VanHattum
ceab1e62fa Fix broken ISLE language reference link (#3865)
* Fix broken ISLE language reference link

* Update README.md

* 3rd time's a charm
2022-02-28 17:11:30 -08:00
Chris Fallin
cd173cfe8e ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856) 2022-02-28 14:40:26 -08:00
Chris Fallin
d9dfc44c32 ISLE: port more ops on x64 to lowering patterns. (#3855) 2022-02-28 13:28:42 -08:00
Chris Fallin
90a081a731 ISLE: port extend/reduce opcodes on x64. (#3849) 2022-02-28 11:49:28 -08:00
Chris Fallin
24f145cd1e Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848) 2022-02-28 09:45:13 -08:00
Ulrich Weigand
b064e60087 ISLE: Re-implement ValueSlice (#3784)
The current definition of `ValueSlice` is not usable, since any call to
a constructor returning a `ValueSlice` will extend the mutable borrow
on the context taken by the constructor call, with the result that it
cannot be passed to any other constructor ever.

Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an
offset into the list.  This type can simply be copied without requiring
a borrow on the context.
2022-02-24 15:24:40 -08:00
Ulrich Weigand
07d615d3f7 ISLE: Lowering of multi-output instructions (#3783)
This changes the output of the `lower` constructor from a
`ValueRegs` to a new `InstOutput` type, which is a vector
of `ValueRegs`.

Code in `lower_common` is updated to use this new type to
handle instructions with multiple outputs.  All back-ends
are updated to use the new type.
2022-02-24 14:03:06 -08:00
Chris Fallin
e8881b2cc0 ISLE lowering rules: make use of implicit conversions. (#3847)
This PR makes use of the new implicit-conversion feature of the ISLE DSL
that was introduced in #3807 in order to make the lowering rules
significantly simpler and more concise.

The basic idea is to eliminate the repetitive and mechanical use of
terms that convert from one type to another when there is only one real
way to do the conversion -- for example, to go from a `WritableReg` to a
`Reg`, the only sensible way is to use `writable_reg_to_reg`.

This PR generally takes any term of the form "A_to_B" and makes it an
automatic conversion, as well as some others that are similar in spirit.

The notable exception to the pure-value-convsion category is the
`put_in_reg` family of operations, which actually do have side-effects.
However, as noted in the doc additions in #3807, this is fine as long as
the side-effects are idempotent. And on balance, making `put_in_reg`
automatic is a significant clarity win -- together with other operand
converters, it enables rules like:

```
;; Add two registers.
(rule (lower (has_type (fits_in_64 ty)
                       (iadd x y)))
      (add ty x y))
```

There may be other converters that we could define to make the rules
even simpler; we can make such improvements as we think of them, but
this should be a good start!
2022-02-23 16:14:38 -08:00
Chris Fallin
56236fb58e ISLE: fix compile fuzz target, and fix a simple error-reporting bug. (#3845)
It seems our `compile` fuzz target for ISLE has not been regularly
tested, as it was never updated for the `isle` -> `cranelift_isle` crate
renaming. This PR fixes it to compile again.

This also includes a simple fix in the typechecking: when verifying that
a term decl is valid, we might insert a term ID into the name->ID map
before fully checking that all of the types exist, and then skipping
(for error recovery purposes) the actual push onto the term-signature
vector if one of the types does have an error. This phantom TID can
later cause a panic. The fix is to avoid adding to the map until we have
fully verified the term decl.
2022-02-23 14:34:32 -08:00
Chris Fallin
9dbb8c25c5 Implicit type conversions in ISLE (#3807)
Add support for implicit type conversions to ISLE.

This feature allows the DSL user to register to the compiler that a
particular term (used as a constructor or extractor) converts from one
type to another. The compiler will then *automatically* insert this term
whenever a type mismatch involving that specific pair of types occurs.

This significantly cleans up many uses of the ISLE DSL. For example,
when defining the compiler backends, we often have newtypes like `Gpr`
around `Reg` (signifying a particular type of register); we can define
a conversion from Gpr to Reg automatically.

Conversions can also have side-effects, as long as these side-effects
are idempotent. For example, `put_value_in_reg` in a compiler backend
has the effect of marking the value as used, causing codegen to produce
it, and assigns a register to the value; but multiple invocations of
this will return the same register for the same value. Thus it is safe
to use it as an implicit conversion that may be invoked multiple times.
This is documented in the ISLE-Cranelift integration document.

This PR also adds some testing infrastructure to the ISLE compiler,
checking that "pass" tests pass through the DSL compiler, "fail" tests
do not, and "link" tests are able to generate code and link that code
with corresponding Rust code.
2022-02-23 13:15:27 -08:00
Nick Fitzgerald
bad9a35418 wasm-mutate fuzz targets (#3836)
* fuzzing: Add a custom mutator based on `wasm-mutate`

* fuzz: Add a version of the `compile` fuzz target that uses `wasm-mutate`

* Update `wasmparser` dependencies
2022-02-23 12:14:11 -08:00
Sam Parker
5b7df72bce [AArch64] Merge 32- and 64-bit BitOps (#3840)
Copyright (c) 2022, Arm Limited.
2022-02-23 11:36:23 -08:00
Sam Parker
d307a4ab9a [AArch64] Improve AtomicRMWLoop (#3839)
Add more tests, use accurate disassembly, respect data sizes and
simplify the Xchg implementation.

Copyright (c) 2022, Arm Limited
2022-02-23 10:47:59 -08:00
bjorn3
141af7523a Allow cloning DataDescriptor (#3377) 2022-02-23 10:19:54 -08:00
Andrew Brown
f87c61176a x64: port select to ISLE (#3682)
* x64: port `select` using an FP comparison to ISLE

This change includes quite a few interlocking parts, required mainly by
the current x64 conventions in ISLE:
 - it adds a way to emit a `cmove` with multiple OR-ing conditions;
   because x64 ISLE cannot currently safely emit a comparison followed
   by several jumps, this adds `MachInst::CmoveOr` and
   `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro
   instructions hide the multi-instruction sequence in `lower.isle`
 - to properly keep track of what instructions consume and produce
   flags, @cfallin added a way to pass around variants of
   `ConsumesFlags` and `ProducesFlags`--these changes affect all
   backends
 - then, to lower the `fcmp + select` CLIF, this change adds several
   `cmove*_from_values` helpers that perform all of the awkward
   conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one
   upside is that now these lowerings have much-improved documentation
   explaining why the various `FloatCC` and `CC` choices are made the
   the way they are.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-02-23 10:03:16 -08:00
Alex Crichton
709f7e0c8a Enable SSE 4.2 unconditionally (#3833)
* Enable SSE 4.2 unconditionally

Fuzzing over the weekend found that `i64x2` comparison operators
require `pcmpgtq` which is an SSE 4.2 instruction. Along the lines of #3816
this commit unconditionally enables and requires SSE 4.2 for compilation
and fuzzing. It will no longer be possible to create a compiler for
x86_64 with simd enabled if SSE 4.2 is disabled.

* Update comment
2022-02-22 13:23:51 -06:00