wasmtime

Author	SHA1	Message	Date
Damian Heaton	db7f9ccd2b	Convert `scalar_to_vector` to ISLE (AArch64) (#4401 ) * Convert `scalar_to_vector` to ISLE (AArch64) Converted the exisiting implementation of `scalar_to_vector` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited * Add support for floats and fix FpuExtend - Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for `scalar_to_vector` - Added tests for `scalar_to_vector` on floats. - Corrected an invalid instruction emitted by `FpuExtend` on 64-bit values. Copyright (c) 2022 Arm Limited	2022-07-18 11:11:54 -07:00
Damian Heaton	6c70428735	Convert `isplit` / `iconcat` to ISLE (AArch64) (#4402 ) Converted the existing implementations for `isplit` and `iconcat` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited	2022-07-08 17:12:42 -07:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Afonso Bordado	e9727b9d4b	aarch64: Fix i128 `of`/`nof` implementations (#4403 ) @yuyang-ok reported via zulip that i128 overflow tests were: 1. different from the interpreter implementation 2. wrong on some of the test cases This fixes both the tests and the aarch64 implementation and adds the interpreter to the testsuite.	2022-07-07 11:00:58 -07:00
Sam Parker	d9e0e6a6a9	[AArch64] Port min/max to ISLE (#4374 ) Copyright (c) 2022, Arm Limited.	2022-07-05 09:16:45 -07:00
Afonso Bordado	38ecd3744f	aarch64: Implement `bmask`/`bextend` in ISLE (#4358 ) * aarch64: Implement `bmask`/`bextend` in ISLE * cranelift: Remove vector versions of `bextend` * aarch64: Cleanup `bmask`/`bextend` documentation	2022-07-01 09:37:18 -07:00
Ulrich Weigand	ec83144c88	s390x: use full vector register file for FP operations (#4360 ) This defines the full set of 32 128-bit vector registers on s390x. (Note that the VRs overlap the existing FPRs.) In addition, this adds support to use all 32 vector registers to implement floating- point operations, by using vector floating-point instructions with the 'W' bit set to operate only on the first element. This part of the vector instruction set mostly matches the old FP instruction set, with two exceptions: - There is no vector version of the COPY SIGN instruction. Instead, now use a VECTOR SELECT with an appropriate bit mask to implement the fcopysign operation. - There are no vector version of the float <-> int conversion instructions where source and target differ in bit size. Use appropriate multiple conversion steps instead. This also requires use of explicit checking to implement correct overflow handling. As a side effect, this version now also implements the i8 / i16 variants of all conversions, which had been missing so far. For all operations except those two above, we continue to use the old FP instruction if applicable (i.e. if all operands happen to have been allocated to the original FP register set), and use the vector instruction otherwise.	2022-06-30 16:33:39 -07:00
bjorn3	d1446f767d	Mark return value as define instead of clobber for TLS pseudoinstructions (#4357 )	2022-06-30 10:44:51 -07:00
Ulrich Weigand	7a9479f77c	ISLE: Migrate call and return instructions (#3785 ) This adds infrastructure to allow implementing call and return instructions in ISLE, and migrates the s390x back-end. To implement ABI details, this patch creates public accessors for `ABISig` and makes them accessible in ISLE. All actual code generation is then done in ISLE rules, following the information provided by that signature. [ Note that the s390x back end never requires multiple slots for a single argument - the infrastructure to handle this should already be present, however. ] To implement loops in ISLE rules, this patch uses regular tail recursion, employing a `Range` data structure holding a range of integers to be looped over.	2022-06-29 14:22:50 -07:00
Afonso Bordado	42d4f97b78	cranelift: Fix `cls` for small types on aarch64 (#4305 ) The previous `cls` code was producing wrong results when fed with a -1 i8. The fix here is to sign extend instead of zero extending since we want to keep the sign bit as one in order for it to be counted correctly in the cls instruction This also merges the interpreter only tests now that aarch64 correctly supports this instruction	2022-06-27 15:55:02 -07:00
Chris Fallin	0d829a57ee	Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. (#4335 ) * Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. * Update RELEASES.md. * Update two compile tests based on slightly shifting regalloc output.	2022-06-27 15:58:54 -05:00
Anton Kirilov	25a588c35f	Cranelift AArch64: Use an allocated encoding for Udf (#4281 ) Preserve the current behaviour when code is generated for SpiderMonkey. Copyright (c) 2022, Arm Limited.	2022-06-22 15:03:28 +01:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Chris Fallin	54acd8b3e2	x64 backend: fix `to_amode` with constant address (no registers). (#4239 ) If an address expression is given to `to_amode` that is completely constant (no registers at all), then it will produce an `Amode` that has the resulting constant as an offset, and `(invalid_reg)` as the base. This is a side-effect of the way we build up the amode step-by-step -- we're waiting to see a register and plug it into the base field. If we never get a reg though, we need to generate a constant zero into a register and use that as the base. This PR adds a `finalize_amode` helper to do just that. Fixes #4234.	2022-06-07 11:40:10 -07:00
Sam Parker	acfeda4d80	[AArch64] Port IaddPairwise to ISLE (#4201 ) Copyright (c) 2022, Arm Limited.	2022-06-06 15:37:13 +01:00
Chris Fallin	8f61eb9341	Upgrade to regalloc2 version 0.2.1. (#4199 ) This resolves an edge-case where mul.i128 with an input that continues to be live after the instruction could cause an invalid regalloc constraint (basically, the regalloc did not previously support an instruction use and def both being constrained to the same physical reg; and the "mul" variant used for mul.i128 on x64 was the only instance of such operands in Cranelift). Causes two extra move instructions in the mul.i128 filetest, but that's the price to pay for the slightly more general (works in all cases) handling of the constraints.	2022-06-01 13:26:20 -07:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Chris Fallin	32622b3e6f	Cranelift: fix use of pinned reg with SysV calling convention. (#4176 ) Previously, the pinned register (enabled by the `enable_pinned_reg` Cranelift setting and used via the `get_pinned_reg` and `set_pinned_reg` CLIF ops) was only used when Cranelift was embedded in SpiderMonkey, in order to support a pinned heap register. SpiderMonkey has its own calling convention in Cranelift (named after the integration layer, "Baldrdash"). However, the feature is more general, and should be usable with the default system calling convention too, e.g. SysV or Windows Fastcall. This PR fixes the ABI code to properly treat the pinned register as a globally allocated register -- and hence an implicit input and output to every function, not saved/restored in the prologue/epilogue -- for SysV on x86-64 and aarch64, and Fastcall on x86-64. Fixes #4170.	2022-05-23 09:18:51 -07:00
Chris Fallin	02d5edc591	Upgrade to regalloc2 0.1.3. (#4157 ) * Upgrade to regalloc2 0.1.3. This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in bytecodealliance/wasmtime#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue. * Update filetests where regalloc2 improvement altered behavior with reftypes.	2022-05-18 11:48:40 -07:00
Anton Kirilov	edf07a8da6	Cranelift AArch64: Migrate Bitselect and Vselect to ISLE (#4139 ) Copyright (c) 2022, Arm Limited.	2022-05-16 09:39:28 -07:00
Ulrich Weigand	0243a16679	s390x: Fix bitwise operations (#4146 ) Current codegen had a number of logic errors confusing NAND with AND WITH COMPLEMENT, and NOR with OR WITH COMPLEMENT. Add support for the missing z15 instructions and fix logic.	2022-05-12 10:05:22 -07:00
Chris Fallin	67eb161d04	Cranelift: fix filetest now failing after merge to main. (#4120 ) This test was added between the last CI run on #4088 and its merge to main, and the changes in #4088 (use of constants directly in instruction via load from constant pool, rather than from a register initialized by a separate instruction) cause it to fail now. This PR alters the test to be invariant to regalloc and argument decisions during lowering, as the test is really checking (per the comment) that we get two cmoves without an intervening move. As such, it just matches the instruction opcodes, irrespective of the arguments.	2022-05-10 10:56:54 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Andrew Brown	527b7a9b05	x64: add test for #3744 (#4095 ) In #3744, we identified that extra `mov` instructions were inserted in between the `cmov` instructions that CLIF's `select` lowers to. The switch to regalloc2 resolved this and this test checks that no intervening `mov`s are inserted. Closes #3744.	2022-05-03 15:04:32 -07:00
Chris Fallin	019ebf47b1	x64: fix pretty-printing argument order for XmmRmR instructions. (#4094 ) The pretty-printing had swapped dst and src2; this was introduced when we moved to RA2 (sorry about that! IMHO we should do something to automate the mapping between regalloc arg collection and pretty printing/emission). `src2` comes at the end because it has a variable number of register mentions; this is in line with how many of the other inst formats work. Actual emitted code was never incorrect, just the pretty-printing. Updated test golden outputs look correct to me now, including the one that we saw was incorrect in #3945.	2022-05-03 10:12:58 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	61dc38c065	Implement Spectre mitigations for table accesses and br_tables. (#4092 ) Currently, we have partial Spectre mitigation: we protect heap accesses with dynamic bounds checks. Specifically, we guard against errant accesses on the misspeculated path beyond the bounds-check conditional branch by adding a conditional move that is also dependent on the bounds-check condition. This data dependency on the condition is not speculated and thus will always pick the "safe" value (in the heap case, a NULL address) on the misspeculated path, until the pipeline flushes and recovers onto the correct path. This PR uses the same technique both for table accesses -- used to implement Wasm tables -- and for jumptables, used to implement Wasm `br_table` instructions. In the case of Wasm tables, the cmove picks the table base address on the misspeculated path. This is equivalent to reading the first table entry. This prevents loads of arbitrary data addresses on the misspeculated path. In the case of `br_table`, the cmove picks index 0 on the misspeculated path. This is safer than allowing a branch to an address loaded from an index under misspeculation (i.e., it preserves control-flow integrity even under misspeculation). The table mitigation is controlled by a Cranelift setting, on by default. The br_table mitigation is always on, because it is part of the single lowering pseudoinstruction. In both cases, the impact should be minimal: a single extra cmove in a (relatively) rarely-used operation. The table mitigation is architecture-independent (happens during legalization); the br_table mitigation has been implemented for both x64 and aarch64. (I don't know enough about s390x to implement this confidently there, but would happily review a PR to do the same on that platform.)	2022-05-02 11:19:16 -07:00
Sam Parker	12b4374cd5	[AArch64] Port atomic rmw to ISLE (#4021 ) Also fix and extend the current implementation: - AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be inverted first. - Inputs to the cmp for the RMWLoop case are sign-extended when needed. - Lower Xchg to Swp. - Lower Sub to Add with a negated input. - Added more runtests. Copyright (c) 2022, Arm Limited.	2022-04-27 13:13:59 -07:00
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	2022-04-26 18:58:26 -07:00
Chris Fallin	f384938a10	x64 backend: fix a load-op merging bug with integer min/max. (#4068 ) The recent work in #4061 introduced a notion of "unique uses" for CLIF values that both simplified the load-op merging rules and allowed loads to merge in some more places. Unfortunately there's one factor that PR didn't account for: a unique use at the CLIF level could become a multiple-use at the VCode level, when a lowering uses a value multiple times! Making this less error-prone in general is hard, because we don't know the lowering in VCode until it's emitted, so we can't ahead-of-time know that a value will be used multiple times and prevent its merging. But we can know in the lowerings themselves when we're doing this. At least we get a panic from regalloc when we get this wrong; no bad code (uninitialized register being read) should ever come from a backend bug like this. This is still a bit less than ideal, but for now the fix is: in `cmp_and_choose` in the x64 backend (which compares values, then picks one or the other with a cmove), explicitly put values in registers. Fixes #4067 (thanks @Mrmaxmeier for the report!).	2022-04-25 10:32:09 -07:00
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	2022-04-22 18:00:48 -07:00
Sam Parker	e142f587a7	[AArch64] Refactor ALUOp3 (#3950 ) As well as adding generic pattern for msub along with runtests for madd and msub. Copyright (c) 2022, Arm Limited.	2022-04-14 12:16:56 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	7a55779c6b	x64: fix miscompilation of `select.i128` (#4017 ) Issue #3963 identified a miscompilation with select in which the second in the pair of `CMOV`s (one pair per `i128` register) used the wrong flag. This change fixes the error in the x64 ISLE helper function emitting these `CMOV` instructions.	2022-04-12 09:56:57 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
Andrew Brown	e8dd13cf87	x64: port the remainder of `select` to ISLE (#3973 ) Previous changes had ported the difficult "`select` based on an `fcmp`" patterns to ISLE; this completes porting of `select` by moving over the final two kinds of patterns: - `select` based on an `icmp` - `select` based on a value	2022-03-30 13:32:26 -07:00
Andrew Brown	5d8dd648d7	x64: port `fcmp` to ISLE (#3967 ) * x64: port scalar `fcmp` to ISLE Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`. * isle: rename `vec128` to `ty_vec12` This refactoring changes the name of the `vec128` matcher function to follow the `ty_` convention of the other type matchers. It also makes the helper an inline function call. x64: port vector `fcmp` to ISLE	2022-03-29 15:41:49 -07:00
Andrew Brown	4d5bd5f90e	x64: fix register allocation panic due to load-coalesced value (#3954 ) Fuzz testing identified a lowering case for CLIF's `icmp` in which the double use of a loaded operand resulted in a register allocation error. This change manually adds `put_in_xmm` to avoid load-coalescing these values and includes a CLIF filetest to trigger this issue. Closes #3951. I opened #3953 to discuss a way in which this kind of mistake (i.e., forgetting to add `put_in_*` in certain situations) could be avoided.	2022-03-21 18:46:27 -07:00
Andrew Brown	5fa104205d	x64: improve generation of i128 `icmp` (#3946 ) Previously, we used the flags of `AND` for `SETcc`. This change uses `TEST` instead, which discards the AND result but sets the flags needed for `SETcc`. This reduces register pressure slightly for this sequence.	2022-03-18 16:36:31 -07:00
Andrew Brown	e92cbfb283	x64: port `icmp` to ISLE (#3886 ) * x64: port GPR-held `icmp` to ISLE * x64: port equality `icmp` for i128 type * x64: port `icmp` for vector types * x64: rename from_intcc to intcc_to_cc	2022-03-18 11:22:09 -07:00
Chris Fallin	58062b5efe	x64 backend: fix fpcmp to avoid load-op merging. (#3934 ) The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in #2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in #3682. The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.	2022-03-16 09:48:20 -07:00
FreddieLiardet	13b9396931	Add vector compare to 0 optims (#3887 ) Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>	2022-03-09 16:20:06 -08:00
Chris Fallin	d9dfc44c32	ISLE: port more ops on x64 to lowering patterns. (#3855 )	2022-02-28 13:28:42 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Sam Parker	d307a4ab9a	[AArch64] Improve AtomicRMWLoop (#3839 ) Add more tests, use accurate disassembly, respect data sizes and simplify the Xchg implementation. Copyright (c) 2022, Arm Limited	2022-02-23 10:47:59 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Chris Fallin	ca0e8d0a1d	Remove incomplete/unmaintained ARM32 backend (for now). (#3799 ) In #3721, we have been discussing what to do about the ARM32 backend in Cranelift. Currently, this backend supports only 32-bit types, which is insufficient for full Wasm-MVP; it's missing other critical bits, like floating-point support; and it has only ever been exercised, AFAIK, via the filetests for the individual CLIF instructions that are implemented. We were very very thankful for the original contribution of this backend, even in its partial state, and we had hoped at the time that we could eventually mature it in-tree until it supported e.g. Wasm and other use-cases. But that hasn't yet happened -- to the blame of no-one, to be clear, we just haven't had a contributor with sufficient time. Unfortunately, the existence of the backend and lack of active maintainer now potentially pose a bit of a burden as we hope to make continuing changes to the backend framework. For example, the ISLE migration, and the use of regalloc2 that it will allow, would need all of the existing lowering patterns in the hand-written ARM32 backend to be rewritten as ISLE rules. Given that we don't currently have the resources to do this, we think it's probably best if we, sadly, for now remove this partial backend. This is not in any way a statement of what we might accept in the future, though. If, in the future, an ARM32 backend updated to our latest codebase with an active maintainer were to appear, we'd be happy to merge it (and likewise for any other architecture!). But for now, this is probably the best path. Thanks again to the original contributor @jmkrauz and we hope that this work can eventually be brought back and reused if someone has the time to do so!	2022-02-14 15:03:52 -08:00
Ulrich Weigand	9c5c872b3b	s390x: Add support for all remaining atomic operations (#3746 ) This adds support for all atomic operations that were unimplemented so far in the s390x back end: - atomic_rmw operations xchg, nand, smin, smax, umin, umax - $I8 and $I16 versions of atomic_rmw and atomic_cas - little endian versions of atomic_rmw and atomic_cas All of these have to be implemented by a compare-and-swap loop; and for the $I8 and $I16 versions the actual atomic instruction needs to operate on the surrounding aligned 32-bit word. Since we cannot emit new control flow during ISLE instruction selection, these compare-and-swap loops are emitted as a single meta-instruction to be expanded at emit time. However, since there is a large number of different versions of the loop required to implement all the above operations, I've implemented a facility to allow specifying the loop bodies from within ISLE after all, by creating a vector of MInst structures that will be emitted as part of the meta-instruction. There are still restrictions, in particular instructions that are part of the loop body may not modify any virtual register. But even so, this approach looks preferable to doing everything in emit.rs. A few instructions needed in those compare-and-swap loop bodies were added as well, in particular the RxSBG family of instructions as well as the LOAD REVERSED in-register byte-swap instructions. This patch also adds filetest runtests to verify the semantics of all operations, in particular the subword and little-endian variants (those are currently only executed on s390x).	2022-02-08 13:48:44 -08:00

... 3 4 5 6 7 ...

610 Commits