wasmtime

Author	SHA1	Message	Date
Trevor Elliott	368004428a	Fix rule shadowing instances in x64 and aarch64 backends (#5334 ) Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.	2022-11-28 15:48:34 -08:00
Trevor Elliott	54cfa4df34	cranelift: Fix implicit pointer argument register use (#5301 ) * Fix arg handling to write to VRegs instead of physical regs * Make is_included_in_clobbers required, and handle Args on x64 and riscv64	2022-11-18 16:47:03 -08:00
Trevor Elliott	07bd8bf34a	Remove unnecessary moves in x64 gen_memcpy (#5277 ) Remove some unnecessary moves in the x64 gen_memcpy implementation -- the call instruction that's generated will already constrain the args to those registers.	2022-11-16 10:33:00 -08:00
Nick Fitzgerald	9967782726	Cranelift(Aarch64): Optimize lowering of `icmp`s with immediates (#5252 ) We can encode more constants into 12-bit immediates if we do the following rewrite for comparisons with odd constants: A >= B + 1 ==> A - 1 >= B ==> A > B	2022-11-15 09:18:55 -08:00
Nick Fitzgerald	c2a7ea7e24	Cranelift: de-duplicate bounds checks in legalizations (#5190 ) * Cranelift: Add the `DataFlowGraph::display_value_inst` convenience method * Cranelift: Add some `trace!` logs to some parts of legalization * Cranelift: de-duplicate bounds checks in legalizations When both (1) "dynamic" memories that need explicit bounds checks and (2) spectre mitigations that perform bounds checks are enabled, reuse the same bounds checks between the two legalizations. This reduces the overhead of explicit bounds checks and spectre mitigations over using virtual memory guard pages with spectre mitigations from ~1.9-2.1x overhead to ~1.6-1.8x overhead. That is about a 14-19% speed up for when dynamic memories and spectre mitigations are enabled. <details> ``` execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm Δ = 3422455129.47 ± 120159.49 (confidence = 99%) virtual-memory-guards.so is 2.09x to 2.09x faster than bounds-checks.so! [6563931659 6564063496.07 6564301535] bounds-checks.so [3141492675 3141608366.60 3141895249] virtual-memory-guards.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm Δ = 338716136.87 ± 1.38 (confidence = 99%) virtual-memory-guards.so is 2.08x to 2.08x faster than bounds-checks.so! [651961494 651961495.47 651961497] bounds-checks.so [313245357 313245358.60 313245362] virtual-memory-guards.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 22742944.07 ± 331.73 (confidence = 99%) virtual-memory-guards.so is 1.87x to 1.87x faster than bounds-checks.so! [48841295 48841567.33 48842139] bounds-checks.so [26098439 26098623.27 26099479] virtual-memory-guards.so ``` </details> <details> ``` execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm Δ = 2465900207.27 ± 146476.61 (confidence = 99%) virtual-memory-guards.so is 1.78x to 1.78x faster than de-duped-bounds-checks.so! [5607275431 5607442989.13 5607838342] de-duped-bounds-checks.so [3141445345 3141542781.87 3141711213] virtual-memory-guards.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm Δ = 234253620.20 ± 2.33 (confidence = 99%) virtual-memory-guards.so is 1.75x to 1.75x faster than de-duped-bounds-checks.so! [547498977 547498980.93 547498985] de-duped-bounds-checks.so [313245357 313245360.73 313245363] virtual-memory-guards.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 16605659.13 ± 315.78 (confidence = 99%) virtual-memory-guards.so is 1.64x to 1.64x faster than de-duped-bounds-checks.so! [42703971 42704284.40 42704787] de-duped-bounds-checks.so [26098432 26098625.27 26099234] virtual-memory-guards.so ``` </details> <details> ``` execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm Δ = 104462517.13 ± 7.32 (confidence = 99%) de-duped-bounds-checks.so is 1.19x to 1.19x faster than bounds-checks.so! [651961493 651961500.80 651961532] bounds-checks.so [547498981 547498983.67 547498989] de-duped-bounds-checks.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm Δ = 956556982.80 ± 103034.59 (confidence = 99%) de-duped-bounds-checks.so is 1.17x to 1.17x faster than bounds-checks.so! [6563930590 6564019842.40 6564243651] bounds-checks.so [5607307146 5607462859.60 5607677763] de-duped-bounds-checks.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 6137307.87 ± 247.75 (confidence = 99%) de-duped-bounds-checks.so is 1.14x to 1.14x faster than bounds-checks.so! [48841303 48841472.93 48842000] bounds-checks.so [42703965 42704165.07 42704718] de-duped-bounds-checks.so ``` </details> * Update test expectations * Add a test for deduplicating bounds checks between dynamic memories and spectre mitigations * Define a struct for the Spectre comparison instead of using a tuple * More trace logging for heap legalization	2022-11-15 08:47:22 -08:00
Trevor Elliott	dece901d16	Use regalloc constraints for sse blend operations (#5251 ) Instead of using xmm0 explicitly for the mask argument to instructions like blendvpd, use regalloc constraints to constrain it to xmm0 instead.	2022-11-14 16:44:34 -08:00
Trevor Elliott	0367fbc2d4	cranelift: Rework pinned register lowering (#5249 ) Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.	2022-11-10 16:19:25 -08:00
Nick Fitzgerald	fc62d4ad65	Cranelift: Make `heap_addr` return calculated `base + index + offset` (#5231 ) * Cranelift: Make `heap_addr` return calculated `base + index + offset` Rather than return just the `base + index`. (Note: I've chosen to use the nomenclature "index" for the dynamic operand and "offset" for the static immediate.) This move the addition of the `offset` into `heap_addr`, instead of leaving it for the subsequent memory operation, so that we can Spectre-guard the full address, and not allow speculative execution to read the first 4GiB of memory. Before this commit, we were effectively doing load(spectre_guard(base + index) + offset) Now we are effectively doing load(spectre_guard(base + index + offset)) Finally, this also corrects `heap_addr`'s documented semantics to say that it returns an address that will trap on access if `index + offset + access_size` is out of bounds for the given heap, rather than saying that the `heap_addr` itself will trap. This matches the implemented behavior for static memories, and after https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked on this commit) will also match the implemented behavior for dynamic memories. * Update heap_addr docs * Factor out `offset + size` to a helper	2022-11-09 19:53:51 +00:00
Trevor Elliott	b077854b57	Generate SSA code from returns (#5172 ) Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.	2022-11-08 16:00:49 -08:00
Ulrich Weigand	3e5938e65a	Support big- and little-endian lane order with bitcast (#5196 ) Add a MemFlags operand to the bitcast instruction, where only the `big` and `little` flags are accepted. These define the lane order to be used when casting between types of different lane counts. Update all users to pass an appropriate MemFlags argument. Implement lane swaps where necessary in the s390x back-end. This is the final part necessary to fix https://github.com/bytecodealliance/wasmtime/issues/4566.	2022-11-07 14:41:10 -08:00
Ulrich Weigand	137a8b710f	Move bitselect->vselect optimization to x64 back-end (#5191 ) The simplifier was performing an optimization to replace bitselect with vselect if the all bytes of the condition mask could be shown to be all ones or all zeros. This optimization only ever made any difference in codegen on the x64 target. Therefore, move this optimization to the x64 back-end and perform it in ISLE instead. Resulting codegen should be unchanged, with slightly improved compile time. This also eliminates a few endian-dependent bitcast operations.	2022-11-03 20:17:36 +00:00
Afonso Bordado	3ef30b5b67	cranelift: Rename `i{min,max}` to `s{min,max}` (#5187 ) This brings these instructions with our general naming convention of signed instructions being prefixed with `s`.	2022-11-03 18:20:33 +00:00
Trevor Elliott	aeceea28e2	Remove trapif and trapff (#5162 ) This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.	2022-11-03 09:25:11 -07:00
Ulrich Weigand	961107ec63	Merge raw_bitcast and bitcast (#5175 ) - Allow bitcast for vectors with differing lane widths - Remove raw_bitcast IR instruction - Change all users of raw_bitcast to bitcast - Implement support for no-op bitcast cases across backends This implements the second step of the plan outlined here: https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394	2022-11-02 10:16:27 -07:00
Trevor Elliott	09d8df6fab	Switch to `x64_rbp` to avoid the use of a pinned register (#5168 ) Avoid a use of preg_rpb in the x64 backend, using x64_rbp instead.	2022-11-01 13:23:33 -07:00
11evan	4ca9e82bd1	cranelift: Add Bswap instruction (#1092 ) (#5147 ) Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now: ``` builder.ins().bswap(value) ``` to get a native byteswap instruction. * x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register. As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8. Updated x64 RexFlags type to support emitting for single-operand instructions like bswap * aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate. * s390x: Bswap was already supported in backend, just had to add a bit of plumbing * For completeness, added bswap to the interpreter as well. * added filetests and runtests for each ISA * added bswap to fuzzgen, thanks to afonso360 for the code there * 128-bit swaps are not yet implemented, that can be done later	2022-10-31 19:30:00 +00:00
Afonso Bordado	2fb76be2e4	x64: Add `bmask` implementation (#5148 )	2022-10-28 17:17:22 -07:00
Afonso Bordado	e8f3d03bbe	cranelift: Mask high bits on `bmask` for types smaller than a register (#5118 ) * aarch64: Fix incorrect masking for small types on bmask `bmask` was accidentally relying on the uppermost bits of the register for small types. This was found by fuzzgen, when it generated a shift left followed by a bmask, the shift left shifted the bits out of the range of the input type (i8), however these are not automatically cleared since they remained inside the 32 bits of the register. That caused issues when the bmask tried to compare the whole register instead of just the bottom bits. The solution here is to mask the upper bits for small types. * aarch64: Emit 32bit cmp on bmask This fixes an issue where bmask was accidentally comparing the upper bits of the register by always using a 64bit cmp. * riscv: Mask high bits in bmask * riscv: Add compile tests for br{z,nz} * riscv: Use shifts to mask 32bit values This produces less code than the AND since that version needs to load an immediate constant from memory. * cranelift: Update test input to hexadecimal values This makes it a bit more clear what is being tested. * riscv: Use addiw for masking 32 bit values Co-authored-by: Trevor Elliott <telliott@fastly.com> * aarch64: Update bmask rule priority Co-authored-by: Trevor Elliott <telliott@fastly.com>	2022-10-27 09:45:39 -07:00
Trevor Elliott	02620441c3	Add uadd_overflow_trap (#5123 ) Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely. The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.	2022-10-27 09:43:15 -07:00
Ulrich Weigand	39b3b1d772	s390x: Fix handling of sret arguments (#5116 ) Skip synthetic StructReturn entries in the return value list. Fixes https://github.com/bytecodealliance/wasmtime/issues/5089	2022-10-25 10:40:10 -07:00
Trevor Elliott	ec12415b1f	cranelift: Remove redundant branch and select instructions (#5097 ) As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input. For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.	2022-10-24 16:14:35 -07:00
Afonso Bordado	c8791073d6	cranelift: Remove `iconst.i128` (#5075 ) * cranelift: Remove iconst.i128 * bugpoint: Report Changed when only one instruction is mutated * cranelift: Fix egraph bxor rule * cranelift: Remove some simple_preopt opts for i128	2022-10-24 12:43:28 -07:00
Ulrich Weigand	bfcf6616fe	s390x: clean up remnants of non-SSA code generation (#5096 ) Eliminate a few remaining instances of non-SSA code. Remove infrastructure previously used for non-SSA code emission. Related cleanup around flags handling.	2022-10-24 12:40:50 -07:00
Ulrich Weigand	9dadba60a0	s390x: use constraints for call arguments and return values (#5092 ) Use the regalloc constraint-based CallArgList / CallRetList mechanism instead of directly using physregs in instructions.	2022-10-21 11:01:22 -07:00
Trevor Elliott	d9753fac2b	Remove uses of `reg_mod` from s390x (#5073 ) Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not. Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-21 09:22:16 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Chris Fallin	e95ffe4413	Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023 ) * Fix StructReturn handling: properly mark the clobber, and offset actual rets. The legalization of `StructReturn` was causing issues in the new call-handling code: the `StructReturn` ret was included in the `SigData` as if it were an actual CLIF-level return value, but it is not. Prior to using regalloc constraints for return values, we unconditionally included rax (or the architecture's usual return register) as a def, so it would be properly handled as "clobbered" by the regalloc. With the new scheme, we include defs on the call only for CLIF-level outputs. Callees with `StructReturn` args were thus not known to clobber the return-value register, and values might be corrupted. This PR updates the code to include a `StructReturn` ret as a clobber rather than a returned value in the relevant spots. I observed it causing saves/restores of rax in some CLIF that @bjorn3 provided me, but I was having difficulty minimizing this into a test-case that I would be comfortable including as a precise-output case (including the whole thing verbatim would lock down a bunch of other irrelevant details and cause test-update noise later). If we can find a more minimized example I'm happy to include it as a filetest. Fixes #5018.	2022-10-07 00:14:38 +00:00
yuyang	07584f6ac8	fix issue 4996. (#5003 )	2022-10-04 11:18:42 -07:00
Trevor Elliott	c1d6ca48a7	ISLE: Resolve overlap in the riscv64 backend (#4982 ) Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.	2022-09-29 17:22:25 -07:00
yuyang-ok	cdecc858b4	add riscv64 backend for cranelift. (#4271 ) Add a RISC-V 64 (`riscv64`, RV64GC) backend. Co-authored-by: yuyang <756445638@qq.com> Co-authored-by: Chris Fallin <chris@cfallin.org> Co-authored-by: Afonso Bordado <afonsobordado@az8.co>	2022-09-27 17:30:31 -07:00
bjorn3	af226d37c2	[AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959 ) * [AArch64] Fix incorrect regalloc constraints for atomic_cas * Update test for latest Cranelift changes	2022-09-26 16:05:57 +00:00
Damian Heaton	3a2b32bf4d	Port branches to ISLE (AArch64) (#4943 ) * Port branches to ISLE (AArch64) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Brz` - `Brnz` - `Brif` - `Brff` - `BrIcmp` - `Jump` - `BrTable` Copyright (c) 2022 Arm Limited * Remove dead code Copyright (c) 2022 Arm Limited	2022-09-26 09:45:32 +01:00
Damian Heaton	3f8cccfb59	Port flag-based ops to ISLE (AArch64) (#4942 ) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Trueif` - `Trueff` - `Trapif` - `Trapff` - `Select` - `Selectif` - `SelectifSpectreGuard` Copyright (c) 2022 Arm Limited	2022-09-22 15:44:32 -07:00
Damian Heaton	352c7595c6	Improve `fcvt_to_{u,s}int_sat` lowering (AArch64) (#4913 ) Improved the instruction lowering for the following opcodes on AArch64, and introduced support for converting to integers less than 32-bits wide as per the docs: - `FcvtToSintSat` - `FcvtToUintSat` Copyright (c) 2022 Arm Limited	2022-09-21 10:16:09 -07:00
Chris Fallin	05cbd667c7	Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892 ) * Cranelift: use regalloc2 constraints on caller side of ABI code. This PR updates the shared ABI code and backends to use register-operand constraints rather than explicit pinned-vreg moves for register arguments and return values. The s390x backend was not updated, because it has its own implementation of ABI code. Ideally we could converge back to the code shared by x64 and aarch64 (which didn't exist when s390x ported calls to ISLE, so the current situation is underestandable, to be clear!). I'll leave this for future work. This PR exposed several places where regalloc2 needed to be a bit more flexible with constraints; it requires regalloc2#74 to be merged and pulled in. * Update to regalloc2 0.3.3. In addition to version bump, this required removing two asserts as `SpillSlot`s no longer carry their class (so we can't assert that they have the correct class). * Review comments. * Filetest updates. * Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade. * Update to regalloc2 0.4.0.	2022-09-21 01:17:04 +00:00
Damian Heaton	e9b08b856d	Port `icmp` to ISLE (AArch64) (#4898 ) * Port `icmp` to ISLE (AArch64) Ported the existing implementation of `icmp` (and, by extension, the `lower_icmp` function) to ISLE for AArch64. Copyright (c) 2022 Arm Limited * Allow 'producer chains', eliminating `Nop0`s Copyright (c) 2022 Arm Limited	2022-09-13 08:56:50 -07:00
Trevor Elliott	ad09c273c6	Don't merge loads for xmm registers (#4891 ) Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm. Fixes #4890	2022-09-12 10:14:35 -07:00
Chris Fallin	96bfd4e8c0	s390x: update some regalloc metadata to remove use of `reg_mod`. (#4856 ) * s390x: update some regalloc metadata to remove use of `reg_mod`. This is a step toward ultimately removing modify-operands, which along with removal of pinned vregs, lets us move to a completely constraint-based and fully-SSA regalloc input and get some nice advantages eventually. There are still a few uses of `mod` operands and pinned vregs remaining, especially around the "regpair" abstraction. Those proved to be a bit trickier to update though, so will have to be done separately. * Review feedback: restore two-arg pretty-print form. * Review feedback.	2022-09-09 18:43:36 -05:00
Chris Fallin	2986f6b0ff	ABI: implement register arguments with constraints. (#4858 ) * ABI: implement register arguments with constraints. Currently, Cranelift's ABI code emits a sequence of moves from physical registers into vregs at the top of the function body, one for every register-carried argument. For a number of reasons, we want to move to operand constraints instead, and remove the use of explicitly-named "pinned vregs"; this allows for better regalloc in theory, as it removes the need to "reverse-engineer" the sequence of moves. This PR alters the ABI code so that it generates a single "args" pseudo-instruction as the first instruction in the function body. This pseudo-inst defs all register arguments, and constrains them to the appropriate registers at the def-point. Subsequently the regalloc can move them wherever it needs to. Some care was taken not to have this pseudo-inst show up in post-regalloc disassemblies, but the change did cause a general regalloc "shift" in many tests, so the precise-output updates are a bit noisy. Sorry about that! A subsequent PR will handle the other half of the ABI code, namely, the callsite case, with a similar preg-to-constraint conversion. * Update based on review feedback. * Review feedback.	2022-09-08 18:03:14 -07:00
Anton Kirilov	d8b290898c	Initial forward-edge CFI implementation (#3693 ) * Initial forward-edge CFI implementation Give the user the option to start all basic blocks that are targets of indirect branches with the BTI instruction introduced by the Branch Target Identification extension to the Arm instruction set architecture. Copyright (c) 2022, Arm Limited. * Refactor `from_artifacts` to avoid second `make_executable` (#1) This involves "parsing" twice but this is parsing just the header of an ELF file so it's not a very intensive operation and should be ok to do twice. * Address the code review feedback Copyright (c) 2022, Arm Limited. Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-09-08 09:35:58 -05:00
Andrew Brown	f063082474	x64: remove `Inst::XmmLoadConst` (#4876 ) This is a cherry-pick of a long-ago commit, 2d46637. The original message reads: > Now that `SyntheticAmode` can refer to constants, there is no longer a > need for a separate instruction format--standard load instructions will > work. Since then, the transition to ISLE and the use of `XmmLoadConst` in many more places makes this change a larger diff than the original. The basic idea is the same, though: the extra indirection of `Inst::XMmLoadConst` is removed and replaced by a direct use of `VCodeConstant` as a `SyntheticAmode`. This has no effect on codegen, but the CLIF output is now clearer in that the actual instruction is displayed (e.g., `movdqu`) instead of a made-up instruction (`load_const`).	2022-09-07 12:52:13 -07:00
Anton Kirilov	dd07e354b4	Cranelift AArch64: Fix the get_return_address lowering (#4851 ) The previous implementation assumed that nothing had clobbered the LR register since the current function had started executing, so it would be incorrect for a non-leaf function, for example, that contains the `get_return_address` operation right after a call. The operation is valid only if the `preserve_frame_pointers` flag is enabled, which implies that the presence of a frame record on the stack is guaranteed. Copyright (c) 2022, Arm Limited.	2022-09-07 11:09:22 -07:00
Jamey Sharp	3d6d49daba	cranelift: Remove of/nof overflow flags from icmp (#4879 ) * cranelift: Remove of/nof overflow flags from icmp Neither Wasmtime nor cg-clif use these flags under any circumstances. From discussion on #3060 I see it's long been unclear what purpose these flags served. Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code that could have been buggy. This changes the cranelift-fuzzgen input format by removing some IntCC options, so I've gone ahead and enabled I128 icmp tests at the same time. Since only the of/nof cases were failing before, I expect these to work. * Restore trapif tests It's still useful to validate that iadd_ifcout's iflags result can be forwarded correctly to trapif, and for that purpose it doesn't really matter what condition code is checked.	2022-09-07 08:38:41 -07:00
Chris Fallin	385bd0cbf8	x64: fix CvtFloatToUintSeq: do not clobber src. (#4842 ) This slipped through the regalloc2 operand code update in #4811: the CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It was marked as a "mod" operand in the original and I mistakenly converted it to a "use" as I had not seen the actual clobber. The instruction now takes an extra temp and makes a copy of `src` in the appropriate place. Fixes #4840.	2022-09-01 22:46:57 +00:00
Afonso Bordado	08e7a7f1a0	cranelift: Add inline stack probing for x64 (#4747 ) * cranelift: Add inline stack probe for x64 * cranelift: Cleanups comments Thanks @jameysharp!	2022-09-01 22:32:54 +00:00
Chris Fallin	ae5fe8a728	aarch64: fix up regalloc2 semantics. (#4830 ) This PR removes all uses of modify-operands in the aarch64 backend, replacing them with reused-input operands instead. This has the nice effect of removing a bunch of move instructions and more clearly representing inputs and outputs. This PR also removes the explicit use of pinned vregs in the aarch64 backend, instead using fixed-register constraints on the operands when insts or pseudo-inst sequences require certain registers. This is the second PR in the regalloc-semantics cleanup series; after the remaining backend (s390x) and the ABI code are cleaned up as well, we'll be able to simplify the regalloc2 frontend.	2022-09-01 21:25:20 +00:00
Andrew Brown	ac2d4c4818	x64: improve tests for `heap_addr` (#4841 ) * x64: improve tests for `heap_addr` This change adds Cranelift `compile` tests for the various cases for `heap_addr`. The idea behind this is to more clearly show what the penalties are for dynamically- vs statically-allocated memory as well as turning Spectre mitigations on and off. * Add test case: "right" size memory with Spectre enabled	2022-09-01 13:59:55 -07:00
Chris Fallin	186c7c3b89	x64: clean up regalloc-related semantics on several instructions. (#4811 ) * x64: clean up regalloc-related semantics on several instructions. This PR removes all uses of "modify" operands on instructions in the x64 backend, and also removes all uses of "pinned vregs", or vregs that are explicitly tied to particular physical registers. In place of both of these mechanisms, which are legacies of the old regalloc design and supported via compatibility code, the backend now uses operand constraints. This is more flexible as it allows the regalloc to see the liveranges and constraints without "reverse-engineering" move instructions. Eventually, after removing all such uses (including in other backends and by the ABI code), we can remove the compatibility code in regalloc2, significantly simplifying its liverange-construction frontend and thus allowing for higher confidence in correctness as well as possibly a bit more compilation speed. Curiously, there are a few extra move instructions now; they are likely poor splitting decisions and I can try to chase these down later. * Fix cranelift-codegen tests. * Review feedback.	2022-08-30 17:21:14 -07:00
Chris Fallin	1a59b3e6c6	AArch64: port `tls_value` to ISLE. (#4821 )	2022-08-30 16:51:15 +00:00
Damian Heaton	3d9d759380	Port `fcmp` to ISLE (AArch64) (#4819 ) Ported the existing implementation of `fcmp` for AArch64 to ISLE. This also ports the `lower_vector_comparison` method to ISLE. Copyright (c) 2022 Arm Limited	2022-08-30 09:06:15 -07:00

1 2 3 4 5 ...

520 Commits