wasmtime

Author	SHA1	Message	Date
T0b1	606ae9b4d1	custom changes	2023-04-14 18:46:16 +02:00
Trevor Elliott	0367fbc2d4	cranelift: Rework pinned register lowering (#5249 ) Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.	2022-11-10 16:19:25 -08:00
Chris Fallin	ae5fe8a728	aarch64: fix up regalloc2 semantics. (#4830 ) This PR removes all uses of modify-operands in the aarch64 backend, replacing them with reused-input operands instead. This has the nice effect of removing a bunch of move instructions and more clearly representing inputs and outputs. This PR also removes the explicit use of pinned vregs in the aarch64 backend, instead using fixed-register constraints on the operands when insts or pseudo-inst sequences require certain registers. This is the second PR in the regalloc-semantics cleanup series; after the remaining backend (s390x) and the ABI code are cleaned up as well, we'll be able to simplify the regalloc2 frontend.	2022-09-01 21:25:20 +00:00
Damian Heaton	eb332b8369	Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) (#4608 ) * Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) Ported the existing implementations of the following opcodes to ISLE on AArch64: - `fma` - Introduced missing support for `fma` on vector values, as per the docs. - `valltrue` - `vanytrue` Also fixed `fcmp` on scalar values in the interpreter, and enabled interpreter tests in `simd-fma.clif`. This introduces the `FMLA` machine instruction. Copyright (c) 2022 Arm Limited * Add comments for `Fmla` and `Bsl` Copyright (c) 2022 Arm Limited	2022-08-05 09:47:56 -07:00
Anton Kirilov	ead6edb0c5	Cranelift AArch64: Migrate Splat to ISLE (#4521 ) Copyright (c) 2022, Arm Limited.	2022-07-26 17:57:15 +00:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Anton Kirilov	95b0b05af2	AArch64: Introduce an enum to specify vector instruction operand sizes Copyright (c) 2020, Arm Limited.	2020-07-14 21:37:44 +01:00
Anton Kirilov	400639245c	AArch64: Remove show_freg_sized() It provides the same functionality as show_vreg_scalar(). Copyright (c) 2020, Arm Limited.	2020-07-14 11:27:46 -07:00
Anton Kirilov	79dfac5514	Refactor the InstSize enum in the AArch64 backend The main issue with the InstSize enum was that it was used both for GPR and SIMD & FP operands, even though machine instructions do not mix them in general (as in a destination register is either a GPR or not). As a result it had methods such as sf_bit() that made sense only for one type of operand. Another issue was that the enum name was not reflecting its purpose accurately - it was meant to represent an instruction operand size, not an instruction size, which is fixed in A64 (always 4 bytes). Now the enum is split into one for GPR operands and another for scalar SIMD & FP operands. Copyright (c) 2020, Arm Limited.	2020-07-14 15:04:35 +01:00
Joey Gouly	0f462330e0	arm64: Implement AllTrue and AnyTrue This enables the simd_boolean WASM SIMD spec test. Copyright (c) 2020, Arm Limited.	2020-06-17 15:40:51 +01:00
Anton Kirilov	d941034c2e	Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-10 15:01:37 +01:00
Chris Fallin	ac87ed12bd	Merge pull request #1847 from akirilov-arm/simd_load_extend Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64	2020-06-09 12:29:06 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	7ac19af498	Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-09 18:05:38 +01:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Joey Gouly	90a421193f	arm64: add support for I8X16 ICmp Copyright (c) 2020, Arm Limited.	2020-06-02 16:58:09 +01:00
Anton Kirilov	8a928830ac	Enable the wast::Cranelift::spec::simd::simd_store test for AArch64 Copyright (c) 2020, Arm Limited.	2020-05-24 22:53:07 +01:00
Joey Gouly	02c3f238f8	arm64: Use FPU instrctions for Fcopysign Copyright (c) 2020, Arm Limited.	2020-05-21 18:14:12 +01:00
Benjamin Bouvier	07c55fa50f	aarch64: suggest a scratch register that's not caller-saved; If the scratch register is caller-saved, then it might appear in fixed ranges because of call clobbers. Instead, use a register that's not caller-saved and has no predefined use in the ABI.	2020-05-13 10:56:32 +02:00
Chris Fallin	a66724aafd	Rework aarch64 stack frame implementation. This PR changes the aarch64 ABI implementation to use positive offsets from SP, rather than negative offsets from FP, to refer to spill slots and stack-local storage. This allows for better addressing-mode options, and hence slightly better code: e.g., the unsigned scaled 12-bit offset mode can be used to reach anywhere in a 32KB frame without extra address-construction instructions, whereas negative offsets are limited to a signed 9-bit unscaled mode (-256 bytes). To enable this, the PR introduces a notion of "nominal SP offsets" as a virtual addressing mode, lowered during the emission pass. The offsets are relative to "SP after adjusting downward to allocate stack/spill slots", but before pushing clobbers. This allows the addressing-mode expressions to be generated before register allocation (or during it, for spill/reload sequences). To convert these offsets into true offsets from SP, we need to track how much further SP is moved downward, and compensate for this. We do so with "virtual SP offset adjustment" pseudo-instructions: these are seen by the emission pass, and result in no instruction (0 byte output), but update state that is now threaded through each instruction emission in turn. In this way, we can push e.g. stack args for a call and adjust the virtual SP offset, allowing reloads from nominal-SP-relative spillslots while we do the argument setup with "real SP offsets" at the same time.	2020-05-06 09:23:55 -07:00
Benjamin Bouvier	241c164e25	Implement pinned register usage through set_pinned_reg/get_pinned_reg;	2020-04-21 12:12:56 +02:00
Chris Fallin	48cf2c2f50	Address review comments: - Undo temporary changes to default features (`all-arch`) and a signal-handler test. - Remove `SIGTRAP` handler: no longer needed now that we've found an "undefined opcode" option on ARM64. - Rename pp.rs to pretty_print.rs in machinst/. - Only use empty stack-probe on non-x86. As per a comment in rust-lang/compiler-builtins [1], LLVM only supports stack probes on x86 and x86-64. Thus, on any other CPU architecture, we cannot refer to `__rust_probestack`, because it does not exist. - Rename arm64 to aarch64. - Use `target` directive in vcode filetests. - Run the flags verifier, but without encinfo, when using new backends. - Clean up warning overrides. - Fix up use of casts: use u32::from(x) and siblings when possible, u32::try_from(x).unwrap() when not, to avoid silent truncation. - Take immutable `Function` borrows as input; we don't actually mutate the input IR. - Lots of other miscellaneous cleanups. [1] `cae3e6ea23/src/probestack.rs (L39)`	2020-04-15 17:21:28 -07:00

26 Commits