wasmtime

Author	SHA1	Message	Date
Ulrich Weigand	df923f18ca	Remove MachInst::gen_constant (#5427 ) * aarch64: constant generation cleanup Add support for MOVZ and MOVN generation via ISLE. Handle f32const, f64const, and nop instructions via ISLE. No longer call Inst::gen_constant from lower.rs. * riscv64: constant generation cleanup Handle f32const, f64const, and nop instructions via ISLE. * s390x: constant generation cleanup Fix rule priorities for "imm" term. Only handle 32-bit stack offsets; no longer use load_constant64. * x64: constant generation cleanup No longer call Inst::gen_constant from lower.rs or abi.rs. * Refactor LowerBackend::lower to return InstOutput No longer write to the per-insn output registers; instead, return an InstOutput vector of temp registers holding the outputs. This will allow calling LowerBackend::lower multiple times for the same instruction, e.g. to rematerialize constants. When emitting the primary copy of the instruction during lowering, writing to the per-insn registers is now done in lower_clif_block. As a result, the ISLE lower_common routine is no longer needed. In addition, the InsnOutput type and all code related to it can be removed as well. * Refactor IsleContext to hold a LowerBackend reference Remove the "triple", "flags", and "isa_flags" fields that are copied from LowerBackend to each IsleContext, and instead just hold a reference to LowerBackend in IsleContext. This will allow calling LowerBackend::lower from within callbacks in src/machinst/isle.rs, e.g. to rematerialize constants. To avoid having to pass LowerBackend references through multiple functions, eliminate the lower_insn_to_regs subroutines in those targets that still have them, and just inline into the main lower routine. This also eliminates lower_inst.rs on aarch64 and riscv64. Replace all accesses to the removed IsleContext fields by going through the LowerBackend reference. * Remove MachInst::gen_constant This addresses the problem described in issue https://github.com/bytecodealliance/wasmtime/issues/4426 that targets currently have to duplicate code to emit constants between the ISLE logic and the gen_constant callback. After the various cleanups in earlier patches in this series, the only remaining user of get_constant is put_value_in_regs in Lower. This can now be removed, and instead constant rematerialization can be performed in the put_in_regs ISLE callback by simply directly calling LowerBackend::lower on the instruction defining the constant (using a different output register). Since the check for egraph mode is now no longer performed in put_value_in_regs, the Lower::flags member becomes obsolete. Care needs to be taken that other calls directly to the Lower::put_value_in_regs routine now handle the fact that no more rematerialization is performed. All such calls in target code already historically handle constants themselves. The remaining call site in the ISLE gen_call_common helper can be redirected to the ISLE put_in_regs callback. The existing target implementations of gen_constant are then unused and can be removed. (In some target there may still be further opportunities to remove duplication between ISLE and some local Rust code - this can be left to future patches.)	2022-12-13 13:00:04 -08:00
Trevor Elliott	b077854b57	Generate SSA code from returns (#5172 ) Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.	2022-11-08 16:00:49 -08:00
Chris Fallin	ae5fe8a728	aarch64: fix up regalloc2 semantics. (#4830 ) This PR removes all uses of modify-operands in the aarch64 backend, replacing them with reused-input operands instead. This has the nice effect of removing a bunch of move instructions and more clearly representing inputs and outputs. This PR also removes the explicit use of pinned vregs in the aarch64 backend, instead using fixed-register constraints on the operands when insts or pseudo-inst sequences require certain registers. This is the second PR in the regalloc-semantics cleanup series; after the remaining backend (s390x) and the ABI code are cleaned up as well, we'll be able to simplify the regalloc2 frontend.	2022-09-01 21:25:20 +00:00
Damian Heaton	db7f9ccd2b	Convert `scalar_to_vector` to ISLE (AArch64) (#4401 ) * Convert `scalar_to_vector` to ISLE (AArch64) Converted the exisiting implementation of `scalar_to_vector` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited * Add support for floats and fix FpuExtend - Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for `scalar_to_vector` - Added tests for `scalar_to_vector` on floats. - Corrected an invalid instruction emitted by `FpuExtend` on 64-bit values. Copyright (c) 2022 Arm Limited	2022-07-18 11:11:54 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Alex Crichton	1ef0abb12c	Update lots of `isa//.clif` tests to `precise-output` (#3677 ) * Update lots of `isa//.clif` tests to `precise-output` This commit goes through the `aarch64` and `x64` subdirectories and subjectively changes tests from `test compile` to add `precise-output`. This then auto-updates all the test expectations so they can be automatically instead of manually updated in the future. Not all tests were migrated, largely subject to the whims of myself, mainly looking to see if the test was looking for specific instructions or just checking the whole assembly output. * Filter out `;;` comments from test expctations Looks like the cranelift parser picks up all comments, not just those trailing the function, so use a convention where `;;` is used for human-readable-comments in test cases and `;`-prefixed comments are the test expectation.	2022-01-10 13:38:23 -06:00
Anton Kirilov	7b98be1bee	Cranelift: Simplify leaf functions that do not use the stack (#2960 ) * Cranelift AArch64: Simplify leaf functions that do not use the stack Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record. Copyright (c) 2021, Arm Limited.	2021-08-27 12:12:37 +02:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
Julian Seward	dd9bfcefaa	CL/aarch64: implement the wasm SIMD `v128.load{32,64}_zero` instructions. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions https://github.com/WebAssembly/simd/pull/237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (https://github.com/bytecodealliance/wasmtime/issues/2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-04 20:00:04 +01:00

9 Commits