wasmtime

Author	SHA1	Message	Date
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
Anton Kirilov	043a8434d2	Cranelift AArch64: Improve the Popcnt implementation Now the backend uses the CNT instruction, which results into a major simplification. Copyright (c) 2021, Arm Limited.	2021-01-19 16:49:47 +00:00
Chris Fallin	456561f431	x64 and aarch64: allow StructArgument and StructReturn args. The StructReturn ABI is fairly simple at the codegen/isel level: we only need to take care to return the sret pointer as one of the return values if that wasn't specified in the initial function signature. Struct arguments are a little more complex. A struct argument is stored as a chunk of memory in the stack-args space. However, the CLIF semantics are slightly special: on the caller side, the parameter passed in is a pointer to an arbitrary memory block, and we must memcpy this data to the on-stack struct-argument; and on the callee side, we provide a pointer to the passed-in struct-argument as the CLIF block param value. This is necessary to support various ABIs other than Wasm, such as that of Rust (with the cg_clif codegen backend).	2021-01-17 23:11:45 -08:00
Chris Fallin	6eea015d6c	Multi-register value support: framework for Values wider than machine regs. This will allow for support for `I128` values everywhere, and `I64` values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the machine backends to build such support; it just adds the framework for the MachInst backends to reason about a `Value` residing in more than one register.	2021-01-05 17:45:02 -08:00
Chris Fallin	aac3751025	aarch64: fix reg/imm `sub` insts that read `SP`, not the zero register. On AArch64, the zero register (xzr) and the stack pointer (xsp) are alternately named by the same index `31` in machine code depending on context. In particular, in the reg-reg-immediate ALU instruction form, add/subtract will use the stack pointer, not the zero register, if index 31 is given for the first (register) source arg. In a few places, we were emitting subtract instructions with the zero register as an argument and a reg/immediate as the second argument. When an immediate could be incorporated directly (we have the `iconst` definition visible), this would result in incorrect code being generated. This issue was found in `ineg` and in the sequence for vector right-shifts. Reported by Ian Cullinan; thanks!	2021-01-05 15:48:07 -08:00
Chris Fallin	2cec20aa57	Merge pull request #2486 from cfallin/fix-probestack Two Lucet-related fixes to stack overflow handling.	2020-12-07 16:47:37 -08:00
Chris Fallin	3a01d14712	Two Lucet-related fixes to stack overflow handling. Lucet uses stack probes rather than explicit stack limit checks as Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I previously was not running some Lucet runtime tests with the new backend, so was missing some test failures due to missing pieces in the new backend. This PR adds (i) calls to probestack, when enabled, in the prologue of every function with a stack frame larger than one page (configurable via flags); and (ii) trap metadata for every instruction on x86-64 that can access the stack, hence be the first point at which a stack overflow is detected when the stack pointer is decremented.	2020-12-07 16:08:53 -08:00
Julian Seward	8f34d2dc59	aarch64 isel: collect_address_addends: correctly handle `ExtendOp::UXTW(negative immediate)`. The current code doesn't correctly handle the case where `ExtendOp::UXTW` has as source, a constant-producing insn that produces a negative (32-bit) value. Then the value is incorrectly sign-extended to 64 bits (in fact, this has already been done by `ctx.get_constant(insn)`), whereas it needs to be zero extended. The obvious fix, done here, is just to force bits 63:32 of the extension to zero, hence zero-extending it.	2020-12-04 19:21:40 +01:00
Anton Kirilov	f59b274d22	Cranelift AArch64: Further vector constant improvements Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements, and the vector variant of FMOV. Copyright (c) 2020, Arm Limited.	2020-12-03 15:30:24 +00:00
Chris Fallin	d413b907b4	Merge pull request #2414 from jgouly/extend-refactor arm64: Refactor Inst::Extend handling	2020-11-25 17:22:07 -08:00
Nick Fitzgerald	93c199363f	Merge pull request #2449 from bytecodealliance/cfallin/add-pre-host-hooks Add FuncEnvironment hooks to generate prologue and epilogue code.	2020-11-24 17:48:29 -08:00
Chris Fallin	4300c2c075	Add FuncEnvironment hooks to generate prologue and epilogue code. In some cases, it is useful to do some work at entry to or exit from a Cranelift function translated from WebAssembly. This PR adds two optional methods to the `FuncEnvironment` trait to do just this, analogous to the pre/post-hooks on operators that already exist. This PR also includes a drive-by compilation fix due to the latest nightly wherein `.is_empty()` on a `Range` ambiguously refers to either the `Range` impl or the `ExactSizeIterator` impl and can't resolve.	2020-11-24 16:36:15 -08:00
Alex Crichton	4d64c68b05	Run rustfmt 1.48 Run rustfmt over wasmtime with the new stable release which looks like it wants to reformat a few lines.	2020-11-19 11:12:30 -08:00
Chris Fallin	073c727a74	x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap. This end result was previously enacted by carrying a `SourceLoc` on every load/store, which was somewhat cumbersome, and only indirectly encoded metadata about a memory reference (can it trap) by its presence or absence. We have a type for this -- `MemFlags` -- that tells us everything we might want to know about a load or store, and we should plumb it through to code emission instead. This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load and store `Inst` variants on aarch64. These two choices seem to factor things out in the nicest way: there are relatively few load/store insts on aarch64 but many addressing modes, while the opposite is true on x64.	2020-11-17 11:43:06 -08:00
Chris Fallin	712ff22492	AArch64 SIMD: pattern-match load+splat into `LD1R` instruction.	2020-11-16 15:59:28 -08:00
Chris Fallin	39b5736727	Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat. This was added as an incremental step to improve AArch64 code quality in PR #2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR #2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.	2020-11-16 15:31:56 -08:00
Chris Fallin	3c8cb7b908	MachInst lowering logic: allow effectful instructions to merge. This PR updates the "coloring" scheme that accounts for side-effects in the MachInst lowering logic. As a result, the new backends will now be able to merge effectful operations (such as memory loads) into other operations; previously, only the other way (pure ops merged into effectful ops) was possible. This will allow, for example, a load+ALU-op combination, as is common on x86. It should even allow a load + ALU-op + store sequence to merge into one lowered instruction. The scheme arose from many fruitful discussions with @julian-seward1 (thanks!); significant credit is due to him for the insights here. The first insight is that given the right basic conditions, i.e. that the root instruction is the only use of an effectful instruction's result, all we need is that the "color" of the effectful instruction is one less than the color of the current instruction. It's easier to think about colors on the program points between instructions: if the color coming out of the first (effectful def) instruction and in to the second (effectful or effect-free use) instruction are the same, then they can merge. Basically the color denotes a version of global state; if the same, then no other effectful ops happened in the meantime. The second insight is that we can keep state as we scan, tracking the "current color", and update this when we sink (merge) an op. Hence when we sink a load into another op, we effectively re-color every instruction it moved over; this may allow further sinks. Consider the example (and assume that we consider loads effectful in order to conservatively ensure a strong memory model; otherwise, replace with other effectful value-producing insts): ``` v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1 ``` Scanning from bottom to top, we first see the add producing `v3` and we can sink the load producing `v1` into it, producing a load + ALU-op machine instruction. This is legal because `v1` moves over only `v2`, which is a pure instruction. Consider, though, `v2`: under a simple scheme that has no other context, `v0` could not sink to `v2` because it would move over `v1`, another load. But because we already sunk `v1` down to `v3`, we are free to sink `v0` to `v2`; the update of the "current color" during the scan allows this. This PR also cleans up the `LowerCtx` interface a bit at the same time: whereas previously it always gave some subset of (constant, mergeable inst, register) directly from `LowerCtx::get_input()`, it now returns zero or more of (constant, mergable inst) from `LowerCtx::maybe_get_input_as_source_or_const()`, and returns the register only from `LowerCtx::put_input_in_reg()`. This removes the need to explicitly denote uses of the register, so it's a little safer. Note that this PR does not actually make use of the new ability to merge loads into other ops; that will come in future PRs, especially to optimize the `x64` backend by using direct-memory operands.	2020-11-16 14:53:45 -08:00
Joey Gouly	70cbc4ca7c	arm64: Refactor Inst::Extend handling This refactors the handling of Inst::Extend and simplifies the lowering of Bextend and Bmask, which allows the use of SBFX instructions for extensions from 1-bit booleans. Other extensions use aliases of BFM, and the code was changed to reflect that, rather than hard coding bit patterns. Also ImmLogic is now implemented, so another hard coded instruction can be removed. As part of looking at boolean handling, `normalize_boolean_result` was changed to `materialize_boolean_result`, such that it can use either CSET or CSETM. Using CSETM saves an instruction (previously CSET + SUB) for booleans bigger than 1-bit. Copyright (c) 2020, Arm Limited.	2020-11-13 16:17:25 +00:00
Chris Fallin	113d061129	Merge pull request #2369 from akirilov-arm/move_fix Cranelift AArch64: Various small fixes	2020-11-12 14:59:10 -08:00
Chris Fallin	89dbc4590d	Merge pull request #2363 from cfallin/extend-only-if-abi Do value-extensions at ABI boundaries only when ABI requires it.	2020-11-12 12:26:20 -08:00
Chris Fallin	fd6433aaf5	Merge pull request #2395 from cfallin/lucet-x64-support Add support for brff/brif and icmp_sp to new x64 backend to support Lucet.	2020-11-12 12:10:52 -08:00
Julian Seward	cbce34af0a	aarch64/inst/unwind.rs: handle zero-length prologues correctly.	2020-11-12 17:41:21 +01:00
Anton Kirilov	edaada3f57	Cranelift AArch64: Various small fixes * Use FMOV to move 64-bit FP registers and SIMD vectors. * Add support for additional vector load types. * Fix the printing of Inst::LoadAddr. Copyright (c) 2020, Arm Limited.	2020-11-12 13:54:05 +00:00
Chris Fallin	5df8840483	Add support for brff/brif and icmp_sp to new x64 backend to support Lucet. `lucetc` currently almost, but not quite, works with the new x64 backend; the only missing piece is support for the particular instructions emitted as part of its prologue stack-check. We do not normally see `brff`, `brif`, or `ifcmp_sp` in CLIF generated by `cranelift-wasm` without the old-backend legalization rules, so these were not supported in the new x64 backend as they were not necessary for Wasm MVP support. Using them resulted in an `unimplemented!()` panic. This PR adds support for `brff` and `brif` analogously to how AArch64 implements them, by pattern-matching the `ifcmp` / `ffcmp` directly. Then `ifcmp_sp` is a straightforward variant of `ifcmp`. Along the way, this also removes the notion of "fallthrough block" from the branch-group lowering method; instead, `fallthrough` instructions are handled as normal branches to their explicitly-provided targets, which (in the original CLIF) match the fallthrough block. The reason for this is that the block reordering done as part of lowering can change the fallthrough block. We were not using `fallthrough` instructions in the output produced by `cranelift-wasm`, so this, too, was not previously caught. With these changes, the `lucetc` crate in Lucet passes all tests with the `x64` feature-flag added to its `cranelift-codegen` dependency.	2020-11-11 13:43:39 -08:00
Chris Fallin	997b654235	Merge pull request #2393 from jgouly/constant-addend arm64: Fold some constants into load instructions	2020-11-11 11:23:21 -08:00
Joey Gouly	a5011e8212	arm64: Fold some constants into load instructions This changes the following: mov x0, #4 ldr x0, [x1, #4] Into: ldr x0, [x1] I noticed this pattern (but with #0), in a benchmark. Copyright (c) 2020, Arm Limited.	2020-11-11 18:47:43 +00:00
Julian Seward	41e87a2f99	Support wasm `select` instruction with V128-typed operands on AArch64. * this requires upgrading to wasmparser 0.67.0. * There are no CLIF side changes because the CLIF `select` instruction is polymorphic enough. * on aarch64, there is unfortunately no conditional-move (csel) instruction on vectors. This patch adds a synthetic instruction `VecCSel` which does behave like that. At emit time, this is emitted as an if-then-else diamond (4 insns). * aarch64 implementation is otherwise straightforwards.	2020-11-11 18:45:24 +01:00
Chris Fallin	4dce51096d	MachInst backends: handle SourceLocs out-of-band, not in Insts. In existing MachInst backends, many instructions -- any that can trap or result in a relocation -- carry `SourceLoc` values in order to propagate the location-in-original-source to use to describe resulting traps or relocation errors. This is quite tedious, and also error-prone: it is likely that the necessary plumbing will be missed in some cases, and in any case, it's unnecessarily verbose. This PR factors out the `SourceLoc` handling so that it is tracked during emission as part of the `EmitState`, and plumbed through automatically by the machine-independent framework. Instruction emission code that directly emits trap or relocation records can query the current location as necessary. Then we only need to ensure that memory references and trap instructions, at their (one) emission point rather than their (many) lowering/generation points, are wired up correctly. This does have the side-effect that some loads and stores that do not correspond directly to user code's heap accesses will have unnecessary but harmless trap metadata. For example, the load that fetches a code offset from a jump table will have a 'heap out of bounds' trap record attached to it; but because it is bounds-checked, and will never actually trap if the lowering is correct, this should be harmless. The simplicity improvement here seemed more worthwhile to me than plumbing through a "corresponds to user-level load/store" bit, because the latter is a bit complex when we allow for op merging. Closes #2290: though it does not implement a full "metadata" scheme as described in that issue, this seems simpler overall.	2020-11-10 15:46:53 -08:00
Yury Delendik	b2b7bc10e2	machinst aarch64: New backend unwind (#2313 ) * Unwind information for aarch64 backend.	2020-11-06 08:02:45 -06:00
Yury Delendik	f60c0f3ec3	cranelift: refactor unwind logic to accommodate multiple backends (#2357 ) * Make cranelift_codegen::isa::unwind::input public * Move UnwindCode's common offset field out of the structure * Make MachCompileResult::unwind_info more generic * Record initial stack pointer offset	2020-11-05 16:57:40 -06:00
Chris Fallin	a2bbb198de	Do value-extensions at ABI boundaries only when ABI requires it. There has been some confusion over the meaning of the "sign-extend" (`sext`) and "zero-extend" (`uext`) attributes on parameters and return values in signatures. According to the three implemented backends, these attributes indicate that a value narrower than a full register should always be extended in the way specified. However, they are much more useful if they mean "extend in this way if the ABI requires extending": only the ABI backend knows whether or not a particular ABI (e.g., x64 SysV vs. x64 Baldrdash) requires extensions, while only the frontend (CLIF generator) knows whether or not a value is signed, so the two have to work in concert. This is the result of some very helpful discussion in #2354 (thanks to @uweigand for raising the issue and @bjorn3 for helping to reason about it). This change respects the extension attributes in the above way, rather than unconditionally extending, to avoid potential performance degradation as we introduce more extension attributes on signatures.	2020-11-05 11:54:35 -08:00
Julian Seward	dd9bfcefaa	CL/aarch64: implement the wasm SIMD `v128.load{32,64}_zero` instructions. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions https://github.com/WebAssembly/simd/pull/237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (https://github.com/bytecodealliance/wasmtime/issues/2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-04 20:00:04 +01:00
Chris Fallin	75e02276be	Merge pull request #2360 from jgouly/reg_map aarch64: Fix aarch64_map_regs for FpuRRI	2020-11-04 10:37:50 -08:00
Joey Gouly	0223cb2f8c	aarch64: Fix aarch64_map_regs for FpuRRI This was wrong since I added it in `02c3f238f`. Copyright (c) 2020, Arm Limited.	2020-11-04 16:53:25 +00:00
Ulrich Weigand	80c2d70d2d	machinst ABI: Support for accumulating outgoing args When performing a function call, the platform ABI may require space on the stack to hold outgoing arguments and/or return values. Currently, this is supported via decrementing the stack pointer before the call and incrementing it afterwards, using the emit_stack_pre_adjust and emit_stack_post_adjust methods of ABICaller. However, on some platforms it would be preferable to just allocate enough space for any call done in the function in the caller's prologue instead. This patch adds support to allow back-ends to choose that method. Instead of calling emit_stack_pre/post_adjust around a call, they simply call a new accumulate_outgoing_args_size method of ABICaller instead. This will pass on the required size to the ABICallee structure of the calling function, which will accumulate the maximum size required for all function calls. That accumulated size is then passed to the gen_clobber_save and gen_clobber_restore functions so they can include the size in the stack allocation / deallocation that already happens in the prologue / epilogue code.	2020-11-03 18:49:34 +01:00
Chris Fallin	5ab7b4aa7f	Merge pull request #2345 from uweigand/abi-stackalign machinst ABI: Allow back-end to define stack alignment	2020-11-03 09:02:41 -08:00
Chris Fallin	0c240991ae	Merge pull request #2346 from uweigand/abi-noframepointer machinst ABI: Pass fixed frame size to gen_clobber_restore	2020-11-03 09:00:59 -08:00
Julian Seward	5a5fb11979	CL/aarch64: implement the wasm SIMD `i32x4.dot_i16x8_s` instruction This patch implements, for aarch64, the following wasm SIMD extensions i32x4.dot_i16x8_s instruction https://github.com/WebAssembly/simd/pull/127 It also updates dependencies as follows, in order that the new instruction can be parsed, decoded, etc: wat to 1.0.27 wast to 26.0.1 wasmparser to 0.65.0 wasmprinter to 0.2.12 The changes are straightforward: * new CLIF instruction `widening_pairwise_dot_product_s` * translation from wasm into `widening_pairwise_dot_product_s` * new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group) * translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv` There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-03 14:25:04 +01:00
Ulrich Weigand	c9bc4edd08	machinst ABI: Pass fixed frame size to gen_clobber_restore The ABI common code currently passes the fixed frame size to the gen_clobber_save back-end routine, which is required to emit code to allocate the required stack space in the prologue. Similarly, the back-end needs to emit code to de-allocate the stack in the epilogue. However, at this point the back-end does not have access to that fixed frame size value any more. With targets that use a frame pointer, this does not matter, since de-allocation can be done simply by assigning the frame pointer back to the stack pointer. However, on targets that do not use a frame pointer, the frame size is required. To allow back-ends that option, this patch changes ABI common code to pass the fixed frame size to get_clobber_restore as well (the same value as is passed to get_clobber_save).	2020-11-03 11:15:03 +01:00
Ulrich Weigand	d02ae3940c	machinst ABI: Allow back-end to define stack alignment The common gen_prologue code currently assumes that the stack pointer has to be aligned to twice the word size. While this is true for many ABIs, it does not hold universally. This patch adds a new callback stack_align that back-ends can provide to define the specific stack alignment required by the ABI on that platform.	2020-11-03 09:43:55 +01:00
Anton Kirilov	207779fe1d	Cranelift AArch64: Improve code generation for vector constants In particular, introduce initial support for the MOVI and MVNI instructions, with 8-bit elements. Also, treat vector constants as 32- or 64-bit floating-point numbers, if their value allows it, by relying on the architectural zero extension. Finally, stop generating literal loads for 32-bit constants. Copyright (c) 2020, Arm Limited.	2020-10-30 13:16:12 +00:00
Chris Fallin	c35904a8bf	Merge pull request #2278 from akirilov-arm/load_splat Introduce the Cranelift IR instruction `LoadSplat`	2020-10-28 12:54:03 -07:00
Julian Seward	c15d9bd61b	CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding instructions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions https://github.com/WebAssembly/simd/pull/232 Pseudo-Minimum and Pseudo-Maximum instructions https://github.com/WebAssembly/simd/pull/122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.	2020-10-26 10:37:07 +01:00
Yury Delendik	de4af90af6	machinst x64: New backend unwind (#2266 ) Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.	2020-10-23 15:19:41 -05:00
Julian Seward	2702942050	CL/aarch64 back end: implement the wasm SIMD `bitmask` instructions The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single AArch64 SIMD instruction, and instead need a sequence of around ten instructions. Because of this, this patch is somewhat longer and more complex than it would be for (eg) x64. Main changes are: * the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64. * at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm instructions are to be translated. * in the wasm->CLIF translation (code_translator.rs), translate into `vhigh_bits`. This is straightforward. * in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits` into equivalent sequences of AArch64 instructions. There is a different sequence for each of the `{8x16, 16x8, 32x4}` variants. All other changes are AArch64-specific, and add instruction definitions needed by the previous step: * Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by immediate) and `VecExtract` (effectively a double-length vector shift) * To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the `VecLanesOp` family add an `addv` variant. * Add supporting code for the above changes to AArch64 instructions: - getting the register uses (`aarch64_get_regs`) - mapping the registers (`aarch64_map_regs`) - printing instructions - emitting instructions (`impl MachInstEmit for Inst`). The handling of `VecShiftImm` is a bit complex. - emission tests for new instructions and variants.	2020-10-23 05:26:25 +02:00
Anton Kirilov	e0b911a4df	Introduce the Cranelift IR instruction `LoadSplat` It corresponds to WebAssembly's `load*_splat` operations, which were previously represented as a combination of `Load` and `Splat` instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge the `Load` and the `Splat` in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation. The AArch64 backend leverages the new instruction to improve code generation. Copyright (c) 2020, Arm Limited.	2020-10-14 13:07:13 +01:00
Benjamin Bouvier	c5bbc87498	machinst: allow passing constant information to the instruction emitter; A new associated type Info is added to MachInstEmit, which is the immutable counterpart to State. It can't easily be constructed from an ABICallee, since it would require adding an associated type to the latter, and making so leaks the associated type in a lot of places in the code base and makes the code harder to read. Instead, the EmitInfo state can simply be passed to the `Vcode::emit` function directly.	2020-10-08 09:21:51 +02:00
Andrew Brown	ce44719e1f	refactor: change LowerCtx::get_immediate to return a DataValue This change abstracts away (from the perspective of the new backend) how immediate values are stored in InstructionData. It gathers large immediates from necessary places (e.g. constant pool) and delegates to `InstructionData::imm_value` for the rest. This refactor only touches original users of `LowerCtx::get_immediate` but a future change could do the same for any place the new backend is accessing InstructionData directly to retrieve immediates.	2020-10-07 12:17:17 -07:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Joey Gouly	eec60c9b06	arm64: Use SignedOffset rather than PreIndexed addressing mode for callee-saved registers This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`) into `gen_clobber_save` so that it can be combined with other stack adjustments. Copyright (c) 2020, Arm Limited.	2020-10-02 16:22:55 +01:00

1 2 3 4 5

212 Commits