wasmtime

Author	SHA1	Message	Date
Sam Parker	d307a4ab9a	[AArch64] Improve AtomicRMWLoop (#3839 ) Add more tests, use accurate disassembly, respect data sizes and simplify the Xchg implementation. Copyright (c) 2022, Arm Limited	2022-02-23 10:47:59 -08:00
Sam Parker	e572198f85	[AArch64] Merge 32- and 64-bit ALUOps (#3802 ) Combine the two opcodes into one and pass and add an OperandSize field to these instructions, as well as an ISLE helper to perform the conversion from Type. This saves us from having having to write ISLE helpers to select the correct opcode, based on type, and reduces the amount of code needed for emission. Copyright (c) 2022, Arm Limited.	2022-02-17 10:03:54 -08:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Freddie Liardet	b5531580e7	Improve code generation for floating-point constants Copyright (c) 2022, Arm Limited.	2022-01-18 10:39:05 +00:00
bjorn3	88baac4ca6	Move the TestCodeSink functionality to MachBufferFinalized	2022-01-11 14:40:53 +01:00
bjorn3	d50f27e8f9	Remove reg_universe method from MachBackend and MachInst	2022-01-06 14:39:50 +01:00
bjorn3	17c3c1813f	Remove MachInstEmitInfo	2022-01-04 18:06:01 +01:00
Alex Crichton	e94ebc2263	aarch64: Translate rot{r,l} to ISLE (#3614 ) This commit translates the `rotl` and `rotr` lowerings already existing to ISLE. The port was relatively straightforward with the biggest changing being the instructions generated around i128 rotl/rotr primarily due to register changes.	2021-12-17 12:37:17 -06:00
Alex Crichton	7d0f6ab90f	aarch64: Migrate `iadd` and `isub` to ISLE This commit is the first "meaty" instruction added to ISLE for the AArch64 backend. I chose to pick the first two in the current lowering's `match` statement, `isub` and `iadd`. These two turned out to be particularly interesting for a few reasons: * Both had clearly migratable-to-ISLE behavior along the lines of special-casing per type. For example 128-bit and vector arithmetic were both easily translateable. * The `iadd` instruction has special cases for fusing with a multiplication to generate `madd` which is expressed pretty easily in ISLE. * Otherwise both instructions had a number of forms where they attempted to interpret the RHS as various forms of constants, extends, or shifts. There's a bit of a design space of how best to represent this in ISLE and what I settled on was to have a special case for each form of instruction, and the special cases are somewhat duplicated between `iadd` and `isub`. There's custom "extractors" for the special cases and instructions that support these special cases will have an `rule`-per-case. Overall I think the ISLE transitioned pretty well. I don't think that the aarch64 backend is going to follow the x64 backend super closely, though. For example the x64 backend is having a helper-per-instruction at the moment but with AArch64 it seems to make more sense to only have a helper-per-enum-variant-of-`MInst`. This is because the same instruction (e.g. `ALUOp::Sub32`) can be expressed with multiple different forms depending on the payload. It's worth noting that the ISLE looks like it's a good deal larger than the code actually being removed from lowering as part of this commit. I think this is deceptive though because a lot of the logic in `put_input_in_rse_imm12_maybe_negated` and `alu_inst_imm12` is being inlined into the ISLE definitions for each instruction instead of having it all packed into the helper functions. Some of the "boilerplate" here is the addition of various ISLE utilities as well.	2021-11-19 06:51:38 -08:00
Alex Crichton	1141169ff8	aarch64: Initial work to transition backend to ISLE (#3541 ) * aarch64: Initial work to transition backend to ISLE This commit is what is hoped to be the initial commit towards migrating the aarch64 backend to ISLE. There's seemingly a lot of changes here but it's intended to largely be code motion. The current thinking is to closely follow the x64 backend for how all this is handled and organized. Major changes in this PR are: * The `Inst` enum is now defined in ISLE. This avoids having to define it in two places (once in Rust and once in ISLE). I've preserved all the comments in the ISLE and otherwise this isn't actually a functional change from the Rust perspective, it's still the same enum according to Rust. * Lots of little enums and things were moved to ISLE as well. As with `Inst` their definitions didn't change, only where they're defined. This will give future ISLE PRs access to all these operations. * Initial code for lowering `iconst`, `null`, and `bconst` are implemented. Ironically none of this is actually used right now because constant lowering is handled in `put_input_in_regs` which specially handles constants. Nonetheless I wanted to get at least something simple working which shows off how to special case various things that are specific to AArch64. In a future PR I plan to hook up const-lowering in ISLE to this path so even though `iconst`-the-clif-instruction is never lowered this should use the const lowering defined in ISLE rather than elsewhere in the backend (eventually leading to the deletion of the non-ISLE lowering). * The `IsleContext` skeleton is created and set up for future additions. * Some code for ISLE that's shared across all backends now lives in `isle_prelude_methods!()` and is deduplicated between the AArch64 backend and the x64 backend. * Register mapping is tweaked to do the same thing for AArch64 that it does for x64. Namely mapping virtual registers is supported instead of just virtual to machine registers. My main goal with this PR was to get AArch64 into a place where new instructions can be added with relative ease. Additionally I'm hoping to figure out as part of this change how much to share for ISLE between AArch64 and x64 (and other backends). * Don't use priorities with rules * Update .gitattributes with concise syntax * Deduplicate some type definitions * Rebuild ISLE * Move isa::isle to machinst::isle	2021-11-18 10:38:16 -06:00
Anton Kirilov	0fb3acfb94	Cranelift AArch64: Fix the VanyTrue implementation for 64-bit elements Copyright (c) 2021, Arm Limited.	2021-09-23 20:39:46 +01:00
Chris Fallin	3474965ca6	Merge pull request #3322 from sparker-arm/aarch64-lse-ops AArch64 LSE atomic_rmw support	2021-09-22 09:21:28 -07:00
Anton Kirilov	a8aec2e0e6	Cranelift AArch64: Avoid invalid encodings for some vector instructions Copyright (c) 2021, Arm Limited.	2021-09-16 12:26:58 +01:00
Sam Parker	7da76f0601	cargo fmt	2021-09-15 16:01:51 +01:00
Sam Parker	80d596b055	AArch64 LSE atomic_rmw support Rename the existing AtomicRMW to AtomicRMWLoop and directly lower atomic_rmw operations, without a loop if LSE support is available. Copyright (c) 2021, Arm Limited	2021-09-15 16:01:51 +01:00
Anton Kirilov	8805e25042	Cranelift AArch64: Improve the type checks for IR operations There were cases where the AArch64 backend assumed that an IR operation would always operate on certain types (the most likely reason being that the corresponding WebAssembly instruction did not cover anything else), even though the definition of the IR operation imposed no constraints like that. Copyright (c) 2021, Arm Limited.	2021-09-13 14:46:45 +01:00
Benjamin Bouvier	85ec11acb9	Aarch64: always generate the CFA directive indicating no pointer signing	2021-09-02 09:16:34 +02:00
Alex Crichton	1532516a36	Use relative `call` instructions between wasm functions (#3275 ) * Use relative `call` instructions between wasm functions This commit is a relatively major change to the way that Wasmtime generates code for Wasm modules and how functions call each other. Prior to this commit all function calls between functions, even if they were defined in the same module, were done indirectly through a register. To implement this the backend would emit an absolute 8-byte relocation near all function calls, load that address into a register, and then call it. While this technique is simple to implement and easy to get right, it has two primary downsides associated with it: * Function calls are always indirect which means they are more difficult to predict, resulting in worse performance. * Generating a relocation-per-function call requires expensive relocation resolution at module-load time, which can be a large contributing factor to how long it takes to load a precompiled module. To fix these issues, while also somewhat compromising on the previously simple implementation technique, this commit switches wasm calls within a module to using the `colocated` flag enabled in Cranelift-speak, which basically means that a relative call instruction is used with a relocation that's resolved relative to the pc of the call instruction itself. When switching the `colocated` flag to `true` this commit is also then able to move much of the relocation resolution from `wasmtime_jit::link` into `wasmtime_cranelift::obj` during object-construction time. This frontloads all relocation work which means that there's actually no relocations related to function calls in the final image, solving both of our points above. The main gotcha in implementing this technique is that there are hardware limitations to relative function calls which mean we can't simply blindly use them. AArch64, for example, can only go +/- 64 MB from the `bl` instruction to the target, which means that if the function we're calling is a greater distance away then we would fail to resolve that relocation. On x86_64 the limits are +/- 2GB which are much larger, but theoretically still feasible to hit. Consequently the main increase in implementation complexity is fixing this issue. This issue is actually already present in Cranelift itself, and is internally one of the invariants handled by the `MachBuffer` type. When generating a function relative jumps between basic blocks have similar restrictions. This commit adds new methods for the `MachBackend` trait and updates the implementation of `MachBuffer` to account for all these new branches. Specifically the changes to `MachBuffer` are: * For AAarch64 the `LabelUse::Branch26` value now supports veneers, and AArch64 calls use this to resolve relocations. * The `emit_island` function has been rewritten internally to handle some cases which previously didn't come up before, such as: * When emitting an island the deadline is now recalculated, where previously it was always set to infinitely in the future. This was ok prior since only a `Branch19` supported veneers and once it was promoted no veneers were supported, so without multiple layers of promotion the lack of a new deadline was ok. * When emitting an island all pending fixups had veneers forced if their branch target wasn't known yet. This was generally ok for 19-bit fixups since the only kind getting a veneer was a 19-bit fixup, but with mixed kinds it's a bit odd to force veneers for a 26-bit fixup just because a nearby 19-bit fixup needed a veneer. Instead fixups are now re-enqueued unless they're known to be out-of-bounds. This may run the risk of generating more islands for 19-bit branches but it should also reduce the number of islands for between-function calls. * Otherwise the internal logic was tweaked to ideally be a bit more simple, but that's a pretty subjective criteria in compilers... I've added some simple testing of this for now. A synthetic compiler option was create to simply add padded 0s between functions and test cases implement various forms of calls that at least need veneers. A test is also included for x86_64, but it is unfortunately pretty slow because it requires generating 2GB of output. I'm hoping for now it's not too bad, but we can disable the test if it's prohibitive and otherwise just comment the necessary portions to be sure to run the ignored test if these parts of the code have changed. The final end-result of this commit is that for a large module I'm working with the number of relocations dropped to zero, meaning that nothing actually needs to be done to the text section when it's loaded into memory (yay!). I haven't run final benchmarks yet but this is the last remaining source of significant slowdown when loading modules, after I land a number of other PRs both active and ones that I only have locally for now. * Fix arm32 * Review comments	2021-09-01 13:27:38 -05:00
Anton Kirilov	7b98be1bee	Cranelift: Simplify leaf functions that do not use the stack (#2960 ) * Cranelift AArch64: Simplify leaf functions that do not use the stack Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record. Copyright (c) 2021, Arm Limited.	2021-08-27 12:12:37 +02:00
Sam Parker	cbb7229457	Re-implement atomic load and stores The AArch64 support was a bit broken and was using Armv7 style barriers, which aren't required with Armv8 acquire-release load/stores. The fallback CAS loops and RMW, for AArch64, have also been updated to use acquire-release, exclusive, instructions which, again, remove the need for barriers. The CAS loop has also been further optimised by using the extending form of the cmp instruction. Copyright (c) 2021, Arm Limited.	2021-08-05 09:08:08 +01:00
Sam Parker	3bc2f0c701	Enable simd_X_extadd_pairwise_X for AArch64 Lower to [u\|s]addlp for AArch64. Copyright (c) 2021, Arm Limited.	2021-08-03 10:25:09 +01:00
Sam Parker	f2806a9192	rebase and ran cargo fmt Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Sam Parker	541a4ee428	Enable simd_extmul_* for AArch64 Lower simd_extmul_[low/high][signed/unsigned] to [s\|u]widen inputs to an imul node. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
Anton Kirilov	6c3d7092b9	Enable the simd_conversions test for AArch64 Copyright (c) 2021, Arm Limited.	2021-07-16 22:04:45 +01:00
Anton Kirilov	330f02aa09	Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64 Also, reorganize the AArch64-specific VCode instructions for unary narrowing and widening vector operations, so that they are more straightforward to use. Copyright (c) 2021, Arm Limited.	2021-06-30 12:17:53 +01:00
Anton Kirilov	98f1ac789e	Enable the simd_i16x8_q15mulr_sat_s test on AArch64 Copyright (c) 2021, Arm Limited.	2021-06-28 12:24:31 +01:00
Afonso Bordado	e85eb77c45	aarch64: Implement missing atomic rmw ops	2021-06-25 07:51:46 +01:00
Afonso Bordado	b8ad99e435	aarch64: Implement TLS ELF GD Relocations Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD. This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers. However currently we only recognize elf_gd so lets start with that as a TLS implementation.	2021-06-24 12:21:44 +01:00
Chris Fallin	3d56728b86	Merge pull request #2975 from afonso360/aarch64-icmp aarch64: Implement lowering i128 icmp instructions	2021-06-09 15:38:41 -07:00
Afonso Bordado	4d085d8fbf	aarch64: Add sbcs instruction encodings	2021-06-09 22:56:39 +01:00
Afonso Bordado	61f07d79a7	aarch64: Add adcs instruction encodings	2021-06-09 22:56:39 +01:00
Afonso Bordado	2c4d1c0003	aarch64: Add ands instruction encoding	2021-06-09 22:38:01 +01:00
Afonso Bordado	a2e74b2c45	aarch64: Implement isub for i128 operands	2021-05-22 21:51:41 +01:00
Afonso Bordado	d3b525fa29	aarch64: Implement iadd for i128 operands	2021-05-22 21:21:44 +01:00
Anton Kirilov	480670e17f	Enable the simd_boolean test for AArch64 Also, enable the simd_i64x2_arith2 test because it doesn't need any code changes. Copyright (c) 2021, Arm Limited.	2021-04-27 20:19:51 +01:00
Chris Fallin	1f21b32e99	Merge pull request #2838 from uweigand/optionalfp Allow unwind support to work without a frame pointer	2021-04-14 10:58:51 -07:00
Ulrich Weigand	5904c09682	Allow unwind support to work without a frame pointer The patch extends the unwinder to support targets that do not need to use a dedicated frame pointer register. Specifically, the changes include: - Change the "fp" routine in the RegisterMapper to return an optional frame pointer regsiter via Option<Register>. - On targets that choose to not define a FP register via the above routine, the UnwindInst::DefineNewFrame operation no longer switches the CFA to be defined in terms of the FP. (The operation still can be used to define the location of the clobber area.) - In addition, on targets that choose not to define a FP register, the UnwindInst::PushFrameRegs operation is not supported. - There is a new operation UnwindInst::StackAlloc that needs to be called on targets without FP whenever the stack pointer is updated. This caused the CFA offset to be adjusted accordingly. (On targets with FP this operation is a no-op.)	2021-04-14 15:32:31 +02:00
Anton Kirilov	7248abd591	Cranelift AArch64: Improve the handling of callee-saved registers SIMD & FP registers are now saved and restored in pairs, similarly to general-purpose registers. Also, only the bottom 64 bits of the registers are saved and restored (in case of non-Baldrdash ABIs), which is the requirement from the Procedure Call Standard for the Arm 64-bit Architecture. As for the callee-saved general-purpose registers, if a procedure needs to save and restore an odd number of them, it no longer uses store and load pair instructions for the last register. Copyright (c) 2021, Arm Limited.	2021-04-13 20:23:08 +01:00
Anton Kirilov	07c27039b1	Cranelift AArch64: Add initial support for the Armv8.1 atomics This commit enables Cranelift's AArch64 backend to generate code for instruction set extensions (previously only the base Armv8-A architecture was supported); also, it makes it possible to detect the extensions supported by the host when JIT compiling. The new functionality is applied to the IR instruction `AtomicCas`. Copyright (c) 2021, Arm Limited.	2021-03-13 02:31:51 +00:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
Kasey Carrothers	99be82c866	Replace MachInst::gen_zero_len_nop with gen_nop(0)	2021-01-29 01:15:08 -08:00
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
Anton Kirilov	043a8434d2	Cranelift AArch64: Improve the Popcnt implementation Now the backend uses the CNT instruction, which results into a major simplification. Copyright (c) 2021, Arm Limited.	2021-01-19 16:49:47 +00:00
Chris Fallin	6eea015d6c	Multi-register value support: framework for Values wider than machine regs. This will allow for support for `I128` values everywhere, and `I64` values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the machine backends to build such support; it just adds the framework for the MachInst backends to reason about a `Value` residing in more than one register.	2021-01-05 17:45:02 -08:00
Anton Kirilov	f59b274d22	Cranelift AArch64: Further vector constant improvements Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements, and the vector variant of FMOV. Copyright (c) 2020, Arm Limited.	2020-12-03 15:30:24 +00:00
Chris Fallin	d413b907b4	Merge pull request #2414 from jgouly/extend-refactor arm64: Refactor Inst::Extend handling	2020-11-25 17:22:07 -08:00
Nick Fitzgerald	93c199363f	Merge pull request #2449 from bytecodealliance/cfallin/add-pre-host-hooks Add FuncEnvironment hooks to generate prologue and epilogue code.	2020-11-24 17:48:29 -08:00
Chris Fallin	4300c2c075	Add FuncEnvironment hooks to generate prologue and epilogue code. In some cases, it is useful to do some work at entry to or exit from a Cranelift function translated from WebAssembly. This PR adds two optional methods to the `FuncEnvironment` trait to do just this, analogous to the pre/post-hooks on operators that already exist. This PR also includes a drive-by compilation fix due to the latest nightly wherein `.is_empty()` on a `Range` ambiguously refers to either the `Range` impl or the `ExactSizeIterator` impl and can't resolve.	2020-11-24 16:36:15 -08:00

1 2 3 4

161 Commits