wasmtime

Author	SHA1	Message	Date
Chris Fallin	5e96a447f0	Add back the `ifcmp_sp` CLIF opcode. This opcode was removed as part of the old-backend cleanup in #3446. While this opcode will definitely go away eventually, it is unfortunately still used today in Lucet (as we just discovered while working to upgrade Lucet's pinned Cranelift version). Lucet is deprecated and slated to eventually be completely sunset in favor of Wasmtime; but until that happens, we need to keep this opcode.	2021-11-01 13:34:31 -07:00
bjorn3	a05bf2bf42	Remove instructions necessary for the old regalloc	2021-10-12 14:37:36 +02:00
bjorn3	1fd491dadd	Remove fallthrough instruction	2021-10-12 14:22:07 +02:00
bjorn3	5b24e117ee	Remove instructions used by old br_table legalization	2021-10-12 14:18:52 +02:00
bjorn3	8a8797b911	Remove the sarg_t type and dummy_sarg_t instruction They are no longer necessary with the new style backends	2021-10-10 14:38:35 +02:00
Benjamin Bouvier	43a86f14d5	Remove more old backend ISA concepts (#3402 ) This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate. Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!	2021-10-04 10:36:12 +02:00
bjorn3	9e34df33b9	Remove the old x86 backend	2021-09-29 16:13:46 +02:00
Chris Fallin	344a219245	Merge pull request #3383 from akirilov-arm/vany_true Cranelift AArch64: Fix the VanyTrue implementation for 64-bit elements	2021-09-24 09:26:36 -07:00
Anton Kirilov	0fb3acfb94	Cranelift AArch64: Fix the VanyTrue implementation for 64-bit elements Copyright (c) 2021, Arm Limited.	2021-09-23 20:39:46 +01:00
Anton Kirilov	930b1f17f0	Cranelift AArch64: Implement scalar FmaxPseudo and FminPseudo Copyright (c) 2021, Arm Limited.	2021-09-23 15:11:01 +01:00
Chris Fallin	3474965ca6	Merge pull request #3322 from sparker-arm/aarch64-lse-ops AArch64 LSE atomic_rmw support	2021-09-22 09:21:28 -07:00
Anton Kirilov	a8aec2e0e6	Cranelift AArch64: Avoid invalid encodings for some vector instructions Copyright (c) 2021, Arm Limited.	2021-09-16 12:26:58 +01:00
Sam Parker	7da76f0601	cargo fmt	2021-09-15 16:01:51 +01:00
Sam Parker	80d596b055	AArch64 LSE atomic_rmw support Rename the existing AtomicRMW to AtomicRMWLoop and directly lower atomic_rmw operations, without a loop if LSE support is available. Copyright (c) 2021, Arm Limited	2021-09-15 16:01:51 +01:00
Anton Kirilov	8805e25042	Cranelift AArch64: Improve the type checks for IR operations There were cases where the AArch64 backend assumed that an IR operation would always operate on certain types (the most likely reason being that the corresponding WebAssembly instruction did not cover anything else), even though the definition of the IR operation imposed no constraints like that. Copyright (c) 2021, Arm Limited.	2021-09-13 14:46:45 +01:00
Benjamin Bouvier	85ec11acb9	Aarch64: always generate the CFA directive indicating no pointer signing	2021-09-02 09:16:34 +02:00
Alex Crichton	1532516a36	Use relative `call` instructions between wasm functions (#3275 ) * Use relative `call` instructions between wasm functions This commit is a relatively major change to the way that Wasmtime generates code for Wasm modules and how functions call each other. Prior to this commit all function calls between functions, even if they were defined in the same module, were done indirectly through a register. To implement this the backend would emit an absolute 8-byte relocation near all function calls, load that address into a register, and then call it. While this technique is simple to implement and easy to get right, it has two primary downsides associated with it: * Function calls are always indirect which means they are more difficult to predict, resulting in worse performance. * Generating a relocation-per-function call requires expensive relocation resolution at module-load time, which can be a large contributing factor to how long it takes to load a precompiled module. To fix these issues, while also somewhat compromising on the previously simple implementation technique, this commit switches wasm calls within a module to using the `colocated` flag enabled in Cranelift-speak, which basically means that a relative call instruction is used with a relocation that's resolved relative to the pc of the call instruction itself. When switching the `colocated` flag to `true` this commit is also then able to move much of the relocation resolution from `wasmtime_jit::link` into `wasmtime_cranelift::obj` during object-construction time. This frontloads all relocation work which means that there's actually no relocations related to function calls in the final image, solving both of our points above. The main gotcha in implementing this technique is that there are hardware limitations to relative function calls which mean we can't simply blindly use them. AArch64, for example, can only go +/- 64 MB from the `bl` instruction to the target, which means that if the function we're calling is a greater distance away then we would fail to resolve that relocation. On x86_64 the limits are +/- 2GB which are much larger, but theoretically still feasible to hit. Consequently the main increase in implementation complexity is fixing this issue. This issue is actually already present in Cranelift itself, and is internally one of the invariants handled by the `MachBuffer` type. When generating a function relative jumps between basic blocks have similar restrictions. This commit adds new methods for the `MachBackend` trait and updates the implementation of `MachBuffer` to account for all these new branches. Specifically the changes to `MachBuffer` are: * For AAarch64 the `LabelUse::Branch26` value now supports veneers, and AArch64 calls use this to resolve relocations. * The `emit_island` function has been rewritten internally to handle some cases which previously didn't come up before, such as: * When emitting an island the deadline is now recalculated, where previously it was always set to infinitely in the future. This was ok prior since only a `Branch19` supported veneers and once it was promoted no veneers were supported, so without multiple layers of promotion the lack of a new deadline was ok. * When emitting an island all pending fixups had veneers forced if their branch target wasn't known yet. This was generally ok for 19-bit fixups since the only kind getting a veneer was a 19-bit fixup, but with mixed kinds it's a bit odd to force veneers for a 26-bit fixup just because a nearby 19-bit fixup needed a veneer. Instead fixups are now re-enqueued unless they're known to be out-of-bounds. This may run the risk of generating more islands for 19-bit branches but it should also reduce the number of islands for between-function calls. * Otherwise the internal logic was tweaked to ideally be a bit more simple, but that's a pretty subjective criteria in compilers... I've added some simple testing of this for now. A synthetic compiler option was create to simply add padded 0s between functions and test cases implement various forms of calls that at least need veneers. A test is also included for x86_64, but it is unfortunately pretty slow because it requires generating 2GB of output. I'm hoping for now it's not too bad, but we can disable the test if it's prohibitive and otherwise just comment the necessary portions to be sure to run the ignored test if these parts of the code have changed. The final end-result of this commit is that for a large module I'm working with the number of relocations dropped to zero, meaning that nothing actually needs to be done to the text section when it's loaded into memory (yay!). I haven't run final benchmarks yet but this is the last remaining source of significant slowdown when loading modules, after I land a number of other PRs both active and ones that I only have locally for now. * Fix arm32 * Review comments	2021-09-01 13:27:38 -05:00
Anton Kirilov	7b98be1bee	Cranelift: Simplify leaf functions that do not use the stack (#2960 ) * Cranelift AArch64: Simplify leaf functions that do not use the stack Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record. Copyright (c) 2021, Arm Limited.	2021-08-27 12:12:37 +02:00
Anton Kirilov	a1b39276e1	Enable more CLIF tests on AArch64 The tests for the SIMD floating-point maximum and minimum operations require particular care because the handling of the NaN values is non-deterministic and may vary between platforms. There is no way to match several NaN values in a test, so the solution is to extract the non-deterministic test cases into a separate file that is subsequently replicated for every backend under test, with adjustments made to the expected results. Copyright (c) 2021, Arm Limited.	2021-08-17 13:27:58 +01:00
Sam Parker	b6f6ac116a	Revert IR changes Along with the x64 and s390x changes. Now pattern matching the uextend(atomic_load) in the aarch64 backend.	2021-08-05 09:35:32 +01:00
Sam Parker	cbb7229457	Re-implement atomic load and stores The AArch64 support was a bit broken and was using Armv7 style barriers, which aren't required with Armv8 acquire-release load/stores. The fallback CAS loops and RMW, for AArch64, have also been updated to use acquire-release, exclusive, instructions which, again, remove the need for barriers. The CAS loop has also been further optimised by using the extending form of the cmp instruction. Copyright (c) 2021, Arm Limited.	2021-08-05 09:08:08 +01:00
Sam Parker	3bc2f0c701	Enable simd_X_extadd_pairwise_X for AArch64 Lower to [u\|s]addlp for AArch64. Copyright (c) 2021, Arm Limited.	2021-08-03 10:25:09 +01:00
Johnnie Birch	e519fca61c	Refactor and turn on lowering for extend-add-pairwise	2021-07-31 10:52:39 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Sam Parker	5eb2dca9f1	Added doc comment And removed an accidental code move. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Sam Parker	f2806a9192	rebase and ran cargo fmt Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Sam Parker	541a4ee428	Enable simd_extmul_* for AArch64 Lower simd_extmul_[low/high][signed/unsigned] to [s\|u]widen inputs to an imul node. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
Afonso Bordado	3a38400447	aarch64: Refactor lower_icmp to use a single materialize_bool_result	2021-07-19 09:31:14 -07:00
Afonso Bordado	14d1c7ee9f	aarch64: Refactor lower_icmp to allow returning a different flag	2021-07-19 09:31:14 -07:00
Afonso Bordado	e628fb376f	aarch64: Fix incorrect code generation for overflow icmp in i16 values	2021-07-19 09:31:14 -07:00
Afonso Bordado	db5566dadb	aarch64: Fix lowering amounts for shifts This commit addresses two issues: * A panic when shifting any non i128 type by i128 amounts (#3064) * Wrong results when lowering shifts with small types (i8, i16) In these types when shifting for amounts larger than the size of the type, we would not get the wrapping behaviour that we see on i32 and i64. This is because in these larger types, the wrapping behaviour is automatically implemented by using the appropriate instruction, however we do not have i8 and i16 specific instructions, so we have to manually wrap the shift amount with an AND instruction. This issue is also found on x86_64 and s390x, and a separate issue will be filed for those. Closes #3064	2021-07-16 22:08:02 +01:00
Anton Kirilov	6c3d7092b9	Enable the simd_conversions test for AArch64 Copyright (c) 2021, Arm Limited.	2021-07-16 22:04:45 +01:00
Johnnie Birch	d8e813204e	Fold fcvt_low_from_uinit into previously existing clif instructions	2021-07-09 10:39:05 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Afonso Bordado	eebae8d4c8	aarch64: Fix incorrect encoding of large const values in icmp. When encoding constants as immediates into an RSE Imm12 instruction we need to take special care to check if the value that we are trying to input does not overflow its type when viewed as a signed value. (i.e. iconst.i8 200) We cannot both put an immediate and sign extend it, so we need to lower it into a separate reg, and emit the sign extend into the instruction. For more details see the [cg_clif bug report](https://github.com/bjorn3/rustc_codegen_cranelift/issues/1184#issuecomment-873214796).	2021-07-03 22:42:15 +01:00
Anton Kirilov	330f02aa09	Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64 Also, reorganize the AArch64-specific VCode instructions for unary narrowing and widening vector operations, so that they are more straightforward to use. Copyright (c) 2021, Arm Limited.	2021-06-30 12:17:53 +01:00
Anton Kirilov	98f1ac789e	Enable the simd_i16x8_q15mulr_sat_s test on AArch64 Copyright (c) 2021, Arm Limited.	2021-06-28 12:24:31 +01:00
Afonso Bordado	e85eb77c45	aarch64: Implement missing atomic rmw ops	2021-06-25 07:51:46 +01:00
Chris Fallin	652f21e3e0	Merge pull request #3026 from afonso360/aarch64-elf-tls aarch64: Implement TLS ELF GD Relocations	2021-06-24 11:54:34 -07:00
Afonso Bordado	7a5948f729	aarch64: Implement lowering i128 select	2021-06-24 16:19:25 +01:00
Afonso Bordado	b8ad99e435	aarch64: Implement TLS ELF GD Relocations Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD. This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers. However currently we only recognize elf_gd so lets start with that as a TLS implementation.	2021-06-24 12:21:44 +01:00
Chris Fallin	fa1a04d002	Merge pull request #3005 from afonso360/aarch64-i128-extend aarch64: Implement uextend/sextend for i128 values	2021-06-22 10:24:30 -07:00
Afonso Bordado	f25f5b2732	aarch64: Implement lowering uextend/sextend for i128 values	2021-06-22 12:24:07 +01:00
Chris Fallin	18cd2f681c	Merge pull request #3002 from afonso360/aarch64-i128-br aarch64 implement brz,brnz,br_icmp for i128 values	2021-06-21 10:52:50 -07:00
Chris Fallin	444d9f9726	Merge pull request #3008 from afonso360/aarch64-i128-ireduce aarch64: Implement ireduce for i128 values	2021-06-21 09:54:43 -07:00
Chris Fallin	4246e69c1c	Merge pull request #3004 from afonso360/aarch64-i128-rotates aarch64: Implement lowering rotl/rotr for i128 values	2021-06-21 09:53:57 -07:00
Afonso Bordado	151ad2f338	aarch64: Implement ireduce for i128 values	2021-06-20 19:04:45 +01:00
Afonso Bordado	f7f52445c8	aarch64: Implement lowering rotl/rotr for i128 values	2021-06-20 15:53:56 +01:00
Afonso Bordado	45faace329	aarch64: Implement i128 br_icmp The previous commit deduplicated the icmp impl, so we reuse that but make modifications where we don't need to set the results.	2021-06-19 22:01:33 +01:00

1 2 3 4 5 ...

310 Commits