wasmtime

Author	SHA1	Message	Date
Chris Fallin	6e05b646a3	Merge pull request #3282 from afonso360/x64-fix-brtables cranelift: Fix `br_table` for `i64` types in x64 backend.	2021-09-02 09:58:42 -07:00
Afonso Bordado	f9ada24bcf	cranelift: Fix br_table for i64 inputs We still only support a maximum of u32::MAX entries, however we no longer crash when compiling 64 bit indexes. Fixes #3100	2021-09-02 15:31:48 +01:00
Benjamin Bouvier	85ec11acb9	Aarch64: always generate the CFA directive indicating no pointer signing	2021-09-02 09:16:34 +02:00
Benjamin Bouvier	fb94b81538	Use 16K code pages on Mac M1 Fixes #3278.	2021-09-02 09:16:34 +02:00
Alex Crichton	1532516a36	Use relative `call` instructions between wasm functions (#3275 ) * Use relative `call` instructions between wasm functions This commit is a relatively major change to the way that Wasmtime generates code for Wasm modules and how functions call each other. Prior to this commit all function calls between functions, even if they were defined in the same module, were done indirectly through a register. To implement this the backend would emit an absolute 8-byte relocation near all function calls, load that address into a register, and then call it. While this technique is simple to implement and easy to get right, it has two primary downsides associated with it: * Function calls are always indirect which means they are more difficult to predict, resulting in worse performance. * Generating a relocation-per-function call requires expensive relocation resolution at module-load time, which can be a large contributing factor to how long it takes to load a precompiled module. To fix these issues, while also somewhat compromising on the previously simple implementation technique, this commit switches wasm calls within a module to using the `colocated` flag enabled in Cranelift-speak, which basically means that a relative call instruction is used with a relocation that's resolved relative to the pc of the call instruction itself. When switching the `colocated` flag to `true` this commit is also then able to move much of the relocation resolution from `wasmtime_jit::link` into `wasmtime_cranelift::obj` during object-construction time. This frontloads all relocation work which means that there's actually no relocations related to function calls in the final image, solving both of our points above. The main gotcha in implementing this technique is that there are hardware limitations to relative function calls which mean we can't simply blindly use them. AArch64, for example, can only go +/- 64 MB from the `bl` instruction to the target, which means that if the function we're calling is a greater distance away then we would fail to resolve that relocation. On x86_64 the limits are +/- 2GB which are much larger, but theoretically still feasible to hit. Consequently the main increase in implementation complexity is fixing this issue. This issue is actually already present in Cranelift itself, and is internally one of the invariants handled by the `MachBuffer` type. When generating a function relative jumps between basic blocks have similar restrictions. This commit adds new methods for the `MachBackend` trait and updates the implementation of `MachBuffer` to account for all these new branches. Specifically the changes to `MachBuffer` are: * For AAarch64 the `LabelUse::Branch26` value now supports veneers, and AArch64 calls use this to resolve relocations. * The `emit_island` function has been rewritten internally to handle some cases which previously didn't come up before, such as: * When emitting an island the deadline is now recalculated, where previously it was always set to infinitely in the future. This was ok prior since only a `Branch19` supported veneers and once it was promoted no veneers were supported, so without multiple layers of promotion the lack of a new deadline was ok. * When emitting an island all pending fixups had veneers forced if their branch target wasn't known yet. This was generally ok for 19-bit fixups since the only kind getting a veneer was a 19-bit fixup, but with mixed kinds it's a bit odd to force veneers for a 26-bit fixup just because a nearby 19-bit fixup needed a veneer. Instead fixups are now re-enqueued unless they're known to be out-of-bounds. This may run the risk of generating more islands for 19-bit branches but it should also reduce the number of islands for between-function calls. * Otherwise the internal logic was tweaked to ideally be a bit more simple, but that's a pretty subjective criteria in compilers... I've added some simple testing of this for now. A synthetic compiler option was create to simply add padded 0s between functions and test cases implement various forms of calls that at least need veneers. A test is also included for x86_64, but it is unfortunately pretty slow because it requires generating 2GB of output. I'm hoping for now it's not too bad, but we can disable the test if it's prohibitive and otherwise just comment the necessary portions to be sure to run the ignored test if these parts of the code have changed. The final end-result of this commit is that for a large module I'm working with the number of relocations dropped to zero, meaning that nothing actually needs to be done to the text section when it's loaded into memory (yay!). I haven't run final benchmarks yet but this is the last remaining source of significant slowdown when loading modules, after I land a number of other PRs both active and ones that I only have locally for now. * Fix arm32 * Review comments	2021-09-01 13:27:38 -05:00
bjorn3	690ea640b3	Implement fmin_pseudo and fmax_pseudo for scalars	2021-08-27 16:59:47 +02:00
Anton Kirilov	7b98be1bee	Cranelift: Simplify leaf functions that do not use the stack (#2960 ) * Cranelift AArch64: Simplify leaf functions that do not use the stack Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record. Copyright (c) 2021, Arm Limited.	2021-08-27 12:12:37 +02:00
Johnnie Birch	e3aae9e498	Refactor to avoid too strict assertion. Fix for 3160 and 3161. Assertion was intended for SIMD lowering of F64x2ConvertLowI32x4U	2021-08-24 19:39:47 -07:00
Anton Kirilov	a1b39276e1	Enable more CLIF tests on AArch64 The tests for the SIMD floating-point maximum and minimum operations require particular care because the handling of the NaN values is non-deterministic and may vary between platforms. There is no way to match several NaN values in a test, so the solution is to extract the non-deterministic test cases into a separate file that is subsequently replicated for every backend under test, with adjustments made to the expected results. Copyright (c) 2021, Arm Limited.	2021-08-17 13:27:58 +01:00
Sam Parker	b6f6ac116a	Revert IR changes Along with the x64 and s390x changes. Now pattern matching the uextend(atomic_load) in the aarch64 backend.	2021-08-05 09:35:32 +01:00
Sam Parker	cbb7229457	Re-implement atomic load and stores The AArch64 support was a bit broken and was using Armv7 style barriers, which aren't required with Armv8 acquire-release load/stores. The fallback CAS loops and RMW, for AArch64, have also been updated to use acquire-release, exclusive, instructions which, again, remove the need for barriers. The CAS loop has also been further optimised by using the extending form of the cmp instruction. Copyright (c) 2021, Arm Limited.	2021-08-05 09:08:08 +01:00
Sam Parker	3bc2f0c701	Enable simd_X_extadd_pairwise_X for AArch64 Lower to [u\|s]addlp for AArch64. Copyright (c) 2021, Arm Limited.	2021-08-03 10:25:09 +01:00
Johnnie Birch	e519fca61c	Refactor and turn on lowering for extend-add-pairwise	2021-07-31 10:52:39 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Johnnie Birch	4f601edc36	Add x64 support for remaining int-to-int extend simd instructions Adds remaming support for int to int extend simd instructions. Specifically adds support for remaining I32x4->I64x2 instructions	2021-07-28 23:33:42 -07:00
Sam Parker	5eb2dca9f1	Added doc comment And removed an accidental code move. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Sam Parker	f2806a9192	rebase and ran cargo fmt Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Sam Parker	541a4ee428	Enable simd_extmul_* for AArch64 Lower simd_extmul_[low/high][signed/unsigned] to [s\|u]widen inputs to an imul node. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Johnnie Birch	500f530322	Add support for i32x4_trunc_sat_f64x2_s for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	23290f0450	Add support for i32x4_trunc_sat_f64x2_u for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	5deda27977	Add support for Saturating Rounding Q-format Multiplication for x64	2021-07-26 20:32:46 -07:00
Johnnie Birch	ffec1f9b41	Fix for 3089 X64 ext_mul_i8x16 has incorrect lowering Also factors out unnecessary temp register	2021-07-26 20:06:43 -07:00
Andrew Brown	6b86984c41	x64: avoid load-coalescing SIMD operations with non-aligned loads Fixes #2943, though not as optimally as may be desired. With x64 SIMD instructions, the memory operand must be aligned--this change adds that check. There are cases, however, where we can do better--see #3106.	2021-07-26 13:39:16 -07:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
Afonso Bordado	3a38400447	aarch64: Refactor lower_icmp to use a single materialize_bool_result	2021-07-19 09:31:14 -07:00
Afonso Bordado	14d1c7ee9f	aarch64: Refactor lower_icmp to allow returning a different flag	2021-07-19 09:31:14 -07:00
Afonso Bordado	e628fb376f	aarch64: Fix incorrect code generation for overflow icmp in i16 values	2021-07-19 09:31:14 -07:00
Afonso Bordado	db5566dadb	aarch64: Fix lowering amounts for shifts This commit addresses two issues: * A panic when shifting any non i128 type by i128 amounts (#3064) * Wrong results when lowering shifts with small types (i8, i16) In these types when shifting for amounts larger than the size of the type, we would not get the wrapping behaviour that we see on i32 and i64. This is because in these larger types, the wrapping behaviour is automatically implemented by using the appropriate instruction, however we do not have i8 and i16 specific instructions, so we have to manually wrap the shift amount with an AND instruction. This issue is also found on x86_64 and s390x, and a separate issue will be filed for those. Closes #3064	2021-07-16 22:08:02 +01:00
Anton Kirilov	6c3d7092b9	Enable the simd_conversions test for AArch64 Copyright (c) 2021, Arm Limited.	2021-07-16 22:04:45 +01:00
Johnnie Birch	2452a4cd74	Refactor lowering structure for ext_mul on x64 and add comments	2021-07-15 01:07:52 -07:00
Johnnie Birch	e5b6bee968	Add emit tests to ext_mul_* instructions	2021-07-15 01:07:52 -07:00
Johnnie Birch	6fbe0b72bd	Add simd_extmul_* support for x64	2021-07-15 01:07:52 -07:00
Johnnie Birch	d8e813204e	Fold fcvt_low_from_uinit into previously existing clif instructions	2021-07-09 10:39:05 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Afonso Bordado	eebae8d4c8	aarch64: Fix incorrect encoding of large const values in icmp. When encoding constants as immediates into an RSE Imm12 instruction we need to take special care to check if the value that we are trying to input does not overflow its type when viewed as a signed value. (i.e. iconst.i8 200) We cannot both put an immediate and sign extend it, so we need to lower it into a separate reg, and emit the sign extend into the instruction. For more details see the [cg_clif bug report](https://github.com/bjorn3/rustc_codegen_cranelift/issues/1184#issuecomment-873214796).	2021-07-03 22:42:15 +01:00
Anton Kirilov	330f02aa09	Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64 Also, reorganize the AArch64-specific VCode instructions for unary narrowing and widening vector operations, so that they are more straightforward to use. Copyright (c) 2021, Arm Limited.	2021-06-30 12:17:53 +01:00
Anton Kirilov	98f1ac789e	Enable the simd_i16x8_q15mulr_sat_s test on AArch64 Copyright (c) 2021, Arm Limited.	2021-06-28 12:24:31 +01:00
Chris Fallin	d42c8692bc	x64 backend: be explicit about unimplemented opcodes. As discussed in #3035, most backends have explicit `unimplemented!(...)` match-arms for opcode lowering cases that are not yet implemented; this allows the backend maintainer to easily see what is not yet implemented, and avoiding a catch-all wildcard arm is less error-prone as opcodes are added in the future. However, the x64 backend was the exception: as @akirilov-arm pointed out, it had a wildcard match arm. This fixes the issue by explicitly listing all opcodes the x64 backend does not yet implement. As per our tests, these opcodes are not used or need by Wasm lowering; but, it is good to know that they exist, so that we can eventually either support or remove them. This was a good exercise for me as I wasn't aware of a few of these in particular: e.g., aarch64 supports `bmask` while x64 does not, and there isn't a good reason why x64 shouldn't, especially if others hope to use Cranelift as a SIMD-capable general codegen in the future. The `unimplemented!()` cases are separate from `panic!()` ones: my convention here was to split out those that are logically just missing from those that should be impossible, mostly due to expected removal by legalization before we reach the lowering step.	2021-06-26 15:29:14 -07:00
Afonso Bordado	e85eb77c45	aarch64: Implement missing atomic rmw ops	2021-06-25 07:51:46 +01:00
Chris Fallin	652f21e3e0	Merge pull request #3026 from afonso360/aarch64-elf-tls aarch64: Implement TLS ELF GD Relocations	2021-06-24 11:54:34 -07:00
Chris Fallin	7d47ba12c5	Merge pull request #3028 from cfallin/x86-legacy cranelift-codegen: move old x86 and RISC-V backends to isa/legacy/.	2021-06-24 11:38:08 -07:00
Chris Fallin	4b2723abb0	cranelift-codegen: move old x86 and RISC-V backends to isa/legacy/. These backends will be removed in the future (see bytecodealliance/rfcs#12 and the pending #3009 in this repo). In the meantime, to more clearly communicate that they are using "legacy" APIs and will eventually be removed, this PR places them in an `isa/legacy/` subdirectory. No functional changes otherwise.	2021-06-24 11:03:47 -07:00
Afonso Bordado	7a5948f729	aarch64: Implement lowering i128 select	2021-06-24 16:19:25 +01:00
Afonso Bordado	b8ad99e435	aarch64: Implement TLS ELF GD Relocations Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD. This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers. However currently we only recognize elf_gd so lets start with that as a TLS implementation.	2021-06-24 12:21:44 +01:00
Chris Fallin	b8c0ac72f1	Merge pull request #3012 from uweigand/s390x-addcarry s390x: Basic support for IaddIfcout	2021-06-22 12:19:24 -07:00
Ulrich Weigand	3c678a7900	s390x: Basic support for IaddIfcout This adds enough support for the IaddIfcout opcode to make the code emitted by dynamic_addr work on s390x. Note: On s390x, the condition code mask that has to be used to implement unsigned_add_overflow_condition does not match any of the masks for the "normal" condition codes, so this design is not really a good match for s390x ...	2021-06-22 20:36:06 +02:00
Chris Fallin	4a6594c514	Merge pull request #3011 from cfallin/bint-x64 Fix `bint` on x64, and make `bextend` consistent with bool representation.	2021-06-22 11:26:20 -07:00
Chris Fallin	efe3930215	Fix `bint` on x64, and make `bextend` consistent with bool representation. There has been occasional confusion with the representation that we use for bool-typed values in registers, at least when these are wider than one bit. Does a `b8` store `true` as 1, or as all-ones (`0xff`)? We've settled on the latter because of some use-cases where the wide bool becomes a mask -- see #2058 for more on this. This is fine, and transparent, to most operations within CLIF, because the bool-typed value still has only two semantically-visible states, namely `true` and `false`. However, we have to be careful with bool-to-int conversions. `bint` on aarch64 correctly masked the all-ones value down to 0 or 1, as required by the instruction specification, but on x64 it did not. This PR fixes that bug and makes x64 consistent with aarch64. While staring at this code I realized that `bextend` was also not consistent with the all-ones invariant: it should do a sign-extend, not a zero-extend as it previously did. This is also rectified and tested. (Aarch64 also already had this case implemented correctly.) Fixes #3003.	2021-06-22 10:56:56 -07:00
Chris Fallin	fa1a04d002	Merge pull request #3005 from afonso360/aarch64-i128-extend aarch64: Implement uextend/sextend for i128 values	2021-06-22 10:24:30 -07:00
Afonso Bordado	f25f5b2732	aarch64: Implement lowering uextend/sextend for i128 values	2021-06-22 12:24:07 +01:00

1 2 3 4 5 ...

698 Commits