wasmtime

Author	SHA1	Message	Date
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Afonso Bordado	e9727b9d4b	aarch64: Fix i128 `of`/`nof` implementations (#4403 ) @yuyang-ok reported via zulip that i128 overflow tests were: 1. different from the interpreter implementation 2. wrong on some of the test cases This fixes both the tests and the aarch64 implementation and adds the interpreter to the testsuite.	2022-07-07 11:00:58 -07:00
Sam Parker	d9e0e6a6a9	[AArch64] Port min/max to ISLE (#4374 ) Copyright (c) 2022, Arm Limited.	2022-07-05 09:16:45 -07:00
Afonso Bordado	38ecd3744f	aarch64: Implement `bmask`/`bextend` in ISLE (#4358 ) * aarch64: Implement `bmask`/`bextend` in ISLE * cranelift: Remove vector versions of `bextend` * aarch64: Cleanup `bmask`/`bextend` documentation	2022-07-01 09:37:18 -07:00
bjorn3	d1446f767d	Mark return value as define instead of clobber for TLS pseudoinstructions (#4357 )	2022-06-30 10:44:51 -07:00
Afonso Bordado	42d4f97b78	cranelift: Fix `cls` for small types on aarch64 (#4305 ) The previous `cls` code was producing wrong results when fed with a -1 i8. The fix here is to sign extend instead of zero extending since we want to keep the sign bit as one in order for it to be counted correctly in the cls instruction This also merges the interpreter only tests now that aarch64 correctly supports this instruction	2022-06-27 15:55:02 -07:00
Chris Fallin	0d829a57ee	Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. (#4335 ) * Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. * Update RELEASES.md. * Update two compile tests based on slightly shifting regalloc output.	2022-06-27 15:58:54 -05:00
Anton Kirilov	25a588c35f	Cranelift AArch64: Use an allocated encoding for Udf (#4281 ) Preserve the current behaviour when code is generated for SpiderMonkey. Copyright (c) 2022, Arm Limited.	2022-06-22 15:03:28 +01:00
Sam Parker	acfeda4d80	[AArch64] Port IaddPairwise to ISLE (#4201 ) Copyright (c) 2022, Arm Limited.	2022-06-06 15:37:13 +01:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Chris Fallin	32622b3e6f	Cranelift: fix use of pinned reg with SysV calling convention. (#4176 ) Previously, the pinned register (enabled by the `enable_pinned_reg` Cranelift setting and used via the `get_pinned_reg` and `set_pinned_reg` CLIF ops) was only used when Cranelift was embedded in SpiderMonkey, in order to support a pinned heap register. SpiderMonkey has its own calling convention in Cranelift (named after the integration layer, "Baldrdash"). However, the feature is more general, and should be usable with the default system calling convention too, e.g. SysV or Windows Fastcall. This PR fixes the ABI code to properly treat the pinned register as a globally allocated register -- and hence an implicit input and output to every function, not saved/restored in the prologue/epilogue -- for SysV on x86-64 and aarch64, and Fastcall on x86-64. Fixes #4170.	2022-05-23 09:18:51 -07:00
Chris Fallin	02d5edc591	Upgrade to regalloc2 0.1.3. (#4157 ) * Upgrade to regalloc2 0.1.3. This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in bytecodealliance/wasmtime#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue. * Update filetests where regalloc2 improvement altered behavior with reftypes.	2022-05-18 11:48:40 -07:00
Anton Kirilov	edf07a8da6	Cranelift AArch64: Migrate Bitselect and Vselect to ISLE (#4139 ) Copyright (c) 2022, Arm Limited.	2022-05-16 09:39:28 -07:00
Sam Parker	12b4374cd5	[AArch64] Port atomic rmw to ISLE (#4021 ) Also fix and extend the current implementation: - AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be inverted first. - Inputs to the cmp for the RMWLoop case are sign-extended when needed. - Lower Xchg to Swp. - Lower Sub to Add with a negated input. - Added more runtests. Copyright (c) 2022, Arm Limited.	2022-04-27 13:13:59 -07:00
Sam Parker	e142f587a7	[AArch64] Refactor ALUOp3 (#3950 ) As well as adding generic pattern for msub along with runtests for madd and msub. Copyright (c) 2022, Arm Limited.	2022-04-14 12:16:56 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
FreddieLiardet	13b9396931	Add vector compare to 0 optims (#3887 ) Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>	2022-03-09 16:20:06 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Sam Parker	d307a4ab9a	[AArch64] Improve AtomicRMWLoop (#3839 ) Add more tests, use accurate disassembly, respect data sizes and simplify the Xchg implementation. Copyright (c) 2022, Arm Limited	2022-02-23 10:47:59 -08:00
Freddie Liardet	b5531580e7	Improve code generation for floating-point constants Copyright (c) 2022, Arm Limited.	2022-01-18 10:39:05 +00:00
Alex Crichton	1ef0abb12c	Update lots of `isa//.clif` tests to `precise-output` (#3677 ) * Update lots of `isa//.clif` tests to `precise-output` This commit goes through the `aarch64` and `x64` subdirectories and subjectively changes tests from `test compile` to add `precise-output`. This then auto-updates all the test expectations so they can be automatically instead of manually updated in the future. Not all tests were migrated, largely subject to the whims of myself, mainly looking to see if the test was looking for specific instructions or just checking the whole assembly output. * Filter out `;;` comments from test expctations Looks like the cranelift parser picks up all comments, not just those trailing the function, so use a convention where `;;` is used for human-readable-comments in test cases and `;`-prefixed comments are the test expectation.	2022-01-10 13:38:23 -06:00
Alex Crichton	a8ea0ec097	cranelift: Add ability to auto-update test expectations (#3612 ) * cranelift: Add ability to auto-update test expectations One of the problems of the current `.clif` testing is that the files are difficult to update when widespread changes are made (such as removing modification of the frame pointer). Additionally when changing register allocation or similar it can cause a large number of changes in tests but the tests themselves didn't actually break. For this reason this commit adds the ability to automatically update test expectations. The idea behind this commit is that tests of the form `test compile` can also optionally be flagged with the `precise-output` flag: test compile precise-output and when doing so the compiled form of each function is asserted to 100% match the following comments and their test expectations. If a match is not found then a `BLESS=1` environment variable can be used to automatically rewrite the test file itself with the correct assertion. If the environment variable isn't present and the expectation doesn't match then the test fails. It's hoped that, if approved, a follow-up commit can add `precise-output` to all current `test compile` tests (or make it the default) and all tests can be mass-updated. When developing locally test expectations need not be written and instead tests can be run with `BLESS=1` and the output can be manually verified. The environment variable will not be present on CI which means that changes to the output which don't also change the test expectation will cause CI to fail. Furthermore this should still make updates to the test output easily readable in review on CI because the test expectations are intended to look the same as before. Closes #1539 Use raw vcode output in tests * Fix a merge conflict * Review comments	2022-01-10 11:59:45 -06:00
Alex Crichton	72e2b7fe80	aarch64: Migrate bitrev/clz/cls/ctz to ISLE (#3658 ) This commit migrates these existing instructions to ISLE from the manual lowerings implemented today. This was mostly straightforward but while I was at it I fixed what appeared to be broken translations for I{8,16} for `clz`, `cls`, and `ctz`. Previously the lowerings would produce results as-if the input was 32-bits, but now I believe they all correctly account for the bit-width.	2022-01-06 15:18:32 -06:00
Chris Fallin	833ebeed76	Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation. This patch makes spillslot allocation, spilling and reloading all based on register class only. Hence when we have a 32- or 64-bit value in a 128-bit XMM register on x86-64 or vector register on aarch64, this results in larger spillslots and spills/restores. Why make this change, if it results in less efficient stack-frame usage? Simply put, it is safer: there is always a risk when allocating spillslots or spilling/reloading that we get the wrong type and make the spillslot or the store/load too small. This was one contributing factor to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that puns an arbitrary user-controlled vector constant over another stackslot. (If this were a pointer, that could result in RCE. SIMD is not yet on by default in a release, fortunately. In particular, we have not been particularly careful about using moves between values of different types, for example with `raw_bitcast` or with certain SIMD operations, and such moves indicate to regalloc.rs that vregs are in equivalence classes and some arbitrary vreg in the class is provided when allocating the spillslot or spilling/reloading. Since regalloc.rs does not track actual type, and since we haven't been careful about moves, we can't really trust this "arbitrary vreg in equivalence class" to provide accurate type information. In the fix to CVE-2021-32629 we fixed this for integer registers by always spilling/reloading 64 bits; this fix can be seen as the analogous change for FP/vector regs.	2022-01-04 13:24:40 -08:00
Alex Crichton	e94ebc2263	aarch64: Translate rot{r,l} to ISLE (#3614 ) This commit translates the `rotl` and `rotr` lowerings already existing to ISLE. The port was relatively straightforward with the biggest changing being the instructions generated around i128 rotl/rotr primarily due to register changes.	2021-12-17 12:37:17 -06:00
Alex Crichton	d8974ce6bc	aarch64: Migrate ishl/ushr/sshr to ISLE (#3608 ) * aarch64: Migrate ishl/ushr/sshr to ISLE This commit migrates the `ishl`, `ushr`, and `sshr` instructions to ISLE. These involve special cases for almost all types of integers (including vectors) and helper functions for the i128 lowerings since the i128 lowerings look to be used for other instructions as well. This doesn't delete the i128 lowerings in the Rust code just yet because they're still used by Rust lowerings, but they should be deletable in due time once those lowerings are translated to ISLE. * Use more descriptive names for i128 lowerings * Use a with_flags-lookalike for csel * Use existing `with_flags_` Coment backwards order * Update generated code	2021-12-16 17:37:53 -06:00
Alex Crichton	4236319a53	aarch64: Migrate some bit-ops to ISLE (#3602 ) * aarch64: Migrate some bit-ops to ISLE This commit migrates these instructions to ISLE: * `bnot` * `band` * `bor` * `bxor` * `band_not` * `bor_not` * `bxor_not` The translations were relatively straightforward but the interesting part here was trying to reduce the duplication between all these instructions. I opted for a route that's similar to what the lowering does today, having a `decl` which takes the `ALUOp` and then performs further pattern matching internally. This enabled each instruction's lowering to be pretty simple while we still get to handle all the fancy cases of shifts, constants, etc, for each instruction. * Actually delete previous lowerings * Remove dead code	2021-12-15 10:41:36 -06:00
Alex Crichton	d89410ec4e	aarch64: Migrate `uextend`/`sextend` to ISLE This commit migrates the sign/zero extension instructions from `lower_inst.rs` to ISLE. There's actually a fair amount going on in this migration since a few other pieces needed touching up along the way as well: * First is the actual migration of `uextend` and `sextend`. These instructions are relatively simple but end up having a number of special cases. I've attempted to replicate all the cases here but double-checks would be good. * This commit actually fixes a few issues where if the result of a vector extraction is sign/zero-extended into i128 that actually results in panics in the current backend. * This commit adds exhaustive testing for extension-of-a-vector-extraction is a noop wrt extraction. * A bugfix around ISLE glue was required to get this commit working, notably the case where the `RegMapper` implementation was trying to map an input to an output (meaning ISLE was passing through an input unmodified to the output) wasn't working. This requires a `mov` instruction to be generated and this commit updates the glue to do this. At the same time this commit updates the ISLE glue to share more infrastructure between x64 and aarch64 so both backends get this fix instead of just aarch64. Overall I think that the translation to ISLE was a net benefit for these instructions. It's relatively obvious what all the cases are now unlike before where it took a few reads of the code and some boolean switches to figure out which path was taken for each flavor of input. I think there's still possible improvements here where, for example, the `put_in_reg_{s,z}ext64` helper doesn't use this logic so technically those helpers could also pattern match the "well atomic loads and vector extractions automatically do this for us" but that's a possible future improvement for later (and shouldn't be too too hard with some ISLE refactoring).	2021-12-14 07:01:37 -08:00
Alex Crichton	20e090b114	aarch64: Migrate {s,u}{div,rem} to ISLE (#3572 ) * aarch64: Migrate {s,u}{div,rem} to ISLE This commit migrates four different instructions at once to ISLE: * `sdiv` * `udiv` * `srem` * `urem` These all share similar codegen and center around the `div` instruction to use internally. The main feature of these was to model the manual traps since the `div` instruction doesn't trap on overflow, instead requiring manual checks to adhere to the semantics of the instruction itself. While I was here I went ahead and implemented an optimization for these instructions when the right-hand-side is a constant with a known value. For `udiv`, `srem`, and `urem` if the right-hand-side is a nonzero constant then the checks for traps can be skipped entirely. For `sdiv` if the constant is not 0 and not -1 then additionally all checks can be elided. Finally if the right-hand-side of `sdiv` is -1 the zero-check is elided, but it still needs a check for `i64::MIN` on the left-hand-side and currently there's a TODO where `-1` is still checked too. * Rebasing and review conflicts	2021-12-13 17:27:11 -06:00
Alex Crichton	98ce029bbd	Add commutative addition cases	2021-11-19 07:00:04 -08:00
Alex Crichton	7d0f6ab90f	aarch64: Migrate `iadd` and `isub` to ISLE This commit is the first "meaty" instruction added to ISLE for the AArch64 backend. I chose to pick the first two in the current lowering's `match` statement, `isub` and `iadd`. These two turned out to be particularly interesting for a few reasons: * Both had clearly migratable-to-ISLE behavior along the lines of special-casing per type. For example 128-bit and vector arithmetic were both easily translateable. * The `iadd` instruction has special cases for fusing with a multiplication to generate `madd` which is expressed pretty easily in ISLE. * Otherwise both instructions had a number of forms where they attempted to interpret the RHS as various forms of constants, extends, or shifts. There's a bit of a design space of how best to represent this in ISLE and what I settled on was to have a special case for each form of instruction, and the special cases are somewhat duplicated between `iadd` and `isub`. There's custom "extractors" for the special cases and instructions that support these special cases will have an `rule`-per-case. Overall I think the ISLE transitioned pretty well. I don't think that the aarch64 backend is going to follow the x64 backend super closely, though. For example the x64 backend is having a helper-per-instruction at the moment but with AArch64 it seems to make more sense to only have a helper-per-enum-variant-of-`MInst`. This is because the same instruction (e.g. `ALUOp::Sub32`) can be expressed with multiple different forms depending on the payload. It's worth noting that the ISLE looks like it's a good deal larger than the code actually being removed from lowering as part of this commit. I think this is deceptive though because a lot of the logic in `put_input_in_rse_imm12_maybe_negated` and `alu_inst_imm12` is being inlined into the ISLE definitions for each instruction instead of having it all packed into the helper functions. Some of the "boilerplate" here is the addition of various ISLE utilities as well.	2021-11-19 06:51:38 -08:00
bjorn3	3f87b768d5	Update filetests	2021-10-11 17:44:21 +02:00
Sam Parker	80d596b055	AArch64 LSE atomic_rmw support Rename the existing AtomicRMW to AtomicRMWLoop and directly lower atomic_rmw operations, without a loop if LSE support is available. Copyright (c) 2021, Arm Limited	2021-09-15 16:01:51 +01:00
Anton Kirilov	7b98be1bee	Cranelift: Simplify leaf functions that do not use the stack (#2960 ) * Cranelift AArch64: Simplify leaf functions that do not use the stack Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record. Copyright (c) 2021, Arm Limited.	2021-08-27 12:12:37 +02:00
Sam Parker	b6f6ac116a	Revert IR changes Along with the x64 and s390x changes. Now pattern matching the uextend(atomic_load) in the aarch64 backend.	2021-08-05 09:35:32 +01:00
Sam Parker	cbb7229457	Re-implement atomic load and stores The AArch64 support was a bit broken and was using Armv7 style barriers, which aren't required with Armv8 acquire-release load/stores. The fallback CAS loops and RMW, for AArch64, have also been updated to use acquire-release, exclusive, instructions which, again, remove the need for barriers. The CAS loop has also been further optimised by using the extending form of the cmp instruction. Copyright (c) 2021, Arm Limited.	2021-08-05 09:08:08 +01:00
Sam Parker	3bc2f0c701	Enable simd_X_extadd_pairwise_X for AArch64 Lower to [u\|s]addlp for AArch64. Copyright (c) 2021, Arm Limited.	2021-08-03 10:25:09 +01:00
Alex Crichton	63a3bbbf5a	Change VMMemoryDefinition::current_length to `usize` (#3134 ) * Change VMMemoryDefinition::current_length to `usize` This commit changes the definition of `VMMemoryDefinition::current_length` to `usize` from its previous definition of `u32`. This is a pretty impactful change because it also changes the cranelift semantics of "dynamic" heaps where the bound global value specifier must now match the pointer type for the platform rather than the index type for the heap. The motivation for this change is that the `current_length` field (or bound for the heap) is intended to reflect the current size of the heap. This is bound by `usize` on the host platform rather than `u32` or` u64`. The previous choice of `u32` couldn't represent a 4GB memory because we couldn't put a number representing 4GB into the `current_length` field. By using `usize`, which reflects the host's memory allocation, this should better reflect the size of the heap and allows Wasmtime to support a full 4GB heap for a wasm program (instead of 4GB minus one page). This commit also updates the legalization of the `heap_addr` clif instruction to appropriately cast the address to the platform's pointer type, handling bounds checks along the way. The practical impact for today's targets is that a `uextend` is happening sooner than it happened before, but otherwise there is no intended impact of this change. In the future when 64-bit memories are supported there will likely need to be fancier logic which handles offsets a bit differently (especially in the case of a 64-bit memory on a 32-bit host). The clif `filetest` changes should show the differences in codegen, and the Wasmtime changes are largely removing casts here and there. Closes #3022 * Add tests for memory.size at maximum memory size * Add a dfg helper method	2021-08-02 13:09:40 -05:00
Sam Parker	541a4ee428	Enable simd_extmul_* for AArch64 Lower simd_extmul_[low/high][signed/unsigned] to [s\|u]widen inputs to an imul node. Copyright (c) 2021, Arm Limited.	2021-07-28 13:14:20 +01:00
Afonso Bordado	db5566dadb	aarch64: Fix lowering amounts for shifts This commit addresses two issues: * A panic when shifting any non i128 type by i128 amounts (#3064) * Wrong results when lowering shifts with small types (i8, i16) In these types when shifting for amounts larger than the size of the type, we would not get the wrapping behaviour that we see on i32 and i64. This is because in these larger types, the wrapping behaviour is automatically implemented by using the appropriate instruction, however we do not have i8 and i16 specific instructions, so we have to manually wrap the shift amount with an AND instruction. This issue is also found on x86_64 and s390x, and a separate issue will be filed for those. Closes #3064	2021-07-16 22:08:02 +01:00
Chris Fallin	652f21e3e0	Merge pull request #3026 from afonso360/aarch64-elf-tls aarch64: Implement TLS ELF GD Relocations	2021-06-24 11:54:34 -07:00
Afonso Bordado	7a5948f729	aarch64: Implement lowering i128 select	2021-06-24 16:19:25 +01:00
Afonso Bordado	b8ad99e435	aarch64: Implement TLS ELF GD Relocations Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD. This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers. However currently we only recognize elf_gd so lets start with that as a TLS implementation.	2021-06-24 12:21:44 +01:00
Chris Fallin	fa1a04d002	Merge pull request #3005 from afonso360/aarch64-i128-extend aarch64: Implement uextend/sextend for i128 values	2021-06-22 10:24:30 -07:00
Afonso Bordado	f25f5b2732	aarch64: Implement lowering uextend/sextend for i128 values	2021-06-22 12:24:07 +01:00
Chris Fallin	18cd2f681c	Merge pull request #3002 from afonso360/aarch64-i128-br aarch64 implement brz,brnz,br_icmp for i128 values	2021-06-21 10:52:50 -07:00
Chris Fallin	444d9f9726	Merge pull request #3008 from afonso360/aarch64-i128-ireduce aarch64: Implement ireduce for i128 values	2021-06-21 09:54:43 -07:00
Afonso Bordado	151ad2f338	aarch64: Implement ireduce for i128 values	2021-06-20 19:04:45 +01:00
Afonso Bordado	f7f52445c8	aarch64: Implement lowering rotl/rotr for i128 values	2021-06-20 15:53:56 +01:00

1 2 3

141 Commits