wasmtime

Author	SHA1	Message	Date
Nick Fitzgerald	c77bec4dcb	Cranelift: don't `emit` inside lowering rules for aarch64 (#4572 ) * Cranelift: Don't `emit` inside lowering rules in aarch64 The lowering rules should be "pure" and side-effect free, using helpers defined in `inst.isle` to perform actual side effects like emitting instructions. * Cranelift: use 80 width for section separators in aarch64 lowering rules	2022-08-01 16:43:42 -07:00
Trevor Elliott	586ec95c11	ISLE: Allow shadowing in let expressions (#4562 ) * Support shadowing in isle * Re-run the isle build.rs if the examples change * Print error messages when isle tests fail * Move run tests * Refactor `let` uses that don't need to introduce unique names	2022-08-01 21:10:28 +00:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Anton Kirilov	a47a82d2e5	Cranelift AArch64: Harden the Spectre mitigations (#4555 ) Use the `CSDB` instruction following Arm's recommendation. Copyright (c) 2022, Arm Limited.	2022-08-01 10:20:48 -07:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Afonso Bordado	1f058a02c0	cranelift: Add MinGW `fma` regression tests (#4517 ) * cranelift: Add MinGW `fma` regression tests * cranelift: Fix FMA in interpreter * cranelift: Add separate `fma` test suite for the interpreter The interpreter can run `fma.clif` on most platforms, however on `x86_64-pc-windows-gnu` we use libm which has issues with some inputs. We should delete `fma-interpreter.clif` and enable the interpreter on the main `fma.clif` file once those are fixed.	2022-07-29 09:09:37 -05:00
Damian Heaton	5e3bb588a8	Port `Fence`, `IsNull`/`IsInvalid` & `Debugtrap` to ISLE (AArch64) (#4548 ) Ported the existing implementation of the following Opcodes for AArch64 to ISLE: - `Fence` - `IsNull` - `IsInvalid` - `Debugtrap` Copyright (c) 2022 Arm Limited	2022-07-28 15:36:13 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Jamey Sharp	799e8919fe	Don't allocate in DataFlowGraph::block_param_types (#4538 ) DHAT reports that when compiling the Spidermonkey Sightglass benchmark, there are over 100k of these Vec allocations, averaging less than 4 bytes, and with an average lifetime of only about 500 instructions. This function is only called from one place, which immediately converts it into an iterator. So this commit just returns the iterator that was previously being collected into a Vec. The iterator has to borrow from the DataFlowGraph, so this would change borrow-check results, but in the one caller that turns out to be okay. (That sole caller is in cranelift/codegen/src/machinst/lower.rs, in Lower::lower().) According to Sightglass, this is a compile-time improvement of between 2% and 12% on the Spidermonkey benchmark: instantiation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 14628.76 ± 10318.59 (confidence = 99%) main-0e6ffd024.so is 0.87x to 0.98x faster than no-small-vecs.so! no-small-vecs.so is 1.02x to 1.14x faster than main-0e6ffd024.so! [142023 187464.24 301522] main-0e6ffd024.so [103742 172835.48 263917] no-small-vecs.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 362392705.93 ± 267070467.06 (confidence = 99%) main-0e6ffd024.so is 0.89x to 0.98x faster than no-small-vecs.so! no-small-vecs.so is 1.02x to 1.12x faster than main-0e6ffd024.so! [3655734131 5522594697.83 6471126699] main-0e6ffd024.so [3278129811 5160201991.90 5810600015] no-small-vecs.so	2022-07-27 01:59:18 +00:00
Anton Kirilov	ead6edb0c5	Cranelift AArch64: Migrate Splat to ISLE (#4521 ) Copyright (c) 2022, Arm Limited.	2022-07-26 17:57:15 +00:00
Anton Kirilov	d041c4b376	Cranelift AArch64: Further integral constant fixes (#4530 ) Copyright (c) 2022, Arm Limited.	2022-07-26 09:35:06 -07:00
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Trevor Elliott	9e9e043174	x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518 ) https://github.com/bytecodealliance/wasmtime/pull/4518	2022-07-25 21:28:52 +00:00
Damian Heaton	3ef89b7787	Allow 64-bit vectors and implement for interpreter (#4509 ) * Allow 64-bit vectors and implement for interpreter The AArch64 backend already supports 64-bit vectors; this simply allows instructions to make use of that. Implemented support for 64-bit vectors within the interpreter to allow interpret runtests to use them. Copyright (c) 2022 Arm Limited * Disable 64-bit SIMD `iaddpairwise` tests on s390x Copyright (c) 2022 Arm Limited	2022-07-25 13:00:43 -07:00
Sam Parker	c5ddb4b803	[AArch64] Port SIMD narrowing to ISLE (#4478 ) * [AArch64] Port SIMD narrowing to ISLE Fvdemote, snarrow, unarrow and uunarrow. Also refactor the aarch64 instructions descriptions to parameterize on ScalarSize instead of using different opcodes. The zero_value pure constructor has been introduced and used by the integer narrow operations and it replaces, and extends, the compare zero patterns. Copright (c) 2022, Arm Limited. * use short 'if' patterns	2022-07-25 12:40:36 -07:00
Ulrich Weigand	dd40bf075a	s390x: Enable more runtests, and fix a few bugs (#4516 ) This enables more runtests to be executed on s390x. Doing so uncovered a two back-end bugs, which are fixed as well: - The result of cls was always off by one. - The result of popcnt.i16 has uninitialized high bits. In addition, I found a bug in the load-op-store.clif test case: v3 = heap_addr.i64 heap0, v1, 4 v4 = iconst.i64 42 store.i32 v4, v3 This was clearly intended to perform a 32-bit store, but actually performs a 64-bit store (it seems the type annotation of the store opcode is ignored, and the type of the operand is used instead). That bug did not show any noticable symptoms on little-endian architectures, but broke on big-endian.	2022-07-25 12:37:06 -07:00
Trevor Elliott	ee7e4f4c6b	x64: Port func_addr and symbol_value to ISLE (#4485 ) https://github.com/bytecodealliance/wasmtime/pull/4485	2022-07-25 11:11:16 -07:00
Afonso Bordado	af62037f62	cranelift: Restrict `br_table` to `i32` indices (#4510 ) * cranelift: Restrict `br_table` to `i32` indices In #4498 it was proposed that we should only accept `i32` indices to `br_table`. The rationale for this is that larger types lead the users to a false sense of flexibility (since we don't support jump tables larger than u32's), and narrower types are not well tested paths that would be safer if we removed them. * cranelift: Reduce directly from i128 to i32 in Switch	2022-07-22 23:32:40 +00:00
Damian Heaton	f1a0c40a53	Convert `sqrt`..`nearest` to ISLE (AArch64) (#4508 ) Converted the existing implementations for the following opcodes to ISLE on AArch64: - `sqrt` - `fneg` - `fabs` - `fpromote` - `fdemote` - `ceil` - `floor` - `trunc` - `nearest` Copyright (c) 2022 Arm Limited	2022-07-22 14:48:07 -07:00
Nick Fitzgerald	b24c561ceb	cranelift: Don't log CLIF and assembly at debug level (#4503 ) Too verbose. Only log them at trace level.	2022-07-21 15:31:05 -07:00
Ulrich Weigand	fd639dd044	s390x: Support preserve_frame_pointers flag (#4477 ) On s390x, we do not have a frame pointer that can be used to chain stack frames for easy unwinding. Instead, our ABI defines a stack "backchain" mechanism that can be used to the same effect. This PR uses that backchain mechanism to implement the new preserve_frame_pointers flags introduced here: https://github.com/bytecodealliance/wasmtime/pull/4469	2022-07-21 10:09:16 -07:00
Benjamin Bouvier	4ce329d1eb	Add a cranelift flag to enable/disable verbose logs for regalloc2 (#4481 )	2022-07-21 09:12:13 +00:00
Trevor Elliott	06407dd337	Add a test to prevent x64 Inst size slipping further (#4489 ) * Add a test to prevent x64 Inst size slipping further * Enable the test for 64-bit pointers only	2022-07-21 00:01:33 +00:00
Alex Crichton	2127c3a369	Fix CI for `main` (#4486 ) * Skip new `table_ops` test under emulation When emulating we already have to disable most pooling-allocator related tests so this commit carries over that logic to the new fuzz test which may run some configurations with the pooling allocator depending on the random input. * Fix panics in s390x codegen related to aliases This commit fixes an issue introduced as part of the fix for GHSA-5fhj-g3p3-pq9g. The `reftyped_vregs` list given to `regalloc2` is not allowed to have duplicates in it and while the list originally doesn't have duplicates once aliases are applied the list may have duplicates. The fix here is to perform another pass to remove duplicates after the aliases have been processed.	2022-07-20 21:39:59 +00:00
Anton Kirilov	2ba4bce5cc	Merge pull request from GHSA-7f6x-jwh5-m9r4 Copyright (c) 2022, Arm Limited.	2022-07-20 11:53:56 -05:00
Alex Crichton	2154c63de9	Merge pull request from GHSA-5fhj-g3p3-pq9g * Improve cranelift disassembly of stack maps Print out extra information about stack maps such as their contents and other related metadata available. Additionally also print out addresses in hex to line up with the disassembly otherwise printed as well. * Improve the `table_ops` fuzzer * Generate more instructions by default * Fix negative indices appearing in `table.{get,set}` * Assert that the traps generated are expected to prevent accidental other errors reporting a fuzzing success. * Fix `reftype_vregs` reported to `regalloc2` This fixes a mistake in the register allocation of Cranelift functions where functions using reference-typed arguments incorrectly report which virtual registers are reference-typed values if there are vreg aliases in play. The fix here is to apply the vreg aliases to the final list of reftyped regs which is eventually passed to `regalloc2`. The main consequence of this fix is that functions which previously accidentally didn't have correct stack maps should now have the missing stack maps. * Add a test that `table_ops` gc's eventually * Add a comment about new alias resolution * Update crates/fuzzing/src/oracles.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add some comments Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-07-20 11:52:23 -05:00
Nick Fitzgerald	22d91a7c84	cranelift: Add a flag for preserving frame pointers (#4469 ) Preserving frame pointers -- even inside leaf functions -- makes it easy to capture the stack of a running program, without requiring any side tables or metadata (like `.eh_frame` sections). Many sampling profilers and similar tools walk frame pointers to capture stacks. Enabling this option will play nice with those tools.	2022-07-20 08:02:21 -07:00
Jeffrey Charles	d55eb64b9e	Enable generating debug symbols on AArch64 (#4468 )	2022-07-19 19:12:07 +00:00
Damian Heaton	00ac18c866	Convert `fadd`..`fmax_pseudo` to ISLE (AArch64) (#4452 ) Converted the existing implementations for the following Opcodes to ISLE on AArch64: - `fadd` - `fsub` - `fmul` - `fdiv` - `fmin` - `fmax` - `fmin_pseudo` - `fmax_pseudo` Copyright (c) 2022 Arm Limited	2022-07-19 12:03:05 -07:00
Ulrich Weigand	b18c9bee15	s390x: Small refactoring of ABI code (#4465 ) Remove a bit of duplicated code and eliminate some unnecessary allocations.	2022-07-19 11:59:30 -07:00
Trevor Elliott	b519c975cb	x64: Port fdemote and fvdemote to ISLE (#4449 ) https://github.com/bytecodealliance/wasmtime/pull/4449	2022-07-18 14:26:23 -07:00
Ulrich Weigand	638dc4e0b3	s390x: Implement full SIMD support (#4427 ) This adds full support for all Cranelift SIMD instructions to the s390x target. Everything is matched fully via ISLE. In addition to adding support for many new instructions, and the lower.isle code to match all SIMD IR patterns, this patch also adds ABI support for vector types. In particular, we now need to handle the fact that vector registers 8 .. 15 are partially callee-saved, i.e. the high parts of those registers (which correspond to the old floating-poing registers) are callee-saved, but the low parts are not. This is the exact same situation that we already have on AArch64, and so this patch uses the same solution (the is_included_in_clobbers callback). The bulk of the changes are platform-specific, but there are a few exceptions: - Added ISLE extractors for the Immediate and Constant types, to enable matching the vconst and swizzle instructions. - Added a missing accessor for call_conv to ABISig. - Fixed endian conversion for vector types in data_value.rs to enable their use in runtests on the big-endian platforms. - Enabled (nearly) all SIMD runtests on s390x. [ Two test cases remain disabled due to vector shift count semantics, see below. ] - Enabled all Wasmtime SIMD tests on s390x. There are three minor issues, called out via FIXMEs below, which should be addressed in the future, but should not be blockers to getting this patch merged. I've opened the following issues to track them: - Vector shift count semantics https://github.com/bytecodealliance/wasmtime/issues/4424 - is_included_in_clobbers vs. link register https://github.com/bytecodealliance/wasmtime/issues/4425 - gen_constant callback https://github.com/bytecodealliance/wasmtime/issues/4426 All tests, including all newly enabled SIMD tests, pass on both z14 and z15 architectures.	2022-07-18 14:00:48 -07:00
Sam Parker	e5678e8f8d	[AArch64] Cleanup dynamic lowering (#4432 ) Copyright (c) 2022, Arm Limited.	2022-07-18 11:13:16 -07:00
Damian Heaton	d792646677	Implement `iabs` in ISLE (AArch64) (#4399 ) * Implement `iabs` in ISLE (AArch64) Converts the existing implementation of `iabs` for AArch64 into ISLE, and fixes support for `iabs` on scalar values. Copyright (c) 2022 Arm Limited. * Improve scalar `iabs` implementation. Also introduces `CSNeg` instruction. Copyright (c) 2022 Arm Limited	2022-07-18 11:12:34 -07:00
Damian Heaton	db7f9ccd2b	Convert `scalar_to_vector` to ISLE (AArch64) (#4401 ) * Convert `scalar_to_vector` to ISLE (AArch64) Converted the exisiting implementation of `scalar_to_vector` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited * Add support for floats and fix FpuExtend - Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for `scalar_to_vector` - Added tests for `scalar_to_vector` on floats. - Corrected an invalid instruction emitted by `FpuExtend` on 64-bit values. Copyright (c) 2022 Arm Limited	2022-07-18 11:11:54 -07:00
Afonso Bordado	eca0a73453	cranelift: Use requested ISA Flags in run tests (#4450 )	2022-07-15 12:09:07 -07:00
Afonso Bordado	80976b6fc7	cranelift: Add `fadd`/`fsub`/`fmul`/`fdiv` to interpreter (#4446 ) Fuzzgen found these as soon as I added float support	2022-07-14 21:53:03 +00:00
Alex Crichton	18d9685eb3	Fix pretty print of `atomic_rmw` clif ops (#4441 ) A comma was accidentally placed after the operation itself when no comma is necessary (only after operands, not the operation itself).	2022-07-13 10:51:07 -05:00
Damian Heaton	6c70428735	Convert `isplit` / `iconcat` to ISLE (AArch64) (#4402 ) Converted the existing implementations for `isplit` and `iconcat` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited	2022-07-08 17:12:42 -07:00
Afonso Bordado	16cb287c53	cranelift: Use `round_ties_even` for `nearest` in interpreter (#4413 ) As @MaxGraey pointed out (thanks!) in #4397, `round` has different behavior from `nearest`. And it looks like the native rust implementation is still pending stabilization. Right now we duplicate the wasmtime implementation, merged in #2171. However, we definitely should switch to the rust native version when it is available.	2022-07-07 16:36:43 -07:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Afonso Bordado	e9727b9d4b	aarch64: Fix i128 `of`/`nof` implementations (#4403 ) @yuyang-ok reported via zulip that i128 overflow tests were: 1. different from the interpreter implementation 2. wrong on some of the test cases This fixes both the tests and the aarch64 implementation and adds the interpreter to the testsuite.	2022-07-07 11:00:58 -07:00
Damian Heaton	6a5fe20956	Convert `swizzle` to ISLE (AArch64) (#4400 ) Converted the implementation of `swizzle` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited	2022-07-07 10:29:33 -07:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Afonso Bordado	f98076ae88	cranelift: Implement float rounding operations (#4397 ) Implements the following operations on the interpreter: * `ceil` * `floor` * `nearest` * `trunc`	2022-07-06 16:43:54 -07:00
Sam Parker	e86e628061	ValueDataPacked getter and setter for Type (#4390 ) Speeds compilation of the spidermonkey benchmark by ~0.5%. Copyright (c) 2022, Arm Limited.	2022-07-06 09:40:03 -07:00
Chris Fallin	00f357c028	Cranelift: support 14-bit Type index with some bitpacking. (#4269 ) * Cranelift: make `ir::Type` a `u16`. * Cranelift: pack ValueData back into 64 bits. After extending `Type` to a `u16`, `ValueData` became 12 bytes rather than 8. This packs it back down to 8 bytes (64 bits) by stealing two bits from the `Type` for the enum discriminant (leaving 14 bits for the type itself). Performance comparison (3-way between original (`ty-u8`), 16-bit `Type` (`ty-u16`), and this PR (`ty-packed`)): ``` ~/work/sightglass% target/release/sightglass-cli benchmark \ -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \ --iterations-per-process 10 --processes 2 \ benchmarks-next/spidermonkey/benchmark.wasm compilation benchmarks-next/spidermonkey/benchmark.wasm cycles [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so nanoseconds [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so ``` So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0% cost (5.683s -> 5.723s). That's still not great, and we can likely do better, but it's a start. * Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.	2022-07-05 14:51:02 -07:00
Afonso Bordado	925891245d	cranelift: Fix `fmin`/`fmax` when dealing with zeroes (#4373 ) `fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0. This is not how the IEEE754 views these values and the interpreter was returning the wrong value in these operations since it was just using the standard IEEE754 comparisons. This also tries to preserve NaN information by avoiding passing NaN's through any operation that could canonicalize it.	2022-07-05 12:59:23 -07:00

... 8 9 10 11 12 ...

1825 Commits