wasmtime

Author	SHA1	Message	Date
Julian Seward	dd9bfcefaa	CL/aarch64: implement the wasm SIMD `v128.load{32,64}_zero` instructions. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions https://github.com/WebAssembly/simd/pull/237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (https://github.com/bytecodealliance/wasmtime/issues/2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-04 20:00:04 +01:00
Andrew Brown	6d50099816	Rewrite interpreter generically (#2323 ) * Rewrite interpreter generically This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions). * Replace macros with closures	2020-11-02 12:28:07 -08:00
Chris Fallin	d1be8dcfc0	Merge pull request #2310 from akirilov-arm/vector_constants Cranelift AArch64: Improve code generation for vector constants	2020-11-01 21:56:40 -08:00
bjorn3	23aafa1054	Fix icmp_imm.i128 The immediate splitting code contained a bug causing both low and high to be equal for i128. This is the root cause for bjorn3/rustc_codegen_cranelift#1097 and likely the only bug preventing cg_clif from bootstrapping rustc.	2020-10-31 21:11:50 +01:00
Anton Kirilov	207779fe1d	Cranelift AArch64: Improve code generation for vector constants In particular, introduce initial support for the MOVI and MVNI instructions, with 8-bit elements. Also, treat vector constants as 32- or 64-bit floating-point numbers, if their value allows it, by relying on the architectural zero extension. Finally, stop generating literal loads for 32-bit constants. Copyright (c) 2020, Arm Limited.	2020-10-30 13:16:12 +00:00
Johnnie Birch	fa66daea25	Add filetests for fcvt_from_sint.f32x4 Add portions of filetests simd-conversion-legalize.clif and simd-conversion-run.clif that test fcvt_from_sint.f32x4	2020-10-28 13:02:50 -07:00
Yury Delendik	de4af90af6	machinst x64: New backend unwind (#2266 ) Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.	2020-10-23 15:19:41 -05:00
Yury Delendik	b10e027fef	Refactor UnwindInfo codes and frame_register (#2307 ) * Refactor UnwindInfo codes and frame_register * use isa word_size * fix filetests * Add comment about UnwindCode::PushRegister	2020-10-22 14:52:42 -05:00
Andrew Brown	0ba35171fb	[machinst x64]: port more CLIF filetests	2020-10-09 10:04:50 -07:00
Benjamin Bouvier	e8c2a1763a	machinst x64: avoid emitting movzx when the input is an ALU 32-bits operation;	2020-10-09 18:49:27 +02:00
Benjamin Bouvier	3980a43cda	machinst x64: use the (base,offset) addressing mode even in the presence of a uextend;	2020-10-09 18:49:27 +02:00
Andrew Brown	c8cce5d2d7	[machinst x64]: enable packed saturated arithmetic	2020-10-08 08:46:20 -07:00
Benjamin Bouvier	e32e6fb612	machinst x64: check SSE requirements for instructions against enabled features;	2020-10-08 09:21:51 +02:00
Andrew Brown	3778fa025c	Switch DataValue to use Ieee32/Ieee64 As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.	2020-10-07 12:17:17 -07:00
Andrew Brown	6f6f79ef2b	refactor: move DataValue from cranelift-reader to cranelift-codegen This is no change to functionality; the move is necessary in order to return InstructionData immediates in a structure way (see next commit).	2020-10-07 12:17:17 -07:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Johnnie Birch	5799fd3cc0	Add file test simd-arithmetic-run to x64 backend Copies over simd-arithmetic-run from the old backend, adding several run tests including for min/max. Tests not supported are commented out.	2020-10-02 16:20:10 -07:00
Chris Fallin	b2f52910fb	Merge pull request #2224 from jgouly/sp_adjust arm64: Use SignedOffset rather than PreIndexed addressing mode for ca…	2020-10-02 09:18:00 -07:00
Andrew Brown	16a2538ecd	[machinst x64]: rename Inst::XmmUninitializedValue and document This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.	2020-10-02 08:29:31 -07:00
Andrew Brown	3d9f3bf728	[machinst x64]: port CLIF tests related to comparison and lane operations	2020-10-02 08:29:31 -07:00
Joey Gouly	eec60c9b06	arm64: Use SignedOffset rather than PreIndexed addressing mode for callee-saved registers This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`) into `gen_clobber_save` so that it can be combined with other stack adjustments. Copyright (c) 2020, Arm Limited.	2020-10-02 16:22:55 +01:00
Anton Kirilov	d18de69e5a	AArch64: Add test cases for callee-saved SIMD & FP registers Copyright (c) 2020, Arm Limited.	2020-09-30 14:19:02 +01:00
Andrew Brown	b43f4a464a	refactor: move all 'filetests/vcode' tests to 'filetests/isa'	2020-09-29 09:27:39 -07:00
Andrew Brown	452d854855	[machinst x64]: demonstrate that packed register moves are elided	2020-09-29 08:48:37 -07:00
Andrew Brown	b7217d454f	[machinst x64]: add lane-related CLIF filetests	2020-09-29 08:45:12 -07:00
Pat Hickey	b10beeee01	dep gardening (#2233 ) * wasmtime-profiling: latest object dep is 0.21.1 * latest gimli is 0.22 * bump cargo.lock	2020-09-26 00:49:28 -05:00
Benjamin Bouvier	e2c286deeb	machinst x64: enable clif testing This adds a new feature experimental_x64 for CLIF tests. A test is run in the new x64 backend iff: - either the test doesn't have an x86_64 target requirement, signaling it must be target agnostic or not run on this target. - or the test does require the x86_64 target, and the test is marked with the `experimental_x64` feature. This required one workaround in the parser. The reason is that the parser will try to use information not provided by the TargetIsa adapter for the Mach backends, like register names. In particular, parsing test may fail before the test runner realizes that the test must not be run. In this case, we early return an almost-empty TestFile from the parser, under the same conditions as above, so that the caller may filter out the test properly. This also copies two tests from the test suite using the new backend, for demonstration purposes.	2020-09-25 11:12:21 +02:00
Alex Crichton	5e08eb3b83	Bump wasmtime to 0.20.0 (#2222 ) At the same time bump cranelift crates to 0.67.0	2020-09-23 13:54:02 -05:00
Jakub Krauz	bab3c73100	Put arm32 backend behind experimental_arm32 flag	2020-09-22 12:53:14 +02:00
Jakub Krauz	f6a140a662	arm32 codegen This commit adds arm32 code generation for some IR insts. Floating-point instructions are not supported, because regalloc does not allow to represent overlapping register classes, which are needed by VFP/Neon. There is also no support for big-endianness, I64 and I128 types.	2020-09-22 12:49:42 +02:00
Nick Fitzgerald	010e5b9aa8	Make filetest errors report full context (#2207 ) * clif-util: do not convert `anyhow::Error`s into strings into `anyhow::Error`s * filetests: Use the debug formatting of `anyhow::Error`s This provides the full error context, not just the source error's message.	2020-09-18 13:33:38 -05:00
Joshua Nelson	d28abad441	Upgrade to target-lexicon 0.11 This allows downstream library users to use `CDataModel` without having to install two different versions of target-lexicon.	2020-09-15 11:40:09 -07:00
Nick Fitzgerald	31cbbd1d20	clif-util: Use `anyhow::Error` for errors instead of `String` Also does the same for `cranelift-filetests`.	2020-09-14 18:29:00 -07:00
Anton Kirilov	f612e8e7b2	AArch64: Add various missing SIMD bits In addition, improve the code for stack pointer manipulation. Copyright (c) 2020, Arm Limited.	2020-09-09 13:37:50 +01:00
Chris Fallin	e8f772c1ac	x64 new backend: port ABI implementation to shared infrastructure with AArch64. Previously, in #2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.	2020-09-08 17:59:01 -07:00
Joey Gouly	650d48cd84	arm64: Don't always materialise a 64-bit constant This improves the mov/movk/movn sequnce when the high half of the 64-bit value is all zero. Copyright (c) 2020, Arm Limited.	2020-09-01 13:29:01 +01:00
Chris Fallin	051feaad75	Merge pull request #2148 from bjorn3/aarch64_fix_put_input_in_rsa Fix put_input_in_reg	2020-08-20 11:41:35 -07:00
Chris Fallin	775dfa9df2	Merge pull request #1520 from bjorn3/aarch64-lower-small-fcvt_from_int Lower fcvt_from_{u,s}int for 8 and 16 bit ints	2020-08-20 11:35:06 -07:00
bjorn3	957eb9eeba	Less unnecessary zero and sign extensions	2020-08-20 10:17:04 +02:00
bjorn3	ba48b9aef1	Fix put_input_in_reg	2020-08-19 19:38:47 +02:00
bjorn3	3a16416132	Add tests	2020-08-19 19:17:27 +02:00
Chris Fallin	5fa0be3515	AArch64 ABI: properly store full 64-bit width of extended args/retvals. When storing an argument to a stack location for consumption by a callee, or storing a return value to an on-stack return slot for consumption by the caller, the ABI implementation was properly extending the value but was then performing a store with only the original width. This fixes the issue by always performing a 64-bit store of the extended value. Issue reported by @uweigand (thanks!).	2020-08-17 15:00:04 -07:00
Nick Fitzgerald	05bf9ea3f3	Rename "Stackmap" to "StackMap" And "stackmap" to "stack_map". This commit is purely mechanical.	2020-08-07 10:08:44 -07:00
Anton Kirilov	1ec6930005	Enable the spec::simd::simd_lane test for AArch64 Copyright (c) 2020, Arm Limited.	2020-08-06 11:14:15 +01:00
Chris Fallin	1fbdf169b5	Aarch64: fix narrow integer-register extension with Baldrdash ABI. In the Baldrdash (SpiderMonkey) embedding, we must take care to zero-extend all function arguments to callees in integer registers when the types are narrower than 64 bits. This is because, unlike the native SysV ABI, the Baldrdash ABI expects high bits to be cleared. Not doing so leads to difficult-to-trace errors where high bits falsely tag an int32 as e.g. an object pointer, leading to potential security issues.	2020-07-31 10:19:13 -07:00
Chris Fallin	8fd92093a4	Merge pull request #2061 from cfallin/aarch64-amode Aarch64 codegen quality: support more general add+extend address computations.	2020-07-27 13:48:55 -07:00
Chris Fallin	f9b98f0ddc	Aarch64 codegen quality: support more general add+extend computations. Previously, our pattern-matching for generating load/store addresses was somewhat limited. For example, it could not use a register-extend address mode to handle the following CLIF: ``` v2760 = uextend.i64 v985 v2761 = load.i64 notrap aligned readonly v1 v1018 = iadd v2761, v2760 store v1017, v1018 ``` This PR adds more general support for address expressions made up of additions and extensions. In particular, it pattern-matches a tree of 64-bit `iadd`s, optionally with `uextend`/`sextend` from 32-bit values at the leaves, to collect the list of all addends that form the address. It also collects all offsets at leaves, combining them. It applies a series of heuristics to make the best use of the available addressing modes, filling the load/store itself with as many 64-bit registers, zero/sign-extended 32-bit registers, and/or an offset, then computing the rest with add instructions as necessary. It attempts to make use of immediate forms (add-immediate or subtract-immediate) whenever possible, and also uses the built-in extend operators on add instructions when possible. There are certainly cases where this is not optimal (i.e., does not generate the strictly shortest sequence of instructions), but it should be good enough for most code. Using `perf stat` to measure instruction count (runtime only, on wasmtime, after populating the cache to avoid measuring compilation), this impacts `bz2` as follows: ``` pre: 1006.410425 task-clock (msec) # 1.000 CPUs utilized 113 context-switches # 0.112 K/sec 1 cpu-migrations # 0.001 K/sec 5,036 page-faults # 0.005 M/sec 3,221,547,476 cycles # 3.201 GHz 4,000,670,104 instructions # 1.24 insn per cycle <not supported> branches 27,958,613 branch-misses 1.006071348 seconds time elapsed post: 963.499525 task-clock (msec) # 0.997 CPUs utilized 117 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,081 page-faults # 0.005 M/sec 3,039,687,673 cycles # 3.155 GHz 3,837,761,690 instructions # 1.26 insn per cycle <not supported> branches 28,254,585 branch-misses 0.966072682 seconds time elapsed ``` In other words, this reduces instruction count by 4.1% on `bz2`.	2020-07-27 13:10:50 -07:00
Chris Fallin	1b80860f1f	Aarch64 codegen quality: handle add-negative-imm as subtract. We often see patterns like: ``` mov w2, #0xffff_ffff // uses ORR with logical immediate form add w0, w1, w2 ``` which is just `w0 := w1 - 1`. It would be much better to recognize when the inverse of an immediate will fit in a 12-bit immediate field if the immediate itself does not, and flip add to subtract (and vice versa), so we can instead generate: ``` sub w0, w1, #1 ``` We see this pattern in e.g. `bz2`, where this commit makes the following difference (counting instructions with `perf stat`, filling in the wasmtime cache first then running again to get just runtime): pre: ``` 992.762250 task-clock (msec) # 0.998 CPUs utilized 109 context-switches # 0.110 K/sec 0 cpu-migrations # 0.000 K/sec 5,035 page-faults # 0.005 M/sec 3,224,119,134 cycles # 3.248 GHz 4,000,521,171 instructions # 1.24 insn per cycle <not supported> branches 27,573,755 branch-misses 0.995072322 seconds time elapsed ``` post: ``` 993.853850 task-clock (msec) # 0.998 CPUs utilized 123 context-switches # 0.124 K/sec 1 cpu-migrations # 0.001 K/sec 5,072 page-faults # 0.005 M/sec 3,201,278,337 cycles # 3.221 GHz 3,917,061,340 instructions # 1.22 insn per cycle <not supported> branches 28,410,633 branch-misses 0.996008047 seconds time elapsed ``` In other words, a 2.1% reduction in instruction count on `bz2`.	2020-07-24 11:41:33 -07:00
Chris Fallin	1b3b2dbfd0	Merge pull request #2043 from cfallin/csel-opt Aarch64: handle csel with icmp/fcmp source without materializing the bool.	2020-07-18 19:33:47 -07:00
Chris Fallin	21dac670f0	Aarch64: handle csel with icmp/fcmp source without materializing the bool. Previously, we simply compared the input bool to 0, which forced the value into a register (usually via a cmp and cset), zero-extended it, etc. This patch performs the same pattern-matching that branches do to directly perform the cmp and use its flag results with the csel. On the `bz2` benchmark, the runtime is affected as follows (measuring with `perf stat`, using wasmtime with its cache enabled, and taking the second run after the first compiles and populates the cache): pre: 1117.232000 task-clock (msec) # 1.000 CPUs utilized 133 context-switches # 0.119 K/sec 1 cpu-migrations # 0.001 K/sec 5,041 page-faults # 0.005 M/sec 3,511,615,100 cycles # 3.143 GHz 4,272,427,772 instructions # 1.22 insn per cycle <not supported> branches 27,980,906 branch-misses 1.117299838 seconds time elapsed post: 1003.738075 task-clock (msec) # 1.000 CPUs utilized 121 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,052 page-faults # 0.005 M/sec 3,224,875,393 cycles # 3.213 GHz 4,000,838,686 instructions # 1.24 insn per cycle <not supported> branches 27,928,232 branch-misses 1.003440004 seconds time elapsed In other words, with this change, on `bz2`, we see a 6.3% reduction in executed instructions.	2020-07-17 21:10:21 -07:00

... 3 4 5 6 7 ...

986 Commits