wasmtime

Author	SHA1	Message	Date
Jakub Krauz	bab3c73100	Put arm32 backend behind experimental_arm32 flag	2020-09-22 12:53:14 +02:00
Jakub Krauz	f6a140a662	arm32 codegen This commit adds arm32 code generation for some IR insts. Floating-point instructions are not supported, because regalloc does not allow to represent overlapping register classes, which are needed by VFP/Neon. There is also no support for big-endianness, I64 and I128 types.	2020-09-22 12:49:42 +02:00
Nick Fitzgerald	010e5b9aa8	Make filetest errors report full context (#2207 ) * clif-util: do not convert `anyhow::Error`s into strings into `anyhow::Error`s * filetests: Use the debug formatting of `anyhow::Error`s This provides the full error context, not just the source error's message.	2020-09-18 13:33:38 -05:00
Joshua Nelson	d28abad441	Upgrade to target-lexicon 0.11 This allows downstream library users to use `CDataModel` without having to install two different versions of target-lexicon.	2020-09-15 11:40:09 -07:00
Nick Fitzgerald	31cbbd1d20	clif-util: Use `anyhow::Error` for errors instead of `String` Also does the same for `cranelift-filetests`.	2020-09-14 18:29:00 -07:00
Anton Kirilov	f612e8e7b2	AArch64: Add various missing SIMD bits In addition, improve the code for stack pointer manipulation. Copyright (c) 2020, Arm Limited.	2020-09-09 13:37:50 +01:00
Chris Fallin	e8f772c1ac	x64 new backend: port ABI implementation to shared infrastructure with AArch64. Previously, in #2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.	2020-09-08 17:59:01 -07:00
Joey Gouly	650d48cd84	arm64: Don't always materialise a 64-bit constant This improves the mov/movk/movn sequnce when the high half of the 64-bit value is all zero. Copyright (c) 2020, Arm Limited.	2020-09-01 13:29:01 +01:00
Chris Fallin	051feaad75	Merge pull request #2148 from bjorn3/aarch64_fix_put_input_in_rsa Fix put_input_in_reg	2020-08-20 11:41:35 -07:00
Chris Fallin	775dfa9df2	Merge pull request #1520 from bjorn3/aarch64-lower-small-fcvt_from_int Lower fcvt_from_{u,s}int for 8 and 16 bit ints	2020-08-20 11:35:06 -07:00
bjorn3	957eb9eeba	Less unnecessary zero and sign extensions	2020-08-20 10:17:04 +02:00
bjorn3	ba48b9aef1	Fix put_input_in_reg	2020-08-19 19:38:47 +02:00
bjorn3	3a16416132	Add tests	2020-08-19 19:17:27 +02:00
Chris Fallin	5fa0be3515	AArch64 ABI: properly store full 64-bit width of extended args/retvals. When storing an argument to a stack location for consumption by a callee, or storing a return value to an on-stack return slot for consumption by the caller, the ABI implementation was properly extending the value but was then performing a store with only the original width. This fixes the issue by always performing a 64-bit store of the extended value. Issue reported by @uweigand (thanks!).	2020-08-17 15:00:04 -07:00
Nick Fitzgerald	05bf9ea3f3	Rename "Stackmap" to "StackMap" And "stackmap" to "stack_map". This commit is purely mechanical.	2020-08-07 10:08:44 -07:00
Anton Kirilov	1ec6930005	Enable the spec::simd::simd_lane test for AArch64 Copyright (c) 2020, Arm Limited.	2020-08-06 11:14:15 +01:00
Chris Fallin	1fbdf169b5	Aarch64: fix narrow integer-register extension with Baldrdash ABI. In the Baldrdash (SpiderMonkey) embedding, we must take care to zero-extend all function arguments to callees in integer registers when the types are narrower than 64 bits. This is because, unlike the native SysV ABI, the Baldrdash ABI expects high bits to be cleared. Not doing so leads to difficult-to-trace errors where high bits falsely tag an int32 as e.g. an object pointer, leading to potential security issues.	2020-07-31 10:19:13 -07:00
Chris Fallin	8fd92093a4	Merge pull request #2061 from cfallin/aarch64-amode Aarch64 codegen quality: support more general add+extend address computations.	2020-07-27 13:48:55 -07:00
Chris Fallin	f9b98f0ddc	Aarch64 codegen quality: support more general add+extend computations. Previously, our pattern-matching for generating load/store addresses was somewhat limited. For example, it could not use a register-extend address mode to handle the following CLIF: ``` v2760 = uextend.i64 v985 v2761 = load.i64 notrap aligned readonly v1 v1018 = iadd v2761, v2760 store v1017, v1018 ``` This PR adds more general support for address expressions made up of additions and extensions. In particular, it pattern-matches a tree of 64-bit `iadd`s, optionally with `uextend`/`sextend` from 32-bit values at the leaves, to collect the list of all addends that form the address. It also collects all offsets at leaves, combining them. It applies a series of heuristics to make the best use of the available addressing modes, filling the load/store itself with as many 64-bit registers, zero/sign-extended 32-bit registers, and/or an offset, then computing the rest with add instructions as necessary. It attempts to make use of immediate forms (add-immediate or subtract-immediate) whenever possible, and also uses the built-in extend operators on add instructions when possible. There are certainly cases where this is not optimal (i.e., does not generate the strictly shortest sequence of instructions), but it should be good enough for most code. Using `perf stat` to measure instruction count (runtime only, on wasmtime, after populating the cache to avoid measuring compilation), this impacts `bz2` as follows: ``` pre: 1006.410425 task-clock (msec) # 1.000 CPUs utilized 113 context-switches # 0.112 K/sec 1 cpu-migrations # 0.001 K/sec 5,036 page-faults # 0.005 M/sec 3,221,547,476 cycles # 3.201 GHz 4,000,670,104 instructions # 1.24 insn per cycle <not supported> branches 27,958,613 branch-misses 1.006071348 seconds time elapsed post: 963.499525 task-clock (msec) # 0.997 CPUs utilized 117 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,081 page-faults # 0.005 M/sec 3,039,687,673 cycles # 3.155 GHz 3,837,761,690 instructions # 1.26 insn per cycle <not supported> branches 28,254,585 branch-misses 0.966072682 seconds time elapsed ``` In other words, this reduces instruction count by 4.1% on `bz2`.	2020-07-27 13:10:50 -07:00
Chris Fallin	1b80860f1f	Aarch64 codegen quality: handle add-negative-imm as subtract. We often see patterns like: ``` mov w2, #0xffff_ffff // uses ORR with logical immediate form add w0, w1, w2 ``` which is just `w0 := w1 - 1`. It would be much better to recognize when the inverse of an immediate will fit in a 12-bit immediate field if the immediate itself does not, and flip add to subtract (and vice versa), so we can instead generate: ``` sub w0, w1, #1 ``` We see this pattern in e.g. `bz2`, where this commit makes the following difference (counting instructions with `perf stat`, filling in the wasmtime cache first then running again to get just runtime): pre: ``` 992.762250 task-clock (msec) # 0.998 CPUs utilized 109 context-switches # 0.110 K/sec 0 cpu-migrations # 0.000 K/sec 5,035 page-faults # 0.005 M/sec 3,224,119,134 cycles # 3.248 GHz 4,000,521,171 instructions # 1.24 insn per cycle <not supported> branches 27,573,755 branch-misses 0.995072322 seconds time elapsed ``` post: ``` 993.853850 task-clock (msec) # 0.998 CPUs utilized 123 context-switches # 0.124 K/sec 1 cpu-migrations # 0.001 K/sec 5,072 page-faults # 0.005 M/sec 3,201,278,337 cycles # 3.221 GHz 3,917,061,340 instructions # 1.22 insn per cycle <not supported> branches 28,410,633 branch-misses 0.996008047 seconds time elapsed ``` In other words, a 2.1% reduction in instruction count on `bz2`.	2020-07-24 11:41:33 -07:00
Chris Fallin	1b3b2dbfd0	Merge pull request #2043 from cfallin/csel-opt Aarch64: handle csel with icmp/fcmp source without materializing the bool.	2020-07-18 19:33:47 -07:00
Chris Fallin	21dac670f0	Aarch64: handle csel with icmp/fcmp source without materializing the bool. Previously, we simply compared the input bool to 0, which forced the value into a register (usually via a cmp and cset), zero-extended it, etc. This patch performs the same pattern-matching that branches do to directly perform the cmp and use its flag results with the csel. On the `bz2` benchmark, the runtime is affected as follows (measuring with `perf stat`, using wasmtime with its cache enabled, and taking the second run after the first compiles and populates the cache): pre: 1117.232000 task-clock (msec) # 1.000 CPUs utilized 133 context-switches # 0.119 K/sec 1 cpu-migrations # 0.001 K/sec 5,041 page-faults # 0.005 M/sec 3,511,615,100 cycles # 3.143 GHz 4,272,427,772 instructions # 1.22 insn per cycle <not supported> branches 27,980,906 branch-misses 1.117299838 seconds time elapsed post: 1003.738075 task-clock (msec) # 1.000 CPUs utilized 121 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,052 page-faults # 0.005 M/sec 3,224,875,393 cycles # 3.213 GHz 4,000,838,686 instructions # 1.24 insn per cycle <not supported> branches 27,928,232 branch-misses 1.003440004 seconds time elapsed In other words, with this change, on `bz2`, we see a 6.3% reduction in executed instructions.	2020-07-17 21:10:21 -07:00
Chris Fallin	9bd9c628aa	Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts. We had previously fixed a bug in which constant shift amounts should be masked to modulo the number of bits in the operand; however, we did not fix the analogous case for shifts incorporated into the second register argument of ALU instructions that support integrated shifts. This failure to mask resulted in illegal instructions being generated, e.g. in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes the issue by masking the amount, as the shift semantics require.	2020-07-17 14:55:23 -07:00
bjorn3	5c5a30f76c	Fix review comments	2020-07-17 12:03:17 +02:00
bjorn3	7b7b1f4997	Rename sarg__ to sarg_t	2020-07-17 12:03:17 +02:00
bjorn3	4431ac1108	Implement SystemV struct argument passing	2020-07-17 12:03:17 +02:00
Chris Fallin	5e0268a542	Merge pull request #2034 from cfallin/update-regalloc Update to regalloc.rs 0.0.28.	2020-07-16 11:36:11 -07:00
Alex Crichton	63d5b91930	Wasmtime 0.19.0 and Cranelift 0.66.0 (#2027 ) This commit updates Wasmtime's version to 0.19.0, Cranelift's version to 0.66.0, and updates the release notes as well.	2020-07-16 12:46:21 -05:00
Chris Fallin	756e8b8ea2	Update to regalloc.rs 0.0.28. This version of regalloc.rs includes several bugfixes for reference-types support used by the new backend framework and the aarch64 backend (bytecodealliance/regalloc.rs#85 and bytecodealliance/regalloc.rs#86).	2020-07-16 09:42:09 -07:00
Andrew Brown	f0b083c6ad	Legalize `[u\|s]widen_high` for x86 Use `x86_palignr` and `[u\|s]widen_low` for legalizing this instruction.	2020-07-15 11:32:08 -07:00
Andrew Brown	c8ddf8a34c	Encode `[u\|s]widen_low` for x86	2020-07-15 11:32:08 -07:00
Andrew Brown	fafef7db77	Add `x86_palignr` instructions This instruction is necessary for implementing `[s\|u]widen_high`.	2020-07-15 11:32:08 -07:00
Chris Fallin	26529006e0	Address review comments.	2020-07-14 10:17:29 -07:00
Chris Fallin	08353fcc14	Reftypes part two: add support for stackmaps. This commit adds support for generating stackmaps at safepoints to the new backend framework and to the AArch64 backend in particular. It has been tested to work with SpiderMonkey.	2020-07-14 10:17:27 -07:00
Chris Fallin	b93e8c296d	Initial reftype support in aarch64, modulo safepoints. This commit adds the inital support to allow reftypes to flow through the program when targetting aarch64. It also adds a fix to the `ModuleTranslationState` needed to send R32/R64 types over from the SpiderMonkey embedding. This commit does not include any support for safepoints in aarch64 or the `MachInst` infrastructure; that is in the next commit. This commit also makes a drive-by improvement to `Bint`, avoiding an unneeded zero-extension op when the extended value comes directly from a conditional-set (which produces a full-width 0 or 1).	2020-07-14 10:14:18 -07:00
Alex Crichton	85ffc8f595	Switch CI back to nightly channel (#2014 ) * Switch CI back to nightly channel I think all upstream issues are now fixed so we should be good to switch back to nightly from our previously pinned version. * Fix doc warnings	2020-07-13 18:40:47 -05:00
Andrew Brown	c5a69cee9f	Add x86 legalization for fcvt_to_uint_sat.i32x4 This converts an `f32x4` into an `i32x4` (unsigned) with rounding by using a long sequence of SSE4.1 compatible instructions.	2020-07-08 10:20:01 -07:00
Peter Huene	d6ae72abe6	Merge pull request #1983 from peterhuene/fix-unwind-info Remove 'set frame pointer' unwind code from Windows x64 unwind.	2020-07-06 22:26:41 -07:00
Peter Huene	3a33749404	Remove 'set frame pointer' unwind code from Windows x64 unwind. This commit removes the "set frame pointer" unwind code and frame pointer information from Windows x64 unwind information. In Windows x64 unwind information, a "frame pointer" is actually the base address of the static part of the local frame and would be at some negative offset to RSP upon establishing the frame pointer. Currently Cranelift uses a "traditional" notion of a frame pointer, one that is the highest address in the local frame (i.e. pointing at the previous frame pointer on the stack). Windows x64 unwind doesn't describe such frame pointers and only needs one described if the frame contains a dynamic stack allocation. Fixes #1967.	2020-07-06 14:22:57 -07:00
Chris Fallin	b700646c93	Merge pull request #1962 from cfallin/aarch64-lowering-condbr AArch64: avoid branches with explicit offsets at lowering stage.	2020-07-02 14:05:40 -07:00
Chris Fallin	b7ecad1d74	AArch64: avoid branches with explicit offsets at lowering stage. In discussions with @bnjbvr, it came up that generating `OneWayCondBr`s with explicit, hardcoded PC-offsets as part of lowered instruction sequences is actually unsafe, because the register allocator might insert a spill or reload into the middle of our sequence. We were careful about this in some cases but somehow missed that it was a general restriction. Conceptually, all inter-instruction references should be via labels at the VCode level; explicit offsets are only ever known at emission time, and resolved by the `MachBuffer`. To allow for conditional trap checks without modifying the CFG (as seen by regalloc) during lowering, this PR instead adds a `TrapIf` pseudo-instruction that conditionally skips a single embedded trap instruction. It lowers to the same `condbr label ; trap ; label: ...` sequence, but without the hardcoded branch-target offset in the lowering code.	2020-07-02 11:02:27 -07:00
Andrew Brown	057c93b64e	Add `unarrow` instruction with x86 implementation Adds a shared `unarrow` instruction in order to lower the Wasm SIMD specification's unsigned narrowing (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing). Additionally, this commit implements the instruction for x86 using PACKUSWB and PACKUSDW for the applicable encodings.	2020-07-02 09:35:45 -07:00
Andrew Brown	65e6de2344	Replace `x86_packss` with `snarrow` Since the Wasm specification contains narrowing instructions (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing) that lower to PACKSS*, the x86-specific instruction is not necessary in the CLIF IR.	2020-07-02 09:35:45 -07:00
Chris Fallin	0a59a321bd	Merge pull request #1954 from cfallin/b1649432 AArch64: fix shift ops: mask shift amount.	2020-07-01 09:33:29 -07:00
Chris Fallin	a351fa52b5	Merge pull request #1930 from cfallin/spectre-heap Spectre mitigation on heap access overflow checks.	2020-07-01 09:23:04 -07:00
Chris Fallin	533f1c8d8b	Aarch64: fix shift ops: mask shift amount. The failure to mask the amount triggered a panic due to a subtraction overflow check; see https://bugzilla.mozilla.org/show_bug.cgi?id=1649432. Attempting to shift by an out-of-range amount should be defined to shift by an amount mod the operand size (i.e., masked to 5 bits for 32-bit shifts, or 6 bits for 64-bit shifts).	2020-07-01 08:57:56 -07:00
Chris Fallin	e694fb1312	Spectre mitigation on heap access overflow checks. This PR adds a conditional move following a heap bounds check through which the address to be accessed flows. This conditional move ensures that even if the branch is mispredicted (access is actually out of bounds, but speculation goes down in-bounds path), the acually accessed address is zero (a NULL pointer) rather than the out-of-bounds address. The mitigation is controlled by a flag that is off by default, but can be set by the embedding. Note that in order to turn it on by default, we would need to add conditional-move support to the current x86 backend; this does not appear to be present. Once the deprecated backend is removed in favor of the new backend, IMHO we should turn this flag on by default. Note that the mitigation is unneccessary when we use the "huge heap" technique on 64-bit systems, in which we allocate a range of virtual address space such that no 32-bit offset can reach other data. Hence, this only affects small-heap configurations.	2020-07-01 08:36:09 -07:00
Andrew Brown	737cf1d605	Implement `iabs` for x86 SIMD This only covers the types necessary for implementing the Wasm SIMD spec--`i8x16`, `i16x8`, `i32x4`.	2020-06-30 14:00:17 -07:00
Alex Crichton	0acd2072c2	Fix doc warnings and link failures (#1948 ) Also add configuration to CI to fail doc generation if any links are broken. Unfortunately we can't blanket deny all warnings in rustdoc since some are unconditional warnings, but for now this is hopefully good enough. Closes #1947	2020-06-30 13:01:49 -05:00
Andrew Brown	c9d573d841	Provide spec-compliant legalization for SIMD floating point min/max	2020-06-25 14:48:16 -07:00

... 3 4 5 6 7 ...

958 Commits