wasmtime

Author	SHA1	Message	Date
Chris Fallin	b8f6d53a6b	Aarch64 codegen: represent bool `true` as -1, not 1. It seems that this is actually the correct behavior for bool types wider than `b1`; some of the vector instruction optimizations depend on bool lanes representing false and true as all-zeroes and all-ones respectively. For `b8`..`b64`, this results in an extra negation after a `cset` when a bool is produced by an `icmp`/`fcmp`, but the most common case (`b1`) is unaffected, because an all-ones one-bit value is just `1`. An example of this assumption can be seen here: `399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)` Thanks to Joey Gouly of ARM for noting this issue while implementing SIMD support, and digging into the source (finding the above example) to determine the correct behavior.	2020-07-22 12:30:55 -07:00
Chris Fallin	96ef2f1a1b	Fix `u8::MAX` -> `std::u8::MAX`. (#2047 ) As per Carlo Kok on Zulip #cranelift, this breaks builds with stable Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to support older versions if we can easily do so. This PR also adds `cranelift-tools` to the crates checked on CI with Rust 1.41.0, which pulls in all backends (including `aarch64`).	2020-07-20 14:59:15 -05:00
Chris Fallin	9bd9c628aa	Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts. We had previously fixed a bug in which constant shift amounts should be masked to modulo the number of bits in the operand; however, we did not fix the analogous case for shifts incorporated into the second register argument of ALU instructions that support integrated shifts. This failure to mask resulted in illegal instructions being generated, e.g. in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes the issue by masking the amount, as the shift semantics require.	2020-07-17 14:55:23 -07:00
Benjamin Bouvier	ead8a835c4	machinst x64: add more FP support	2020-07-17 15:56:44 +02:00
Anton Kirilov	95b0b05af2	AArch64: Introduce an enum to specify vector instruction operand sizes Copyright (c) 2020, Arm Limited.	2020-07-14 21:37:44 +01:00
Chris Fallin	b93e8c296d	Initial reftype support in aarch64, modulo safepoints. This commit adds the inital support to allow reftypes to flow through the program when targetting aarch64. It also adds a fix to the `ModuleTranslationState` needed to send R32/R64 types over from the SpiderMonkey embedding. This commit does not include any support for safepoints in aarch64 or the `MachInst` infrastructure; that is in the next commit. This commit also makes a drive-by improvement to `Bint`, avoiding an unneeded zero-extension op when the extended value comes directly from a conditional-set (which produces a full-width 0 or 1).	2020-07-14 10:14:18 -07:00
Benjamin Bouvier	9d1bcfb2e8	machinst x64: implement cmov	2020-07-03 14:33:52 +02:00
Chris Fallin	533f1c8d8b	Aarch64: fix shift ops: mask shift amount. The failure to mask the amount triggered a panic due to a subtraction overflow check; see https://bugzilla.mozilla.org/show_bug.cgi?id=1649432. Attempting to shift by an out-of-range amount should be defined to shift by an amount mod the operand size (i.e., masked to 5 bits for 32-bit shifts, or 6 bits for 64-bit shifts).	2020-07-01 08:57:56 -07:00
Chris Fallin	492000e945	MachInst isel and aarch64 backend: docs / clarity improvements. From discussion with Julian and Ben, this PR makes a few documentation- and naming-level changes (no functionality change): - Document that the `LowerCtx`-provided output register can be used as a scratch register during the lowered instruction sequence before placing the final result in it. - Rename `input_to_` helpers in the AArch64 backend to `put_input_in_`, emphasizing that these are side-effecting helpers that potentially generate code (e.g., sign/zero-extensions) to ensure an input value is in a register.	2020-06-18 12:18:50 -07:00
Anton Kirilov	90bafae1dc	AArch64: Implement SIMD floating-point comparisons Copyright (c) 2020, Arm Limited.	2020-06-18 11:07:52 +01:00
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Chris Fallin	fc2a6f273b	Three fixes to various SpiderMonkey-related issues: - Properly mask constant values down to appropriate width when generating a constant value directly in aarch64 backend. This was a miscompilation introduced in the new-isel refactor. In combination with failure to respect NarrowValueMode, this resulted in a very subtle bug when an `i32` constant was used in bit-twiddling logic. - Add support for `iadd_ifcout` in aarch64 backend as used in explicit heap-check mode. With this change, we no longer fail heap-related tests with the huge-heap-region mode disabled. - Remove a panic that was occurring in some tests that are currently ignored on aarch64, by simply returning empty/default information in `value_label` functionality rather than touching unimplemented APIs. This is not a bugfix per-se, but removes confusing panic messages from `cargo test` output that might otherwise mislead.	2020-06-08 13:02:00 -07:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Joey Gouly	90a421193f	arm64: add support for I8X16 ICmp Copyright (c) 2020, Arm Limited.	2020-06-02 16:58:09 +01:00
Anton Kirilov	8a928830ac	Enable the wast::Cranelift::spec::simd::simd_store test for AArch64 Copyright (c) 2020, Arm Limited.	2020-05-24 22:53:07 +01:00
Chris Fallin	bdd2873c8c	Address review comments.	2020-05-18 16:25:26 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00
Benjamin Bouvier	4c066b1c73	codegen: split lower.rs into multiple files; This splits off lower.rs into two files: lower.rs keeps all the utility functions, while lower_inst.rs contains the (gigantic!) function lowering a single Cranelift instruction into vcode. This is done to satisfy a check done on the maximal file's size when vendoring Rust source code into Mozilla central's repository.	2020-04-30 13:50:45 +02:00
Gabor Greif	d9d69299bb	A few typofixes (#1623 ) * a few typofixes * more tyops	2020-04-28 19:18:05 -05:00
Benjamin Bouvier	19b5b0cc7b	aarch64: pass a lowering context to gen_copy_reg_to_arg;	2020-04-24 17:41:14 +02:00
Benjamin Bouvier	0b13d8c848	aarch64: copy SP whenever it's involved in an address lowering with an explicit add;	2020-04-24 17:41:14 +02:00
Joey Gouly	f020f0812e	arm64: Implement checks in division / remainder This implements the divide by 0 and signed overflow checks that Wasm specifies. Copyright (c) 2020, Arm Limited.	2020-04-24 17:40:19 +02:00
Benjamin Bouvier	b6e6998713	aarch64: mask rotation counts and share code generation of left and right rotations; Given an integer size N, a left rotation of K places is the same as a right rotation of N - K places. This means we can use right rotations to implement left rotations too. The Cranelift's rotation semantics are inherited from WebAssembly, which mean the rotation count is truncated modulo the operand's bit size. Note the ROR aarch64 instruction has the same semantics, when both input operands are registers.	2020-04-24 12:36:59 +02:00
Benjamin Bouvier	de92b7e014	aarch64: implement correct float-to-int conversion semantics; These are inherited from wasm semantics.	2020-04-24 11:51:35 +02:00
Chris Fallin	8f462db645	Fix ImmLogic.invert(), and with it, `fcopysign` and `float_misc` test. Previously, `fcopysign` was mysteriously failing to pass the `float_misc` spec test. This was tracked down to bad logical-immediate masks used to separate the sign and not-sign bits. In particular, the masks for the and-not operations were wrong. The `invert()` function on an `ImmLogic` immediate, it turns out, assumed every immediate would be used by a 64-bit instruction; `ImmLogic` immediates are subtly different for 32-bit instructions. This change tracks the instruction size (32 or 64 bits) intended for use with each such immediate, and passes it back into `maybe_from_u64` when computing the inverted immediate. Addresses several of the failures (`float_misc`, `f32_bitwise`) for #1521 (test failures) and presumably helps #1519 (SpiderMonkey integration).	2020-04-23 13:06:13 -07:00
Andrew Brown	fb6e8f784d	Add x86 pack instructions	2020-04-23 10:55:54 -07:00
Andrew Brown	f5fc09f64a	Add x86 unpack instructions	2020-04-23 10:55:54 -07:00
Chris Fallin	d88098744b	Merge pull request #1527 from cfallin/aarch64-fp-vcode-test Add vcode test for floating-point, and fix two FP bugs.	2020-04-21 09:35:23 -07:00
Chris Fallin	297d64b2c0	Merge pull request #1530 from bnjbvr/bbouvier-arm64-fixes Pending arm64 fixes for Spidermonkey integration	2020-04-21 08:08:09 -07:00
Joey Gouly	ad9be0d445	arm64: Support bool constants Copyright (c) 2020, Arm Limited.	2020-04-21 12:24:57 +02:00
Benjamin Bouvier	5b8b75def0	Baldrdash: implement support for sign-extension in returns;	2020-04-21 12:12:56 +02:00
Benjamin Bouvier	241c164e25	Implement pinned register usage through set_pinned_reg/get_pinned_reg;	2020-04-21 12:12:56 +02:00
Chris Fallin	c67fdca60c	Change from review comments: fix comment. Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>	2020-04-20 11:19:13 -07:00
bjorn3	1bee1af755	Implement stack_addr for AArch64	2020-04-18 13:24:06 +02:00
Chris Fallin	5e53482a13	arm64: Support less-than-64-bit integers in Bitrev, Clz, Cls, and Popcnt instructions. Includes a temporary bugfix for popcnt with 32-bit operand. The popcnt issue was initially identified by Benjamin Bouvier <public@benj.me>, and the root cause was debugged by Joey Gouly <joey.gouly@arm.com>. This patch is simply a quick fix that zero-extends the operand to 64 bits; Joey plans to contribute a more permanent fix shortly (tracked in #1537).	2020-04-17 16:42:46 -07:00
Andrew Brown	3159f0a76e	Add unimplemented! arm for ConstAddr in Aarch64 (#1548 )	2020-04-17 15:03:28 -05:00
Chris Fallin	2b68abed6a	Add vcode test for floating-point, and fix two FP bugs. - Added a filetest for the vcode output of lowering every handled FP opcode. - Fixed two bugs that were discovered while going through the lowerings: - Saturating FP->int operators would return `u{32,64}::MIN` rather than `0` for a NaN input. - `fcopysign` did not mask off the sign bit of the value whose sign is overwritten. These probably would have been caught by Wasm conformance tests soon (and the validity of these lowerings will ultimately be tested this way) but let's get them right by inspection, too!	2020-04-16 13:43:52 -07:00
Chris Fallin	48cf2c2f50	Address review comments: - Undo temporary changes to default features (`all-arch`) and a signal-handler test. - Remove `SIGTRAP` handler: no longer needed now that we've found an "undefined opcode" option on ARM64. - Rename pp.rs to pretty_print.rs in machinst/. - Only use empty stack-probe on non-x86. As per a comment in rust-lang/compiler-builtins [1], LLVM only supports stack probes on x86 and x86-64. Thus, on any other CPU architecture, we cannot refer to `__rust_probestack`, because it does not exist. - Rename arm64 to aarch64. - Use `target` directive in vcode filetests. - Run the flags verifier, but without encinfo, when using new backends. - Clean up warning overrides. - Fix up use of casts: use u32::from(x) and siblings when possible, u32::try_from(x).unwrap() when not, to avoid silent truncation. - Take immutable `Function` borrows as input; we don't actually mutate the input IR. - Lots of other miscellaneous cleanups. [1] `cae3e6ea23/src/probestack.rs (L39)`	2020-04-15 17:21:28 -07:00

40 Commits