wasmtime

Author	SHA1	Message	Date
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Chris Fallin	b0cccf1d87	Merge pull request #1864 from jgouly/bitwise arm64: Implement SIMD bitwise operations	2020-06-11 11:38:39 -07:00
Chris Fallin	6ba165be01	Merge pull request #1858 from cfallin/fix-scale-b1 Bugfix: scaled addressing mode: round B1 up to one byte.	2020-06-11 11:16:07 -07:00
Joey Gouly	544c5dece5	arm64: Implement SIMD bitwise operations Copyright (c) 2020, Arm Limited.	2020-06-11 10:58:23 -07:00
Chris Fallin	ed7e410111	Bugfix: scaled addressing mode: round B1 up to one byte. Issue uncovered by Ben Bouvier during regalloc work.	2020-06-11 10:27:32 -07:00
Chris Fallin	a84c1931a0	Merge pull request #1854 from akirilov-arm/simd_load_splat Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64	2020-06-10 12:11:29 -07:00
Johnnie Birch	48f0b10c7a	Add initial scalar FP operations (addss, subss, etc) to x64 backend. Adds support for addss and subss. This is the first lowering for sse floating point alu and some move operations. The changes here do some renaming of data structures and adds a couple of new ones to support sse specific operations. The work done here will likely evolve as needed to support an efficient, inituative, and consistent framework.	2020-06-10 18:36:57 +02:00
Anton Kirilov	d941034c2e	Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-10 15:01:37 +01:00
Chris Fallin	ac87ed12bd	Merge pull request #1847 from akirilov-arm/simd_load_extend Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64	2020-06-09 12:29:06 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	7ac19af498	Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-09 18:05:38 +01:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Chris Fallin	e3d89c8a92	Merge pull request #1825 from cfallin/spidermonkey-fixes Three fixes to various SpiderMonkey-related issues	2020-06-08 13:54:13 -07:00
Chris Fallin	fc2a6f273b	Three fixes to various SpiderMonkey-related issues: - Properly mask constant values down to appropriate width when generating a constant value directly in aarch64 backend. This was a miscompilation introduced in the new-isel refactor. In combination with failure to respect NarrowValueMode, this resulted in a very subtle bug when an `i32` constant was used in bit-twiddling logic. - Add support for `iadd_ifcout` in aarch64 backend as used in explicit heap-check mode. With this change, we no longer fail heap-related tests with the huge-heap-region mode disabled. - Remove a panic that was occurring in some tests that are currently ignored on aarch64, by simply returning empty/default information in `value_label` functionality rather than touching unimplemented APIs. This is not a bugfix per-se, but removes confusing panic messages from `cargo test` output that might otherwise mislead.	2020-06-08 13:02:00 -07:00
whitequark	bc555468a7	cranelift: add i64.{ishl,ushr,ashr} libcalls. These libcalls are useful for 32-bit platforms. On x86_32 in particular, commit `4ec16fa0` added support for legalizing 64-bit shifts through SIMD operations. However, that legalization requires SIMD to be enabled and SSE 4.1 to be supported, which is not acceptable as a hard requirement.	2020-06-05 12:13:49 -07:00
Yury Delendik	6f37204f82	Upgrade gimli to 0.21 (#1819 ) * Use gimli 0.21 * rm CFI w Expression * Don't write .debug_frame twice	2020-06-04 14:34:05 -05:00
Andrew Brown	1ea09088be	Add x86 legalization for imul.i64x2 for non-AVX CPUs The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.	2020-06-03 16:27:57 -07:00
Andrew Brown	df171f01b5	Add x86_pmuludq This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.	2020-06-03 16:27:57 -07:00
Andrew Brown	40f31375a5	Add TargetIsa::as_any for downcasting to specific ISA implementations This is necessary when we would like to check specific ISA flags, e.g.	2020-06-03 16:27:57 -07:00
Andrew Brown	9ba9fd0f64	Add x86-specific instruction for i64x2 multiplication Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.	2020-06-03 16:27:57 -07:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Chris Fallin	615362068f	Multi-value return support.	2020-06-03 13:31:34 -07:00
Chris Fallin	9fec933056	Merge pull request #1801 from jgouly/cmp-rebase arm64: add support for I8X16 ICmp	2020-06-02 09:35:41 -07:00
Joey Gouly	90a421193f	arm64: add support for I8X16 ICmp Copyright (c) 2020, Arm Limited.	2020-06-02 16:58:09 +01:00
Benjamin Bouvier	67c7a3ed19	mach backend: reduce the size of the Inst enum down to 32 bytes;	2020-06-02 16:29:05 +02:00
Benjamin Bouvier	e227608510	mach backend: use vectors instead of sets to remember set of uses/defs for calls; This avoids the set uniqueness (hashing) test, reduces memory churn when re-mapping virtual register onto real registers, and is generally more memory-efficient.	2020-06-02 16:29:05 +02:00
Benjamin Bouvier	cfa0527794	mach backend: have mem_finalize return a SmallVec; This avoids a spurious reallocation of the SmallVec containing the load_constants result to a Vec, which appeared in dhat profiles.	2020-06-02 16:29:05 +02:00
Nick Fitzgerald	7c68a10ed6	Merge pull request #1670 from teapotd/win64-pass-by-ref Implement passing arguments by ref for win64 ABI	2020-06-01 11:13:30 -07:00
Andrew Brown	0dd77d36f8	Rename BinaryImm format to BinaryImm64	2020-05-29 19:56:27 -07:00
Andrew Brown	a27a079d65	Replace ExtractLane format with BinaryImm8 Like https://github.com/bytecodealliance/wasmtime/pull/1762, this change the name of the `ExtractLane` format to the more-general `BinaryImm8` and renames its immediate argument from `lane` to `imm`.	2020-05-29 19:56:27 -07:00
Andrew Brown	7d6e94b952	Replace InsertLane format with TernaryImm8 The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.	2020-05-29 19:56:27 -07:00
teapotd	e430984ac4	Improve bitselect codegen with knowledge of operand origin (#1783 ) * Encode vselect using BLEND instructions on x86 * Legalize vselect to bitselect * Optimize bitselect to vselect for some operands * Add run tests for bitselect-vselect optimization * Address review feedback	2020-05-29 19:53:11 -07:00
teapotd	759cc3e751	Implement passing arguments by ref for win64 ABI	2020-05-29 20:12:41 +02:00
Nick Fitzgerald	94380bf2b7	Merge pull request #1510 from teapotd/abi-i128-fix Always check if struct-return parameter is needed	2020-05-29 10:02:16 -07:00
teapotd	0f55bb4b8d	Always check if struct-return parameter is needed	2020-05-25 20:03:24 +02:00
Anton Kirilov	8a928830ac	Enable the wast::Cranelift::spec::simd::simd_store test for AArch64 Copyright (c) 2020, Arm Limited.	2020-05-24 22:53:07 +01:00
Chris Fallin	73537e72c0	Merge pull request #1732 from jgouly/copysign-fpu arm64: Use FPU instrctions for Fcopysign	2020-05-22 17:25:33 -07:00
Peter Huene	ce5f3e153b	Only update XMM save unwind operation offsets when using a FP. This commit prevents updating the XMM save unwind operation offsets when a frame pointer is not used, even though currently Cranelift always uses a frame pointer. This will prevent incorrect unwind information in the future when we start omitting frame pointers.	2020-05-21 16:46:30 -07:00
Peter Huene	2cd5ed1880	Address code review feedback.	2020-05-21 15:57:11 -07:00
Joey Gouly	02c3f238f8	arm64: Use FPU instrctions for Fcopysign Copyright (c) 2020, Arm Limited.	2020-05-21 18:14:12 +01:00
Peter Huene	78c3091e84	Fix FPR saving and shadow space allocation for Windows x64. This commit fixes both how FPR callee-saved registers are saved and how the shadow space allocation occurs when laying out the stack for Windows x64 calling convention. Importantly, this commit removes the compiler limitation of stack size for Windows x64 that was imposed because FPR saves previously couldn't always be represented in the unwind information. The FPR saves are now performed without using stack slots, much like how the callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so that it is included in the frame size and to offset the layout of spills and explicit slots. The FPR saves are now done via an RSP offset (post adjustment) and they always follow the GPR saves on the stack. A simpler calculation can now be made to determine the proper offsets of the FPR saves for representing the unwind information. Additionally, the shadow space is no longer treated as an incoming argument, but an explicit stack slot that gets laid out at the lowest address possible in the local frame. This prevents `layout_stack` from putting a spill or explicit slot in this reserved space. In the future, `layout_stack` should take advantage of the caller-provided shadow space for spills, but this commit does not attempt to address that. The shadow space is now omitted from the local frame for leaf functions. Fixes #1728. Fixes #1587. Fixes #1475.	2020-05-20 15:37:30 -07:00
Chris Fallin	c9e3b71c39	Merge pull request #1729 from cfallin/machinst-branch-opt Fix MachBuffer branch optimization.	2020-05-20 14:43:57 -07:00
Benjamin Bouvier	1f620e1b46	cranelift: bump regalloc.rs to 0.0.24 and adapt to latest API changes;	2020-05-20 15:37:15 +02:00
Chris Fallin	e11094b28b	Fix MachBuffer branch optimization. This patch fixes a subtle bug that occurred in the MachBuffer branch optimization: in tracking labels at the current buffer tail using a sorted-by-offset array, the code did not update this array properly when redirecting labels. As a result, the dead-branch removal was unsafe, because not every label pointing to a branch is guaranteed to be redirected properly first. Discovered while doing performance testing: bz2 silently took a wrong branch and exited compression early. (Eek!) To address this problem, this patch adopts a slightly simpler data structure: we only track the labels at the current buffer tail, and at the start of each branch, and we're careful to update these appropriately to maintain the invariants. I'm pretty confident that this is correct now, but we should (still) fuzz it a bunch, because wrong control flow scares me a nonzero amount. I should probably also actually write out a formal proof that these data-structure updates are correct. The optimizations are important for performance (removing useless empty blocks, and taking advantage of any fallthrough opportunities at all), so I don't think we would want to drop them entirely.	2020-05-19 18:09:18 -07:00
Chris Fallin	bdd2873c8c	Address review comments.	2020-05-18 16:25:26 -07:00
Chris Fallin	687aca00fe	Update x64 backend to use new lowering APIs.	2020-05-18 16:25:15 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00
Joey Gouly	f418b7a700	Reduce arm64 Inst enum size This reduces the size of the Inst enum from 112 bytes to 48 bytes. Using DHAT on a regex-rs.wasm benchmark, `valgrind --tool=dhat clif-util compile --target aarch64` The total number of allocated bytes, drops by around 170 MB. At t-gmax drops by 3 MB. Using `perf stat clif-util compile --target aarch64`, the instructions count dropped by 0.6%. Cache misses dropped by 6%. Cycles dropped by 2.3%.	2020-05-14 15:45:55 +01:00
Benjamin Bouvier	07c55fa50f	aarch64: suggest a scratch register that's not caller-saved; If the scratch register is caller-saved, then it might appear in fixed ranges because of call clobbers. Instead, use a register that's not caller-saved and has no predefined use in the ABI.	2020-05-13 10:56:32 +02:00
Chris Fallin	ee2f861fdd	Merge pull request #1674 from cfallin/machinst-reg-universe-opt MachInst backend: don't reallocate RealRegUniverses for each function compilation.	2020-05-09 14:10:26 -07:00

... 9 10 11 12 13 ...

698 Commits