wasmtime

Author	SHA1	Message	Date
Benjamin Bouvier	ead8a835c4	machinst x64: add more FP support	2020-07-17 15:56:44 +02:00
Chris Fallin	26529006e0	Address review comments.	2020-07-14 10:17:29 -07:00
Chris Fallin	08353fcc14	Reftypes part two: add support for stackmaps. This commit adds support for generating stackmaps at safepoints to the new backend framework and to the AArch64 backend in particular. It has been tested to work with SpiderMonkey.	2020-07-14 10:17:27 -07:00
Chris Fallin	492000e945	MachInst isel and aarch64 backend: docs / clarity improvements. From discussion with Julian and Ben, this PR makes a few documentation- and naming-level changes (no functionality change): - Document that the `LowerCtx`-provided output register can be used as a scratch register during the lowered instruction sequence before placing the final result in it. - Rename `input_to_` helpers in the AArch64 backend to `put_input_in_`, emphasizing that these are side-effecting helpers that potentially generate code (e.g., sign/zero-extensions) to ensure an input value is in a register.	2020-06-18 12:18:50 -07:00
Chris Fallin	02ae1b4464	Merge pull request #1846 from julian-seward1/better-phis Rewrite `lower_edge` to produce better phi-translations:	2020-06-09 09:56:52 -07:00
Julian Seward	6d25759c8e	Rewrite `lower_edge` to produce better phi-translations: * ensure that all const assignments are placed at the end of the sequence. This minimises live ranges. * for the non-const assignments, ignore self-assignments. This can dramatically reduce the total number of moves generated, because any self-assignments trigger the overlap-case handling, hence invoking the double-copy behaviour in cases where it's not necessary. It's worth pointing out that self-assignments are common, and are not due to deficiencies in CLIR optimisation. Rather, they occur whenever a loop back edge doesn't modify all loop-carried values. This can easily happen if the loop has multiple "early" back-edges -- "continues" in C parlance. Eg: loop_header(a, b, c, d, e, f): ... a_new = ... b_new = ... if (..) goto loop_header(a_new, b_new, c, d, e, f) ... c_new = ... d_new = ... if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f) etc For functions with many live values, this can dramatically reduce the number of spill moves we throw into the register allocator. In terms of compilation costs, this ranges from neutral for functions which spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in insn count. In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness), instruction counts are reduced by 4.3%, data reads by 12.3% and data writes by 18.5%. Note those last two figures include all reads and writes made by the generated code, not just spills/reloads, so the proportional reduction in spill/reload traffic must be greater.	2020-06-09 10:36:32 +02:00
Chris Fallin	fc2a6f273b	Three fixes to various SpiderMonkey-related issues: - Properly mask constant values down to appropriate width when generating a constant value directly in aarch64 backend. This was a miscompilation introduced in the new-isel refactor. In combination with failure to respect NarrowValueMode, this resulted in a very subtle bug when an `i32` constant was used in bit-twiddling logic. - Add support for `iadd_ifcout` in aarch64 backend as used in explicit heap-check mode. With this change, we no longer fail heap-related tests with the huge-heap-region mode disabled. - Remove a panic that was occurring in some tests that are currently ignored on aarch64, by simply returning empty/default information in `value_label` functionality rather than touching unimplemented APIs. This is not a bugfix per-se, but removes confusing panic messages from `cargo test` output that might otherwise mislead.	2020-06-08 13:02:00 -07:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Chris Fallin	615362068f	Multi-value return support.	2020-06-03 13:31:34 -07:00
Anton Kirilov	8a928830ac	Enable the wast::Cranelift::spec::simd::simd_store test for AArch64 Copyright (c) 2020, Arm Limited.	2020-05-24 22:53:07 +01:00
Chris Fallin	bdd2873c8c	Address review comments.	2020-05-18 16:25:26 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00
Benjamin Bouvier	19d8a7f1fb	machinst: Reuse memory accross loop iterations in lowering;	2020-05-07 12:24:02 +02:00
Benjamin Bouvier	b24b711c16	machinst: Reduce the number of vec allocations for edge blocks;	2020-05-07 12:24:02 +02:00
Benjamin Bouvier	1d90751ba9	machinst: Avoid a full instructions traversal of all the blocks when computing the final block ordering;	2020-05-06 15:13:25 +02:00
Chris Fallin	e39b4aba1c	Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use it. Previously, every call was lowered on AArch64 to a `call` instruction, which takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible relocation record to be emitted (or rather, a record that a more sophisticated linker would fix up by inserting a shim/veneer). This commit adds a notion of "relocation distance" in the MachInst backends, and provides this information for every call target and symbol reference. The intent is that backends on architectures like AArch64, where there are different offset sizes / addressing strategies to choose from, can either emit a regular call or a load-64-bit-constant / call-indirect sequence, as necessary. This avoids the need to implement complex linking behavior. The MachInst driver code provides this information based on the "colocated" bit in the CLIF symbol references, which appears to have been designed for this purpose, or at least a similar one. Combined with the `use_colocated_libcalls` setting, this allows client code to ensure that library calls can link to library code at any location in the address space. Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing so, it appears all that is necessary to get its tests to pass is to set the `use_colocated_libcalls` flag to false, to make use of the above change. This fixes the `libcall_function` unit-test in this crate.	2020-05-05 09:55:12 -07:00
Benjamin Bouvier	698dc9c401	Fixes #1619 : Properly bubble up errors when seeing an unexpected type during lowering.	2020-04-29 10:26:22 +02:00
Chris Fallin	b691770faa	MachInst backend: pass through SourceLoc information. This change adds SourceLoc information per instruction in a `VCode<Inst>` container, and keeps this information up-to-date across register allocation and branch reordering. The information is initially collected during instruction lowering, eventually collected on the MachSection, and finally provided to the environment that wraps the codegen crate for wasmtime.	2020-04-24 13:18:01 -07:00
Benjamin Bouvier	5b8b75def0	Baldrdash: implement support for sign-extension in returns;	2020-04-21 12:12:56 +02:00
Benjamin Bouvier	d1b5df31fd	Baldrdash: use the right frame offset when loading arguments from the stack	2020-04-21 12:12:56 +02:00
bjorn3	1bee1af755	Implement stack_addr for AArch64	2020-04-18 13:24:06 +02:00
Chris Fallin	48cf2c2f50	Address review comments: - Undo temporary changes to default features (`all-arch`) and a signal-handler test. - Remove `SIGTRAP` handler: no longer needed now that we've found an "undefined opcode" option on ARM64. - Rename pp.rs to pretty_print.rs in machinst/. - Only use empty stack-probe on non-x86. As per a comment in rust-lang/compiler-builtins [1], LLVM only supports stack probes on x86 and x86-64. Thus, on any other CPU architecture, we cannot refer to `__rust_probestack`, because it does not exist. - Rename arm64 to aarch64. - Use `target` directive in vcode filetests. - Run the flags verifier, but without encinfo, when using new backends. - Clean up warning overrides. - Fix up use of casts: use u32::from(x) and siblings when possible, u32::try_from(x).unwrap() when not, to avoid silent truncation. - Take immutable `Function` borrows as input; we don't actually mutate the input IR. - Lots of other miscellaneous cleanups. [1] `cae3e6ea23/src/probestack.rs (L39)`	2020-04-15 17:21:28 -07:00
Chris Fallin	d83574261c	ARM64 backend, part 3 / 11: MachInst infrastructure. This patch adds the MachInst, or Machine Instruction, infrastructure. This is the machine-independent portion of the new backend design. It contains the implementation of the "vcode" (virtual-registerized code) container, the top-level lowering algorithm and compilation pipeline, and the trait definitions that the machine backends will fill in. This backend infrastructure is included in the compilation of the `codegen` crate, but it is not yet tied into the public APIs; that patch will come last, after all the other pieces are filled in. This patch contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the `arm64` branch at `https://github.com/cfallin/wasmtime` for original development history. Co-authored-by: Julian Seward <jseward@acm.org> Co-authored-by: Benjamin Bouvier <public@benj.me>	2020-04-11 17:51:11 -07:00

23 Commits