wasmtime

Author	SHA1	Message	Date
Alex Crichton	0acd2072c2	Fix doc warnings and link failures (#1948 ) Also add configuration to CI to fail doc generation if any links are broken. Unfortunately we can't blanket deny all warnings in rustdoc since some are unconditional warnings, but for now this is hopefully good enough. Closes #1947	2020-06-30 13:01:49 -05:00
Benjamin Bouvier	c9a3f05afd	machinst x64: implement calls and int cmp/store/loads; This makes it possible to run a simple recursive fibonacci function in wasmtime.	2020-06-25 16:20:33 +02:00
Chris Fallin	13e12908a6	MachBuffer branch opts: comments approximating a semi-formal correctness proof.	2020-05-20 14:12:19 -07:00
Chris Fallin	80ab154d04	Update from review comments.	2020-05-20 12:35:36 -07:00
Chris Fallin	e11094b28b	Fix MachBuffer branch optimization. This patch fixes a subtle bug that occurred in the MachBuffer branch optimization: in tracking labels at the current buffer tail using a sorted-by-offset array, the code did not update this array properly when redirecting labels. As a result, the dead-branch removal was unsafe, because not every label pointing to a branch is guaranteed to be redirected properly first. Discovered while doing performance testing: bz2 silently took a wrong branch and exited compression early. (Eek!) To address this problem, this patch adopts a slightly simpler data structure: we only track the labels at the current buffer tail, and at the start of each branch, and we're careful to update these appropriately to maintain the invariants. I'm pretty confident that this is correct now, but we should (still) fuzz it a bunch, because wrong control flow scares me a nonzero amount. I should probably also actually write out a formal proof that these data-structure updates are correct. The optimizations are important for performance (removing useless empty blocks, and taking advantage of any fallthrough opportunities at all), so I don't think we would want to drop them entirely.	2020-05-19 18:09:18 -07:00
Chris Fallin	bdd2873c8c	Address review comments.	2020-05-18 16:25:26 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00

7 Commits