wasmtime

Author	SHA1	Message	Date
Dan Gohman	8d5fecd324	Format with stable rustfmt-preview, then with rustfmt-0.9 again.	2018-03-30 13:17:15 -07:00
Dan Gohman	4af95e37a6	Convert regular comments to documentation comments.	2018-03-28 22:48:30 -07:00
Dan Gohman	57cd69d8b4	Say "IR" instead of "IL". While the specifics of these terms are debatable, "IR" generally isn't incorrect in this context, and is the more widely recognized term at this time. See also the discussion in #267. Fixes #267.	2018-03-28 22:07:26 -07:00
Dan Gohman	7b51edd285	Fix spelling in a comment to be consistent with the code.	2018-03-28 08:58:58 -07:00
Dan Gohman	14faef7374	Derive Debug for LiveValue.	2018-03-26 21:21:54 -07:00
Dan Gohman	af8ac8f8ca	Avoid calling `analyze_branch()` when the `BranchInfo` is not needed. It's faster to just call `opcode().is_branch()`.	2018-03-22 13:18:25 -07:00
Pat Hickey	03ee007624	Use clippy (#276 ) * cton-util: fix some clippy unnecessary pass-by-value warnings * clippy: ignore too many arguments / cyclomatic complexity in module since these functions are taking args coming from the command line, i dont think this is actually a valid lint, morally the arguments are all from one structure * cton-util: take care of remaining clippy warnings * cton-reader: fix all non-suspicious clippy warnings * cton-reader: disable clippy at site of suspicious lint * cton-frontend: disable clippy at the site of an invalid lint * cton-frontend: fix clippy warnings, or ignore benign ones * clippy: ignore the camelcase word WebAssembly in docs * cton-wasm: fix clippy complaints or ignore benign ones * cton-wasm tests: fix clippy complaints * cretonne: starting point turns off all clippy warnings * cretonne: clippy fixes, or lower allow() to source of problem * cretonne: more clippy fixes * cretonne: fix or disable needless_lifetimes lint this linter is buggy when the declared lifetime is used for another type constraint. * cretonne: fix clippy complaint about Pass::NoPass * rustfmt * fix prev minor api changes clippy suggested * add clippy to test-all * cton-filetests: clippy fixes * simplify clippy reporting in test-all * cretonne: document clippy allows better * cretonne: fix some more clippy lints * cretonne: fix clippy lints (mostly doc comments) * cretonne: allow all needless_lifetimes clippy warnings remove overrides at the false positives * rustfmt	2018-03-22 13:10:41 -07:00
Dan Gohman	2b3df1a506	Add `use` declarations for `std` features. Merge the `use` parts of the `no_std` branch. This reduces the diffs between master and the `no_std` branch, making it easier to maintain. Most of these changes are derived from patches by @lachlansneff in https://github.com/Cretonne/cretonne/tree/no_std.	2018-03-20 14:07:05 -07:00
Dan Gohman	f04e02c0a1	Clarify comments about Conventional SSA form. Captialize "Conventional" so that it's clear that "Conventional SSA" is a specific concept being referenced.	2018-03-12 12:38:30 -07:00
Dan Gohman	30f8daa9d6	Replace `assert!` with `debug_assert!` in production code paths. This allows the assertions to be disabled in release builds, so that the code is faster and smaller, at the expense of not performing the checks. Assertions can be re-enabled in release builds with the debug-assertions flag in Cargo.toml, as the top-level Cargo.toml file does.	2018-03-12 12:38:30 -07:00
Dan Gohman	e81a27fb5d	Implement Debug for `ArgAction` and `Affinity`.	2018-03-12 12:38:30 -07:00
Dan Gohman	bf480c341b	Use https rather than http for several URLs.	2018-03-05 06:55:27 -08:00
Dan Gohman	505fe9277a	Tidy up calls to `analyze_branch`.	2018-03-04 21:34:49 -08:00
Bruce Mitchener	125270e2b0	Fix some typos.	2018-03-04 21:33:04 -08:00
Jakob Stoklund Olesen	d56ce9e8bf	Optimize DomForest::push_node(). The dominator tree pre-order is defined at the EBB granularity, but we are looking for dominating nodes at the instruction level. This means that we sometimes need to look higher up the DomForest stack for a dominating node, using DominatorTree::dominates() instead of DominatorTreePreorder::dominates(). Each dominance check involves the domtree.last_dominator() function scanning up the dominator tree, starting from the new node that was pushed. We can eliminate this duplicate work by exposing the last_dominator() function to push_node(). As we are searching through nodes on the stack, maintain a last_dom program point representing the previous return value from last_dominator(). This way, we're only scanning the dominator tree once.	2018-01-24 15:01:58 -08:00
Jakob Stoklund Olesen	416b21c18d	Pairwise virtual register coalescing. Use a better algorithm for resolving interferences in virtual registers. This improves code quality by generating much fewer copies on some complicated functions. After the initial union-find phase, the check_vreg() function uses a Budimlic forest to check for interference between the values in the virtual registers, as before. All the interference-free vregs are done. Others are passed to synthesize_vreg() which dissolves the vreg and then attempts to rebuild one or more vregs from the contained values. The pairwise interference checks use virtual copies to make sure that any future conflicts can be resolved by inserting a copy instruction. This technique was not present in the old coalescer which caused some correctness issues. This coalescing algorithm makes much better code, and it is generally a bit slower than before. Some of the slowdown is made up by the following passes being faster because they have to process less code. Example 1, the Python interpreter which contains a very large function with a lot of variables. Before: 15.664 0.011 Register allocation 1.535 1.535 RA liveness analysis 2.872 1.911 RA coalescing CSSA 4.436 4.436 RA spilling 2.610 2.598 RA reloading 4.200 4.199 RA coloring After: 9.795 0.013 Register allocation 1.372 1.372 RA liveness analysis 6.231 6.227 RA coalescing CSSA 0.712 0.712 RA spilling 0.598 0.598 RA reloading 0.869 0.869 RA coloring Coalescing is more than twice as slow, but because of the vastly better code quality, overall register allocation time is improved by 37%. Example 2, the clang compiler. Before: 57.148 0.035 Register allocation 9.630 9.630 RA liveness analysis 7.210 7.169 RA coalescing CSSA 9.972 9.972 RA spilling 11.602 11.572 RA reloading 18.698 18.672 RA coloring After: 64.792 0.042 Register allocation 8.630 8.630 RA liveness analysis 22.937 22.928 RA coalescing CSSA 8.684 8.684 RA spilling 9.559 9.551 RA reloading 14.939 14.936 RA coloring Here coalescing is 3x slower, but overall regalloc time only regresses by 13%. Most examples are less extreme than these two. They just get better code at about the same compile time.	2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen	d2e786a78a	Resurrect the VirtRegs::unify() function. The coalescer needs to be able to merge virtual registers with this function. Updated to recycle all virtual registers.	2018-01-23 14:21:36 -08:00
Jakob Stoklund Olesen	1bbc529ef9	Improve the variable ordering used by the coloring constraint solver. The fuzzer bugs #219 and #227 are both cases where the register allocator coloring pass "runs out of registers". What's really happening is that the constraint solver failed to find a solution, even when one existed. Suppose we have three solver variables: v0(GPR, out, global) v1(GPR, in) v2(GPR, in, out) And suppose registers %r0 and %r1 are available on both input and output sides of the instruction, but only %r1 is available for global outputs. A valid solution would be: v0 -> %r1 v1 -> %r1 v2 -> %r0 However, the solver would pick registers for the three values in numerical order because v1 and v2 have the same domain size (=2). This would assign v1 -> %r0 and then fail to find a free register for v2. Fix this by prioritizing in+out variables over single-sided variables even when their domains are equal. This means the v2 gets assigned a register before v1, and it gets a chance to pick a register that is still available on both in and out sides. Also try to avoid depending on value numbers in the solver. These bugs were hard to reproduce because a test case invariably would have different value numbers, causing the solver to order its variables differently and succeed. Throw in the previous solution and original register assignments as tie breakers which are stable and not dependent on value numbers. This is still not a substitute for a proper solver search algorithm that we will probably have to write eventually. Fixes #219 Fixes #227	2018-01-19 13:31:26 -08:00
Jakob Stoklund Olesen	dcad3fa339	Fix coloring bug with combined constraints and global values. The Intel instruction "v1 = ushr v2, v2" will implicitly fix the output register for v2 to %rcx because the output is tied to the first input operand and the second input operand is fixed to %rcx. Make sure we handle this transitive constraint when checking for interference with the globally live registers. Fixes #218	2018-01-17 15:51:08 -08:00
Jakob Stoklund Olesen	0a6500c99a	Avoid making solver variables for fixed input constraints. When the coloring pass sees an instruction with a fixed input register constraint that is already satisfied, make sure to tell the solver about it anyway. There are situations where the solver wants to convert a value to a solver variable, and we can't allow that if the same value is also used for a fixed register operand. Fixes #221.	2018-01-17 15:01:00 -08:00
Jakob Stoklund Olesen	13af22b46b	Track register pressure for dead EBB parameters. The spiller wasn't tracking register pressure correctly for dead EBB parameters in visit_ebb_header(). Make sure we free any dead EBB parameters. Fixes #223	2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen	5463fde203	Don't print unused pressure classes.	2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen	1e2b7de141	Remove dead code.	2018-01-16 12:34:32 -08:00
Jakob Stoklund Olesen	d1f236b00a	Reimplement coalescer following the Budimlic paper. The old coalescing algorithm had some algorithmic complexity issues when dealing with large virtual registers. Reimplement to use a proper union-find algorithm so we only need one pass through the dominator forests for virtual registers that are interference free. Virtual registers that do have interference are split and new registers built. This pass is about twice as fast as the old one when dealing with complex virtual registers.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	16ac4f65b3	Add support for textbook union-find to VirtRegs. The initial phase of computing virtual registers can now be implemented with a textbook union-find algorithm using a disjoint set forest complete with rank and path compression optimizations. The disjoint set forest is converted to virtual register value lists in a single linear scan implemented in finish_union_find(). This union-find algorithm will soon be used by the coalescer.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	cacba1a58f	Don't allow EBB parameters to be ghost values. Ghost instructions and values are supposed to be stored as metadata alongside the compiled program such that the ghost values can be computed from the real register/stack values when the program is stopped for debugging or de-optimization. If we allow an EBB parameter to be a ghost value, we have no way of computing its real value using ghost instructions. We would need to know a complete execution trace of the stopped program to figure out which values were passed to the ghost parameter. Instead we require EBB parameters to be real values materialized in registers or on the stack. We use the regclass_for_abi_type() TargetIsa callback to determine the initial register class for these parameters. They can then be spilled later if needed. Fixes #215.	2018-01-11 16:48:02 -08:00
Jakob Stoklund Olesen	fc857247e4	Fix overlaps_def for dead live ranges. A dead live range ends at the same point it is defined, but it is still considered to overlap a def at the same program point.	2017-12-14 17:16:19 -06:00
Jakob Stoklund Olesen	febe8e0e51	Allow spilling of EBB arguments. When the spiller needs to make a register available for a conditional branch instruction, it can be necessary to spill some of the EBB arguments on the branch instruction. This is ok because EBB argument values belong to the same virtual register as the corresponding EBB parameter and we spill the whole virtreg to the same slot. Also make sure free_regs() can handle values that are killed by the current instruction and spilled.	2017-12-14 13:57:13 -06:00
Jakob Stoklund Olesen	d617d5e0f3	Use a domtree pre-order instead of a CFG RPO for coalescing. The stack implementation if the Budimlic dominator forest doesn't work correctly with a CFG RPO. It needs the domtree pre-order. Also handle EBB pre-order vs inst-level preorder. Manage the stack according to EBB dominance. Look for a dominating value by searching the stack. This is different from the Budimlic algorithm because we're computing the dominator tree pre-order with EBB granularity only. Fixes #207.	2017-12-13 16:22:01 -06:00
Jakob Stoklund Olesen	2473661d49	Loosen the required order of values in a virtual register. Instead of requiring the values in a virtual register to be sorted according to the domtree.rpo_cmp() order, just require any topological ordering w.r.t. dominance. The coalescer with stop using the RPO shortly.	2017-12-13 15:25:21 -06:00
Jakob Stoklund Olesen	a825427786	Avoid reloading spilled EBB arguments. The coalescer makes sure that matching EBB arguments and parameters are always in the same virtual registers, and therefore also in the same stack slot if they are spilled. This means that the reload pass should never rewrite an EBB argument if the argument value is spilled. This comes up in cases where the branch instruction needs the same value in a register: brnz v9, ebb3(v9) If the virtual register containing v9 is spilled, the branch instruction must be reloaded like: v52 = fill v9 brnz v52, ebb3(v9) The branch register argument must be rewritten, and the EBB argument must be referring to the original stack value. Fixes #208.	2017-12-13 15:22:05 -06:00
Jakob Stoklund Olesen	60c456c1ec	Add a compilation pass timing facility. Individual compilation passes call the corresponding timing::*() function and hold on to their timing token while they run. This causes nested per-pass timing information to be recorded in thread-local storage. The --time-passes command line option prints a pass timing report to stdout.	2017-12-06 17:04:23 -08:00
Jakob Stoklund Olesen	feaea238bc	Use bforest::Map for representing live ranges. Get rid of the per-value Vec in the LiveRange data type and use a bforest::Map instead to represent the live-in intervals for non-local live ranges. This has some advantages: - The memory footprint of a local live range is reduced from 40 to 20 bytes, and - Clearing the Liveness data structure is now a constant time operation which doesn't call free(). - The potentially quadratic behavior when computing large live ranges is controlled by the logarithmic B-tree operations.	2017-12-06 14:14:21 -08:00
Jakob Stoklund Olesen	27d5543adc	Make LiveRange a type alias for GenLiveRange<Layout>. This makes the whole LiveRange generic over the program order instead of having a number of methods that are individually program order-generic. This makes is possible to have data members that depend on the program order, as we will shortly. This also gives us stronger type checking on the public LiveRange methods which now require a Layout argument, not just any program order.	2017-12-06 13:53:24 -08:00
Jakob Stoklund Olesen	92f378de76	Expose CFG predecessors only as an iterator. Define two public iterator types in the flowgraph module, PredIter and SuccIter, which are by-value iterators over an EBB's predecessors and successors respectively. Provide matching pred_iter() and succ_iter() methods for inspecting the CFG. Remove the get_predecessors() method which returned a slice. Update the uses of get_predecessors(), none of which depended on it being a slice. This abstraction makes it possible to change the internal representation of the CFG.	2017-11-22 09:13:04 -08:00
Jakob Stoklund Olesen	2e0b931590	Provide an fmt::Debug impl for entity references. Instead of deriving a Debug impl: Ebb(45), use the Display version for Debug too: ebb45. This is more readable, and no information is lost.	2017-11-21 10:01:55 -08:00
Dan Gohman	bfa5a72b77	Use reserve_exact instead of reserve when the max size is known.	2017-11-16 15:16:23 -08:00
Dan Gohman	4d9aedbaca	Add a 'clear()' function to Context. This includes adding `clear()` functions to its (transitive) members.	2017-11-15 11:15:30 -08:00
Dan Gohman	889b06fd16	Replace `as` casts with type-conversion functions.	2017-11-08 10:48:44 -08:00
Dan Gohman	5d3ae0596c	Avoid matching with reference patterns.	2017-11-08 10:43:12 -08:00
Dan Gohman	3ab4349c1b	Use `Self` instead of repeating the type name.	2017-11-08 10:43:11 -08:00
Dan Gohman	8b053aa923	Remove a trivial cast.	2017-11-08 10:40:39 -08:00
Jakob Stoklund Olesen	d37126565e	Also consider fixed outputs for replace_global_defines. Fixes #178. When an instruction with a fixed output operand defines a globally live SSA value, we need to check if the fixed register is available in the `regs.global` set of registers that can be used across EBB boundaries. If the fixed output register is not available in regs.global, set the replace_global_defines flag so the output operands are rewritten as local values.	2017-10-25 14:28:30 -07:00
Jakob Stoklund Olesen	e8ecf1f809	Add a FixedTied constraint kind for operand constraints. Fixes #175. The Intel division instructions have fixed input operands that are clobbered by fixed output operands, so the value passed as an input will be clobbered just like a tied operand. The FixedTied operand constraint is used to indicate a fixed input operand that has a corresponding output operand with the same fixed register. Teach the spiller to teach a FixedTied operand the same as a Tied operand constraint and make sure that the input value is killed by the instruction.	2017-10-25 11:22:20 -07:00
Jakob Stoklund Olesen	b3fb41087e	Use the term "Function parameter" instead of "argument". Rename the ArgumentType type to AbiParam since it describes the ABI characteristics of a parameter or return value, not just the value type. In Signature, rename members argument_types and return_types to "params" and "returns". Again, they are not just types. Fix a couple lingering references to "EBB arguments".	2017-10-19 17:39:23 -07:00
Jakob Stoklund Olesen	921bcc6c25	Use the term "EBB parameter" everywhere. Add EBB parameter and EBB argument to the langref glossary to clarify the distinction between formal EBB parameter values and arguments passed to branches. - Replace "ebb_arg" with "ebb_param" in function names that deal with EBB parameters. - Rename the ValueDef variants to Result and Param. - A bunch of other small langref fixes. No functional changes intended.	2017-10-19 16:17:09 -07:00
Jakob Stoklund Olesen	1dbc55dadf	Add a pressure_tracking flag to register banks. This makes it possible to define register banks that opt out of register pressure tracking. This will be used to define banks for special-purpose registers like the CPU flags. The pressure tracker does not need to use resources for a top-level register class in a non-tracked bank. The constant MAX_TOPRCS is renamed to MAX_TRACKED_TOPRCS to indicate that there may be top-level register classes with higher numbers, but they won't require pressure tracking. We won't be tracking register pressure for CPU flags since only one value is allowed to be live at a time.	2017-10-13 13:46:16 -07:00
Jakob Stoklund Olesen	994af598f5	Avoid interference on CFG edges. Track allocatable registers both locally and globally: Add a second AllocatableSet which tracks registers allocated to global values without accounting for register diversions. Since diversions are only local to an EBB, global values must be assigned un-diverted locations that don't interfere. Handle the third "global" interference domain in the constraint solver in addition to the existing "input" and "output" domains. Extend the solver error code to indicate when a global define just can't be allocated because there are not enough available global registers. Resolve this problem by replacing the instruction's global defines with local defines that are copied into their global destinations afterwards.	2017-10-11 15:38:30 -07:00
Jakob Stoklund Olesen	893a6716c6	Enforce all instruction constraints in iterate_solution(). During iterate_solution(), live-through values may be converted to solver variables so they can be moved out of the way in order to satisfy all constraints. Make sure that the instruction's operand constraints are also considered for these new variables. Add a program_complete_input_constraints() which turns all the instruction's input operands into variables with the proper constraints. That makes it safe for try_add_var() to re-add these values as variables with looser generic constraints. The solver's add_var() function is split into three functions: add_var for use before inputs_done(), and add_killed_var/add_through_var for use after.	2017-10-09 14:08:37 -07:00
Jakob Stoklund Olesen	4a2bf6d9a6	Use a more compact display of AllocatableSet. Since only Intel uses named registers, we can use a one-char shorthand for the registers.	2017-10-09 14:08:37 -07:00

1 2 3 4

183 Commits