wasmtime

Author	SHA1	Message	Date
Pat Hickey	3f69581d03	cretonne::Context: add for_function constructor	2018-01-25 18:14:57 -08:00
Jakob Stoklund Olesen	ef2640d8a4	Add information about SpiderMonkey and rustc plans.	2018-01-25 15:48:28 -08:00
Jakob Stoklund Olesen	d56ce9e8bf	Optimize DomForest::push_node(). The dominator tree pre-order is defined at the EBB granularity, but we are looking for dominating nodes at the instruction level. This means that we sometimes need to look higher up the DomForest stack for a dominating node, using DominatorTree::dominates() instead of DominatorTreePreorder::dominates(). Each dominance check involves the domtree.last_dominator() function scanning up the dominator tree, starting from the new node that was pushed. We can eliminate this duplicate work by exposing the last_dominator() function to push_node(). As we are searching through nodes on the stack, maintain a last_dom program point representing the previous return value from last_dominator(). This way, we're only scanning the dominator tree once.	2018-01-24 15:01:58 -08:00
Jakob Stoklund Olesen	416b21c18d	Pairwise virtual register coalescing. Use a better algorithm for resolving interferences in virtual registers. This improves code quality by generating much fewer copies on some complicated functions. After the initial union-find phase, the check_vreg() function uses a Budimlic forest to check for interference between the values in the virtual registers, as before. All the interference-free vregs are done. Others are passed to synthesize_vreg() which dissolves the vreg and then attempts to rebuild one or more vregs from the contained values. The pairwise interference checks use virtual copies to make sure that any future conflicts can be resolved by inserting a copy instruction. This technique was not present in the old coalescer which caused some correctness issues. This coalescing algorithm makes much better code, and it is generally a bit slower than before. Some of the slowdown is made up by the following passes being faster because they have to process less code. Example 1, the Python interpreter which contains a very large function with a lot of variables. Before: 15.664 0.011 Register allocation 1.535 1.535 RA liveness analysis 2.872 1.911 RA coalescing CSSA 4.436 4.436 RA spilling 2.610 2.598 RA reloading 4.200 4.199 RA coloring After: 9.795 0.013 Register allocation 1.372 1.372 RA liveness analysis 6.231 6.227 RA coalescing CSSA 0.712 0.712 RA spilling 0.598 0.598 RA reloading 0.869 0.869 RA coloring Coalescing is more than twice as slow, but because of the vastly better code quality, overall register allocation time is improved by 37%. Example 2, the clang compiler. Before: 57.148 0.035 Register allocation 9.630 9.630 RA liveness analysis 7.210 7.169 RA coalescing CSSA 9.972 9.972 RA spilling 11.602 11.572 RA reloading 18.698 18.672 RA coloring After: 64.792 0.042 Register allocation 8.630 8.630 RA liveness analysis 22.937 22.928 RA coalescing CSSA 8.684 8.684 RA spilling 9.559 9.551 RA reloading 14.939 14.936 RA coloring Here coalescing is 3x slower, but overall regalloc time only regresses by 13%. Most examples are less extreme than these two. They just get better code at about the same compile time.	2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen	b124eaf77d	Add a preorder pre_cmp_def() function. This provides a total ordering of values according to when their definition appears in the dominator tree pre-order.	2018-01-23 14:55:21 -08:00
Jakob Stoklund Olesen	d2e786a78a	Resurrect the VirtRegs::unify() function. The coalescer needs to be able to merge virtual registers with this function. Updated to recycle all virtual registers.	2018-01-23 14:21:36 -08:00
Jakob Stoklund Olesen	085e228358	Remove assertions from the hot value_def() function. The errors caught by these assertions are also detected by the IL verifier. Speed up compilation by 13%.	2018-01-22 11:31:21 -08:00
Jakob Stoklund Olesen	c6bb7afa07	Dampen quadratic behavior in check_cssa(). Use a dominator tree pre-order to speed up the dominance checks and only check for interference with the nearest dominating predecessor in a virtual register. This makes the CSSA verification about 2x as fast.	2018-01-22 10:48:14 -08:00
Jakob Stoklund Olesen	91343f208d	Fix quadratic behavior in sequence numbering. The ir::layout module is assigning sequence numbers to all EBBs and instructions so relative positions can be computed in constant time. This works a lot like BASIC line numbers where we initially use numbers 10, 20, 30, ... so we can insert new instructions in the middle of the sequence without renumbering everything. In some cases where the coalescer is misbehaving and inserting a lot of copy instructions, we end up having to renumber a larger and larger number of instructions to make space in the sequence. This causes the following reload pass to be very slow, spending most of its time renumbering instructions. Fix this by putting an upper limit on the number of instructions we're willing to renumber locally. When the limit is reached, switch to a full function renumbering with the major stride of 10. This gives us new elasticity in the sequence numbers. - Time to compile the Python interpreter in #229 drops from 4826 s -> 15.8 s. - The godot benchmark in #226 drops from 1257 s -> 75 s. - The AngryBots1 benchmark does not have the coalescer misbehavior. Its compilation time changes 22.9 s -> 23.1 s. It's worth noting that the sequence numbering is still technically quadratic with this fix. The system is not designed to handle a large number of instructions inserted in a single location. It expects a more even distribution of new instructions. We still need to fix the coalescer. It should not insert so many copies in degenerate cases.	2018-01-22 09:42:26 -08:00
Jakob Stoklund Olesen	10845995a8	Make room for 4-digit pass timings. (sigh)	2018-01-22 07:51:25 -08:00
Jakob Stoklund Olesen	1bbc529ef9	Improve the variable ordering used by the coloring constraint solver. The fuzzer bugs #219 and #227 are both cases where the register allocator coloring pass "runs out of registers". What's really happening is that the constraint solver failed to find a solution, even when one existed. Suppose we have three solver variables: v0(GPR, out, global) v1(GPR, in) v2(GPR, in, out) And suppose registers %r0 and %r1 are available on both input and output sides of the instruction, but only %r1 is available for global outputs. A valid solution would be: v0 -> %r1 v1 -> %r1 v2 -> %r0 However, the solver would pick registers for the three values in numerical order because v1 and v2 have the same domain size (=2). This would assign v1 -> %r0 and then fail to find a free register for v2. Fix this by prioritizing in+out variables over single-sided variables even when their domains are equal. This means the v2 gets assigned a register before v1, and it gets a chance to pick a register that is still available on both in and out sides. Also try to avoid depending on value numbers in the solver. These bugs were hard to reproduce because a test case invariably would have different value numbers, causing the solver to order its variables differently and succeed. Throw in the previous solution and original register assignments as tie breakers which are stable and not dependent on value numbers. This is still not a substitute for a proper solver search algorithm that we will probably have to write eventually. Fixes #219 Fixes #227	2018-01-19 13:31:26 -08:00
Tyler McMullen	7826fce44f	On finding an static OOB heap addr, split the Ebb and recompute the CFG.	2018-01-18 15:49:10 -08:00
Tyler McMullen	14e39db428	Add filetest for statically out-of-bound heap addresses.	2018-01-18 15:49:10 -08:00
Tyler McMullen	df210bfdea	Fix the Intel x64 PIC 'call' test, adding correct addend.	2018-01-18 14:23:00 -08:00
Tyler McMullen	850896f05e	The addend for a PLTRel4 reloc should be -4.	2018-01-18 14:23:00 -08:00
Jakob Stoklund Olesen	1e49431804	Add test case from #216 . The error exposed by this test case no longer happens after the coalescer was rewritten to to follow the Budimlic paper. It's still a good coalescer test. Fixes #216 by including the test case.	2018-01-17 16:19:51 -08:00
Jakob Stoklund Olesen	dcad3fa339	Fix coloring bug with combined constraints and global values. The Intel instruction "v1 = ushr v2, v2" will implicitly fix the output register for v2 to %rcx because the output is tied to the first input operand and the second input operand is fixed to %rcx. Make sure we handle this transitive constraint when checking for interference with the globally live registers. Fixes #218	2018-01-17 15:51:08 -08:00
Jakob Stoklund Olesen	0a6500c99a	Avoid making solver variables for fixed input constraints. When the coloring pass sees an instruction with a fixed input register constraint that is already satisfied, make sure to tell the solver about it anyway. There are situations where the solver wants to convert a value to a solver variable, and we can't allow that if the same value is also used for a fixed register operand. Fixes #221.	2018-01-17 15:01:00 -08:00
Jakob Stoklund Olesen	13af22b46b	Track register pressure for dead EBB parameters. The spiller wasn't tracking register pressure correctly for dead EBB parameters in visit_ebb_header(). Make sure we free any dead EBB parameters. Fixes #223	2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen	5463fde203	Don't print unused pressure classes.	2018-01-17 13:19:08 -08:00
Dan Gohman	457e161974	Add comments explaining why the rustfmt version is still at 0.9.0.	2018-01-17 11:24:45 -08:00
Tyler McMullen	eb85aa833c	Illegalize rbp/r13 for zero-offset loads on Intel x64 (#225 ) * Switch RegClass to a bitmap implementation. * Add special RegClass to remove r13 from 'ld' recipe. * Use MASK_LEN constant instead of magic number. * Enforce that RegClass slicing is only valid on contiguous classes. * Use Optional[int] for RegClass optional bitmask parameter. * Add comment explaining use of Intel ISA's GPR_NORIP register class.	2018-01-16 20:05:53 -08:00
Jakob Stoklund Olesen	1e2b7de141	Remove dead code.	2018-01-16 12:34:32 -08:00
Jakob Stoklund Olesen	d1f236b00a	Reimplement coalescer following the Budimlic paper. The old coalescing algorithm had some algorithmic complexity issues when dealing with large virtual registers. Reimplement to use a proper union-find algorithm so we only need one pass through the dominator forests for virtual registers that are interference free. Virtual registers that do have interference are split and new registers built. This pass is about twice as fast as the old one when dealing with complex virtual registers.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	16ac4f65b3	Add support for textbook union-find to VirtRegs. The initial phase of computing virtual registers can now be implemented with a textbook union-find algorithm using a disjoint set forest complete with rank and path compression optimizations. The disjoint set forest is converted to virtual register value lists in a single linear scan implemented in finish_union_find(). This union-find algorithm will soon be used by the coalescer.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	ce4cc8ce12	Fix the handling of special types in type variables. - Allow the syntax "specials=True" to indicate that a type variable can assume all special types. Use this for the unconstrained type variable created in ast.py. - Fix TypeSet.copy() to avoid deepcopy() which doesn't do the right thing for the self.specials set. - Fix TypeSet.typeset_key() to just use the name of special types instead of the full SpecialType objects.	2018-01-16 10:29:31 -08:00
Jakob Stoklund Olesen	85aab278dd	Add RISC-V encodings for b1 copy/spill/fill. We allow b1 values in general purpose registers, so we need to be able to move them around.	2018-01-16 09:19:22 -08:00
Yury Delendik	567e570c02	Allow to print translated wasm file.	2018-01-12 13:12:50 -08:00
Jakob Stoklund Olesen	cacba1a58f	Don't allow EBB parameters to be ghost values. Ghost instructions and values are supposed to be stored as metadata alongside the compiled program such that the ghost values can be computed from the real register/stack values when the program is stopped for debugging or de-optimization. If we allow an EBB parameter to be a ghost value, we have no way of computing its real value using ghost instructions. We would need to know a complete execution trace of the stopped program to figure out which values were passed to the ghost parameter. Instead we require EBB parameters to be real values materialized in registers or on the stack. We use the regclass_for_abi_type() TargetIsa callback to determine the initial register class for these parameters. They can then be spilled later if needed. Fixes #215.	2018-01-11 16:48:02 -08:00
Jakob Stoklund Olesen	5e094034d4	Fix verifier bug in unreachable code. We want to disable dominance checks in unreachable code. The is_reachable() check for EBB parameter values was checking if the defining EBB was reachable, not the EBB using the value. This bug showed up in fuzzing and in #213.	2018-01-09 10:47:49 -08:00
Jakob Stoklund Olesen	af89006b09	Fix some markdown issues. Work around some cases where the old markdown parser differs from the new Pulldown parser for the documentation.	2018-01-08 16:19:16 -08:00
Jakob Stoklund Olesen	4afa19ddff	Fix some mypy errors. It looks like mypy 0.560 doesn't like when a local variable changes its type inside a function. Fixes introduce a new variable instead of reusing an existing one.	2018-01-03 12:13:13 -08:00
Dan Gohman	4f53cc1dad	Align IntelGOTPCRel4 with R_X86_64_GOTPCREL. Add an addend field to reloc_external, and use it to move the responsibility for accounting for the difference between the end of an instruction (where the PC is considered to be in PC-relative on intel) and the beginning of the immediate field into the encoding code. Specifically, this makes IntelGOTPCRel4 directly correspond to R_X86_64_GOTPCREL, instead of also carrying an implicit `- 4`.	2017-12-15 16:17:32 -06:00
Dan Gohman	76e31cc1ad	Rename GotPCRel4 to GOTPCRel4. This emphasizes that GOT is being used as an abbreviation rather than the word "got".	2017-12-15 16:17:32 -06:00
Jakob Stoklund Olesen	fc857247e4	Fix overlaps_def for dead live ranges. A dead live range ends at the same point it is defined, but it is still considered to overlap a def at the same program point.	2017-12-14 17:16:19 -06:00
Jakob Stoklund Olesen	66073eb26c	Better verifier error for coinciding defs. If a virtual register contains values that a defined at the same program point, say so. Don't cryptically claim that one dominates the other.	2017-12-14 17:04:16 -06:00
Jakob Stoklund Olesen	febe8e0e51	Allow spilling of EBB arguments. When the spiller needs to make a register available for a conditional branch instruction, it can be necessary to spill some of the EBB arguments on the branch instruction. This is ok because EBB argument values belong to the same virtual register as the corresponding EBB parameter and we spill the whole virtreg to the same slot. Also make sure free_regs() can handle values that are killed by the current instruction and spilled.	2017-12-14 13:57:13 -06:00
Jakob Stoklund Olesen	d617d5e0f3	Use a domtree pre-order instead of a CFG RPO for coalescing. The stack implementation if the Budimlic dominator forest doesn't work correctly with a CFG RPO. It needs the domtree pre-order. Also handle EBB pre-order vs inst-level preorder. Manage the stack according to EBB dominance. Look for a dominating value by searching the stack. This is different from the Budimlic algorithm because we're computing the dominator tree pre-order with EBB granularity only. Fixes #207.	2017-12-13 16:22:01 -06:00
Jakob Stoklund Olesen	2473661d49	Loosen the required order of values in a virtual register. Instead of requiring the values in a virtual register to be sorted according to the domtree.rpo_cmp() order, just require any topological ordering w.r.t. dominance. The coalescer with stop using the RPO shortly.	2017-12-13 15:25:21 -06:00
Jakob Stoklund Olesen	a825427786	Avoid reloading spilled EBB arguments. The coalescer makes sure that matching EBB arguments and parameters are always in the same virtual registers, and therefore also in the same stack slot if they are spilled. This means that the reload pass should never rewrite an EBB argument if the argument value is spilled. This comes up in cases where the branch instruction needs the same value in a register: brnz v9, ebb3(v9) If the virtual register containing v9 is spilled, the branch instruction must be reloaded like: v52 = fill v9 brnz v52, ebb3(v9) The branch register argument must be rewritten, and the EBB argument must be referring to the original stack value. Fixes #208.	2017-12-13 15:22:05 -06:00
Pat Hickey	d444044e9e	intel isa: comments to explain rip-relative addressing encoding	2017-12-12 19:29:52 -08:00
Pat Hickey	ed81bc21be	filetests: add filetests for intel PIC encodings	2017-12-12 19:29:52 -08:00
Pat Hickey	6d44debc18	intel: add PIC variants to recipes and encodings	2017-12-12 19:29:52 -08:00
Pat Hickey	5834520bfe	binemit: add PIC relocation types for Intel	2017-12-12 19:29:52 -08:00
Pat Hickey	90bc798e4f	settings: add "is_pic" boolean setting to base	2017-12-12 19:29:52 -08:00
Pat Hickey	88b30ff386	refactor Reloc to an enum of every architecture's reloc types https://github.com/stoklund/cretonne/pull/206#issuecomment-350905016	2017-12-12 13:57:10 -08:00
Jakob Stoklund Olesen	a888b2a6f1	Dominator tree pre-order. Add a DominatorTreePreorder data structure which can be initialized for a DominatorTree and used for queries involving a pre-order of the dominator tree. Print out the pre-order and send it through filecheck in "test domtree" file tests.	2017-12-08 17:43:15 -08:00
Jakob Stoklund Olesen	7d5f2f0404	Convert the CFG traversal tests to file tests. Add a "cfg_postorder:" printout to the "test domtree" file tests and use that to check the computed CFG post-order instead of doing it manually with Rust code.	2017-12-08 13:58:18 -08:00
Jakob Stoklund Olesen	a7eb13a151	Expand unknown instructions to runtime library calls.	2017-12-08 10:37:50 -08:00
Jakob Stoklund Olesen	f03729d742	Fix generated code for ISA predicates on encoding recipes. The generated code had syntax errors and inverted logic. Add an SSE 4.1 requirement to the floating point rounding instructions.	2017-12-08 10:37:50 -08:00

1 2 3 4 5 ...

1543 Commits