wasmtime

Author	SHA1	Message	Date
Dan Gohman	11eddafef8	Avoid using floating-point values in expand_fcvt_to_sint. Compute the bound values for expand_fcvt_to_sint using bitwise integer arithmetic rather than floating-point arithmetic, to avoid relying on host floating point arithmetic.	2018-03-12 12:50:58 -07:00
Dan Gohman	30f8daa9d6	Replace `assert!` with `debug_assert!` in production code paths. This allows the assertions to be disabled in release builds, so that the code is faster and smaller, at the expense of not performing the checks. Assertions can be re-enabled in release builds with the debug-assertions flag in Cargo.toml, as the top-level Cargo.toml file does.	2018-03-12 12:38:30 -07:00
Dan Gohman	13b167770c	Include emergency stack slots when laying out the stack. Emergency stack slots are a new kind of stack slot added relatively recently. They need to be allocated a stack offset just like explicit and spill slots. Also, make StackSlotData's offset field an Option, to catch problems like this in the future. Previously the value 0 was used when offsets weren't assigned yet, however that made it non-obvious when the field meant "not assigned yet" and when it meant "assigned the value 0".	2018-03-04 21:34:49 -08:00
Dan Gohman	5dc449ec9e	Rename "local variables" to "explicit stack slots". The term "local variables" predated the SSA builder in the front-end crate, which also provides a way to implement source-language local variables. The name "explicit stack slot" makes it clear what this construct is.	2018-02-28 14:04:28 -08:00
Dan Gohman	a5b00b173e	Don't renumber entities in the parser. This makes it easier to debug testcases: - the entity numbers in a .cton file match the entity numbers used within Cretonne. - serializing and deserializing doesn't cause indices to change. One disadvantage is that if a .cton file uses sparse entity numbers, deserializing to the in-memory form doesn't compact it. However, the text format is not intended to be performance-critical, so this isn't expected to be a big burden.	2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen	b9b1d0fcd5	Add a trapff instruction. This is the floating point equivalent of trapif: Trap when a given condition is in the floating-point flags. Define Intel encodings comparable to the trapif encodings.	2018-02-20 14:35:41 -08:00
Dan Gohman	21215529fe	Simplify the code in make_inst_results_reusing; NFC.	2018-02-20 12:15:23 -08:00
Jakob Stoklund Olesen	11c721934c	Add a trapif instruction. This is a conditional trap controlled by integer CPU flags. Compare to brif.	2018-02-08 14:40:46 -08:00
Julian Seward	6f8a54b6a5	Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets. Changes: * Adds a new generic instruction, SELECTIF, that does value selection (a la conditional move) similarly to existing SELECT, except that it is controlled by condition code input and flags-register inputs. * Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and nothing else. * Adds new Intel x86_64 instructions BSR and BSF. * Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets using the new BSR, BSF and SELECTIF instructions. * Implements SELECTIF on x86_64 targets using conditional-moves. * new test filetests/isa/intel/baseline_clz_ctz_popcount.cton (for legalization) * new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton (for encoding) * Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified Rust without rustc complaining. Fixes #238.	2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen	e3714ddd10	Add a func.inst_offsets() iterator. This Function method can be used after the final code layout has been computed. It returns all the instructions in an EBB along with their encoded size and offset from the beginning of the function. This is useful for extracting additional metadata about trapping instructions and other things that may be needed by a VM.	2018-02-01 17:19:05 -08:00
Jakob Stoklund Olesen	416b21c18d	Pairwise virtual register coalescing. Use a better algorithm for resolving interferences in virtual registers. This improves code quality by generating much fewer copies on some complicated functions. After the initial union-find phase, the check_vreg() function uses a Budimlic forest to check for interference between the values in the virtual registers, as before. All the interference-free vregs are done. Others are passed to synthesize_vreg() which dissolves the vreg and then attempts to rebuild one or more vregs from the contained values. The pairwise interference checks use virtual copies to make sure that any future conflicts can be resolved by inserting a copy instruction. This technique was not present in the old coalescer which caused some correctness issues. This coalescing algorithm makes much better code, and it is generally a bit slower than before. Some of the slowdown is made up by the following passes being faster because they have to process less code. Example 1, the Python interpreter which contains a very large function with a lot of variables. Before: 15.664 0.011 Register allocation 1.535 1.535 RA liveness analysis 2.872 1.911 RA coalescing CSSA 4.436 4.436 RA spilling 2.610 2.598 RA reloading 4.200 4.199 RA coloring After: 9.795 0.013 Register allocation 1.372 1.372 RA liveness analysis 6.231 6.227 RA coalescing CSSA 0.712 0.712 RA spilling 0.598 0.598 RA reloading 0.869 0.869 RA coloring Coalescing is more than twice as slow, but because of the vastly better code quality, overall register allocation time is improved by 37%. Example 2, the clang compiler. Before: 57.148 0.035 Register allocation 9.630 9.630 RA liveness analysis 7.210 7.169 RA coalescing CSSA 9.972 9.972 RA spilling 11.602 11.572 RA reloading 18.698 18.672 RA coloring After: 64.792 0.042 Register allocation 8.630 8.630 RA liveness analysis 22.937 22.928 RA coalescing CSSA 8.684 8.684 RA spilling 9.559 9.551 RA reloading 14.939 14.936 RA coloring Here coalescing is 3x slower, but overall regalloc time only regresses by 13%. Most examples are less extreme than these two. They just get better code at about the same compile time.	2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen	085e228358	Remove assertions from the hot value_def() function. The errors caught by these assertions are also detected by the IL verifier. Speed up compilation by 13%.	2018-01-22 11:31:21 -08:00
Jakob Stoklund Olesen	91343f208d	Fix quadratic behavior in sequence numbering. The ir::layout module is assigning sequence numbers to all EBBs and instructions so relative positions can be computed in constant time. This works a lot like BASIC line numbers where we initially use numbers 10, 20, 30, ... so we can insert new instructions in the middle of the sequence without renumbering everything. In some cases where the coalescer is misbehaving and inserting a lot of copy instructions, we end up having to renumber a larger and larger number of instructions to make space in the sequence. This causes the following reload pass to be very slow, spending most of its time renumbering instructions. Fix this by putting an upper limit on the number of instructions we're willing to renumber locally. When the limit is reached, switch to a full function renumbering with the major stride of 10. This gives us new elasticity in the sequence numbers. - Time to compile the Python interpreter in #229 drops from 4826 s -> 15.8 s. - The godot benchmark in #226 drops from 1257 s -> 75 s. - The AngryBots1 benchmark does not have the coalescer misbehavior. Its compilation time changes 22.9 s -> 23.1 s. It's worth noting that the sequence numbering is still technically quadratic with this fix. The system is not designed to handle a large number of instructions inserted in a single location. It expects a more even distribution of new instructions. We still need to fix the coalescer. It should not insert so many copies in degenerate cases.	2018-01-22 09:42:26 -08:00
Jakob Stoklund Olesen	d1f236b00a	Reimplement coalescer following the Budimlic paper. The old coalescing algorithm had some algorithmic complexity issues when dealing with large virtual registers. Reimplement to use a proper union-find algorithm so we only need one pass through the dominator forests for virtual registers that are interference free. Virtual registers that do have interference are split and new registers built. This pass is about twice as fast as the old one when dealing with complex virtual registers.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	af89006b09	Fix some markdown issues. Work around some cases where the old markdown parser differs from the new Pulldown parser for the documentation.	2018-01-08 16:19:16 -08:00
Jakob Stoklund Olesen	362a4bdc4c	Add well-known names for runtime library functions. Add a LibCall type which represents runtime library functions that many be synthesized by Cretonne from pure instructions. Add a LibCall variant to ExternalName to represent one of these runtime functions.	2017-12-07 17:50:22 -08:00
Tyler McMullen	ced39f5186	Fix up adjust_sp_imm instruction. * Use imm64 rather than offset32 * Add predicate to enforce signed 32-bit limit to imm * Remove AdjustSpImm format * Add encoding tests for adjust_sp_imm * Adjust use of adjust_sp_imm in Intel prologue_epilogue to match	2017-12-05 11:49:12 -08:00
Tyler McMullen	ffab87318e	Add adjust_sp_imm instruction. Note: This enables using rsp and rbp as normal registers. Which is... wrong.	2017-12-05 11:49:12 -08:00
Tyler McMullen	cdf70ccb77	Add copy_special instruction.	2017-12-05 11:49:12 -08:00
Tyler McMullen	8ed37e352e	Add x86_push and x86_pop instructions.	2017-12-05 11:49:12 -08:00
Pat Hickey	9108725b74	extname: extend testcases to 16 chars this is sufficient for all filetests to pass without truncating	2017-11-23 14:08:47 -08:00
Pat Hickey	894268233e	ir::ExternalName representation and constructors rewritten	2017-11-23 14:08:47 -08:00
Jakob Stoklund Olesen	2e0b931590	Provide an fmt::Debug impl for entity references. Instead of deriving a Debug impl: Ebb(45), use the Display version for Debug too: ebb45. This is more readable, and no information is lost.	2017-11-21 10:01:55 -08:00
Dan Gohman	ced7a88ecc	Use consistent formatting for module-level comments.	2017-11-08 14:49:27 -08:00
Dan Gohman	e213c2654f	Fix branch_destination/analyze_branch for BranchInt/BranchFloat.	2017-11-08 10:58:03 -08:00
Dan Gohman	889b06fd16	Replace `as` casts with type-conversion functions.	2017-11-08 10:48:44 -08:00
Dan Gohman	3ab4349c1b	Use `Self` instead of repeating the type name.	2017-11-08 10:43:11 -08:00
Dan Gohman	d7796cbf25	Suppress an unused-import warning on AsciiExt. See https://users.rust-lang.org/t/psa-dealing-with-warning-unused-import-std-ascii-asciiext-in-today-s-nightly/13726 for details.	2017-11-08 10:40:29 -08:00
Dan Gohman	5dda19035d	Implement Hash for ExternalName. This allows users to have HashMaps etc. with ExternalNames as keys.	2017-11-06 11:34:09 -08:00
Dan Gohman	d9743290ea	Elide elidable lifetime parameters.	2017-11-06 11:09:56 -08:00
Dan Gohman	acc6d941a3	Combine redundant match arm bodies.	2017-11-03 16:40:51 -07:00
Dan Gohman	5d063eb8bc	Merge reloc_func and reloc_globalsym into reloc_external.	2017-10-31 12:26:33 -07:00
Dan Gohman	6fc45b070a	Add a new kind of GlobalVar for symbolic addresses. These addresses will allow referencing C/C++/Rust-style global variables by name directly.	2017-10-30 13:26:56 -07:00
Dan Gohman	c2665385b1	Rename FunctionName to ExternalName.	2017-10-30 13:26:56 -07:00
Dan Gohman	bbdce7e3e0	Update the comment for 'notrap()'.	2017-10-24 10:39:01 -07:00
Jakob Stoklund Olesen	b3fb41087e	Use the term "Function parameter" instead of "argument". Rename the ArgumentType type to AbiParam since it describes the ABI characteristics of a parameter or return value, not just the value type. In Signature, rename members argument_types and return_types to "params" and "returns". Again, they are not just types. Fix a couple lingering references to "EBB arguments".	2017-10-19 17:39:23 -07:00
Jakob Stoklund Olesen	921bcc6c25	Use the term "EBB parameter" everywhere. Add EBB parameter and EBB argument to the langref glossary to clarify the distinction between formal EBB parameter values and arguments passed to branches. - Replace "ebb_arg" with "ebb_param" in function names that deal with EBB parameters. - Rename the ValueDef variants to Result and Param. - A bunch of other small langref fixes. No functional changes intended.	2017-10-19 16:17:09 -07:00
Jakob Stoklund Olesen	57b81a179e	Move the CursorBase trait into the cursor module. Also move the CursorPosition type into the cursor module. Move layout::cursor into the tests module as LayoutCursor and remove its ability to insert instructions via the dfg.ins() method. This cursor type is only used in the layout unit tests now. The FuncCursor and EncCursor types are the commonly used cursors now.	2017-10-19 12:15:43 -07:00
Jakob Stoklund Olesen	b948de1693	Add a verifier pass for CPU flags. Only one CPU flags value can be live at a time, and some instructions clobber the flags.	2017-10-18 15:07:19 -07:00
Jakob Stoklund Olesen	1f98fc491c	Add instructions using CPU flags. Add integer and floating comparison instructions that return CPU flags: ifcmp, ifcmp_imm, and ffcmp. Add conditional branch instructions that check CPU flags: brif, brff Add instructions that check a condition in the CPU flags and return a b1: trueif, trueff.	2017-10-12 19:12:28 -07:00
Jakob Stoklund Olesen	15461c1e4b	Add two new value types: iflags and fflags. These two value types represent the state of CPU flags after an integer comparison and a floating point comparison respectively. Instructions using these types TBD.	2017-10-12 19:05:24 -07:00
Jakob Stoklund Olesen	dbaa919ca9	Make room for SpecialType in the value type numbering. The value types are now classified into three groups: 1. Lane types are scalar types that can also be used to form vectors. 2. Vector types 2-256 copies of a lane type. 3. Special types. This is where the CPU flag types will go. The special types can't be used to form vectors. Change the numbering scheme for value types to make room for the special types and add `is_lane()` and `is_special()` classification methods. The VOID type still has number 0, but it can no longer appear as a vector lane. It classifies as special now.	2017-10-12 12:48:55 -07:00
Jakob Stoklund Olesen	89a24b2f13	Rename ScalarType to LaneType. The word "scalar" is a bit vague and tends to mean "non-vector". Since we are about to add new CPU flag value types that can't appear as vector lanes, make the distinction clear: LaneType represents value types that can appear as a vector lane. Also replace the Type::is_scalar() method with an is_vector() method.	2017-10-12 10:39:12 -07:00
Jakob Stoklund Olesen	699cb9895e	Enforce a 4-byte minimum spill slot size. This is primarily for the benefit of 32-bit x86 code which can't spill 1-byte types from arbitrary registers. This makes it possible to use 32-bit writes to spill types like b1 and i8. These small types are expected to be very rare since WebAssembly doesn't have then, and we tend to push integer arithmetic to at least i32. The effect of frame sizes should be minimal.	2017-10-11 14:20:43 -07:00
Jakob Stoklund Olesen	90ed698e83	Add an unreachable code elimination pass. The register allocator doesn't even try to compile unreachable EBBs, so any values defined in such blocks won't be assigned registers. Since the dominator tree already has determined which EBBs are reachable, we should just eliminate any unreachable blocks instead o trying to do something with the dead code. Not that this is not a "dead code elimination" pass which would also remove individual instructions whose results are not used.	2017-10-09 15:26:27 -07:00
Dan Gohman	0c4500897f	Clarify FunctionName's role in its comment.	2017-10-09 13:50:03 -07:00
Jakob Stoklund Olesen	b3fa47cacc	Add support for emergency spill slots. - Create a new kind of stack slot: emergency_slot. - Add a get_emergency_slot() method which finds a suitable emergency slot given a list of slots already in use. - Use emergency spill slots when schedule_moves needs them.	2017-10-06 10:45:13 -07:00
Jakob Stoklund Olesen	b562fdcd5c	Remove the dfg::resolve_copies() method. This method was important back when result values couldn't be moved between instructions. Now that results can be moved, value aliases do everything we need. Copy instructions are still used to break interferences in the register allocator's coalescing phase, but there isn't really any reason to use a copy instruction over a value alias anywhere else. After and during register allocation, copy instructions are significant, so we never want to "see through" them like the resolve_copies() function did. This is related to #166, but probably doesn't fix the underlying problem.	2017-10-05 14:46:34 -07:00
Jakob Stoklund Olesen	dda3efcbdd	Add regspill and regfill instructions. These are parallels to the existing regmove instruction, but the divert the value to and from a stack slot. Like regmove diversions, this is a temporary diversion that must be local to the EBB.	2017-10-04 17:02:09 -07:00
Jakob Stoklund Olesen	2888ff5bf3	Fix a corner case in fcvt_to_sint.i32.f64 legalization. An f64 can represent multiple values in the range INT_MIN-1 < x <= INT_MIN which all truncate to INT_MIN, so comparing the input value against INT_MIN is not good enough. Instead, detect overflow on x <= INT_MIN-1 when INT_MIN-1 is an exact floating point value.	2017-09-28 14:24:39 -07:00

1 2 3 4 5 ...

253 Commits