wasmtime

Author	SHA1	Message	Date
Dan Gohman	cc8d6400f4	Rename builder.rs to inst_builder.rs. This reflects its purpose, to define the `InstBuilder` trait.	2018-03-14 10:51:09 -07:00
Dan Gohman	3afe85ff17	Auto-generate `InstructionData`. The meta description has all the information to generate the `InstructionData` enum, so generate it rather than having a manually-maintained copy.	2018-03-14 10:51:09 -07:00
Dan Gohman	11eddafef8	Avoid using floating-point values in expand_fcvt_to_sint. Compute the bound values for expand_fcvt_to_sint using bitwise integer arithmetic rather than floating-point arithmetic, to avoid relying on host floating point arithmetic.	2018-03-12 12:50:58 -07:00
Dan Gohman	f04e02c0a1	Clarify comments about Conventional SSA form. Captialize "Conventional" so that it's clear that "Conventional SSA" is a specific concept being referenced.	2018-03-12 12:38:30 -07:00
Dan Gohman	30f8daa9d6	Replace `assert!` with `debug_assert!` in production code paths. This allows the assertions to be disabled in release builds, so that the code is faster and smaller, at the expense of not performing the checks. Assertions can be re-enabled in release builds with the debug-assertions flag in Cargo.toml, as the top-level Cargo.toml file does.	2018-03-12 12:38:30 -07:00
Dan Gohman	e81a27fb5d	Implement Debug for `ArgAction` and `Affinity`.	2018-03-12 12:38:30 -07:00
Dan Gohman	8df9fe6c87	Fix obsolete paths in comments.	2018-03-09 15:08:54 -08:00
Dan Gohman	40ec50d0b6	Don't relax a branch to have different input constraints. When relaxing a branch, restrict the set of candidate encodings to those which have the same input constraints as the original encoding choice. This prevents situations where relaxation prefers a non-REX-prefixed encoding over a REX prefixed one because the end of the instruction can be one byte closer to the destination, in a situation where the encoding needs to be REX-prefixed because of one of the operand registers. This also makes the Context class perform encoding verification after relaxation, to catch similar problems in the future. Fixes #256.	2018-03-08 02:34:41 -08:00
Dan Gohman	6cf9bf36b8	Fix a typo in a comment.	2018-03-08 02:26:15 -08:00
Dan Gohman	bf480c341b	Use https rather than http for several URLs.	2018-03-05 06:55:27 -08:00
Dan Gohman	c59e9180de	Tidy up whitespace.	2018-03-05 06:55:27 -08:00
Dan Gohman	13b167770c	Include emergency stack slots when laying out the stack. Emergency stack slots are a new kind of stack slot added relatively recently. They need to be allocated a stack offset just like explicit and spill slots. Also, make StackSlotData's offset field an Option, to catch problems like this in the future. Previously the value 0 was used when offsets weren't assigned yet, however that made it non-obvious when the field meant "not assigned yet" and when it meant "assigned the value 0".	2018-03-04 21:34:49 -08:00
Dan Gohman	a301280d94	Change the stack alignment for 32-bit x86 to 16. Spiderwasm on 32-bit x86 always uses a 16-byte-aligned stack pointer. Change the setting for the "native" convention as well, for compatibility with Linux and Darwin ABIs, and so that if a platform has different ABI rules, the problem will be detected in code emitted by Cretonne, rather than somewhere else.	2018-03-04 21:34:49 -08:00
Dan Gohman	aac006ed49	Avoid trivial numeric casts.	2018-03-04 21:34:49 -08:00
Dan Gohman	505fe9277a	Tidy up calls to `analyze_branch`.	2018-03-04 21:34:49 -08:00
Bruce Mitchener	125270e2b0	Fix some typos.	2018-03-04 21:33:04 -08:00
Dan Gohman	5dc449ec9e	Rename "local variables" to "explicit stack slots". The term "local variables" predated the SSA builder in the front-end crate, which also provides a way to implement source-language local variables. The name "explicit stack slot" makes it clear what this construct is.	2018-02-28 14:04:28 -08:00
Dan Gohman	b9f51d7850	Enable more compliler lints.	2018-02-28 13:18:07 -08:00
Dan Gohman	227baaadb8	Enable the simple_gvn and licm passes at OptLevel::Best.	2018-02-28 11:50:59 -08:00
Julian Seward	7054f25abb	Adds support to transform integer div and rem by constants into cheaper equivalents. Adds support for transforming integer division and remainder by constants into sequences that do not involve division instructions. * div/rem by constant powers of two are turned into right shifts, plus some fixups for the signed cases. * div/rem by constant non-powers of two are turned into double length multiplies by a magic constant, plus some fixups involving shifts, addition and subtraction, that depends on the constant, the word size and the signedness involved. * The following cases are transformed: div and rem, signed or unsigned, 32 or 64 bit. The only un-transformed cases are: unsigned div and rem by zero, signed div and rem by zero or -1. * This is all incorporated within a new transformation pass, "preopt", in lib/cretonne/src/preopt.rs. * In preopt.rs, fn do_preopt() is the main driver. It is designed to be extensible to transformations of other kinds of instructions. Currently it merely uses a helper to identify div/rem transformation candidates and another helper to perform the transformation. * In preopt.rs, fn get_div_info() pattern matches to find candidates, both cases where the second arg is an immediate, and cases where the second arg is an identifier bound to an immediate at its definition point. * In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the transformation proper. It in turn uses magic{S,U}{32,64} to calculate the magic numbers required for the transformations. * There are many test cases for the transformation proper: filetests/preopt/div_by_const_non_power_of_2.cton filetests/preopt/div_by_const_power_of_2.cton filetests/preopt/rem_by_const_non_power_of_2.cton filetests/preopt/rem_by_const_power_of_2.cton filetests/preopt/div_by_const_indirect.cton preopt.rs also contains a set of tests for magic number generation. * The main (non-power-of-2) transformation requires instructions that return the high word of a double-length multiply. For this, instructions umulhi and smulhi have been added to the core instruction set. These will map directly to single instructions on most non-intel targets. * intel does not have an instruction exactly like that. For intel, instructions x86_umulx and x86_smulx have been added. These map to real instructions and return both result words. The intel legaliser will rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half word. Tests: filetests/isa/intel/legalize-mulhi.cton (new file) filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)	2018-02-28 11:41:36 -08:00
Dan Gohman	d71756f298	Delete an extraneous file.	2018-02-28 03:43:55 -08:00
Dan Gohman	af154655d7	Replace `as` casts with type-conversion functions. https://github.com/rust-lang-nursery/rust-clippy/wiki#cast_lossless	2018-02-26 15:25:46 -08:00
Dan Gohman	e943d932b9	Change `match self` to `match *self` to avoid adding `&` to all patterns in a match. https://rust-lang-nursery.github.io/rust-clippy/v0.0.186/index.html#match_ref_pats	2018-02-26 15:25:39 -08:00
Dan Gohman	a55a582d4e	Suppress printing of placeholder external functions. With the change to the parser to preserve indices, it now inserts placeholders to pad out index spaces as needed. Placeholder functions use reserved signature indices, so skip them when writing them out, to avoid writing them out as "sig4294967295".	2018-02-21 13:10:58 -08:00
Dan Gohman	20500913ad	Change tabs to spaces, for consistency with rustfmt. This code is in a macro, which is presumably why rustfmt itself doesn't reformat it.	2018-02-21 12:32:26 -08:00
Jakob Stoklund Olesen	043d80f7e1	Make the bitset module private. This is just an implementation detail.	2018-02-21 12:14:50 -08:00
Jakob Stoklund Olesen	eae8261b07	Make the regalloc module private. Cretonne clients don't need to know how the register allocator works. Export the RegDiversions type from the binemit module instead. It is used by the "test binemit" driver.	2018-02-21 12:11:50 -08:00
Jakob Stoklund Olesen	2f58c371bc	Make specific ISA sub-modules private. We don't want ISA-specific details exposed in the public Cretonne APIs.	2018-02-21 12:06:58 -08:00
Jakob Stoklund Olesen	8d388b2218	Fix stack pointer offsets for outgoing arguments. StackSlotKind::OutgoingArg stack slots have an offset that is relative to our own stack pointer, while all other stack slot kinds have offsets that are relative to the caller's stack pointer. Make sure we generate the right sp-relative offsets for outgoing arguments too.	2018-02-21 10:34:41 -08:00
Dan Gohman	a5b00b173e	Don't renumber entities in the parser. This makes it easier to debug testcases: - the entity numbers in a .cton file match the entity numbers used within Cretonne. - serializing and deserializing doesn't cause indices to change. One disadvantage is that if a .cton file uses sparse entity numbers, deserializing to the in-memory form doesn't compact it. However, the text format is not intended to be performance-critical, so this isn't expected to be a big burden.	2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen	c846ec1626	Catch NaN explicitly in expand_fcvt_to_uint(). When the input is a NaN, we need to generate a different trap code, so use the new trapff instruction to generate such a trap after the first floating point comparison.	2018-02-20 14:44:57 -08:00
Jakob Stoklund Olesen	b9b1d0fcd5	Add a trapff instruction. This is the floating point equivalent of trapif: Trap when a given condition is in the floating-point flags. Define Intel encodings comparable to the trapif encodings.	2018-02-20 14:35:41 -08:00
Jakob Stoklund Olesen	1e56d44465	Remember the positive numbers in expand_fcvt_to_sint. We can get an INT_MIN result when converting a positive number that is too large to fit in an integer too.	2018-02-20 14:01:13 -08:00
Dan Gohman	21215529fe	Simplify the code in make_inst_results_reusing; NFC.	2018-02-20 12:15:23 -08:00
Jakob Stoklund Olesen	a9e799debb	Add an avoid_div_traps setting. This enables code generation that never causes a SIGFPE signal to be raised from a division instruction. Instead, division and remainder calculations are protected by explicit traps.	2018-02-16 13:10:29 -08:00
Pat Hickey	ed24320eda	gen_settings: dont try to display a Preset descriptor in Flags (#241 ) * gen_settings: dont try to display a Preset descriptor in Flags Trying to display a preset doesnt make sense, and before this commit it does not display anything meaningful - the printout just says e.g. "haswell =\n". The offset byte a preset descriptor isnt a valid offset into the flag bytes, it is actually an offset into the PRESETS table. It will cause a panic when the offset is out of bounds for the flag bytes, which happens in the intel isa as of this commit. * intel settings: test that display impl doesnt panic	2018-02-14 11:51:40 -08:00
Jakob Stoklund Olesen	3ccc3f4f9b	Add a stack_check instruction. This instruction loads a stack limit from a global variable and compares it to the stack pointer, trapping if the stack has grown beyond the limit. Also add a expand_flags transform group containing legalization patterns for ISAs with CPU flags. Fixes #234.	2018-02-13 10:48:06 -08:00
Jakob Stoklund Olesen	a73fcb2691	Pass an ISA argument to legalization functions. This lets them look at the ISA flags.	2018-02-13 10:42:00 -08:00
Jakob Stoklund Olesen	788a78caf4	Add Intel encodings for ifcmp_sp. Also generate an Into<RegUnit> implementation for the RU enums.	2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen	69f70fc61d	Add Intel encodings for trapif. This is implemented as a macro with a conditional jump over a ud2. This way, we don't have to split up EBBs at every conditional trap.	2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen	11c721934c	Add a trapif instruction. This is a conditional trap controlled by integer CPU flags. Compare to brif.	2018-02-08 14:40:46 -08:00
Julian Seward	6f8a54b6a5	Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets. Changes: * Adds a new generic instruction, SELECTIF, that does value selection (a la conditional move) similarly to existing SELECT, except that it is controlled by condition code input and flags-register inputs. * Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and nothing else. * Adds new Intel x86_64 instructions BSR and BSF. * Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets using the new BSR, BSF and SELECTIF instructions. * Implements SELECTIF on x86_64 targets using conditional-moves. * new test filetests/isa/intel/baseline_clz_ctz_popcount.cton (for legalization) * new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton (for encoding) * Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified Rust without rustc complaining. Fixes #238.	2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen	e3714ddd10	Add a func.inst_offsets() iterator. This Function method can be used after the final code layout has been computed. It returns all the instructions in an EBB along with their encoded size and offset from the beginning of the function. This is useful for extracting additional metadata about trapping instructions and other things that may be needed by a VM.	2018-02-01 17:19:05 -08:00
Jakob Stoklund Olesen	429027e2f2	Stack alignment is 16 bytes on x86_64. Make sure that the stack frame size is always a multiple of 16 as specified by the x86_64 ABI. Heads up @pchickey	2018-01-31 16:19:35 -08:00
Pat Hickey	099b959d57	TargetIsa implies a Display of shared and isa-specific flags	2018-01-29 13:53:08 -08:00
Pat Hickey	3f69581d03	cretonne::Context: add for_function constructor	2018-01-25 18:14:57 -08:00
Jakob Stoklund Olesen	d56ce9e8bf	Optimize DomForest::push_node(). The dominator tree pre-order is defined at the EBB granularity, but we are looking for dominating nodes at the instruction level. This means that we sometimes need to look higher up the DomForest stack for a dominating node, using DominatorTree::dominates() instead of DominatorTreePreorder::dominates(). Each dominance check involves the domtree.last_dominator() function scanning up the dominator tree, starting from the new node that was pushed. We can eliminate this duplicate work by exposing the last_dominator() function to push_node(). As we are searching through nodes on the stack, maintain a last_dom program point representing the previous return value from last_dominator(). This way, we're only scanning the dominator tree once.	2018-01-24 15:01:58 -08:00
Jakob Stoklund Olesen	416b21c18d	Pairwise virtual register coalescing. Use a better algorithm for resolving interferences in virtual registers. This improves code quality by generating much fewer copies on some complicated functions. After the initial union-find phase, the check_vreg() function uses a Budimlic forest to check for interference between the values in the virtual registers, as before. All the interference-free vregs are done. Others are passed to synthesize_vreg() which dissolves the vreg and then attempts to rebuild one or more vregs from the contained values. The pairwise interference checks use virtual copies to make sure that any future conflicts can be resolved by inserting a copy instruction. This technique was not present in the old coalescer which caused some correctness issues. This coalescing algorithm makes much better code, and it is generally a bit slower than before. Some of the slowdown is made up by the following passes being faster because they have to process less code. Example 1, the Python interpreter which contains a very large function with a lot of variables. Before: 15.664 0.011 Register allocation 1.535 1.535 RA liveness analysis 2.872 1.911 RA coalescing CSSA 4.436 4.436 RA spilling 2.610 2.598 RA reloading 4.200 4.199 RA coloring After: 9.795 0.013 Register allocation 1.372 1.372 RA liveness analysis 6.231 6.227 RA coalescing CSSA 0.712 0.712 RA spilling 0.598 0.598 RA reloading 0.869 0.869 RA coloring Coalescing is more than twice as slow, but because of the vastly better code quality, overall register allocation time is improved by 37%. Example 2, the clang compiler. Before: 57.148 0.035 Register allocation 9.630 9.630 RA liveness analysis 7.210 7.169 RA coalescing CSSA 9.972 9.972 RA spilling 11.602 11.572 RA reloading 18.698 18.672 RA coloring After: 64.792 0.042 Register allocation 8.630 8.630 RA liveness analysis 22.937 22.928 RA coalescing CSSA 8.684 8.684 RA spilling 9.559 9.551 RA reloading 14.939 14.936 RA coloring Here coalescing is 3x slower, but overall regalloc time only regresses by 13%. Most examples are less extreme than these two. They just get better code at about the same compile time.	2018-01-24 14:54:32 -08:00
Jakob Stoklund Olesen	b124eaf77d	Add a preorder pre_cmp_def() function. This provides a total ordering of values according to when their definition appears in the dominator tree pre-order.	2018-01-23 14:55:21 -08:00
Jakob Stoklund Olesen	d2e786a78a	Resurrect the VirtRegs::unify() function. The coalescer needs to be able to merge virtual registers with this function. Updated to recycle all virtual registers.	2018-01-23 14:21:36 -08:00

1 2 3 4 5 ...

679 Commits