wasmtime

Author	SHA1	Message	Date
Dan Gohman	0e57f3d0ea	Add a "colocated" flag to symbol references. (#298 ) This adds a "colocated" flag to function and symbolic global variables which indicates that they are defined along with the current function, so they can use PC-relative addressing. This also changes the function decl syntax; the name now always precedes the signature, and the "function" keyword is no longer included.	2018-04-13 15:00:09 -07:00
Dan Gohman	1c760ab179	Rename intel to x86. x86 is the more accurate name, as there are non-Intel x86 implementations. Fixes #263.	2018-04-12 10:02:16 -07:00
Dan Gohman	18b2f12150	Save/restore callee-saved registers used in regmove/regfill. The regmove and regfill instructions temporarily divert a value's location, and these temporary diversions are not reflected in `func.locations`. For now, make an extra scan through the instructions of the function to find any regmove or regfill instructions in order to find all used callee-saved registers. This fixes #296.	2018-04-09 22:33:54 -07:00
Dan Gohman	eab57c0a40	Use large-model addressing for calls when in non-PIC mode. The main use for non-PIC code at present is JIT code, and JIT code can live anywhere in memory and reference other symbols defined anywhere in memory, so it needs to use the "large" code model. func_addr and globalsym_addr instructions were already using `movabs` to support arbitrary 64-bit addresses, so this just makes calls be legalized to support arbitrary 64-bit addresses also.	2018-04-08 22:37:35 -07:00
Dan Gohman	b0d414731c	The addend for a PCRel4 reloc should be -4 too.	2018-04-07 06:15:33 -07:00
Tyler McMullen	775c674b38	Only save callee-saved registers that are used (#293 ) * Only save callee-saved registers that are actually being used. * Rename AllocatableSet to RegisterSet * Style cleanup and small renames for readability. * Adjust x86 prologue-epilogue test to account for callee-saved register optimization. * Add more tests for prologue-epilogue optimizations.	2018-04-03 14:44:12 -07:00
Dan Gohman	e107793b68	Pre-opt: Use the correct operand in the irsub_imm pattern.	2018-03-30 21:02:26 -07:00
Dan Gohman	9e4ab7dc86	Rename CallConv::Native to CallConv::SystemV. (#291 ) To keep cross-compiling straightforward, Cretonne shouldn't have any behavior that depends on the host. This renames the "Native" calling convention to "SystemV", which has a defined meaning for each target, so that it's clear that the calling convention doesn't change depending on what host Cretonne is running on.	2018-03-30 12:32:14 -07:00
Dan Gohman	6606b88136	Optimize immediates and compare and branch sequences (#286 ) * Add a pre-opt optimization to change constants into immediates. This converts 'iadd' + 'iconst' into 'iadd_imm', and so on. * Optimize away redundant `bint` instructions. Cretonne has a concept of "Testable" values, which can be either boolean or integer. When the an instruction needing a "Testable" value receives the result of a `bint`, converting boolean to integer, eliminate the `bint`, as it's redundant. * Postopt: Optimize using CPU flags. This introduces a post-legalization optimization pass which converts compare+branch sequences to use flags values on CPUs which support it. * Define a form of x86's `urm` that doesn't clobber FLAGS. movzbl/movsbl/etc. don't clobber FLAGS; define a form of the `urm` recipe that represents this. * Implement a DCE pass. This pass deletes instructions with no side effects and no results that are used. * Clarify ambiguity about "32-bit" and "64-bit" in comments. * Add x86 encodings for icmp_imm. * Add a testcase for postopt CPU flags optimization. This covers the basic functionality of transforming compare+branch sequences to use CPU flags. * Pattern-match irsub_imm in preopt.	2018-03-30 12:30:07 -07:00
Tyler McMullen	951ff11f85	[WIP] Add a Trap sink to code generation (#279 ) * First draft of TrapSink implementation. * Add trap sink calls to 'trapif' and 'trapff' recipes. * Add SourceLoc to trap sink calls, and add trap sink calls to all loads and stores. * Add IntegerDivisionByZero trap to div recipe. * Only emit load/store traps if 'notrap' flag is not set on the instruction. * Update filetest machinery to add new trap sink functionality. * Update filetests to include traps in output. * Add a few more trap outputs to filetests. * Add trap output to CLI tool.	2018-03-28 22:48:03 -07:00
Dan Gohman	e5ec7242cc	Fix handling of value aliases, and re-enable LICM. Value aliases aren't instructions, so they don't have a location in the CFG, so it's not meaningful to query whether a value alias is defined within a loop.	2018-03-28 22:06:52 -07:00
Dan Gohman	db2be8ee01	Verifier: Diagnose an instruction using its own result values.	2018-03-28 21:27:05 -07:00
Dan Gohman	23ab07b54e	Support legalizing bconst instructions on x86.	2018-03-28 14:11:16 -07:00
Dan Gohman	79f02e42dd	Use movss/movsd rather than movd/movq for floating-point loads and stores. While there may be CPUs that have a domain crossing penalty here, this also helps the generated code look more like the code produced by other compilers.	2018-03-27 11:53:59 -07:00
Dan Gohman	ffe89cdc0a	Rename %eflags to %rflags. EFLAGS is a subregister of RFLAGS. For consistency with GPRs where we use the 64-bit names to refer to the registers, use the 64-bit name for RFLAGS as well.	2018-03-27 11:52:57 -07:00
Dan Gohman	685cde98a4	Mark loads from globals `aligned` and `notrap`. Mark loads from globals generated by cton_wasm or by legalization as `aligned` and `notrap`, since memory for these globals should be allocated by the runtime environment for that purpose. This reduces the number of potentially trapping instructions, which can reduce the amount of metadata required by embedding environments.	2018-03-26 21:21:54 -07:00
Pat Hickey	80d2c5d9bf	Implement shift-immediate encodings for x86 (#283 ) * add x86 encodings for shift-immediate instructions implements encodings for ishl_imm, sshr_imm, and ushr_imm. uses 8-bit immediates. added tests for the encodings to intel/binary64.cton. Canonical versions come from llvm-mc. * translate test to use shift-immediates * shift immediate encodings: use enc_i32_i64 and note why the regular shift encodings cant use it above * add additional encoding tests for shift immediates this covers 32 bit mode, and 64 bit operations in 64 bit mode.	2018-03-26 16:48:20 -07:00
Dan Gohman	136d6f5c4b	Implement ireduce, sextend, and uextend between i8/i16 and i32/i64.	2018-03-05 15:13:59 -08:00
Dan Gohman	5dc449ec9e	Rename "local variables" to "explicit stack slots". The term "local variables" predated the SSA builder in the front-end crate, which also provides a way to implement source-language local variables. The name "explicit stack slot" makes it clear what this construct is.	2018-02-28 14:04:28 -08:00
Julian Seward	7054f25abb	Adds support to transform integer div and rem by constants into cheaper equivalents. Adds support for transforming integer division and remainder by constants into sequences that do not involve division instructions. * div/rem by constant powers of two are turned into right shifts, plus some fixups for the signed cases. * div/rem by constant non-powers of two are turned into double length multiplies by a magic constant, plus some fixups involving shifts, addition and subtraction, that depends on the constant, the word size and the signedness involved. * The following cases are transformed: div and rem, signed or unsigned, 32 or 64 bit. The only un-transformed cases are: unsigned div and rem by zero, signed div and rem by zero or -1. * This is all incorporated within a new transformation pass, "preopt", in lib/cretonne/src/preopt.rs. * In preopt.rs, fn do_preopt() is the main driver. It is designed to be extensible to transformations of other kinds of instructions. Currently it merely uses a helper to identify div/rem transformation candidates and another helper to perform the transformation. * In preopt.rs, fn get_div_info() pattern matches to find candidates, both cases where the second arg is an immediate, and cases where the second arg is an identifier bound to an immediate at its definition point. * In preopt.rs, fn do_divrem_transformation() does the heavy lifting of the transformation proper. It in turn uses magic{S,U}{32,64} to calculate the magic numbers required for the transformations. * There are many test cases for the transformation proper: filetests/preopt/div_by_const_non_power_of_2.cton filetests/preopt/div_by_const_power_of_2.cton filetests/preopt/rem_by_const_non_power_of_2.cton filetests/preopt/rem_by_const_power_of_2.cton filetests/preopt/div_by_const_indirect.cton preopt.rs also contains a set of tests for magic number generation. * The main (non-power-of-2) transformation requires instructions that return the high word of a double-length multiply. For this, instructions umulhi and smulhi have been added to the core instruction set. These will map directly to single instructions on most non-intel targets. * intel does not have an instruction exactly like that. For intel, instructions x86_umulx and x86_smulx have been added. These map to real instructions and return both result words. The intel legaliser will rewrite {s,u}mulhi into x86_{s,u}mulx uses that throw away the lower half word. Tests: filetests/isa/intel/legalize-mulhi.cton (new file) filetests/isa/intel/binary64.cton (added x86_{s,u}mulx encoding tests)	2018-02-28 11:41:36 -08:00
Dan Gohman	ab9298eafa	Make the `fst` recipe use the deref-safe register class as well.	2018-02-28 10:12:40 -08:00
Dan Gohman	2a26b70854	Update URLs.	2018-02-23 16:16:44 -08:00
Dan Gohman	a55a582d4e	Suppress printing of placeholder external functions. With the change to the parser to preserve indices, it now inserts placeholders to pad out index spaces as needed. Placeholder functions use reserved signature indices, so skip them when writing them out, to avoid writing them out as "sig4294967295".	2018-02-21 13:10:58 -08:00
Jakob Stoklund Olesen	8d388b2218	Fix stack pointer offsets for outgoing arguments. StackSlotKind::OutgoingArg stack slots have an offset that is relative to our own stack pointer, while all other stack slot kinds have offsets that are relative to the caller's stack pointer. Make sure we generate the right sp-relative offsets for outgoing arguments too.	2018-02-21 10:34:41 -08:00
Dan Gohman	10dcfcacdb	Remove support for entity variables in filecheck. Now that the parser doesn't renumber indices, there's no need for entity variables like $v0.	2018-02-20 17:27:46 -08:00
Dan Gohman	a5b00b173e	Don't renumber entities in the parser. This makes it easier to debug testcases: - the entity numbers in a .cton file match the entity numbers used within Cretonne. - serializing and deserializing doesn't cause indices to change. One disadvantage is that if a .cton file uses sparse entity numbers, deserializing to the in-memory form doesn't compact it. However, the text format is not intended to be performance-critical, so this isn't expected to be a big burden.	2018-02-20 17:27:46 -08:00
Jakob Stoklund Olesen	b9b1d0fcd5	Add a trapff instruction. This is the floating point equivalent of trapif: Trap when a given condition is in the floating-point flags. Define Intel encodings comparable to the trapif encodings.	2018-02-20 14:35:41 -08:00
Jakob Stoklund Olesen	ad896d9790	Add more legalization patterns for *_imm instructions. When the imediate value is out of range for the legal encodings, convert these instructions to an iconst followed by their register counterparts.	2018-02-20 10:47:46 -08:00
Jakob Stoklund Olesen	a9e799debb	Add an avoid_div_traps setting. This enables code generation that never causes a SIGFPE signal to be raised from a division instruction. Instead, division and remainder calculations are protected by explicit traps.	2018-02-16 13:10:29 -08:00
Jakob Stoklund Olesen	3ccc3f4f9b	Add a stack_check instruction. This instruction loads a stack limit from a global variable and compares it to the stack pointer, trapping if the stack has grown beyond the limit. Also add a expand_flags transform group containing legalization patterns for ISAs with CPU flags. Fixes #234.	2018-02-13 10:48:06 -08:00
Jakob Stoklund Olesen	60e70da0e6	Add Intel encodings for ifcmp_imm. The instruction set has variants with 8-bit and 32-bit signed immediate operands. Add a TODO to use a TEST instruction for the special case ifcmp_imm x, 0.	2018-02-13 10:38:46 -08:00
Jakob Stoklund Olesen	788a78caf4	Add Intel encodings for ifcmp_sp. Also generate an Into<RegUnit> implementation for the RU enums.	2018-02-09 14:32:29 -08:00
Jakob Stoklund Olesen	69f70fc61d	Add Intel encodings for trapif. This is implemented as a macro with a conditional jump over a ud2. This way, we don't have to split up EBBs at every conditional trap.	2018-02-08 15:15:15 -08:00
Jakob Stoklund Olesen	11c721934c	Add a trapif instruction. This is a conditional trap controlled by integer CPU flags. Compare to brif.	2018-02-08 14:40:46 -08:00
Julian Seward	6f8a54b6a5	Adds support for legalizing CLZ, CTZ and POPCOUNT on baseline x86_64 targets. Changes: * Adds a new generic instruction, SELECTIF, that does value selection (a la conditional move) similarly to existing SELECT, except that it is controlled by condition code input and flags-register inputs. * Adds a new Intel x86_64 variant, 'baseline', that supports SSE2 and nothing else. * Adds new Intel x86_64 instructions BSR and BSF. * Implements generic CLZ, CTZ and POPCOUNT on x86_64 'baseline' targets using the new BSR, BSF and SELECTIF instructions. * Implements SELECTIF on x86_64 targets using conditional-moves. * new test filetests/isa/intel/baseline_clz_ctz_popcount.cton (for legalization) * new test filetests/isa/intel/baseline_clz_ctz_popcount_encoding.cton (for encoding) * Allow lib/cretonne/meta/gen_legalizer.py to generate non-snake-caseified Rust without rustc complaining. Fixes #238.	2018-02-06 09:43:00 -08:00
Jakob Stoklund Olesen	1bbc529ef9	Improve the variable ordering used by the coloring constraint solver. The fuzzer bugs #219 and #227 are both cases where the register allocator coloring pass "runs out of registers". What's really happening is that the constraint solver failed to find a solution, even when one existed. Suppose we have three solver variables: v0(GPR, out, global) v1(GPR, in) v2(GPR, in, out) And suppose registers %r0 and %r1 are available on both input and output sides of the instruction, but only %r1 is available for global outputs. A valid solution would be: v0 -> %r1 v1 -> %r1 v2 -> %r0 However, the solver would pick registers for the three values in numerical order because v1 and v2 have the same domain size (=2). This would assign v1 -> %r0 and then fail to find a free register for v2. Fix this by prioritizing in+out variables over single-sided variables even when their domains are equal. This means the v2 gets assigned a register before v1, and it gets a chance to pick a register that is still available on both in and out sides. Also try to avoid depending on value numbers in the solver. These bugs were hard to reproduce because a test case invariably would have different value numbers, causing the solver to order its variables differently and succeed. Throw in the previous solution and original register assignments as tie breakers which are stable and not dependent on value numbers. This is still not a substitute for a proper solver search algorithm that we will probably have to write eventually. Fixes #219 Fixes #227	2018-01-19 13:31:26 -08:00
Tyler McMullen	14e39db428	Add filetest for statically out-of-bound heap addresses.	2018-01-18 15:49:10 -08:00
Tyler McMullen	df210bfdea	Fix the Intel x64 PIC 'call' test, adding correct addend.	2018-01-18 14:23:00 -08:00
Jakob Stoklund Olesen	1e49431804	Add test case from #216 . The error exposed by this test case no longer happens after the coalescer was rewritten to to follow the Budimlic paper. It's still a good coalescer test. Fixes #216 by including the test case.	2018-01-17 16:19:51 -08:00
Jakob Stoklund Olesen	dcad3fa339	Fix coloring bug with combined constraints and global values. The Intel instruction "v1 = ushr v2, v2" will implicitly fix the output register for v2 to %rcx because the output is tied to the first input operand and the second input operand is fixed to %rcx. Make sure we handle this transitive constraint when checking for interference with the globally live registers. Fixes #218	2018-01-17 15:51:08 -08:00
Jakob Stoklund Olesen	0a6500c99a	Avoid making solver variables for fixed input constraints. When the coloring pass sees an instruction with a fixed input register constraint that is already satisfied, make sure to tell the solver about it anyway. There are situations where the solver wants to convert a value to a solver variable, and we can't allow that if the same value is also used for a fixed register operand. Fixes #221.	2018-01-17 15:01:00 -08:00
Jakob Stoklund Olesen	13af22b46b	Track register pressure for dead EBB parameters. The spiller wasn't tracking register pressure correctly for dead EBB parameters in visit_ebb_header(). Make sure we free any dead EBB parameters. Fixes #223	2018-01-17 13:19:08 -08:00
Jakob Stoklund Olesen	d1f236b00a	Reimplement coalescer following the Budimlic paper. The old coalescing algorithm had some algorithmic complexity issues when dealing with large virtual registers. Reimplement to use a proper union-find algorithm so we only need one pass through the dominator forests for virtual registers that are interference free. Virtual registers that do have interference are split and new registers built. This pass is about twice as fast as the old one when dealing with complex virtual registers.	2018-01-16 12:32:04 -08:00
Jakob Stoklund Olesen	cacba1a58f	Don't allow EBB parameters to be ghost values. Ghost instructions and values are supposed to be stored as metadata alongside the compiled program such that the ghost values can be computed from the real register/stack values when the program is stopped for debugging or de-optimization. If we allow an EBB parameter to be a ghost value, we have no way of computing its real value using ghost instructions. We would need to know a complete execution trace of the stopped program to figure out which values were passed to the ghost parameter. Instead we require EBB parameters to be real values materialized in registers or on the stack. We use the regclass_for_abi_type() TargetIsa callback to determine the initial register class for these parameters. They can then be spilled later if needed. Fixes #215.	2018-01-11 16:48:02 -08:00
Jakob Stoklund Olesen	5e094034d4	Fix verifier bug in unreachable code. We want to disable dominance checks in unreachable code. The is_reachable() check for EBB parameter values was checking if the defining EBB was reachable, not the EBB using the value. This bug showed up in fuzzing and in #213.	2018-01-09 10:47:49 -08:00
Dan Gohman	4f53cc1dad	Align IntelGOTPCRel4 with R_X86_64_GOTPCREL. Add an addend field to reloc_external, and use it to move the responsibility for accounting for the difference between the end of an instruction (where the PC is considered to be in PC-relative on intel) and the beginning of the immediate field into the encoding code. Specifically, this makes IntelGOTPCRel4 directly correspond to R_X86_64_GOTPCREL, instead of also carrying an implicit `- 4`.	2017-12-15 16:17:32 -06:00
Dan Gohman	76e31cc1ad	Rename GotPCRel4 to GOTPCRel4. This emphasizes that GOT is being used as an abbreviation rather than the word "got".	2017-12-15 16:17:32 -06:00
Jakob Stoklund Olesen	febe8e0e51	Allow spilling of EBB arguments. When the spiller needs to make a register available for a conditional branch instruction, it can be necessary to spill some of the EBB arguments on the branch instruction. This is ok because EBB argument values belong to the same virtual register as the corresponding EBB parameter and we spill the whole virtreg to the same slot. Also make sure free_regs() can handle values that are killed by the current instruction and spilled.	2017-12-14 13:57:13 -06:00
Jakob Stoklund Olesen	d617d5e0f3	Use a domtree pre-order instead of a CFG RPO for coalescing. The stack implementation if the Budimlic dominator forest doesn't work correctly with a CFG RPO. It needs the domtree pre-order. Also handle EBB pre-order vs inst-level preorder. Manage the stack according to EBB dominance. Look for a dominating value by searching the stack. This is different from the Budimlic algorithm because we're computing the dominator tree pre-order with EBB granularity only. Fixes #207.	2017-12-13 16:22:01 -06:00
Jakob Stoklund Olesen	a825427786	Avoid reloading spilled EBB arguments. The coalescer makes sure that matching EBB arguments and parameters are always in the same virtual registers, and therefore also in the same stack slot if they are spilled. This means that the reload pass should never rewrite an EBB argument if the argument value is spilled. This comes up in cases where the branch instruction needs the same value in a register: brnz v9, ebb3(v9) If the virtual register containing v9 is spilled, the branch instruction must be reloaded like: v52 = fill v9 brnz v52, ebb3(v9) The branch register argument must be rewritten, and the EBB argument must be referring to the original stack value. Fixes #208.	2017-12-13 15:22:05 -06:00

... 13 14 15 16 17 ...

986 Commits