wasmtime

Author	SHA1	Message	Date
bjorn3	cc35f1e9bb	x64: Misc small integer fixes	2020-09-08 15:00:24 +02:00
bjorn3	ce033f2a0c	x64: Fix udiv and sdiv for 8bit integers	2020-09-08 15:00:24 +02:00
bjorn3	74642b166f	x64: Implement ineg and bnot	2020-09-08 15:00:24 +02:00
Anton Kirilov	e92f949663	AArch64: Introduce an enum for ternary integer operations This commit performs a small cleanup in the AArch64 backend - after the MAdd and MSub variants have been extracted, the ALUOp enum is used purely for binary integer operations. Also, Inst::Mov has been renamed to Inst::Mov64 for consistency. Copyright (c) 2020, Arm Limited.	2020-09-08 13:22:22 +01:00
Johnnie Birch	a64af55cda	Adds x64 packed negation for the new backend	2020-09-07 11:56:05 -07:00
Joey Gouly	650d48cd84	arm64: Don't always materialise a 64-bit constant This improves the mov/movk/movn sequnce when the high half of the 64-bit value is all zero. Copyright (c) 2020, Arm Limited.	2020-09-01 13:29:01 +01:00
Benjamin Bouvier	a7f7c23bf9	machinst aarch64: in baldrdash, allow returning only one value across register classes; Baldrdash's API requires that there is at most one result in a register, across all the possible register classes: in particular, it's not possible to return an i64 value in a register while returning an v128 value in another register. This patch adds a notion of "remaining register values", so this is properly taking into account when choosing whether a return value may be put into a register or not.	2020-08-31 12:36:26 +02:00
Julian Seward	8ac4bd1d0d	CL/newBE/x64: Lowering of scalar shifts: fix shift-by-imm generation The logic for generation of shifts-by-immediate was not quite right. The result was that even shifts by an amount known at compile time were being done by moving the shift immediate into %cl and then doing a variable shift by %cl. The effect is worse than it sounds, because all of those shift constants are small and often used in multiple places, so they were GVN'd up and often ended up at the entry block of the function. Hence these were connected to the use points by long live ranges which got spilled. So all in all, most of the win here comes from avoiding spilling. The problem was caused by this line, in the `Opcode::Ishl \| Opcode::Ushr ..` case: ``` let (count, rhs) = if let Some(cst) = ctx.get_constant(inputs[1].insn) { ``` `inputs[]` appears to refer to this CLIF instruction's inputs, and bizarrely `inputs[].insn` all refer to the instruction (the shift) itself. Hence `ctx.get_constant(inputs[1].insn)` asks "does this shift instruction produce a constant" to which the answer is always "no", so the shift-by-unknown amount code is always generated. The fix here is to change that expression to ``` let (count, rhs) = if let Some(cst) = ctx.get_input(insn, 1).constant { ``` `get_input`'s result conveniently includes a `constant` field of type `Option<u64>`, so we just use that instead.	2020-08-27 11:48:35 +02:00
Benjamin Bouvier	7c85654285	Address review comments.	2020-08-24 17:00:30 +02:00
Benjamin Bouvier	ee76e01efc	machinst: fix the pinned reg hack; The pinned register hack didn't work because the GetPinnedReg is marked as having side-effects, so that GVN wouldn't try to common it out. This commit tweaks the function used during lowering to vcode, so that the GetPinnedReg opcode is specially handled. It's a bit lame, but it makes the hack work again. Also, use_input needs to be a no-op for real registers.	2020-08-24 17:00:30 +02:00
Benjamin Bouvier	efff43e769	machinst x64: fold address modes on loads/stores;	2020-08-24 17:00:30 +02:00
Benjamin Bouvier	b830ee79de	machinst x64: commute operands of integer operations if one input is an immediate;	2020-08-24 17:00:30 +02:00
Benjamin Bouvier	cca10b87cb	machinst x64: optimize select/brz/brnz when the input is a comparison;	2020-08-24 17:00:30 +02:00
Julian Seward	620e4b4e82	This patch fills in the missing pieces needed to support wasm atomics on newBE/x64. It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`, `AtomicLoad`, `AtomicStore` and `Fence`. The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad` becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only complex case: it becomes a normal load, followed by a loop which computes an updated value, tries to `cmpxchg` it back to memory, and repeats if necessary. This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old value in memory is not required. Subsequent work could add that, if required. The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix` mechanism has been revised to handle multiple prefix bytes, not just one, since it is now sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock). In the aarch64 implementation of atomics, there has been some minor renaming for the sake of clarity, and for consistency with this x64 implementation.	2020-08-24 11:50:06 +02:00
Anton Kirilov	b895ac0e40	AArch64: Implement SIMD conversions Copyright (c) 2020, Arm Limited.	2020-08-21 18:03:50 +01:00
Andrew Brown	2767b2efc6	machinst x64: add `Inst::[move\|load\|store]` for choosing the correct x86 instruction This change primarily adds the ability to lower packed `[move\|load\|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move\|load\|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).	2020-08-20 12:37:22 -07:00
Andrew Brown	cf598dc35b	machinst x64: add packed moves for different vector types	2020-08-20 12:37:22 -07:00
Chris Fallin	debacec1c5	Merge pull request #2150 from jgouly/mul64s arm64: Implement SIMD i64x2 multiply	2020-08-20 11:57:56 -07:00
Chris Fallin	051feaad75	Merge pull request #2148 from bjorn3/aarch64_fix_put_input_in_rsa Fix put_input_in_reg	2020-08-20 11:41:35 -07:00
Chris Fallin	775dfa9df2	Merge pull request #1520 from bjorn3/aarch64-lower-small-fcvt_from_int Lower fcvt_from_{u,s}int for 8 and 16 bit ints	2020-08-20 11:35:06 -07:00
Joey Gouly	a518c10141	arm64: Implement SIMD i64x2 multiply Copyright (c) 2020, Arm Limited.	2020-08-20 13:26:03 +01:00
bjorn3	957eb9eeba	Less unnecessary zero and sign extensions	2020-08-20 10:17:04 +02:00
bjorn3	ba48b9aef1	Fix put_input_in_reg	2020-08-19 19:38:47 +02:00
Johnnie Birch	a31336996c	Add support for some packed multiplication for new x64 backend Adds support for i32x4, and i16x8 and lowering for pmuludq in preperation for i64x2.	2020-08-19 10:24:14 -07:00
bjorn3	4a84f3f073	Lower fcvt_from_{u,s}int for 8 and 16 bit ints	2020-08-19 18:07:12 +02:00
Chris Fallin	22181d0819	Use regalloc 0.0.30. This upgrade pulls in one memory-allocation reduction improvement (bytecodealliance/regalloc.rs#95). There should be no change in behavior as a result of this.	2020-08-18 09:51:35 -07:00
Chris Fallin	5fa0be3515	AArch64 ABI: properly store full 64-bit width of extended args/retvals. When storing an argument to a stack location for consumption by a callee, or storing a return value to an on-stack return slot for consumption by the caller, the ABI implementation was properly extending the value but was then performing a store with only the original width. This fixes the issue by always performing a 64-bit store of the extended value. Issue reported by @uweigand (thanks!).	2020-08-17 15:00:04 -07:00
Chris Fallin	ac6539abd7	Merge pull request #2128 from cfallin/machinst-abi-refactor-2 Refactor AArch64 ABI support to extract common bits for shared impl with x64.	2020-08-14 17:08:07 -07:00
Chris Fallin	5cf3fba3da	Refactor AArch64 ABI support to extract common bits for shared impl with x64. We have observed that the ABI implementations for AArch64 and x64 are very similar; in fact, x64's implementation started as a modified copy of AArch64's implementation. This is an artifact of both a similar ABI (both machines pass args and return values in registers first, then the stack, and both machines give considerable freedom with stack-frame layout) and a too-low-level ABI abstraction in the existing design. For machines that fit the mainstream or most common ABI-design idioms, we should be able to do much better. This commit factors AArch64 into machine-specific and machine-independent parts, but does not yet modify x64; that will come next. This should be completely neutral with respect to compile time and generated code performance.	2020-08-14 16:27:39 -07:00
Chris Fallin	3b007dd6a2	Upgrade to regalloc 0.0.29. This upgrade pulls in several recent changes in regalloc that should improve compile-time performance in the AArch64 and new x64 backends.	2020-08-13 11:40:34 -07:00
Johnnie Birch	38ef98700f	Adds packed integer subtraction	2020-08-12 09:41:20 -07:00
Nick Fitzgerald	5af47dc4cd	cranelift: Only emit stack maps when a function actually uses reference types This fix avoids a small slow down in scenarios where reference types are enabled but a given function doesn't actually use them. Fixes #1883	2020-08-07 16:54:51 -07:00
Nick Fitzgerald	00de2a6ab6	Merge pull request #2110 from fitzgen/peepmatic-parse-nesting-depth Peepmatic: Implement maximum nesting level in parser	2020-08-07 10:49:23 -07:00
Nick Fitzgerald	fdbc9e351f	Merge pull request #2111 from fitzgen/rename-stackmap-to-stack-map Rename "Stackmap" to "StackMap"	2020-08-07 10:46:38 -07:00
Nick Fitzgerald	174159a552	Bump `wast` to version 22.0.0 in peepmatic crates	2020-08-07 10:12:11 -07:00
Nick Fitzgerald	05bf9ea3f3	Rename "Stackmap" to "StackMap" And "stackmap" to "stack_map". This commit is purely mechanical.	2020-08-07 10:08:44 -07:00
Johnnie Birch	e60a6f2ad2	Fixup packed integer add lowering Remove stray print statement Fix bug in match statement causing unreachable code.	2020-08-06 22:25:18 -07:00
Johnnie Birch	f5909b37c3	Add emit tests for packed integer add instructions	2020-08-06 22:25:18 -07:00
Johnnie Birch	dd6ba5f9d7	Lower packed integer add instructions (v128) Adds lowering support for packed integer add instructions and helper function for determining if a type for an instruction indicates it is packed.	2020-08-06 22:25:18 -07:00
Johnnie Birch	2eadc6e2a8	Add packed integer add opcodes (v128) to instruction set enum	2020-08-06 22:25:18 -07:00
Anton Kirilov	1ec6930005	Enable the spec::simd::simd_lane test for AArch64 Copyright (c) 2020, Arm Limited.	2020-08-06 11:14:15 +01:00
Andrew Brown	4cb36afd7b	machinst x64: refactor to use types::[type] everywhere This change is a pure refactoring--no change to functionality. It removes `use crate::ir::types::*` imports and uses instead `types::I32`, e.g., throughout the x64 code. Though it increases code verbosity, this change makes it more clear where the type identifiers come from (they are generated by `cranelif-codegen-meta` so without a prefix it is difficult to find their origin), avoids IDE confusion (e.g. CLion flags the un-prefixed identifiers as errors), and avoids importing unwanted identifiers into the namespace.	2020-08-05 10:45:45 -07:00
Andrew Brown	8cfff26957	machinst x64: implement floating point comparisons Note that this fixes an encoding issue in which the packed single and packed double prefixes were flipped.	2020-08-04 13:24:38 -07:00
Andrew Brown	c21fe0eb73	machinst x64: use assert_eq! when possible	2020-08-04 09:18:45 -07:00
Andrew Brown	999e04a2c4	machinst x64: refactor imports to use rustfmt convention This change is a pure refactoring--no change to functionality. It removes newlines between the `use ...` statements in the x64 backend so that rustfmt can format them according to its convention. I noticed some files had followed a manual convention but subsequent additions did not seem to fit; this change fixes that and lightly coalesces some of the occurrences of `use a::b; use a::c;` into `use::{b, c}`.	2020-08-04 09:17:54 -07:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00
Alex Crichton	65eaca35dd	Refactor where results of compilation are stored (#2086 ) * Refactor where results of compilation are stored This commit refactors the internals of compilation in Wasmtime to change where results of individual function compilation are stored. Previously compilation resulted in many maps being returned, and compilation results generally held all these maps together. This commit instead switches this to have all metadata stored in a `CompiledFunction` instead of having a separate map for each item that can be stored. The motivation for this is primarily to help out with future module-linking-related PRs. What exactly "module level" is depends on how we interpret modules and how many modules are in play, so it's a bit easier for operations in wasmtime to work at the function level where possible. This means that we don't have to pass around multiple different maps and a function index, but instead just one map or just one entry representing a compiled function. Additionally this change updates where the parallelism of compilation happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`. This is another goal where `wasmtime-jit` will have more knowledge about module-level pieces with module linking in play. User-facing-wise this should be the same in terms of parallel compilation, though. The ultimate goal of this refactoring is to make it easier for the results of compilation to actually be a set of wasm modules. This means we won't be able to have a map-per-metadata where the primary key is the function index, because there will be many modules within one "object file". * Don't clear out fields, just don't store them Persist a smaller set of fields in `CompilationArtifacts` instead of trying to clear fields out and dynamically not accessing them.	2020-08-03 12:20:51 -05:00
Benjamin Bouvier	e108f14620	machinst x64: use xor/xorpss/xorpd to generate zero constants;	2020-07-31 13:17:52 -07:00
Chris Fallin	9a9b5015d0	Merge pull request #2081 from cfallin/aarch64-baldrdash-fix Aarch64: fix narrow integer-register extension with Baldrdash ABI.	2020-07-31 12:13:38 -07:00
Chris Fallin	dd09865611	Fix MachBuffer branch handling with redirect chains. When one branch target label in a MachBuffer is redirected to another, we eventually fix up branches targetting the first to refer to the redirected target instead. Separately, we have a branch-folding optimization that, when an unconditional branch occurs as the only instruction in a block (right at a label) and the previous instruction is also an unconditional branch (hence no fallthrough), we can elide that block entirely and redirect the label. Finally, we prevented infinite loops when resolving label aliases by chasing only one alias deep. Unfortunately, these three facts interacted poorly, and this is a result of our correctness arguments assuming a fully-general "redirect" that was not limited to one indirection level. In particular, we could have some label A that redirected to B, then remove the block at B because it is just a single branch to C, redirecting B to C. A would still redirect to B, though, without chasing to C, and hence a branch to B would fall through to the unrelated block that came after block B. Thanks to @bnjbvr for finding this bug while debugging the x64 backend and reducing a failure to the function in issue #2082. (This is a very subtle bug and it seems to have been quite difficult to chase; my apologies!) The fix is to (i) chase redirects arbitrarily deep, but also (ii) ensure that we do not form a cycle of redirects. The latter is done by very carefully checking the existing fully-resolved target of the label we are about to redirect to; if it resolves back to the branch that is causing this redirect, then we avoid making the alias. The comments in this patch make a slightly more detailed argument why this should be correct. Unfortunately we cannot directly test the CLIF that @bnjbvr reduced because we don't have a way to assert anything about the machine-code that comes after the branch folding and emission. However, the dedicated unit tests in this patch replicate an equivalent folding case, and also test that we handle branch cycles properly (as argued above). Fixes #2082.	2020-07-31 19:52:30 +02:00

1 2 3 4 5 ...

986 Commits