wasmtime

Author	SHA1	Message	Date
Alex Crichton	63a3bbbf5a	Change VMMemoryDefinition::current_length to `usize` (#3134 ) * Change VMMemoryDefinition::current_length to `usize` This commit changes the definition of `VMMemoryDefinition::current_length` to `usize` from its previous definition of `u32`. This is a pretty impactful change because it also changes the cranelift semantics of "dynamic" heaps where the bound global value specifier must now match the pointer type for the platform rather than the index type for the heap. The motivation for this change is that the `current_length` field (or bound for the heap) is intended to reflect the current size of the heap. This is bound by `usize` on the host platform rather than `u32` or` u64`. The previous choice of `u32` couldn't represent a 4GB memory because we couldn't put a number representing 4GB into the `current_length` field. By using `usize`, which reflects the host's memory allocation, this should better reflect the size of the heap and allows Wasmtime to support a full 4GB heap for a wasm program (instead of 4GB minus one page). This commit also updates the legalization of the `heap_addr` clif instruction to appropriately cast the address to the platform's pointer type, handling bounds checks along the way. The practical impact for today's targets is that a `uextend` is happening sooner than it happened before, but otherwise there is no intended impact of this change. In the future when 64-bit memories are supported there will likely need to be fancier logic which handles offsets a bit differently (especially in the case of a 64-bit memory on a 32-bit host). The clif `filetest` changes should show the differences in codegen, and the Wasmtime changes are largely removing casts here and there. Closes #3022 * Add tests for memory.size at maximum memory size * Add a dfg helper method	2021-08-02 13:09:40 -05:00
Andrew Brown	766774e1f5	refactor: reorganize crate imports	2021-07-26 13:39:16 -07:00
Andrew Brown	6b86984c41	x64: avoid load-coalescing SIMD operations with non-aligned loads Fixes #2943, though not as optimally as may be desired. With x64 SIMD instructions, the memory operand must be aligned--this change adds that check. There are cases, however, where we can do better--see #3106.	2021-07-26 13:39:16 -07:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
StackDoubleFlow	9637bc5a09	Fix cranelift `Module` and `ObjectModule` docs links (#2852 )	2021-04-21 06:29:02 -07:00
Chris Fallin	48d542d67c	Fix bad jumptable block ref when DCE removes a block. When a block is unreachable, the `unreachable_code` pass will remove it, which is perfectly sensible. Jump tables factor into unreachability in an expected way: even if a block is listed in a jump table, the block might be unreachable if the jump table itself is unused (or used in an unreachable block). Unfortunately, the verifier still expects all block refs in all jump tables to be valid, even after DCE, which will not always be the case. This makes a simple change to the pass: after removing blocks, it scans jump tables. Any jump table that refers to an unreachable block must itself be unused, and so we just clear its entries. We do not bother removing it (and renumbering all later jumptables), and we do not bother computing full unused-ness of all jumptables, as that would be more expensive; it's sufficient to clear out the ones that refer to unreachable blocks, which are a subset of all unused jumptables. Fixes #2670.	2021-02-23 15:01:01 -08:00
Chris Fallin	c07ec4c525	Merge pull request #2653 from bjorn3/more_atomic_ops More atomic ops	2021-02-18 08:34:58 -08:00
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
bjorn3	720da20588	Describe serialization format	2021-02-18 11:27:51 +01:00
bjorn3	a0c2276ee7	Add a version marker This prevents deserializing a function with a different Cranelift version	2021-02-18 11:27:51 +01:00
bjorn3	2fc964ea35	Add serde serialization support for the full clif ir	2021-02-18 11:27:02 +01:00
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
Chris Fallin	743529b4eb	Merge pull request #2492 from uweigand/endian-memory-v5 Support explicit endianness in Cranelift IR MemFlags	2020-12-14 13:59:08 -08:00
Ulrich Weigand	467a1af83a	Support explicit endianness in Cranelift IR MemFlags WebAssembly memory operations are by definition little-endian even on big-endian target platforms. However, other memory accesses will require native target endianness (e.g. to access parts of the VMContext that is also accessed by VM native code). This means on big-endian targets, the code generator will have to handle both little- and big-endian memory accesses. However, there is currently no way to encode that distinction into the Cranelift IR that describes memory accesses. This patch provides such a way by adding an (optional) explicit endianness marker to an instance of MemFlags. Since each Cranelift IR instruction that describes memory accesses already has an instance of MemFlags attached, this can now be used to provide endianness information. Note that by default, memory accesses will continue to use the native target ISA endianness. To override this to specify an explicit endianness, a MemFlags value that was built using the set_endianness routine must be used. This patch does so for accesses that implement WebAssembly memory operations. This patch addresses issue #2124.	2020-12-14 20:15:37 +01:00
Y-Nak	855a6374dd	Fix missing modification of jump table in licm	2020-12-09 11:13:33 +09:00
Pat Hickey	0f1dc9a735	Merge pull request #2403 from bjorn3/simplejit_hot_swapping SimpleJIT hot code swapping	2020-12-03 13:36:32 -08:00
bjorn3	eaa2c5b3c2	Implement GOT relocations in SimpleJIT	2020-11-12 15:06:52 +01:00
Chris Fallin	a2bbb198de	Do value-extensions at ABI boundaries only when ABI requires it. There has been some confusion over the meaning of the "sign-extend" (`sext`) and "zero-extend" (`uext`) attributes on parameters and return values in signatures. According to the three implemented backends, these attributes indicate that a value narrower than a full register should always be extended in the way specified. However, they are much more useful if they mean "extend in this way if the ABI requires extending": only the ABI backend knows whether or not a particular ABI (e.g., x64 SysV vs. x64 Baldrdash) requires extensions, while only the frontend (CLIF generator) knows whether or not a value is signed, so the two have to work in concert. This is the result of some very helpful discussion in #2354 (thanks to @uweigand for raising the issue and @bjorn3 for helping to reason about it). This change respects the extension attributes in the above way, rather than unconditionally extending, to avoid potential performance degradation as we introduce more extension attributes on signatures.	2020-11-05 11:54:35 -08:00
Andrew Brown	6d50099816	Rewrite interpreter generically (#2323 ) * Rewrite interpreter generically This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions). * Replace macros with closures	2020-11-02 12:28:07 -08:00
Leonardo Yvens	bde9555793	Add Trap::trap_code (#2309 ) * add Trap::trap_code * Add non-exhaustive wasmtime::TrapCode * wasmtime: Better document TrapCode * move and refactor test	2020-10-27 16:30:45 -05:00
Yury Delendik	3c68845813	Cranelift: refactoring of unwind info (#2289 ) * factor common code * move fde/unwind emit to more abstract level * code_len -> function_size * speedup block scanning * better function_size calciulation * Rename UnwindCode enums	2020-10-15 08:34:50 -05:00
Andrew Brown	3778fa025c	Switch DataValue to use Ieee32/Ieee64 As discussed in #2251, in order to be very confident that NaN signaling bits are correctly handled by the compiler, this switches `DataValue` to use Cranelift's `Ieee32` and `Ieee64` structures. This makes it a bit more inconvenient to interpreter Cranelift FP operations but this should change to something like `rustc_apfloat` in the future.	2020-10-07 12:17:17 -07:00
Andrew Brown	ce44719e1f	refactor: change LowerCtx::get_immediate to return a DataValue This change abstracts away (from the perspective of the new backend) how immediate values are stored in InstructionData. It gathers large immediates from necessary places (e.g. constant pool) and delegates to `InstructionData::imm_value` for the rest. This refactor only touches original users of `LowerCtx::get_immediate` but a future change could do the same for any place the new backend is accessing InstructionData directly to retrieve immediates.	2020-10-07 12:17:17 -07:00
Andrew Brown	3a2025fdc7	Add InstructionData::imm_value()	2020-10-07 12:17:17 -07:00
Chris Fallin	835db11bea	Support for SpiderMonkey's "Wasm ABI 2020". As part of a Wasm JIT update, SpiderMonkey is changing its internal WebAssembly function ABI. The new ABI's frame format includes "caller TLS" and "callee TLS" slots. The details of where these come from are not important; from Cranelift's point of view, the only relevant requirement is that we have two on-stack args that are always present (offsetting other on-stack args), and that we define special argument purposes so that we can supply values for these slots. Note that this adds a new ABI (a variant of the Baldrdash ABI) because we do not want to tightly couple the landing of this PR to the landing of the changes in SpiderMonkey; it's better if both the old and new behavior remain available in Cranelift, so SpiderMonkey can continue to vendor Cranelift even if it does not land (or backs out) the ABI change. Furthermore, note that this needs to be a Cranelift-level change (i.e. cannot be done purely from the translator environment implementation) because the special TLS arguments must always go on the stack, which would not otherwise happen with the usual argument-placement logic; and there is no primitive to push a value directly in CLIF code (the notion of a stack frame is a lower-level concept).	2020-09-30 14:55:56 -07:00
Benjamin Bouvier	7a833f442a	machinst: common up some instruction data helpers;	2020-09-09 18:03:59 +02:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00
Alex Crichton	65eaca35dd	Refactor where results of compilation are stored (#2086 ) * Refactor where results of compilation are stored This commit refactors the internals of compilation in Wasmtime to change where results of individual function compilation are stored. Previously compilation resulted in many maps being returned, and compilation results generally held all these maps together. This commit instead switches this to have all metadata stored in a `CompiledFunction` instead of having a separate map for each item that can be stored. The motivation for this is primarily to help out with future module-linking-related PRs. What exactly "module level" is depends on how we interpret modules and how many modules are in play, so it's a bit easier for operations in wasmtime to work at the function level where possible. This means that we don't have to pass around multiple different maps and a function index, but instead just one map or just one entry representing a compiled function. Additionally this change updates where the parallelism of compilation happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`. This is another goal where `wasmtime-jit` will have more knowledge about module-level pieces with module linking in play. User-facing-wise this should be the same in terms of parallel compilation, though. The ultimate goal of this refactoring is to make it easier for the results of compilation to actually be a set of wasm modules. This means we won't be able to have a map-per-metadata where the primary key is the function index, because there will be many modules within one "object file". * Don't clear out fields, just don't store them Persist a smaller set of fields in `CompilationArtifacts` instead of trying to clear fields out and dynamically not accessing them.	2020-08-03 12:20:51 -05:00
Nick Fitzgerald	ee5982fd16	peepmatic: Be generic over the operator type This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to `peepmatic_runtime::Operator` conversion overhead, and paves the way for allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations). Rather than defining our own `peepmatic::Operator` type like we used to, now the whole `peepmatic` crate is effectively generic over a `TOperator` type parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode` as the concrete type for our `TOperator` type parameter. For testing, we also define a `TestOperator` type, so that we can test Peepmatic code without building all of Cranelift, and we can keep them somewhat isolated from each other. The methods that `peepmatic::Operator` had are now translated into trait bounds on the `TOperator` type. These traits need to be shared between all of `peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic integration. Therefore, these new traits live in a new crate: `peepmatic-traits`. This crate acts as a header file of sorts for shared trait/type/macro definitions. Additionally, the `peepmatic-runtime` crate no longer depends on the `peepmatic-macro` procedural macro crate, which should lead to faster build times for Cranelift when it is using pre-built peephole optimizers.	2020-07-17 16:16:49 -07:00
bjorn3	7b7b1f4997	Rename sarg__ to sarg_t	2020-07-17 12:03:17 +02:00
bjorn3	4971d9ee80	Merge {make_incoming,get_outgoing}_{,struct_}arg	2020-07-17 12:03:17 +02:00
bjorn3	0d4fa6d32a	Fix review comments	2020-07-17 12:03:17 +02:00
bjorn3	4431ac1108	Implement SystemV struct argument passing	2020-07-17 12:03:17 +02:00
Andrew Brown	0e5e8a62c8	Add `DerivedFunction` for doubling lane widths and halving the number of lanes (i.e. merging) Certain operations (e.g. widening) will have operands with types like `NxM` but will return results with types like `(N*2)x(M/2)` (double the lane width, halve the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::DoubleWidth` then `DerivedFunction::HalfVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, `1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)`), I added `DerivedFunction::MergeLanes` to do the necessary type conversion.	2020-07-15 11:32:08 -07:00
Alex Crichton	85ffc8f595	Switch CI back to nightly channel (#2014 ) * Switch CI back to nightly channel I think all upstream issues are now fixed so we should be good to switch back to nightly from our previously pinned version. * Fix doc warnings	2020-07-13 18:40:47 -05:00
Yury Delendik	b2551bb4d0	Make wasmtime_environ::Module serializable (#2005 ) * Define WasmType/WasmFuncType in the Cranelift * Make `Module` serializable	2020-07-10 15:56:43 -05:00
Nick Fitzgerald	8c5f59c0cf	wasmtime: Implement `table.get` and `table.set` These instructions have fast, inline JIT paths for the common cases, and only call out to host VM functions for the slow paths. This required some changes to `cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert an instruction sequence within the current basic block, `FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder` so that they can create whole new basic blocks. This is necessary for implementing GC read/write barriers that involve branching (e.g. checking for null, or whether a store buffer is at capacity). Furthermore, it required that the `load`, `load_complex`, and `store` instructions handle loading and storing through an `r{32,64}` rather than just `i{32,64}` addresses. This involved making `r{32,64}` types acceptable instantiations of the `iAddr` type variable, plus a few new instruction encodings. Part of #929	2020-06-30 12:00:57 -07:00
Alex Crichton	0acd2072c2	Fix doc warnings and link failures (#1948 ) Also add configuration to CI to fail doc generation if any links are broken. Unfortunately we can't blanket deny all warnings in rustdoc since some are unconditional warnings, but for now this is hopefully good enough. Closes #1947	2020-06-30 13:01:49 -05:00
Benjamin Bouvier	238ae3bf21	cranelift: tweak condition in safepoint detection to check for resumable traps;	2020-06-15 12:04:28 +02:00
Chris Fallin	cdbe76a1d4	Remove uses of `matches!()` macro, incompatible with Firefox build. When we vendor Cranelift into Firefox, we need to be able to build with the Firefox CI setup (unless we carry patches on top of upstream). Unfortunately, the Firefox CI currently appears to build with a slightly older version of Rust: I can't work out which version exactly, but one without stable support for `matches!()`. A recent attempt to version-bump Cranelift failed with build errors at the two locations in this patch: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305994046&repo=autoland&lineNumber=24829 I also see a bunch of uses of `matches!()` in Peepmatic, but those crates are not built by Firefox, so we can leave them be for now, I think.	2020-06-11 15:11:10 -07:00
whitequark	bc555468a7	cranelift: add i64.{ishl,ushr,ashr} libcalls. These libcalls are useful for 32-bit platforms. On x86_32 in particular, commit `4ec16fa0` added support for legalizing 64-bit shifts through SIMD operations. However, that legalization requires SIMD to be enabled and SSE 4.1 to be supported, which is not acceptable as a hard requirement.	2020-06-05 12:13:49 -07:00
Nick Fitzgerald	7c68a10ed6	Merge pull request #1670 from teapotd/win64-pass-by-ref Implement passing arguments by ref for win64 ABI	2020-06-01 11:13:30 -07:00
Andrew Brown	0dd77d36f8	Rename BinaryImm format to BinaryImm64	2020-05-29 19:56:27 -07:00
teapotd	759cc3e751	Implement passing arguments by ref for win64 ABI	2020-05-29 20:12:41 +02:00
whitequark	b2e8ed4dc9	cranelift: add i64.[us]{div,rem} libcalls. These libcalls are useful for 32-bit platforms.	2020-05-22 11:41:56 +00:00
Nick Fitzgerald	fb7a690efc	Merge pull request #1687 from fitzgen/sign-extend-immediates cranelift: Sign extend `Imm64` immediates	2020-05-14 10:09:53 -07:00
Nick Fitzgerald	090d1c2d32	cranelift: Port most of `simple_preopt.rs` over to the `peepmatic` DSL This ports all of the identity, no-op, simplification, and canonicalization related optimizations over from being hand-coded to the `peepmatic` DSL. This does not handle the branch-to-branch optimizations or most of the divide-by-constant optimizations.	2020-05-14 07:52:23 -07:00
Dan Gohman	fb0b9e3ae6	Change `proc_exit` to unwind the stack rather than exiting the host process. (#1646 ) * Remove Cranelift's OutOfBounds trap, which is no longer used. * Change proc_exit to unwind instead of exit the host process. This implements the semantics in https://github.com/WebAssembly/WASI/pull/235. Fixes #783. Fixes #993. * Fix exit-status tests on Windows. * Revert the wiggle changes and re-introduce the wasi-common implementations. * Move `wasi_proc_exit` into the wasmtime-wasi crate. * Revert the spec_testsuite change. * Remove the old proc_exit implementations. * Make `TrapReason` an implementation detail. * Allow exit status 2 on Windows too. * Fix a documentation link. * Really fix a documentation link.	2020-05-13 15:59:43 -07:00
Nick Fitzgerald	9b867b09c7	cranelift: Sign extend `Imm64` immediates When an instruction has an `Imm64` immediate, but operates on values of a narrower width, we need to sign extend the value. Fixes #1095	2020-05-12 15:44:48 -07:00
Chris Fallin	e39b4aba1c	Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use it. Previously, every call was lowered on AArch64 to a `call` instruction, which takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible relocation record to be emitted (or rather, a record that a more sophisticated linker would fix up by inserting a shim/veneer). This commit adds a notion of "relocation distance" in the MachInst backends, and provides this information for every call target and symbol reference. The intent is that backends on architectures like AArch64, where there are different offset sizes / addressing strategies to choose from, can either emit a regular call or a load-64-bit-constant / call-indirect sequence, as necessary. This avoids the need to implement complex linking behavior. The MachInst driver code provides this information based on the "colocated" bit in the CLIF symbol references, which appears to have been designed for this purpose, or at least a similar one. Combined with the `use_colocated_libcalls` setting, this allows client code to ensure that library calls can link to library code at any location in the address space. Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing so, it appears all that is necessary to get its tests to pass is to set the `use_colocated_libcalls` flag to false, to make use of the above change. This fixes the `libcall_function` unit-test in this crate.	2020-05-05 09:55:12 -07:00

1 2 3

137 Commits