wasmtime

Author	SHA1	Message	Date
uint256_t	59d46c2fec	cranelift-entity: improve `EntitySet::cardinality()` implementation (#6066 ) * Simplify 'EntitySet::cardinality()' * Fix test	2023-03-21 18:59:54 +00:00
uint256_t	b50cf9bb57	cranelift-entity: more efficient `EntitySet` implementation (#5978 ) * Use usize intead of u8 * Rename 'byte's to appropriate words	2023-03-13 18:43:34 +00:00
Chris Fallin	7b8854f803	egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800 ) * Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)" This reverts commit `c7e2571866`. * egraphs: fix handling of effectful-but-idempotent ops and GVN. This PR addresses #5796: currently, ops that are effectful, i.e., remain in the side-effecting skeleton (which we keep in the `Layout` while the egraph exists), but are idempotent and thus mergeable by a GVN pass, are not handled properly. GVN is still possible on effectful but idempotent ops precisely because our GVN does not create partial redundancies: it removes an instruction only when it is dominated by an identical instruction. An isntruction will not be "hoisted" to a point where it could execute in the optimized code but not in the original. However, there are really two parts to the egraph implementation that produce this effect: the deduplication on insertion into the egraph, and the elaboration with a scoped hashmap. The deduplication lets us give a single name (value ID) to all copies of an identical instruction, and then elaboration will re-create duplicates if GVN should not hoist or merge some of them. Because deduplication need not worry about dominance or scopes, we use a simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes". When we added support for GVN'ing effectful but idempotent ops (#5594), we kept the use of this simple dedup'ing hashmap, but these ops do not get elaborated; instead they stay in the side-effecting skeleton. Thus, we inadvertently created potential for weird code-motion effects. The proposal in #5796 would solve this in a clean way by treating these ops as pure again, and keeping them out of the skeleton, instead putting "force" pseudo-ops in the skeleton. However, this is a little more complex than I would like, and I've realized that @jameysharp's earlier suggestion is much simpler: we can keep an actual scoped hashmap separately just for the effectful-but-idempotent ops, and use it to GVN while we build the egraph. In effect, we're fusing a separate GVN pass with the egraph pass (but letting it interact corecursively with egraph rewrites. This is in principle similar to how we keep a separate map for loads and fuse this pass with the egraph rewrite pass as well. Note that we can use a `ScopedHashMap` here without the "context" (as needed by `CtxHashMap`) because, as noted by @jameysharp, in practice the ops we want to GVN have all their args inline. Equality on the `InstructinoData` itself is conservative: two insts whose struct contents compare shallowly equal are definitely identical, but identical insts in a deep-equality sense may not compare shallowly equal, due to list indirection. This is fine for GVN, because it is still sound to skip any given GVN opportunity (and keep the original instructions). Fixes #5796. * Add comments from review.	2023-03-02 02:10:42 +00:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Jamey Sharp	d35c508436	cranelift-frontend: Replace Vecs with ListPools (#5001 ) * Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.	2022-10-03 14:29:12 -07:00
Jamey Sharp	77ab99d3b0	cranelift-frontend: SSA-building cleanup (#4984 ) * Cleanups to cranelift-frontend SSA construction * Encode sealed/undef_variables relationship in type A block can't have any undef_variables if it is sealed. It's useful to make that fact explicit in the types so that any time either value is used, it's clear that we should think about the other one too. In addition, encoding this fact in an enum type lets Rust apply an optimization that reduces the size of SSABlockData by 8 bytes, making it fit in a 64-byte cache line. I haven't taken the extra step of making SSABlockData be 64-byte aligned because 1) it doesn't seem to have a performance impact and b) doing so makes other structures quite a bit bigger. * Simplify finish_predecessors_lookup Using Vec::drain is more concise than a combination of iter().rev().take() followed by Vec::truncate. And in this case it doesn't matter what order we examine the results in, because we just want to know if they're all equal, so we might as well iterate forward instead of in reverse. There's no need for the ZeroOneOrMore enum. Instead, there are only two cases: either we have a single value to use for the variable (possibly synthesized as a constant zero), or we need to add a block parameter in every predecessor. Pre-filtering the results iterator to eliminate the sentinel makes it easy to identify how many distinct definitions this variable has. iter.next() indicates if there are any definitions at all, and then iter.all() is a clear way to express that we want to know if the remaining definitions are the same as the first one. * Simplify append_jump_argument * Avoid assigning default() into SecondaryMap This eliminates some redundant reads and writes. * cranelift-frontend: Construct with default() This eliminates a bunch of boilerplate in favor of a built in `derive` macro. Also I'm deleting an import that had the comment "FIXME: Remove in edition2021", which we've been using everywhere since April. * Fix tests	2022-09-29 16:59:47 -07:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
Alex Crichton	97894bc65e	Add initial support for fused adapter trampolines (#4501 ) * Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts	2022-07-25 23:13:26 +00:00
Chris Fallin	00f357c028	Cranelift: support 14-bit Type index with some bitpacking. (#4269 ) * Cranelift: make `ir::Type` a `u16`. * Cranelift: pack ValueData back into 64 bits. After extending `Type` to a `u16`, `ValueData` became 12 bytes rather than 8. This packs it back down to 8 bytes (64 bits) by stealing two bits from the `Type` for the enum discriminant (leaving 14 bits for the type itself). Performance comparison (3-way between original (`ty-u8`), 16-bit `Type` (`ty-u16`), and this PR (`ty-packed`)): ``` ~/work/sightglass% target/release/sightglass-cli benchmark \ -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \ --iterations-per-process 10 --processes 2 \ benchmarks-next/spidermonkey/benchmark.wasm compilation benchmarks-next/spidermonkey/benchmark.wasm cycles [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so nanoseconds [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so ``` So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0% cost (5.683s -> 5.723s). That's still not great, and we can likely do better, but it's a start. * Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.	2022-07-05 14:51:02 -07:00
Nick Fitzgerald	d377b665c6	Initial ISLE integration with the x64 backend On the build side, this commit introduces two things: 1. The automatic generation of various ISLE definitions for working with CLIF. Specifically, it generates extern type definitions for clif opcodes and the clif instruction data `enum`, as well as extractors for matching each clif instructions. This happens inside the `cranelift-codegen-meta` crate. 2. The compilation of ISLE DSL sources to Rust code, that can be included in the main `cranelift-codegen` compilation. Next, this commit introduces the integration glue code required to get ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif instruction, we first try to use the ISLE code path. If it succeeds, then we are done lowering this instruction. If it fails, then we proceed along the existing hand-written code path for lowering. Finally, this commit ports many lowering rules over from hand-written, open-coded Rust to ISLE. In the process of supporting ISLE, this commit also makes the x64 `Inst` capable of expressing SSA by supporting 3-operand forms for all of the existing instructions that only have a 2-operand form encoding: dst = src1 op src2 Rather than only the typical x86-64 2-operand form: dst = dst op src This allows `MachInst` to be in SSA form, since `dst` and `src1` are disentangled. ("3-operand" and "2-operand" are a little bit of a misnomer since not all operations are binary operations, but we do the same thing for, e.g., unary operations by disentangling the sole operand from the result.) There are two motivations for this change: 1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE lowering to translate a CLIF expression that evaluates to some value into a `MachInst` expression that evaluates to the same value. We want both the lowering itself and the resulting `MachInst` to be pure and referentially transparent. This is both a nice paradigm for compiler writers that are authoring and maintaining lowering rules and is a prerequisite to any sort of formal verification of our lowering rules in the future. 2. Better align `MachInst` with `regalloc2`'s API, which requires that the input be in SSA form.	2021-10-12 17:11:58 -07:00
katelyn martin	1b9ff6b181	🥓 rust fmt	2021-07-16 16:41:44 -04:00
katelyn martin	87726882dd	✅ add test cases for new `entity_impl!` form	2021-07-16 14:32:00 -04:00
katelyn martin	2e8f7bacf8	🌈 provide a new form of `entity_impl!` N.B. There is likely still some light refactoring to do, so that we are not duplicating so much code. We should also additionally introduce some test coverage.	2021-07-16 14:32:00 -04:00
katelyn martin	d3277c005e	🔭 add simple entity tests	2021-07-16 12:40:39 -04:00
Amanieu d'Antras	76664fc73e	Optimize codegen for SecondaryMap indexing (#2940 ) Moves the slow path which resizes the vector out-of-line. The actual indexing is also done in the out-of-line path which avoids the need for a second bounds check in the fast path after a potential resize.	2021-05-27 11:09:15 -05:00
Alex Crichton	c77ea0c5c7	Add some more `#[inline]` annotations for trivial functions (#2817 ) Looking at some profiles these or their related functions were all showing up, so this commit adds `#[inline]` to allow cross-crate inlining by default.	2021-04-08 12:23:54 -05:00
Peter Huene	ad9fa11d48	Code review feedback. * Remove `once-cell` dependency. * Remove function address `BTreeMap` from `CompiledModule` in favor of binary searching finished functions directly. * Use `with_capacity` when populating `CompiledModule` finished functions and trampolines.	2021-04-07 16:37:04 -07:00
Amanieu d'Antras	9b1693aa72	Add EntityList::truncate	2021-03-08 18:21:02 +00:00
Amanieu d'Antras	65d0bc58d2	Add EntityList::deep_clone	2021-03-08 18:20:46 +00:00
Amanieu d'Antras	b2abe74f25	Improve codegen for remove and swap_remove on EntityList	2021-03-08 18:20:05 +00:00
bjorn3	2fc964ea35	Add serde serialization support for the full clif ir	2021-02-18 11:27:02 +01:00
Amanieu d'Antras	78f312799e	Optimize EntityList::extend and add EntityList::from_iter	2021-01-29 14:09:52 +01:00
Andrew Brown	c9e8889d47	Update clippy annotation to use latest version (#2375 )	2020-11-09 09:24:59 -06:00
CohenArthur	6849dc40bd	Fix typo in generated documentation for `entity!` (#2176 ) * entity: Fix typo in generated documentation The same function documentation was used for `from_u32()` and `as_u32()` while their behaviour is different	2020-08-31 10:40:24 +02:00
Alex Crichton	65eaca35dd	Refactor where results of compilation are stored (#2086 ) * Refactor where results of compilation are stored This commit refactors the internals of compilation in Wasmtime to change where results of individual function compilation are stored. Previously compilation resulted in many maps being returned, and compilation results generally held all these maps together. This commit instead switches this to have all metadata stored in a `CompiledFunction` instead of having a separate map for each item that can be stored. The motivation for this is primarily to help out with future module-linking-related PRs. What exactly "module level" is depends on how we interpret modules and how many modules are in play, so it's a bit easier for operations in wasmtime to work at the function level where possible. This means that we don't have to pass around multiple different maps and a function index, but instead just one map or just one entry representing a compiled function. Additionally this change updates where the parallelism of compilation happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`. This is another goal where `wasmtime-jit` will have more knowledge about module-level pieces with module linking in play. User-facing-wise this should be the same in terms of parallel compilation, though. The ultimate goal of this refactoring is to make it easier for the results of compilation to actually be a set of wasm modules. This means we won't be able to have a map-per-metadata where the primary key is the function index, because there will be many modules within one "object file". * Don't clear out fields, just don't store them Persist a smaller set of fields in `CompilationArtifacts` instead of trying to clear fields out and dynamically not accessing them.	2020-08-03 12:20:51 -05:00
Ömer Sinan Ağacan	c619136752	Remove Eq bound of ReservedValue trait A full Eq implementation is no needed for ReservedValue, as we only need to check whether a value is the reserved one. For entities (defined with `entity_impl!`) this doesn't make much difference, but for more complicated types this avoids generating redundant `Eq`s.	2020-05-26 10:27:55 +02:00
Ryan Hunt	832666c45e	Mass rename Ebb and relatives to Block (#1365 ) * Manually rename BasicBlock to BlockPredecessor BasicBlock is a pair of (Ebb, Inst) that is used to represent the basic block subcomponent of an Ebb that is a predecessor to an Ebb. Eventually we will be able to remove this struct, but for now it makes sense to give it a non-conflicting name so that we can start to transition Ebb to represent a basic block. I have not updated any comments that refer to BasicBlock, as eventually we will remove BlockPredecessor and replace with Block, which is a basic block, so the comments will become correct. * Manually rename SSABuilder block types to avoid conflict SSABuilder has its own Block and BlockData types. These along with associated identifier will cause conflicts in a later commit, so they are renamed to be more verbose here. * Automatically rename 'Ebb' to 'Block' in .rs Automatically rename 'EBB' to 'block' in .rs Automatically rename 'ebb' to 'block' in .rs Automatically rename 'extended basic block' to 'basic block' in .rs Automatically rename 'an basic block' to 'a basic block' in .rs Manually update comment for `Block` `Block`'s wikipedia article required an update. * Automatically rename 'an `Block`' to 'a `Block`' in .rs Automatically rename 'extended_basic_block' to 'basic_block' in .rs Automatically rename 'ebb' to 'block' in .clif Manually rename clif constant that contains 'ebb' as substring to avoid conflict * Automatically rename filecheck uses of 'EBB' to 'BB' 'regex: EBB' -> 'regex: BB' '$EBB' -> '$BB' * Automatically rename 'EBB' 'Ebb' to 'block' in .clif Automatically rename 'an block' to 'a block' in .clif Fix broken testcase when function name length increases Test function names are limited to 16 characters. This causes the new longer name to be truncated and fail a filecheck test. An outdated comment was also fixed.	2020-02-07 10:46:47 -06:00
llogiq	0d8f8bc71f	Fix some clippy warnings (#1277 )	2019-12-07 09:47:43 -08:00
Peter Huene	9f506692c2	Fix clippy warnings. This commit fixes the current set of (stable) clippy warnings in the repo.	2019-10-24 17:20:12 -07:00
bjorn3	bb8fa40ef0	Rustfmt	2019-10-02 11:50:44 -07:00
bjorn3	a114423d0a	Remove std feature from cranelift-entity	2019-10-02 11:50:44 -07:00
Joshua Nelson	a1f6457e8a	Allow building without std (#1069 ) Closes https://github.com/CraneStation/cranelift/issues/1067	2019-09-26 18:00:03 +02:00
Erin Power	947fce194e	Replaced instances of SparseSet with EntitySet	2019-09-23 17:20:25 +02:00
julian-seward1	92a01c816d	Minor speedup tuning for SecondaryMap (#1020 ) The `SecondaryMap` abstraction -- basically, resize-on-demand arrays with a default value -- is very hot in Cranelift. This small patch is the result of many profiling runs. It makes two changes: * `fn index_mut` is changed to be `#[inline(always)]`, based on profile data. * `fn index` and `fn index_mut` call `self.elems.resize()` directly, rather than via `self.resize()`. The point of this is not to improve performance. Rather, it ensures that the public functions for `SecondaryMap` do not call each other. When public interface functions call each other, it becomes difficult to interpret profiling results, because it's harder to see what fraction of costs for `SecondaryMap` as a whole come from outside the module, and what fraction is the result of "internal" calls to the external interface. The overall result, for wasm_lua_binarytrees, is a 1.4% reduction in instruction count for the compiler, and a 2.2% reduction in loads/stores.	2019-09-12 11:09:35 +02:00
Sean Stangl	4b085b9cf7	Avoid unnecessary reallocations in domtree::with_function() (#1011 )	2019-09-10 08:18:06 -06:00
Pat Hickey	89d741f8ae	upgrade to target-lexicon 0.8.0 * the target-lexicon crate no longer has or needs the std feature in cargo, so we can delete all default-features=false, any mentions of its std feature, and the nostd configs in many lib.rs files * the representation of arm architectures has changed, so some case statements needed refactoring	2019-09-04 15:12:17 -07:00
julian-seward1	b8fb52446c	Cranelift: implement redundant fill removal on tree-shaped CFG regions. Mozilla bug 1570584. (#906 )	2019-08-25 19:37:34 +02:00
Artur Jamro	d3815a0399	Implement serde and equality traits for SecondaryMap	2019-08-22 15:54:10 -07:00
Artur Jamro	09ec0d4149	Derive Hash for some types	2019-07-27 06:23:03 -05:00
Artur Jamro	7a72ffefdd	Add serde derive to PrimaryMap	2019-07-19 15:56:29 -07:00
Artur Jamro	9e884b4433	Add support for some serde serialization (#847 ) * Add support for some serde serialization	2019-07-12 15:30:50 -07:00
Benjamin Bouvier	d7d48d5cc6	Add the dyn keyword before trait objects;	2019-06-24 11:42:26 +02:00
lazypassion	747ad3c4c5	moved crates in lib/ to src/, renamed crates, modified some files' text (#660 ) moved crates in lib/ to src/, renamed crates, modified some files' text (#660)	2019-01-28 15:56:54 -08:00

43 Commits