wasmtime

Author	SHA1	Message	Date
Nick Fitzgerald	0b1f51f804	Remove unnecessary parens around expression (#4647 ) Fixes a compiler warning.	2022-08-08 15:48:03 -07:00
Nick Fitzgerald	e81ad3c7eb	cli-flags: Don't ignore the first flag in `CommonOptions::parse_from_str` (#4642 )	2022-08-08 15:25:15 -07:00
Alex Crichton	c816a52746	Reuse locals in adapter trampolines (#4646 ) This commit implements a scheme I've been meaning to work on in the adapter compiler where instead of always generating a fresh local for all operations locals may now be reused. Locals generated are explicitly free'd when their lexical scope has ended, allowing reuse in translation of later types in the adapter. This also implements a new scheme for initializing locals where previously a local could simply be generated, but now the local must be fused with its initializer where a `local.{tee,set}` instruction is always generated. This should help prevent a bug I ran into with strings where one usage of a local was forgotten to be initialized which meant that when it was used during a loop it may have had a stale value from before. Modeling this in Rust isn't possible at compile time unfortunately so I opted for the next best thing, runtime panics. If a local is accidentally not released back to the pool of free locals then it will panic. The fuzzer for simply generating and validating adapter modules should be good at exercising this and it weeded out a few forgotten free's and should be good now.	2022-08-08 21:18:04 +00:00
Chris Fallin	863659e04f	VCode emission: account for RA spill/reload/moves in worst-case block size. (#4644 ) To determine whether we need to insert a "veneer island" of branch-range extension veneers, we need to know ahead of emitting a basic block the worst-case size of that block. This is because veneers only go between blocks (we could plop one in the middle of a block but that would require another jump around it and would probably pessimize some code significantly), and we can't back up once we emit a block. To compute this worst-case size, we take the number of instructions and multiply by the largest possible size of one pseudoinst (e.g., on aarch64, this is 44 bytes; it explicitly excludes the `EmitIsland` pseudo-op which is used before large jumptable inline offset tables are emitted). This is conservative, but it always works, and veneers are somewhat rare in practice (function body >1MiB on aarch64 for example). Unfortunately this logic didn't account for the spill/reload/move instructions inserted by the register allocator, and in one example in issue #4629, a block had only one instruction but 482 edge-moves (!). This came at just the wrong time as we were approaching the 1MiB limit on aarch64. This PR fixes that issue, and fixes the logic to actually look at the correct next block (next in `final_order` rather than numerically next), as a bonus correctness fix. Fixes #4629.	2022-08-08 13:57:18 -07:00
Nick Fitzgerald	ec47335b9c	wasmtime: Add a `Config::native_unwind_info` method (#4643 ) This method configures whether native unwind information (e.g. `.eh_frame` on Linux) is generated or not. This helps integrate with third-party stack capturing tools, such as the system unwinder or the `backtrace` crate. It does not affect whether Wasmtime can capture stack traces in Wasm code that it is running or not. Unwind info is always enabled on Windows, since the Windows ABI requires it. This configuration option defaults to true. Additionally, we deprecate `Config::wasm_backtrace` since we can always cheaply capture stack traces ever since https://github.com/bytecodealliance/wasmtime/pull/4431. Fixes https://github.com/bytecodealliance/wasmtime/issues/4554	2022-08-08 13:54:51 -07:00
Damian Heaton	e463890f26	Port `AvgRound` & `SqmulRoundSat` to ISLE (AArch64) (#4639 ) Ported the existing implementations of the following opcodes on AArch64 to ISLE: - `AvgRound` - Also introduced support for `i64x2` vectors, as per the docs. - `SqmulRoundSat` Copyright (c) 2022 Arm Limited	2022-08-08 11:35:43 -07:00
Damian Heaton	47a67d752b	Split `Fmla` and `Bsl` out into new `VecRRRMod` op (#4638 ) Separates the following opcodes for AArch64 into a separate `VecALUModOp` enum, which is emitted via the `VecRRRMod` instruction. This separates vector ALU instructions which modify a register from instructions which write to a new register: - `Bsl` - `Fmla` Addresses [a discussion](https://github.com/bytecodealliance/wasmtime/pull/4608#discussion_r937975581) in #4608. Copyright (c) 2022 Arm Limited	2022-08-08 11:33:13 -07:00
Alex Crichton	866ec46613	Implement roundtrip fuzzing of component adapters (#4640 ) * Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow<'static, ...>` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store<Box<dyn Any>>` to try to cut down on codegen by monomorphizing fewer `Store<T>` implementation. * Add debug logging to print out what's flowing in and what's flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple<T, T, T, T, ...>` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn't lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the "stack", e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn't trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn't encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator	2022-08-08 18:01:45 +00:00
Alex Crichton	650979ae40	Implement strings in adapter modules (#4623 ) * Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings	2022-08-08 16:01:57 +00:00
jiang1997	e6d339b6ac	supplement necessary instructions to run example-wasi (#4633 )	2022-08-08 10:37:16 -05:00
Chris Fallin	c5e3c0cafb	AArch64: don't assert inst within worst-case size when island emitted. (#4627 ) We assert after emitting each instruction that its size was less than the "worst-case size", which is used to determine when we need to proactively emit an island so pending branch fixups don't go out of bounds. However, the `EmitIsland` pseudo-inst itself can cause an arbitrarily large island to be emitted; this should not have to fit within the worst-case size (because island size is explicitly accounted for by the threshold computation). This PR fixes the assert accordingly. Fixes #4626.	2022-08-05 17:27:56 -07:00
Nick Fitzgerald	95e72db458	Some little Cranelift logging things (#4624 ) * Cranelift: Don't print "skipped TEST can't run aarch64" on x64, etc It's way too noisy. Move it to the logs. * Cranelift: Enable Cranelift trace logs in `clif-util` by default * cranelift-filetest: use `log::warn!` for warnings Instead of `println!` * rustfmt	2022-08-05 13:25:24 -07:00
Damian Heaton	eb332b8369	Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) (#4608 ) * Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) Ported the existing implementations of the following opcodes to ISLE on AArch64: - `fma` - Introduced missing support for `fma` on vector values, as per the docs. - `valltrue` - `vanytrue` Also fixed `fcmp` on scalar values in the interpreter, and enabled interpreter tests in `simd-fma.clif`. This introduces the `FMLA` machine instruction. Copyright (c) 2022 Arm Limited * Add comments for `Fmla` and `Bsl` Copyright (c) 2022 Arm Limited	2022-08-05 09:47:56 -07:00
Nick Fitzgerald	1ed7b43e62	Cranelift: Remove unused `ABICaller::signature` method (#4621 ) And the `ABICallerImpl::ir_sig` field that was used to implement that method. This removes 56 bytes from the size of `ABICallerImpl` and gives us speed ups to compilation of about 7% on all benchmarks. ``` compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 8205119.48 ± 4069474.25 (confidence = 99%) main.so is 0.91x to 0.97x faster than feature.so! feature.so is 1.03x to 1.10x faster than main.so! [117729152 132258110.36 167484097] main.so [107486500 124052990.88 138008797] feature.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 4645258.32 ± 1981104.59 (confidence = 99%) main.so is 0.92x to 0.97x faster than feature.so! feature.so is 1.03x to 1.08x faster than main.so! [76562171 85504479.28 93116863] main.so [75180650 80859220.96 90591978] feature.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 150575617.54 ± 65021102.57 (confidence = 99%) main.so is 0.92x to 0.97x faster than feature.so! feature.so is 1.03x to 1.08x faster than main.so! [2573089039 2843117485.10 3175982602] main.so [2559784932 2692541867.56 3143529008] feature.so ```	2022-08-05 09:46:46 -07:00
Trevor Elliott	0c2a48f682	x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619 ) https://github.com/bytecodealliance/wasmtime/pull/4619	2022-08-05 09:36:11 -07:00
Alex Crichton	1ce9e8aa5f	Fix an issue in adapter module partitioning (#4622 ) When an adapter module depends on a particular core wasm instance this means that it actually depends on not only that instance but all prior core wasm instances as well. This is because core wasm instances must be instantiated in the specified order within a component and that cannot change depending on the dataflow between adapters. This commit fixes a possible panic from linearizing the component dfg where an adapter module tried to depend on an instance that hadn't been instantiated yet because the ordering dependency between core wasm instances hadn't been modeled.	2022-08-05 01:32:39 +00:00
wasmtime-publish	412fa04911	Bump Wasmtime to 0.41.0 (#4620 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-08-04 20:02:19 -05:00
Ulrich Weigand	f552a53654	s390x: Implement bitrev (#4617 ) Since we do not have an instruction for this, this is a simple open-coded implementation. Needed by the cg_clif frontend.	2022-08-04 16:24:55 -07:00
Peter Huene	42233e8eda	components: ignore export aliases to types in translation. (#4604 ) * components: ignore export aliases to types in translation. Currently, translation is ignoring type exports from components during translation by skipping over them before adding them to the exports map. If a component instantiates an inner component and aliases a type export of that instance, it will cause wasmtime to panic with a failure to find the export in the exports map. The fix is to add a representation for exported types to the map that is simply ignored when encountered. This also makes it easier to track places where we would have to support type exports in translation in the future. * Keep type information for type exports. This commit keeps the type information for type exports so that types can be properly aliased from an instance export and thereby adjusting the type index space accordingly. * Add a simple test case for type exports for the component model.	2022-08-04 22:45:11 +00:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Alex Crichton	b4d7ab36f9	Add a dataflow-based representation of components (#4597 ) * Add a dataflow-based representation of components This commit updates the inlining phase of compiling a component to creating a dataflow-based representation of a component instead of creating a final `Component` with a linear list of initializers. This dataflow graph is then linearized in a final step to create the actual final `Component`. The motivation for this commit stems primarily from my work implementing strings in fused adapters. In doing this my plan is to defer most low-level transcoding to the host itself rather than implementing that in the core wasm adapter modules. This means that small cranelift-generated trampolines will be used for adapter modules to call which then call "transcoding libcalls". The cranelift-generated trampolines will get raw pointers into linear memory and pass those to the libcall which core wasm doesn't have access to when passing arguments to an import. Implementing this with the previous representation of a `Component` was becoming too tricky to bear. The initialization of a transcoder needed to happen at just the right time: before the adapter module which needed it was instantiated but after the linear memories referenced had been extracted into the `VMComponentContext`. The difficulty here is further compounded by the current adapter module injection pass already being quite complicated. Adapter modules are already renumbering the index space of runtime instances and shuffling items around in the `GlobalInitializer` list. Perhaps the worst part of this was that memories could already be referenced by host function imports or exports to the host, and if adapters referenced the same memory it shouldn't be referenced twice in the component. This meant that `ExtractMemory` initializers ideally needed to be shuffled around in the initializer list to happen as early as possible instead of wherever they happened to show up during translation. Overall I did my best to implement the transcoders but everything always came up short. I have decided to throw my hands up in the air and try a completely different approach to this, namely the dataflow-based representation in this commit. This makes it much easier to edit the component after initial translation for injection of adapters, injection of transcoders, adding dependencies on possibly-already-existing items, etc. The adapter module partitioning pass in this commit was greatly simplified to something which I believe is functionally equivalent but is probably an order of magnitude easier to understand. The biggest downside of this representation I believe is having a duplicate representation of a component. The `component::info` was largely duplicated into the `component::dfg` module in this commit. Personally though I think this is a more appropriate tradeoff than before because it's very easy to reason about "convert representation A to B" code whereas it was very difficult to reason about shuffling around `GlobalInitializer` items in optimal fashions. This may also have a cost at compile-time in terms of shuffling data around, but my hope is that we have lots of other low-hanging fruit to optimize if it ever comes to that which allows keeping this easier-to-understand representation. Finally, to reiterate, the final representation of components is not changed by this PR. To the runtime internals everything is still the same. * Fix compile of factc	2022-08-04 15:42:06 -05:00
Ulrich Weigand	b17b1eb25d	[s390x, abi_impl] Add i128 support (#4598 ) This adds full i128 support to the s390x target, including new filetests and enabling the existing i128 runtest on s390x. The ABI requires that i128 is passed and returned via implicit pointer, but the front end still generates direct i128 types in call. This means we have to implement ABI support to implicitly convert i128 types to pointers when passing arguments. To do so, we add a new variant ABIArg::ImplicitArg. This acts like StructArg, except that the value type is the actual target type, not a pointer type. The required conversions have to be inserted in the prologue and at function call sites. Note that when dereferencing the implicit pointer in the prologue, we may require a temp register: the pointer may be passed on the stack so it needs to be loaded first, but the value register may be in the wrong class for pointer values. In this case, we use the "stack limit" register, which should be available at this point in the prologue. For return values, we use a mechanism similar to the one used for supporting multiple return values in the Wasmtime ABI. The only difference is that the hidden pointer to the return buffer must be the first, not last, argument in this case. (This implements the second half of issue #4565.)	2022-08-04 20:41:26 +00:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Joel Dice	ed8908efcf	implement fuzzing for component types (#4537 ) This addresses #4307. For the static API we generate 100 arbitrary test cases at build time, each of which includes 0-5 parameter types, a result type, and a WAT fragment containing an imported function and an exported function. The exported function calls the imported function, which is implemented by the host. At runtime, the fuzz test selects a test case at random and feeds it zero or more sets of arbitrary parameters and results, checking that values which flow host-to-guest and guest-to-host make the transition unchanged. The fuzz test for the dynamic API follows a similar pattern, the only difference being that test cases are generated at runtime. Signed-off-by: Joel Dice <joel.dice@fermyon.com>	2022-08-04 12:02:55 -05:00
Teymour Aldridge	ad223c5234	Add `try_use_var` method to `cranelift-frontend`. (#4588 ) * Add `try_use_var` method to `cranelift-frontend`. - Unlike `use_var`, this method does not panic if the variable has not been defined before use * Add `try_declare_var` and `try_def_var`. - Also implement Error for error enums. * Use `write!` macro. * Add `write!` use I missed.	2022-08-04 16:19:15 +00:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Damian Heaton	12a9705fbc	Port `Shuffle` to ISLE (AArch64) (#4596 ) * Port `Shuffle` to ISLE (AArch64) Ported the existing implementation of `Shuffle` for AArch64 to ISLE. Copyright (c) 2022 Arm Limited * Cleanup by shadowing `rn`, `rn2`, and `_` Copyright (c) 2022 Arm Limited	2022-08-04 08:43:23 -07:00
Nick Fitzgerald	70ce288dc7	Save exit Wasm FP and PC in component-to-host trampolines (#4601 ) * Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts * Save exit Wasm FP and PC in component-to-host trampolines Fixes #4535 * Add comment about why we deref the trampoline's FP * Update some tests to use new `vmruntime_limits_*` methods	2022-08-04 10:27:30 -05:00
Jamey Sharp	f69acd6187	Upgrade regalloc2 -> 0.3.2 (#4603 ) Includes a modest improvement in memory usage and performance by removing analysis that was only used during fuzzing.	2022-08-04 00:06:13 +00:00
Trevor Elliott	301be7438e	x64: Begin migrating branch instructions to ISLE (#4587 ) https://github.com/bytecodealliance/wasmtime/pull/4587	2022-08-03 20:28:52 +00:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Anton Kirilov	a897742593	Initial back-edge CFI implementation (#3606 ) Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited.	2022-08-03 11:08:29 -07:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Alex Crichton	ff6082c0af	Improve readability of memory64 compat in `fact` (#4581 ) This commit aims to improve the readability of supporting the memory64 proposal in the `fact` adapter trampoline compiler. Previously there were a few sprinkled blocks that used `if` to generate different instructions inline, but as I've worked on support for strings this has become pretty unwieldy as strings do far more memory manipulation than other type conversions. A pattern that's easier to read is to have small instruction helpers that take the pointer width as an argument and internally dispatch to the correct instruction. This keeps the main translation code branch-free and a bit easier to follow. Additionally for more complicated branching logic it allows for deduplicating the main translation path by having lots of little branches instead of one large branch with everything duplicated on both halves.	2022-08-03 17:12:00 +00:00
Alex Crichton	9f82644cc3	Some minor cleanups/refactorings in components (#4582 ) This is a collection of some minor renamings, refactorings, sharing of code, etc. This was all discovered during my addition of string support to adapter functions and I figured it'd be best to frontload this and land it ahead of the full patch since it's getting complex.	2022-08-03 11:21:55 -05:00
Alex Crichton	0a6baeddf4	Improve some support in `factc`: (#4580 ) * Support CLI parameters for string encoding * Fix `--skip-validate` * Fix printing binary to stdout	2022-08-03 11:21:30 -05:00
Alex Crichton	f587b10eb9	Reduce wasm invocations in the `stacks` fuzzer (#4595 ) On oss-fuzz a test case has been found that executes 30k iterations of a wasm trap which with a 60s timeout leaves 2ms for each invocation which under fuzzing instrumentation is a bit of a stretch with a ~20x slowdown. This commit places a limit on the number of inputs to the fuzzer at 200 to keep it reasonably sized.	2022-08-03 16:08:36 +00:00
Nick Fitzgerald	55215bbd1e	Use a `SmallVec` for `ABIArgSlot`s (#4586 ) These are always length 1 for Wasm benchmarks. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 328624015.86 ± 40274677.93 (confidence = 99%) main.so is 0.88x to 0.91x faster than slots-smallvec.so! slots-smallvec.so is 1.10x to 1.13x faster than main.so! [3070752447 3203778792.55 3446269274] main.so [2503544039 2875154776.69 3197966713] slots-smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 9685705.06 ± 3221286.87 (confidence = 99%) main.so is 0.91x to 0.96x faster than slots-smallvec.so! slots-smallvec.so is 1.05x to 1.09x faster than main.so! [129356493 145594942.79 165038803] main.so [118555011 135909237.73 188780619] slots-smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [79080493 86757564.46 112649639] main.so [78083384 85934125.69 94992743] slots-smallvec.so ```	2022-08-02 17:40:36 -07:00
Nick Fitzgerald	ab1cf3df2d	Use a `SmallVec` for `ABIArg`s (#4584 ) Instead of a regular `Vec`. These vectors are usually very small, for example here is the histogram of sizes when running Sightglass's `pulldown-cmark` benchmark: ``` ;; Number of samples = 10332 ;; Min = 0 ;; Max = 11 ;; ;; Mean = 2.496128532713901 ;; Standard deviation = 2.2859559855427243 ;; Variance = 5.225594767838607 ;; ;; Each ∎ is a count of 62 ;; 0 .. 1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 1 .. 2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 2 .. 3 [ 159 ]: ∎∎ 3 .. 4 [ 838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 4 .. 5 [ 970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 5 .. 6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 6 .. 7 [ 303 ]: ∎∎∎∎ 7 .. 8 [ 272 ]: ∎∎∎∎ 8 .. 9 [ 40 ]: 9 .. 10 [ 18 ]: ``` By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap allocations and get some nice benchmark wins of up to ~1.11x faster compilation. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 340361395.90 ± 63384608.15 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [3101467423 3425524333.41 4060621653] main.so [2820915877 3085162937.51 3375167352] smallvec.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 988446098.59 ± 184075718.89 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [9006994951 9948091070.66 11792481990] main.so [8192243090 8959644972.07 9801848982] smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 7854567.87 ± 2215491.16 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [80354527 93864666.76 119789198] main.so [77554917 86010098.89 94726994] smallvec.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 22810509.85 ± 6434024.63 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [233358190 272593088.57 347880715] main.so [225227821 249782578.72 275097380] smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 10849521.41 ± 4324757.85 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [133875427 156859544.47 222455440] main.so [126073854 146010023.06 181611647] smallvec.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 31508176.97 ± 12559561.91 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [388788638 455536988.31 646034523] main.so [366132033 424028811.34 527419755] smallvec.so ```	2022-08-02 15:53:44 -07:00
Nick Fitzgerald	edf7f9f2bb	wasmtime: Add lots of logging for `externref`s and `table_ops` fuzz target (#4583 ) I essentially add these same logs back in every time I'm debugging something related to this fuzz target or `externref`s in general. Probably like 5 times I've added roughly these logs. We should just make them available whenever we need them via `RUST_LOG=wasmtime_runtime=trace`. This also changes a couple `if let`s to `unwrap`s that are now infallible after	2022-08-02 15:06:44 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Ulrich Weigand	6b4e6523f7	[abi_impl] Respect extension for incoming stack arguments (#4576 ) The gen_copy_arg_to_regs routine currently ignores argument extension flags when loading incoming arguments. This causes a problem with stack arguments on big-endian systems, since the argument address points to the word on the stack as extended by the caller, but the generated code only loads the inner type from the address, causing it to receive an incorrect value. (This happens to work on little- endian systems.) Fixed by loading extended arguments as full words.	2022-08-02 13:54:13 -07:00
Alex Crichton	ee5b192d35	Re-enable component model `.wast` tests (#4577 ) Re-enable component model `.wast` tests These accidentally stopped running as part of #4556 on CI since I forgot one more location to touch a feature gate. Enable logging in component tests This is a small convenience to get log messages during testing for components by default.	2022-08-02 15:43:33 -05:00
Peter Huene	43125aa994	components: fix trampoline compilation for lists. (#4579 ) This commit fixes trampoline compilation for lists where the loop condition would only branch if the amount remaining was 0 instead of not 0. It resulted in only the first element of the list being copied.	2022-08-02 20:28:43 +00:00
Chris Fallin	8dddd6f1f7	Cranelift: Remove `ifcmp_sp` opcode. (#4578 ) This was temporarily added back in #3502 due to a need from Lucet; now that Lucet is EOL, the opcode is no longer needed and we can remove it.	2022-08-02 13:15:39 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Benjamin Bouvier	ff37c9d8a4	[cranelift] Rejigger the `compile` API (#4540 ) * Move `emit_to_memory` to `MachCompileResult` This small refactoring makes it clearer to me that emitting to memory doesn't require anything else from the compilation `Context`. While it's a trivial change, it's a small public API change that shouldn't cause too much trouble, and doesn't seem RFC-worthy. Happy to hear different opinions about this, though! * hide the MachCompileResult behind a method * Add a `CompileError` wrapper type that references a `Function` * Rename MachCompileResult to CompiledCode * Additionally remove the last unsafe API in cranelift-codegen	2022-08-02 12:05:40 -07:00
Sam Parker	37cd96beff	[AArch64] i64x2 support for min/max (#4575 ) Also added interpreter support for vector min/max. Copyright (c) 2022, Arm Limited.	2022-08-02 11:42:05 -07:00
Nick Fitzgerald	c77bec4dcb	Cranelift: don't `emit` inside lowering rules for aarch64 (#4572 ) * Cranelift: Don't `emit` inside lowering rules in aarch64 The lowering rules should be "pure" and side-effect free, using helpers defined in `inst.isle` to perform actual side effects like emitting instructions. * Cranelift: use 80 width for section separators in aarch64 lowering rules	2022-08-01 16:43:42 -07:00
Alex Crichton	fb59de15af	Implement fused adapters for `(list T)` types (#4558 ) * Implement fused adapters for `(list T)` types This commit implements one of the two remaining types for adapter fusion, lists. This implementation is particularly tricky for a number of reasons: * Lists have a number of validity checks which need to be carefully implemented. For example the byte length of the list passed to allocation in the destination module could overflow the 32-bit index space. Additionally lists in 32-bit memories need a check that their final address is in-bounds in the address space. * In the effort to go ahead and support memory64 at the lowest layers this is where much of the magic happens. Lists are naturally always stored in memory and shifting between 64/32-bit address spaces is done here. This notably required plumbing an `Options` around during flattening/size/alignment calculations due to the size/types of lists changing depending on the memory configuration. I've also added a small `factc` program in this commit which should hopefully assist in exploring and debugging adapter modules. This takes as input a component (text or binary format) and then generates an adapter module for all component function signatures found internally. This commit notably does not include tests for lists. I tried to figure out a good way to add these but I felt like there were too many cases to test and the tests would otherwise be extremely verbose. Instead I think the best testing strategy for this commit will be through #4537 which should be relatively extensible to testing adapters between modules in addition to host-based lifting/lowering. * Improve handling of lists of 0-size types * Skip overflow checks on byte sizes for 0-size types * Skip the copy loop entirely when src/dst are both 0 * Skip the increments of src/dst pointers if either is 0-size * Update semantics for zero-sized lists/strings When a list/string has a 0-byte-size the base pointer is no longer verified to be in-bounds to match the supposedly desired adapter semantics where no trap happens because no turn of the loop happens.	2022-08-01 17:02:08 -05:00

1 2 3 4 5 ...

10173 Commits