The release process failed last night due to me filling out the dates in
the release notes early (rather than leaving "Unreleased") which mean
there were no changes for each commit. Switch to passing `--allow-empty`
when making a commit to prevent this.
* Memoize `can_optimize_var_lookup`
`can_optimize_var_lookup` can have quadratic behavior if there is a chain
of blocks each containing a `local.get` instruction because each run can
walk up the entire chain. This change memoizes the results of
`can_optimize_var_lookup` so that we can stop following the chain of
predecessors when we hit a block that has previously been handled
(making the operation linear again).
This commit limits the maximum number of linear memories when the
pooling allocator is used to ensure that the virtual memory mapping for
the pooling allocator itself can succeed. Currently there are a number
of crashes in the differential fuzzer where the pooling allocator can't
allocate its mapping because the maximum specified number of linear
memories times the number of instances exceeds the address space
presumably.
* Optimize the WASI `random_get` implementation.
Use `StdRng` instead of the `OsRng` in the default implementation of
`random_get`. This uses a userspace CSPRNG, making `random_get` 3x faster
in simple benchmarks.
* Update cargo-vet audits for cap-std 0.25.3.
* Update all cap-std packages to 0.25.3.
* Adds a github action to support x64 performance testing using a sightglass
This github action allows performance testing using sightglass. The
action is triggered either via a workflow dispatch or with the comment
'/bench_x64', in a pull request. Once triggered the action will send
a request to a private repository that supports using a self-hosted runner
to do comparisons of "refs/feature/commit" vs "refs/heads/main" for
wasmtime. If the action is triggered via a comment in a pull request
(with '/bench_x64') then the commit referenced by the pull request is used
for the comparison against refs/head/main. If triggered via a workflow
dispatch the interface will request the commit to compare against
refs/head/main. The results of the performance tests, run via sightglass,
will be a table showing a percentage change in clock ticks in various stages
requried for executing the benchmark, namely instantiate, compiliation,
and execution. This patch is intended to be just a starting patch with much
to tweak and improve. One of the TODOs will be adding support for aarch64
.. currently this patch supports only x64. Note also that the logic for
actually doing the comparison and parsing the results occurs with the action
associated with the private repo and so this patch itself (though the trigger)
is fairly straight forward.
* Refactor patch to consolidate all steps to here.
* Remove unused code
* Remvoes unused pull_request_review_comment trigger
* Match trigger word when contained anywhere in the pull request review message
* Remove redundant repo and ref variables for wasmtime_commit
* Minor comment update
* Remove command to install jq
* Remove printing of git config variables being used
* Fix token for posting results
* Update message explaining pct_change for benchmark results
* Revert TOKEN for publsh change
* Update message explaining results
* cranelift: Test Forward branching
* fuzzgen: Separate terminators
* fuzzgen: Avoid generating jumptables if we have no valid targets
* fuzzgen: Forward Jump Tables
* fuzzgen: Cleanup some feedback
Thanks @jameysharp!
* fuzzgen: Cleanup block generation
Thanks @jameysharp!
* fuzzgen: Style Cleanups
These were accidentally reverted in a rebase
* fuzzgen: Prevent block0 from being targeted for branches
* fuzzgen: Add jump tables sorting TODO
* fuzzgen: Disable verifier after NaN Canonicalization
We are currently running the verifier twice, once after the nan canonicalization pass, and again when JIT compiling the code.
The verifier first runs in the NaN Canonicalization pass. If it fails it prevents us from getting a nice `cargo fuzz fmt` test case.
So disable the verifier there, but ensure its enabled when JIT compiling.
* fuzzgen: Force enable verifier in cranelift-icache
This is already the default, but since we no longer run the verifier in `fuzzgen` its important to ensure that it runs in the fuzz targets.
* Port `icmp` to ISLE (AArch64)
Ported the existing implementation of `icmp` (and, by extension, the
`lower_icmp` function) to ISLE for AArch64.
Copyright (c) 2022 Arm Limited
* Allow 'producer chains', eliminating `Nop0`s
Copyright (c) 2022 Arm Limited
* fuzz: improve the API of the `wasm-spec-interpreter` crate
This change addresses key parts of #4852 by improving the bindings to
the OCaml spec interpreter. The new API allows users to `instantiate` a
module, `interpret` named functions on that instance, and `export`
globals and memories from that instance. This currently leaves the
existing implementation ("instantiate and interpret the first function in
a module") present under a new name: `interpret_legacy`.
* fuzz: adapt the differential spec engine to the new API
This removes the legacy uses in the differential spec engine, replacing
them with the new `instantiate`-`interpret`-`export` API from the
`wasm-spec-interpreter` crate.
* fix: make instance access thread-safe
This changes the OCaml-side definition of the instance so that each
instance carries round a reference to a "global store" that's specific
to that instantiation. Because everything is updated by reference there
should be no visible behavioural change on the Rust side, apart from
everything suddenly being thread-safe (modulo the fact that access to
the OCaml runtime still needs to be locked). This fix will need to be
generalised slightly in future if we want to allow multiple modules to
be instantiated in the same store.
Co-authored-by: conrad-watt <cnrdwtt@gmail.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Removes the function_alignment field from ObjectBuilder and ObjectModule. Alignment information is now provided either by the Module trait for minimum function alignment requirements, or on FunctionInfo for fucntion specific alignment requirements.
* s390x: update some regalloc metadata to remove use of `reg_mod`.
This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.
There are still a few uses of `mod` operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.
* Review feedback: restore two-arg pretty-print form.
* Review feedback.
* ABI: implement register arguments with constraints.
Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.
For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.
This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.
Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!
A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.
* Update based on review feedback.
* Review feedback.
Previously, Cranelift panicked (via a a panic in regalloc2) when the
virtual-register limit of 2M (2^21) was reached. This resulted in a
perplexing and unhelpful failure when the user provided a too-large
input (such as the Wasm module in #4865).
This PR adds an explicit check when allocating vregs that fails with a
"code too large" error when the limit is hit, producing output such as
(on the minimized testcase from #4865):
```
Error: failed to compile wasm function 3785 at offset 0xa3f3
Caused by:
Compilation error: Code for function is too large
```
Fixes#4865.
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Using fallible extractors that produce no values for flag checks means
that it's not possible to pattern match cases where those flags are
false. This change reworks the existing flag-checking extractors to be
infallible, returning the flag's boolean value from the context instead.
This is a cherry-pick of a long-ago commit, 2d46637. The original
message reads:
> Now that `SyntheticAmode` can refer to constants, there is no longer a
> need for a separate instruction format--standard load instructions will
> work.
Since then, the transition to ISLE and the use of `XmmLoadConst` in many
more places makes this change a larger diff than the original. The basic
idea is the same, though: the extra indirection of `Inst::XMmLoadConst`
is removed and replaced by a direct use of `VCodeConstant` as a
`SyntheticAmode`. This has no effect on codegen, but the CLIF output is
now clearer in that the actual instruction is displayed (e.g., `movdqu`)
instead of a made-up instruction (`load_const`).
* Improve panic message if typevar_operand is None
* cranelift-fuzzgen: Don't allocate for each choice
I don't think the performance of test-case generation is at all
important here. I'm actually doing this in preparation for a bigger
refactor where I want to be able to borrow the list of valid choices for
a given opcode without worrying about lifetimes.
* cranelift-fuzzgen: Remove next_func_index
It's only used locally within `generate_funcrefs`, so it doesn't need to
be in the FunctionBuilder struct.
Also there's already a local counter that I think is good enough for
this. As far as I know, the function indexes only need to be distinct,
not contiguous.
* cranelift-fuzzgen: Separate resources from config
The function-global variables, blocks, etc that are generated before
generating instructions are all owned collections without any lifetime
parameters. By contrast, the Unstructured and Config are both borrowed.
Separating them will make it easier to borrow from the owned resources.
The previous implementation assumed that nothing had clobbered the
LR register since the current function had started executing, so
it would be incorrect for a non-leaf function, for example, that
contains the `get_return_address` operation right after a call.
The operation is valid only if the `preserve_frame_pointers` flag
is enabled, which implies that the presence of a frame record on
the stack is guaranteed.
Copyright (c) 2022, Arm Limited.
* cranelift: Remove of/nof overflow flags from icmp
Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.
Fixes#3060, fixes#4406, and fixes #4875... by deleting all the code
that could have been buggy.
This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.
* Restore trapif tests
It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
* [fuzz] Add SIMD to single-instruction generator
This change extends the single-instruction generator with most of the
SIMD instructions. Examples of instructions that were excluded are: all
memory-related instructions, any instruction with an immediate.
* [fuzz] Generate V128s with known values from each type
To better cover the fuzzing search space, `DiffValue` will generate
better known values for the `V128` type. First, it uses arbitrary data
to select a sub-type (e.g., `I8x16`, `F32x4`, etc.) and then it fills in
the bytes by generating biased values for each of the lanes.
* [fuzz] Canonicalize NaN values in SIMD lanes
This change ports the NaN canonicalization logic from `wasm-smith`
([here]) to the single-instruction generator.
[here]: https://github.com/bytecodealliance/wasm-tools/blob/6c127a6/crates/wasm-smith/src/core/code_builder.rs#L927
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
This commit hard-codes the pooling allocator's limit of linear memories
to 1 when used with fuzzing the spec tests themselves. This prevents the
number from being set too high and hitting a virtual-memory-based OOM
due to the virtual memory reservation of the pooling allocator being too
large.
* Throw out fewer fuzz inputs with differential fuzzer
Prior to this commit the differential fuzzer would generate a module and
then select an engine to execute the module against Wasmtime. This
meant, however, that the candidate list of engines were filtered against
the configuration used to generate the module to ensure that the
selected engine could run the generated module.
This commit inverts this logic and instead selects an engine first,
allowing the engine to then tweak the module configuration to ensure
that the generated module is compatible with the engine selected. This
means that fewer fuzz inputs are discarded because every fuzz input will
result in an engine being executed.
Internally the engine constructors have all been updated to update the
configuration to work instead of filtering the configuration. Some other
fixes were applied for the spec interpreter as well to work around #4852
* Fix tests
* cranelift-codegen: Remove all uses of DataValue
This type is only used by the interpreter, cranelift-fuzzgen, and
filetests. I haven't found another convenient crate for those to all
depend on where this type can live instead, but this small refactor at
least makes it obvious that code generation does not in any way depend
on the implementation of this type.
* Make DataValue, not Ieee32/64, respect IEEE754
This fixes#4857 by partially reverting #4849.
It turns out that Ieee32 and Ieee64 need bitwise equality semantics so
they can be used as hash-table keys.
Moving the IEEE754 semantics up a layer to DataValue makes sense in
conjunction with #4855, where we introduced a DataValue::bitwise_eq
alternative implementation of equality for those cases where users of
DataValue still want the bitwise equality semantics.
* cranelift-interpreter: Use eq/ord from DataValue
This fixes#4828, again, now that the comparison operators on DataValue
have the right IEEE754 semantics.
* Add regression test from issue #4857
* Improve wasmi differential fuzzer
* Support modules with a `start` function
* Implement trap-matching to ensure that wasmi and Wasmtime both report
the same flavor of trap.
* Support differential fuzzing where no engines match
Locally I was attempting to run against just one wasm engine with
`ALLOWED_ENGINES=wasmi` but the fuzzer quickly panicked because the
generated test case didn't match wasmi's configuration. This commit
updates engine-selection in the differential fuzzer to return `None` if
no engine is applicable, throwing out the test case. This won't be hit
at all with oss-fuzz-based runs but for local runs it'll be useful to
have.
* Improve proposal support in differential fuzzer
* De-prioritize unstable wasm proposals such as multi-memory and
memory64 by making them more unlikely with `Unstructured::ratio`.
* Allow fuzzing multi-table (reference types) and multi-memory by
avoiding setting their maximums to 1 in `set_differential_config`.
* Update selection of the pooling strategy to unconditionally support
the selected module config rather than the other way around.
* Improve handling of traps in differential fuzzing
This commit fixes an issue found via local fuzzing where engines were
reporting different results but the underlying reason for this was that
one engine was hitting stack overflow before the other. To fix the
underlying issue I updated the execution to check for stack overflow
and, if hit, it discards the entire fuzz test case from then on.
The rationale behind this is that each engine can have unique limits for
stack overflow. One test case I was looking at for example would stack
overflow at less than 1000 frames with epoch interruption enabled but
would stack overflow at more than 1000 frames with it disabled. This
means that the state after the trap started to diverge and it looked
like the engines produced different results.
While I was at it I also improved the "function call returned a trap"
case to compare traps to make sure the same trap reason popped out.
* Fix fuzzer tests
I noticed an oss-fuzz-based timeout that was reported for the
`component_api` fuzzer where the adapter module generated takes 1.5
seconds to compile the singular function in release mode (no fuzzing
enabled). The test case in question was a deeply recursive
list-of-list-of-etc and only one function was generated instead of
multiple. I updated the cost of strings/lists to cost more in the
approximate cost calculation which now forces the one giant function to
get split up and the large function is now split up into multiple
smaller function that take milliseconds to compile.
* cranelift: Add `fcmp` tests
Some of these are disabled on aarch64 due to not being implemented yet.
* cranelift: Implement float PartialEq for Ieee{32,64} (fixes#4828)
Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.
This is not correct for floats because `NaN != NaN`.
PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.
The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.
* cranelift: Disable `fcmp ord` tests on aarch64
* cranelift: Disable `fcmp ueq` tests on aarch64
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.
Copyright (c) 2022, Arm Limited.
Co-authored-by: Chris Fallin <chris@cfallin.org>
This slipped through the regalloc2 operand code update in #4811: the
CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It
was marked as a "mod" operand in the original and I mistakenly
converted it to a "use" as I had not seen the actual clobber. The
instruction now takes an extra temp and makes a copy of `src` in the
appropriate place.
Fixes#4840.
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.
This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.
This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.
* x64: improve tests for `heap_addr`
This change adds Cranelift `compile` tests for the various cases for
`heap_addr`. The idea behind this is to more clearly show what the
penalties are for dynamically- vs statically-allocated memory as well as
turning Spectre mitigations on and off.
* Add test case: "right" size memory with Spectre enabled