Commit Graph

10593 Commits

Author SHA1 Message Date
Chris Fallin
1aaea279e5 egraph opts: fix uextend-of-i32. (#5061)
This is a simple error in the const-prop rules: uextend was not
masking iconst's u64 immediate when extending from i32 to
i64. Arguably an iconst.i32 should not have nonzero bits in the upper
32 of its immediate, but that's a separate design question. For now,
if our invariant is that the upper bits are ignored, then it is
required to mask the bits when const-evaling a `uextend`.

Fixes #5047.
2022-10-17 12:45:49 -07:00
Alex Crichton
ff0c45b4a0 Minor changes for components related to wit-bindgen support (#5053)
* Plumb type exports in components around more

This commit adds some more plumbing for type exports to ensure that they
show up in the final compiled representation of a component. For now
they continued to be ignored for all purposes in the embedding API
itself but I found this useful to explore in `wit-bindgen` based tooling
which is leveraging the component parsing in Wasmtime.

* Add a field to `ModuleTranslation` to store the original wasm

This commit adds a field to be able to refer back to the original wasm
binary for a `ModuleTranslation`. This field is used in the upcoming
support for host generation in `wit-component` to "decompile" a
component into core wasm modules to get instantiated. This is used to
extract a core wasm module from the original component.

* FIx a build warning
2022-10-13 12:11:34 -05:00
Nick Fitzgerald
a2f846f124 Don't re-capture backtraces when propagating traps through host frames (#5049)
* Add a benchmark for traps with many Wasm<-->host calls on the stack

* Add a test for expected Wasm stack traces with Wasm<--host calls on the stack when we trap

* Don't re-capture backtraces when propagating traps through host frames

This fixes some accidentally quadratic code where we would re-capture a Wasm
stack trace (takes `O(n)` time) every time we propagated a trap through a host
frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of
course. Whoops. After this commit, it trapping with a call stack that is `n`
frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and
is therefore just a proper `O(n)` time operation, as it is intended to be.

Now we explicitly track whether we need to capture a Wasm backtrace or not when
raising a trap. This unfortunately isn't as straightforward as one might hope,
however, because of the split between `wasmtime::Trap` and
`wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm
backtrace inside `wasmtime_runtime` but in order to determine whether to do that
or not we need to reflect on the `anyhow::Error` and see if it is a
`wasmtime::Trap` that already has a backtrace or not. This can't be done the
straightforward way because it would introduce a cyclic dependency between the
`wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap`
types-- at least not without effectively merging the whole `wasmtime` and
`wasmtime-runtime` crates together, which would be a good idea in a perfect
world but would be a *ton* of ocean boiling from where we currently are --
because `wasmtime::Trap` does symbolication of stack traces which relies on
module registration information data that resides inside the `wasmtime` crate
and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by
adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we
should capture a Wasm backtrace or not, and then determine whether we need a
backtrace or not at each of that function's call sites, which are in `wasmtime`
and therefore can do the reflection to determine whether the user trap already
has a backtrace or not. Phew!

Fixes #5037

* debug assert that we don't record unnecessary backtraces for traps

* Add assertions around `needs_backtrace`

Unfortunately we can't do

    debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some());

because `needs_backtrace` doesn't consider whether Wasm backtraces have been
disabled via config.

* Consolidate `needs_backtrace` calculation followed by calling `raise_user_trap` into one place
2022-10-13 07:22:46 -07:00
Pat Hickey
f96491f333 Ignore when components export type definitions (#5051)
* allow a ComponentTypeRef::Type to point to a component TypeDef

* component matching: don't assert exported Interface type definitions are "defined"

types may be exported by their name for consumption by some component
runtimes, but in wasmtime this doesn't matter (we lift and lower to
types, not define them) so we should ignore these.

* component-model instance tests: show that an import can export a type definition

this is meaningless, but it should be accepted. (previously rejected)
2022-10-12 22:45:03 +00:00
Afonso Bordado
4639e85c4e Flush Icache on AArch64 Windows (#4997)
* cranelift: Add FlushInstructionCache for AArch64 on Windows

This was previously done on #3426 for linux.

* wasmtime: Add FlushInstructionCache for AArch64 on Windows

This was previously done on #3426 for linux.

* cranelift: Add MemoryUse flag to JIT Memory Manager

This allows us to keep the icache flushing code self-contained and not leak implementation details.

This also changes the windows icache flushing code to only flush pages that were previously unflushed.

* Add jit-icache-coherence crate

* cranelift: Use `jit-icache-coherence`

* wasmtime: Use `jit-icache-coherence`

* jit-icache-coherence: Make rustix feature additive

Mutually exclusive features cause issues.

* wasmtime: Remove rustix from wasmtime-jit

We now use it via jit-icache-coherence

* Rename wasmtime-jit-icache-coherency crate

* Use cfg-if in wasmtime-jit-icache-coherency crate

* Use inline instead of inline(always)

* Add unsafe marker to clear_cache

* Conditionally compile all rustix operations

membarrier does not exist on MacOS

* Publish `wasmtime-jit-icache-coherence`

* Remove explicit windows check

This is implied by the target_os = "windows" above

* cranelift: Remove len != 0 check

This is redundant as it is done in non_protected_allocations_iter

* Comment cleanups

Thanks @akirilov-arm!

* Make clear_cache safe

* Rename pipeline_flush to pipeline_flush_mt

* Revert "Make clear_cache safe"

This reverts commit 21165d81c9030ed9b291a1021a367214d2942c90.

* More docs!

* Fix pipeline_flush reference on clear_cache

* Update more docs!

* Move pipeline flush after `mprotect` calls

Technically the `clear_cache` operation is a lie in AArch64, so move the pipeline flush after the `mprotect` calls so that it benefits from the implicit cache cleaning done by it.

* wasmtime: Remove rustix backend from icache crate

* wasmtime: Use libc for macos

* wasmtime: Flush icache on all arch's for windows

* wasmtime: Add flags to membarrier call
2022-10-12 11:15:38 -07:00
Alex Crichton
75cd888e23 Add release notes for 2.0.0 (#5048)
* Add release notes for 2.0.0

* Review comments
2022-10-12 15:39:39 +00:00
Nick Fitzgerald
03d77d4d6b Cranelift: Derive Copy for InstructionData (#5043)
* Cranelift: Derive `Copy` for `InstructionData`

And update `clone` calls to be copies.

* Add a test for `InstructionData`'s size
2022-10-12 07:58:27 -07:00
Afonso Bordado
1d8f982fe5 fuzzgen: Add bitops (#5040)
* cranelift: Implement some bitops for i128 values

* fuzzgen: Add bitops
2022-10-12 05:52:48 -07:00
Andrew Brown
bad71cde4a Revert "Disable wasi-nn CI tests due to breakage (404'ing package repository). (#5028)" (#5045)
This reverts commit 4f8b94163c. The APT
repository for OpenVINO should be restored to its original state.
2022-10-11 19:27:30 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00
Nick Fitzgerald
e2f1ced0b6 Cranelift: Make Opcode represented as a u8 instead of u16 and remove vestigial conversion impls (#5042)
* Cranelift: Make `Opcode` represented as a `u8` instead of `u16`

* Cranelift: Remove unused conversion impls for `Opcode`

These are vestigial, left over from Peepmatic.
2022-10-11 12:57:12 -07:00
Afonso Bordado
86331b9b37 cranelift: Native feature detection for RISC-V (#5044)
* cranelift: Native feature detection for RISC-V

* cranelift: Typo fix

Thanks @cfallin
2022-10-11 19:29:03 +00:00
Benjamin Bouvier
d68ca3711b Upgrade sha2 to 0.10.2 in wasmtime (#4749) 2022-10-10 09:40:40 +00:00
Pat Hickey
42d460f3a6 wasmtime-component-macro: struct and consts created for flags! must be pub (#5030)
* wasmtime-component-macro: struct and consts created for flags! must be pub

* addd empty and all constructors to flags
2022-10-07 16:36:27 -07:00
Adam C. Foltzer
e45577e097 feat(wasi) add push_file and push_dir methods to WasiCtx (#5027)
These are useful when we don't want to collide with an existing file descriptor.
2022-10-06 20:20:17 -07:00
Chris Fallin
4f8b94163c Disable wasi-nn CI tests due to breakage (404'ing package repository). (#5028)
In #5023 we are seeing a failing CI job (see [1]); after four attempted
restarts, it 404's each time when trying to download OpenVino from the
Intel apt mirrors.

This PR temporarily removes the wasi-nn CI job from our CI configuration
so that we have green CI and can merge other work.

[1] https://github.com/bytecodealliance/wasmtime/actions/runs/3200861896/jobs/5228903240
2022-10-06 20:19:57 -07:00
Jun Ryung Ju
39fbff92c3 cranelift: Added fp and, or, xor, not ops to interpreter. (#4999)
* cranelift: Added fp and, or, xor, not ops to interpreter.

* Formatting.

* Removed archtecture dependent test on float-bitops.
2022-10-06 18:24:45 -07:00
Chris Fallin
e95ffe4413 Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023)
* Fix StructReturn handling: properly mark the clobber, and offset actual rets.

The legalization of `StructReturn` was causing issues in the new
call-handling code: the `StructReturn` ret was included in the `SigData` as
if it were an actual CLIF-level return value, but it is not.

Prior to using regalloc constraints for return values, we
unconditionally included rax (or the architecture's usual return
register) as a def, so it would be properly handled as "clobbered" by
the regalloc. With the new scheme, we include defs on the call only for
CLIF-level outputs. Callees with `StructReturn` args were thus not known
to clobber the return-value register, and values might be corrupted.

This PR updates the code to include a `StructReturn` ret as a clobber
rather than a returned value in the relevant spots. I observed it
causing saves/restores of rax in some CLIF that @bjorn3 provided me, but
I was having difficulty minimizing this into a test-case that I would be
comfortable including as a precise-output case (including the whole
thing verbatim would lock down a bunch of other irrelevant details and
cause test-update noise later). If we can find a more minimized example
I'm happy to include it as a filetest.

Fixes #5018.
2022-10-07 00:14:38 +00:00
Jimmy Bourassa
b454110ac7 Fix broken WASI ABI link (#5024)
The file has been moved.
2022-10-05 23:01:05 +00:00
Jamey Sharp
04b30acad9 Misc cleanups (#5014)
* Replace resize+copy_from_slice with extend_from_slice

Vec::resize initializes the new space, which is wasted effort if we're
just going to call `copy_from_slice` on it immediately afterward. Using
`extend_from_slice` is simpler, and very slightly faster.

If the new size were bigger than the buffer we're copying from, then it
would make sense to initialize the excess. But it isn't: it's always
exactly the same size.

* Move helpers from Context to CompiledCode

These methods only use information from Context::compiled_code, so they
should live on CompiledCode instead.

* Remove an unnecessary #[cfg_attr]

There are other uses of `#[allow(clippy::too_many_arguments)]` in this
file, so apparently it doesn't need to be guarded by the "cargo-clippy"
feature.

* Fix a few comments

Two of these were wrong/misleading:

- `FunctionBuilder::new` does not clear the provided func_ctx. It does
  debug-assert that the context is already clear, but I don't think
  that's worth a comment.

- `switch_to_block` does not "create values for the arguments." That's
  done by the combination of `append_block_params_for_function_params`
  and `declare_wasm_parameters`.

* wasmtime-cranelift: Misc cleanups

The main change is to use the `CompiledCode` reference we already had
instead of getting it out of `Context` repeatedly. This removes a bunch
of `unwrap()` calls.

* wasmtime-cranelift: Factor out uncached compile
2022-10-05 10:35:59 -07:00
Nathaniel McCallum
bbdafaf5ce feat: provide default methods for WasiDir (#5019)
When implementing custom WasiDir instances, there is a lot of
boilerplate.  These default methods should reduce code for implementors
who want to provide only a subset of functionality.

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>
2022-10-05 08:58:24 -07:00
wasmtime-publish
a9be4a9b56 Bump Wasmtime to 3.0.0 (#5016)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-10-05 09:30:55 -05:00
Dan Gohman
24da5f7787 Tidy up the WASI ErrorKind enum. (#5015)
* Tidy up the WASI `ErrorKind` enum.

`ErrorKind` is an internal enum used in wasi-libc to represent WASI
errors that aren't precisely represened by `std::io::ErrorKind` errors.
Add a descriptive comment, and remove some codes that are no longer
needed:

 - Remove `NotCapable`, which is no longer used.
 - Remove `WouldBlk`, `Exist`, `Noent`, and `Inval`, which have
   one-to-one correspondences with codes in `std::io::ErrorKind`.

This will simplify the error handling in #4947 and #4967, as it means
the code will no longer have to check for two different forms of these
errors.

* Map `std::io::ErrorKind::InvalidInput` to `Ok(types::Errno::Inval)`.
2022-10-05 09:29:49 -05:00
Rainy Sinclair
6d1bce9c64 Adjust fuel consumption to be empty when fuel is 0 (#5013)
Co-authored-by: Jamey Sharp <jsharp@fastly.com>

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-10-05 09:25:31 -05:00
Trevor Elliott
e63771f2d9 More consistent use of add_inst (#5012)
Use the InstId returned by add_inst rather than creating it eagerly, when possible.
2022-10-04 15:59:30 -07:00
Trevor Elliott
a209cb63f5 ISLE: Enable the overlap checker (#5011)
This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.
2022-10-04 21:56:49 +00:00
Alex Crichton
2607590d8c Update the wasm-tools family of crates (#5010)
* Update the wasm-tools family of crates

Only minor updates here, mostly internal changes and no binary-related
changes today.

* Fix test expectation
2022-10-04 16:26:22 -05:00
Nathaniel McCallum
d986b3cbc2 feat: improve wasi_common::ErrorKind derives (#5006)
Besides the standard traits (Copy, Clone, PartialEq and Eq), we also mark
the trait as non-exhaustive so that we can add errors in the future
without breaking API.

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>
2022-10-04 14:00:42 -07:00
Alex Crichton
b07e619055 Drop a few crates from our dependency graph (#5009)
A minor update of a few other crates drops `semver` and `rustc_version`
from `Cargo.lock`. I've audited the deltas in versions for the other
crates here as well and they all look good.
2022-10-04 20:31:56 +00:00
yuyang
07584f6ac8 fix issue 4996. (#5003) 2022-10-04 11:18:42 -07:00
Johnnie Birch
f2105f95e9 Update perf action to work for wasmtime forks (#5007) 2022-10-04 12:18:24 -05:00
Trevor Elliott
c9ff14e00b Resolve overlap in the s390x backend (#5002)
Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.
2022-10-03 17:06:10 -07:00
Jamey Sharp
d35c508436 cranelift-frontend: Replace Vecs with ListPools (#5001)
* Elide redundant sentinel values

The `undef_variables` lists were a binding from Variable to Value, but
the Values were always equal to a suffix of the block's parameters. So
instead of storing another copy, we can just get them back from the
block parameters.

According to DHAT, this decreases total memory allocated and number of
bytes written, and increases number of bytes read and instructions
retired, but all by small fractions of a percent. According to
hyperfine, main is "1.00 ± 0.01 times faster".

* Use entity_impl for cranelift_frontend::Variable

Instead of hand-coding essentially the same thing.

* Keep undefined variables in a ListPool

According to DHAT, this improves every measure of performance
(instructions retired, total memory allocated, max heap size, bytes
read, and bytes written), although by fractions of a percent. According
to hyperfine the difference is nearly zero, but on Spidermonkey this
branch is "1.01 ± 0.00 times faster" than main.

* Elide redundant block IDs

In a list of predecessors, we previously kept both the jump instruction
that points to the current block, and the block where that instruction
resides. But we can look up the block from the instruction as long as we
have access to the current Layout, which we do everywhere that it was
necessary. So don't store the block, just store the instruction.

* Keep predecessor definitions in a ListPool

* Make append_jump_argument independent of self

This makes it easier to reason about borrow-checking issues.

* Reuse `results` instead of re-doing variable lookup

This eliminates three array lookups per predecessor by hanging on to the
results of earlier steps a little longer. This only works now because I
previously removed the need to borrow all of `self`, which otherwise
prevented keeping a borrow of self.results alive.

I had experimented with using `Vec::split_off` to copy the relevant
chunk of results to a temporary heap allocation, but the extra
allocation and copy was measurably slower. So it's important that this
is just a borrow.

* Cache single-predecessor block ID when sealing

Of the code in cranelift_frontend, `use_var` is the second-hottest path,
sitting close behind the `build` function that's used when inserting
every new instruction. This makes sense given that the operands of a new
instruction usually need to be looked up immediately before building the
instruction.

So making the single-predecessor loops in `find_var` and `use_var_local`
do fewer memory accesses and execute fewer instructions turns out to
have a measurable effect. It's still only a small fraction of a percent
overall since cranelift-frontend is only a few percent of total runtime.

This patch keeps a block ID in the SSABlockData, which is None unless
both the block is sealed and it has exactly one predecessor. Doing so
avoids two array lookups on each iteration of the two loops.

According to DHAT, compared with main, at this point this PR uses 0.3%
less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer
bytes.

According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main
when compiling Spidermonkey. On the other hand, Sightglass says main is
1.01x faster than this PR on the same benchmark by CPU cycles. In short,
actual effects are too small to measure reliably.
2022-10-03 14:29:12 -07:00
bjorn3
f1fce6c60d Support writing riscv64 object files (#4995) 2022-10-03 10:27:07 -07:00
Alex Crichton
7bab5c1b28 Consolidate module definition in wasmtime-jit (#5000)
Minor thing I noticed from #4990 but I stylistically prefer to keep the
`mod foo;` definitions canonicalized to one location to emphasize how
multiple targets can use the same definition.
2022-10-03 11:04:07 -05:00
Yuyi Wang
6bcc430855 Initial work to build for Windows ARM64 (#4990)
* Make wasmtime build for windows-aarch64

* Add check for win arm64 build.

* Fix checks for winarm64 key in workflows.

* Add target in windows arm64 build.

* Add tracking issue for Windows ARM64 trap handling
2022-10-02 19:45:42 -07:00
Jamey Sharp
3fa545bd89 Cleanup cranelift-frontend (#4989)
* cranelift-wasm: Assume block is reachable

In handling the WebAssembly "end" operator, cranelift-wasm had logic to
skip generating a jump instruction if the block was both unreachable and
"pristine", meaning no instructions had been added.

However, `translate_operator` checks first that `state.reachable` is
true, so this logic only runs when cranelift-wasm believes that the
current block _is_ reachable. Therefore the condition should always be
true, whether the block is pristine or not.

I've left a debug_assert in case `state.reachable` ever doesn't agree
with `builder.is_unreachable()`, but the assert doesn't fail in any of
the tests. We'll see if fuzzing finds something.

Anyway, outside of cranelift-frontend, this eliminates the only use of
`is_pristine()`, and there were no uses of `is_filled()`. So I've made
both of those private. They're now only used in a nearby debug assert.

* cranelift-frontend: Clarify pristine/filled states

There was a comment here saying "A filled block cannot be pristine."
Given that the intent was for those two states to be mutually exclusive,
I've replaced the two booleans with a three-state enum.

I also replaced all reads of these two flags with method calls. In all
but one case these are only checked in debug assertions, so I don't even
care whether they get inlined. They're easier to read, and this will
make it easier to replace their implementations, which I hope to do
soon.

Finally, I replaced all assignments to either flag with an appropriate
assignment of the corresponding enum state. Keep in mind this
correspondence between the new enum and the old flags:

- Empty: pristine true, filled false
- Partial: pristine false, filled false
- Filled: pristine false, filled true

Every existing update to these flags could only move to a later state.
(For example, Partial couldn't go back to Empty.) In the old flags that
meant that pristine could only go from true to false, and filled could
only go from false to true.

`fill_current_block` was a weird case because at first glance it looks
like it could allow both pristine and filled to be true at the same
time. However, it's only called from `FuncInstBuilder::build`, which
calls `ensure_inserted_block` before doing anything else, and _that_
cleared the pristine flag.

Similarly, `handle_ssa_side_effects` looks like it could allow both
pristine and filled to be true for anything in `split_blocks_created`.
However, those blocks are created by SSABuilder, so their BlockData is
not initialized by `create_block`, and instead uses BlockData::default.
The `Default` implementation here previously set both flags false, while
`create_block` would instead set pristine to true. So these split blocks
were correctly set to the Filled state, and after this patch they are
still set correctly.

* cranelift-frontend: Separate SSA and user block params

Previously there was a `user_param_count` field in BlockData, used
purely to debug-assert that no user parameters are added to a block
after `use_var` adds SSA parameters.

Instead, this patch enforces a strict phase separation between the
period after a block is created when user parameters can be added to it,
and the period when `use_var` may be called and instructions may be
added.

I'm assuming that calls to `use_var` are _always_ followed by inserting
one or more instructions into the block. (If you don't want to insert an
instruction, why do you need to know where instructions in this block
would get variable definitions from?) This patch has no visible effect
for callers which follow that rule.

However, it was previously legal to call `use_var`, then append a block
parameter before adding instructions, so long as `use_var` didn't
actually need to add a block parameter. That could only happen if the
current block is sealed and has exactly one predecessor. So anyone who
was counting on this behavior was playing a dangerous game anyway.

* cranelift-frontend: Defer initializing block data

Every reference to the func_ctx.status SecondaryMap will automatically
create the appropriate entries on-demand, with the sole exception of
`finalize`. In that function, debug assertions use SecondaryMap::keys to
find out which blocks need to be checked.

However, those assertions always succeed for blocks which never had any
instructions added. So it's okay to skip them for blocks which aren't
touched after `create_block`.
2022-09-30 14:11:19 -07:00
Trevor Elliott
ab4be2bdd1 ISLE: Resolve overlaps in the aarch64 backend (#4988) 2022-09-30 12:57:50 -07:00
Trevor Elliott
c1d6ca48a7 ISLE: Resolve overlap in the riscv64 backend (#4982)
Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.
2022-09-29 17:22:25 -07:00
Jamey Sharp
77ab99d3b0 cranelift-frontend: SSA-building cleanup (#4984)
* Cleanups to cranelift-frontend SSA construction

* Encode sealed/undef_variables relationship in type

A block can't have any undef_variables if it is sealed. It's useful to
make that fact explicit in the types so that any time either value is
used, it's clear that we should think about the other one too.

In addition, encoding this fact in an enum type lets Rust apply an
optimization that reduces the size of SSABlockData by 8 bytes, making it
fit in a 64-byte cache line. I haven't taken the extra step of making
SSABlockData be 64-byte aligned because 1) it doesn't seem to have a
performance impact and b) doing so makes other structures quite a bit
bigger.

* Simplify finish_predecessors_lookup

Using Vec::drain is more concise than a combination of
iter().rev().take() followed by Vec::truncate. And in this case it
doesn't matter what order we examine the results in, because we just
want to know if they're all equal, so we might as well iterate forward
instead of in reverse.

There's no need for the ZeroOneOrMore enum. Instead, there are only two
cases: either we have a single value to use for the variable (possibly
synthesized as a constant zero), or we need to add a block parameter in
every predecessor.

Pre-filtering the results iterator to eliminate the sentinel makes it
easy to identify how many distinct definitions this variable has.
iter.next() indicates if there are any definitions at all, and then
iter.all() is a clear way to express that we want to know if the
remaining definitions are the same as the first one.

* Simplify append_jump_argument

* Avoid assigning default() into SecondaryMap

This eliminates some redundant reads and writes.

* cranelift-frontend: Construct with default()

This eliminates a bunch of boilerplate in favor of a built in `derive`
macro.

Also I'm deleting an import that had the comment "FIXME: Remove in
edition2021", which we've been using everywhere since April.

* Fix tests
2022-09-29 16:59:47 -07:00
Trevor Elliott
46e42601eb ISLE: Always default the priority to 0 (#4983)
* Always default priorities to 0 in the ISLE IR
* Fix a hidden overlap in the x64 backend
2022-09-29 15:32:29 -07:00
Trevor Elliott
7d5b2b5bb2 ISLE: Add the overlap_errors pragma (#4981)
* Add the overlap_errors pragma to ISLE
* Enable overlap errors in the x64 backend
2022-09-29 12:40:39 -07:00
Trevor Elliott
db06e4e622 ISLE: Resolve remaining x64 overlap errors (#4977)
Resolve overlap errors with the x64 backend.
2022-09-29 10:09:37 -07:00
Trevor Elliott
bf2fa40e4e Stabilize the error output order (#4976) 2022-09-28 17:23:54 -07:00
Jamey Sharp
6c8620b688 Bypass state machine for single-predecessor chains (#4955)
In the common case where there is a chain of sealed blocks that each
have exactly one predecessor, we can keep track of any sub-sequence of
those blocks in O(1) space. So there's no need to use the state machine
stack to propagate variable definitions back along the chain.

Instead, we can do one loop to find which block to stop at, then either
get the variable definition from that block or introduce a block
parameter there, and finally do one more loop to update variable
definitions in all the intervening blocks.

The existing implementation already had to do a graph traversal to
propagate variable definitions correctly, so this doesn't visit any more
blocks than before. However, this change also makes it possible to
integrate cycle detection with the graph traversal. That eliminates the
need for the in_predecessor_cycle flags, and any possibility of spiky
performance profiles in maintaining those flags.

As far as performance goes, this is all pretty much a wash: Changes to
CPU time and CPU cycles are within noise, according to hyperfine and
Sightglass/perf. But it's a substantially simpler implementation, with
fewer invisible interactions between functions.
2022-09-28 17:05:08 -07:00
Trevor Elliott
2e954668c7 Sort overlap errors by position (#4975) 2022-09-28 13:23:31 -07:00
Trevor Elliott
faf31f6216 ISLE: Resolve overlap in prelude.isle and x64/inst.isle (#4941)
Resolve overlap in the ISLE prelude and the x64 inst module by introducing new types that allow better sharing of extractor resuls, or falling back on priorities.
2022-09-28 10:54:39 -07:00
Alex Crichton
2ba604e406 Update spec test repo (#4974)
* Update spec test repo

Our submodule was accidentally reverted to an older commit as part
of #4271 and while it could be updated to as it was before I went ahead
and updated it to `main`.

* Update ignore directives and test multi-memory

* Update riscv ignores
2022-09-28 17:04:17 +00:00
Ulrich Weigand
83bb6fd4c8 s390x: Fix regalloc checker error (#4973)
For ShiftRR and VecShiftRR, if shift_reg is zero_reg(), the
instruction does not actually use any register value.

Fixes #4969
2022-09-28 08:25:23 -07:00
yuyang-ok
cdecc858b4 add riscv64 backend for cranelift. (#4271)
Add a RISC-V 64 (`riscv64`, RV64GC) backend.

Co-authored-by: yuyang <756445638@qq.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
2022-09-27 17:30:31 -07:00