Commit Graph

85 Commits

Author SHA1 Message Date
Jamey Sharp
3629bbbd55 Print constants in a comment in CLIF output (#4725)
When trying to read generated CLIF, it's nice to be able to see at a
glance that some of the operands are defined by `iconst` and similar
instructions, without having to go find each operand's definition
manually.
2022-08-17 09:00:20 -07:00
Ulrich Weigand
a916788ab4 Fix mis-aligned access issues with s390x (#4702)
This fixes two problems: minimum symbol alignment for the LARL
instruction, and alignment requirements for LRL/LGRL etc.

The first problem is that the LARL instruction used to load a
symbol address (PC relative) requires that the target symbol
is at least 2-byte aligned.  This is always guaranteed for code
symbols (all instructions must be 2-aligned anyway), but not
necessarily for data symbols.

Other s390x compilers fix this problem by ensuring that all
global symbols are always emitted with a minimum 2-byte
alignment.  This patch introduces an equivalent mechanism
for cranelift:
- Add a symbol_alignment routine to TargetIsa, similar to the
  existing code_section_alignment routine.
- Respect symbol_alignment as minimum alignment for all symbols
  emitted in the object backend (code and data).

The second problem is that PC-relative instructions that
directly *access* data (like LRL/LGRL, STRL/STGRL etc.)
not only have the 2-byte requirement like LARL, but actually
require that their memory operand is *naturally* aligned
(i.e. alignment is at least the size of the access).

This property (natural alignment for memory accesses) is
supposed to be provided by the "aligned" flag in MemFlags;
however, this is not implemented correctly at the moment.

To fix this, this patch:
- Only emits PC-relative memory access instructions if the
  "aligned" flag is set in the associated MemFlags.
- Fixes a bug in emit_small_memory_copy and emit_small_memset
  which currently set the aligned flag unconditionally, ignoring
  the actual alignment info passed by their caller.

Tested with wasmtime and cg_clif.
2022-08-16 12:39:42 -07:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Afonso Bordado
a36a52a017 cranelift: Print error message when basic blocks are invalid (#4591) 2022-08-09 09:28:41 -07:00
Teymour Aldridge
ad223c5234 Add try_use_var method to cranelift-frontend. (#4588)
* Add `try_use_var` method to `cranelift-frontend`.
- Unlike `use_var`, this method does not panic if the variable has not been defined
before use

* Add `try_declare_var` and `try_def_var`.
- Also implement Error for error enums.

* Use `write!` macro.

* Add `write!` use I missed.
2022-08-04 16:19:15 +00:00
Nick Fitzgerald
50b9195882 cranelift-frontend: Reuse visited block sets in SSABuilder::can_optimize_var_lookup (#4536)
First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet`
will deallocate the btree's nodes but clearing a `HashSet` will not deallocate
the backing hash table, saving the space to reuse for future insertions.

Then, we reuse the same set (and therefore the same allocation) across every
call to `can_optimize_var_lookup`.

This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks:

```
compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 39478181.76 ± 3441880.32 (confidence = 99%)

  main.so is 0.75x to 0.79x faster than reuse-set.so!
  reuse-set.so is 1.27x to 1.32x faster than main.so!

  [160128343 172174751.09 213325968] main.so
  [115055695 132696569.33 200782128] reuse-set.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 22576954.88 ± 1830771.68 (confidence = 99%)

  main.so is 0.77x to 0.81x faster than reuse-set.so!
  reuse-set.so is 1.25x to 1.29x faster than main.so!

  [100449245 106820149.65 118628066] main.so
  [77039172 84243194.77 128168647] reuse-set.so

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 664533554.97 ± 22109170.05 (confidence = 99%)

  main.so is 0.81x to 0.82x faster than reuse-set.so!
  reuse-set.so is 1.22x to 1.23x faster than main.so!

  [3549762523 3640587103.35 3798662501] main.so
  [2793335181 2976053548.38 3192950484] reuse-set.so
```
2022-07-26 18:38:24 -07:00
Afonso Bordado
af62037f62 cranelift: Restrict br_table to i32 indices (#4510)
* cranelift: Restrict `br_table` to `i32` indices

In #4498 it was proposed that we should only accept `i32` indices
to `br_table`. The rationale for this is that larger types lead the
users to a false sense of flexibility (since we don't support jump
tables larger than u32's), and narrower types are not well tested
paths that would be safer if we removed them.

* cranelift: Reduce directly from i128 to i32 in Switch
2022-07-22 23:32:40 +00:00
Afonso Bordado
4720d09651 cranelift: Fix Switch bug when emitting indexes larger than val type (#4507)
In #4502 we discovered a bug in the switch api where it would emit
`icmp_imm`'s with types that were not able to fully represent the
destination index.

We now reject these inputs. The index val must always have a
type that is capable of addressing the entire range of inputs.
2022-07-22 10:55:36 -07:00
Jamey Sharp
f242975c49 cranelift-frontend: Allow jump table reuse (#4429)
* Allow using jump-tables multiple times (fixes #3347)

If there are multiple `br_table` instructions using the same jump table,
then `append_jump_argument` must not modify the jump table in-place.

When this function is called, we don't know if more `br_table`
instructions might be added later. So this patch conservatively assumes
that all jump tables might be reused. If Cranelift needs to add a block
argument to a block that's the target of some jump table, then the jump
table will be unconditionally cloned.

I'm not sure if having duplicated and unused jump tables will turn out
to be a compile-time performance issue. If it is, there's discussion in
issue #3347 about ways to determine that there can't be any more uses of
a jump table, so that it's safe to modify in-place.

* Re-enable cranelift-fuzzgen fuzz target

I've been running this fuzz target for an hour without finding new bugs.
Let's see if oss-fuzz finds anything now.
2022-07-11 15:09:51 -05:00
Sam Parker
9c43749dfe [RFC] Dynamic Vector Support (#4200)
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.

Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.

The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.

Copyright (c) 2022, Arm Limited.
2022-07-07 12:54:39 -07:00
Chris Fallin
1c014d129a Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816)
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
2022-02-16 17:29:30 -08:00
wackbyte
05ace6c0e2 Fix a typo in cranelift-frontend's docs (#3796)
Specifically that of `Variable`.
2022-02-13 11:08:19 -08:00
Chris Fallin
f489b83835 Cranelift: add support for cold blocks.
This PR adds a flag to each block that can be set via the frontend/builder
interface that indicates that the block will not be frequently
executed. As such, the compiler backend should place the block "out of
line" in the final machine code, so that the ordinary, more frequent
execution path that excludes the block does not have to jump around it.

This is useful for adding handlers for exceptional conditions
(slow-paths, guard violations) in a way that minimizes performance cost.

Fixes #2747.
2022-01-19 12:17:41 -08:00
Scott McMurray
ca7c54b5f8 Add Type::int_with_byte_size constructor 2021-11-29 16:53:54 -08:00
Scott McMurray
d55a19365e Add FunctionBuilder::emit_small_memory_compare
This takes an IntCC for the comparison to do, though panics for Signed*
since memcmp is an unsigned comparison.  Currently it's most useful for
(Not)Equal, but once big-endian loads are implemented it'll be able to
support the other Unsigned* comparisons nicely on more than just bytes.
2021-11-29 02:08:04 -08:00
Scott McMurray
c266f7f4c3 Cranelift: Add LibCall::Memcmp
The comment says the enum is "likely to grow" and the function's been in libc since C89, so hopefully this is ok.

I'd like to use it for emitting things like array equality.
2021-11-29 01:42:59 -08:00
Benjamin Bouvier
43a86f14d5 Remove more old backend ISA concepts (#3402)
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.

Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
2021-10-04 10:36:12 +02:00
Afonso Bordado
20913a7473 cranelift: Enable compiling br_tables for types larger than i32 2021-09-02 16:26:23 +01:00
Afonso Bordado
6a9378e244 cranelift: Prevent infinite loops in ssa frontend with unreachable code.
Perform a search over block predecessors trying to find loops of
unreachable predecessors. We do this by iterating on predecessors and
marking them as visited, stopping if we find a previously visited block
or if we find a block with multiple predecessors.

This issue was found by the CLIF fuzzer in #3094.
2021-09-01 11:19:23 +01:00
Nick Fitzgerald
4283d2116d cranelift: Move most debug-level logs to the trace level
Cranelift crates have historically been much more verbose with debug-level
logging than most other crates in the Rust ecosystem. We log things like how
many parameters a basic block has, the color of virtual registers during
regalloc, etc. Even for Cranelift hackers, these things are largely only useful
when hacking specifically on Cranelift and looking at a particular test case,
not even when using some Cranelift embedding (such as Wasmtime).

Most of the time, when people want logging for their Rust programs, they do
something like:

    RUST_LOG=debug cargo run

This means that they get all that mostly not useful debug logging out of
Cranelift. So they might want to disable logging for Cranelift, or change it to
a higher log level:

    RUST_LOG=debug,cranelift=info cargo run

The problem is that this is already more annoying to type that `RUST_LOG=debug`,
and that Cranelift isn't one single crate, so you actually have to play
whack-a-mole with naming all the Cranelift crates off the top of your head,
something more like this:

    RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,...

Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything
that is very Cranelift-internal, unlikely to be useful/meaningful to the
"average" Cranelift embedder, or prints a message for each instruction visited
during a pass. On the other hand, things that just report a one line statistic
for a whole pass, for example, are left as `debug!`. The more verbose the log
messages are, the higher the bar they must clear to be `debug!` rather than
`trace!`.
2021-07-26 11:50:16 -07:00
Andrew Brown
e25bf362ab Switch to using TargetFrontendConfig in the frontend memory tests
Commit 7d36fd9a1e avoided these
x86-specific tests altogether. This change avoids any dependency on x86
entirely by specifying a frontend configuration (SystemV + U64); this is
enough information for the `FunctionBuilder` to correctly generate the
syscalls.
2021-06-03 11:01:13 -07:00
Olivier Lemasle
7d36fd9a1e Restrict running tests dependent of x86_64
These 5 tests fail with the error "This test requires x86_64 support." when
executed on another architecture.
2021-06-02 10:35:35 -07:00
MaxGraey
38140900f1 properly splatting bytes in emit_small_memset 2021-05-13 22:05:30 +03:00
Chris Fallin
ff2529c339 Merge pull request #2746 from bjorn3/emit_small_mem_custom_memflags
Allow passing arbitrary MemFlags to emit_small_mem{cpy,move}
2021-04-28 11:26:48 -07:00
Will Robson
da33496063 Seal the block that should have been sealed 2021-04-02 15:35:12 +01:00
Will Robson
f5f3c2fb25 Add test cases to show the unsealed block in switch generation 2021-04-02 15:34:49 +01:00
Alex Crichton
30d9164b6e Fix a number of warnings cropping up on nightly Rust (#2767)
Various small issues here and there, nothing major
2021-03-25 13:19:37 -05:00
MaxGraey
1425c1e7bf fix emit_small_memset
use shifts


format


more


fix


revert to multiply
2021-03-21 17:03:25 +02:00
bjorn3
1639f2c844 Allow passing arbitrary MemFlags to emit_small_mem{cpy,move} 2021-03-20 19:22:55 +01:00
Chris Fallin
8cd64e3ec6 Fix warnings (causing CI failures) with new Rust beta.
- Panic messages must now be string literals (we used `format!()` in
  many places; `panic!()` can take format strings directly).
- Some dead enum options with EVEX encoding stuff in old x86 backend.
  This will go away soon and/or be moved to the new backend anyway, so
  let's silence the warning for now.
- A few other misc warnings.
2021-02-16 14:10:05 -08:00
bjorn3
81b4e48f9f Remove some uses of riscv in tests (#2600)
* Remove some uses of riscv in tests

* Fix typo

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Benjamin Bouvier <public@benj.me>
2021-01-30 23:54:48 +01:00
Matt
27f3307f24 Replace if + panic! with assert!
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2020-11-21 00:03:41 -05:00
Matthieu Felix
35da24adfd Specify unsealed / unfilled blocks 2020-11-11 23:35:48 -05:00
Andrew Brown
c9e8889d47 Update clippy annotation to use latest version (#2375) 2020-11-09 09:24:59 -06:00
Jakub Krauz
bab3c73100 Put arm32 backend behind experimental_arm32 flag 2020-09-22 12:53:14 +02:00
bjorn3
45ccc6940e Fix Switch for 128bit integers 2020-09-21 14:50:59 +02:00
bjorn3
f1fdd5764a Limit jump tables to 32bit in Switch
This is the maximum size supported by the x64 backend
2020-09-08 15:00:24 +02:00
Chris Fallin
1d09c8e46b Merge pull request #1939 from bjorn3/128bit_switch
Fix Switch for 128bit integers
2020-07-14 11:16:33 -07:00
bjorn3
88a84e90ef Fix Switch for 128bit integers 2020-07-14 18:07:52 +02:00
Nick Fitzgerald
a2f4202800 cranelift-frontend: Add the FunctionBuilder::insert_block_after method 2020-06-30 12:00:57 -07:00
Nick Fitzgerald
8c5f59c0cf wasmtime: Implement table.get and table.set
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).

Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.

Part of #929
2020-06-30 12:00:57 -07:00
bjorn3
eeb1e141ba Add some assertions to cranelift_frontend 2020-05-26 10:17:08 +02:00
teapotd
8ee8c322ae Seal blocks created by cranelift_frontend::Switch 2020-04-29 11:42:41 -07:00
teapotd
11497a5207 Ignore already sealed blocks in seal_all_blocks 2020-04-29 11:42:41 -07:00
Andrew Brown
0672d1dc0f Declare constants in the function preamble
This allows us to give names to constants in the constant pool and then use these names in the function body. The original behavior, specifiying the constant value as an instruction immediate, is still supported as a shortcut but some filetests had to change since the canonical way of printing the CLIF constants is now in the preamble.
2020-04-17 11:59:47 -07:00
bjorn3
bc462404b3 [cranelift-frontend] Fix emit_zero for vectors (#1428)
* Fix emit_zero for vectors
* Add undef_vector_vars frontend test
* Remove unused import
2020-03-30 13:01:14 -07:00
Y-Nak
bcddce5fe0 Simplify ssa builder (#1340)
* Simplify SSABuilder with basic block

* Simplify FunctionBuilder with basic block

* Update SSABuilder test

* Update SSABuilder doc
2020-03-20 11:54:44 -07:00
Ryan Hunt
07f335dca6 Rename 'an block' to 'a block'
Missed this in the automatic rename of 'Ebb' to 'Block'.
2020-03-03 13:21:13 -06:00
bjorn3
45cc95e60e Merge emit_small_memcpy and emit_small_memmove (#1301)
* Merge emit_small_memcpy and emit_small_memmove

* Fix typo
2020-02-14 16:44:40 -06:00
Ryan Hunt
832666c45e Mass rename Ebb and relatives to Block (#1365)
* Manually rename BasicBlock to BlockPredecessor

BasicBlock is a pair of (Ebb, Inst) that is used to represent the
basic block subcomponent of an Ebb that is a predecessor to an Ebb.

Eventually we will be able to remove this struct, but for now it
makes sense to give it a non-conflicting name so that we can start
to transition Ebb to represent a basic block.

I have not updated any comments that refer to BasicBlock, as
eventually we will remove BlockPredecessor and replace with Block,
which is a basic block, so the comments will become correct.

* Manually rename SSABuilder block types to avoid conflict

SSABuilder has its own Block and BlockData types. These along with
associated identifier will cause conflicts in a later commit, so
they are renamed to be more verbose here.

* Automatically rename 'Ebb' to 'Block' in *.rs

* Automatically rename 'EBB' to 'block' in *.rs

* Automatically rename 'ebb' to 'block' in *.rs

* Automatically rename 'extended basic block' to 'basic block' in *.rs

* Automatically rename 'an basic block' to 'a basic block' in *.rs

* Manually update comment for `Block`

`Block`'s wikipedia article required an update.

* Automatically rename 'an `Block`' to 'a `Block`' in *.rs

* Automatically rename 'extended_basic_block' to 'basic_block' in *.rs

* Automatically rename 'ebb' to 'block' in *.clif

* Manually rename clif constant that contains 'ebb' as substring to avoid conflict

* Automatically rename filecheck uses of 'EBB' to 'BB'

'regex: EBB' -> 'regex: BB'
'$EBB' -> '$BB'

* Automatically rename 'EBB' 'Ebb' to 'block' in *.clif

* Automatically rename 'an block' to 'a block' in *.clif

* Fix broken testcase when function name length increases

Test function names are limited to 16 characters. This causes
the new longer name to be truncated and fail a filecheck test. An
outdated comment was also fixed.
2020-02-07 10:46:47 -06:00