Commit Graph

1502 Commits

Author SHA1 Message Date
Chris Fallin
96bfd4e8c0 s390x: update some regalloc metadata to remove use of reg_mod. (#4856)
* s390x: update some regalloc metadata to remove use of `reg_mod`.

This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.

There are still a few uses of `mod` operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.

* Review feedback: restore two-arg pretty-print form.

* Review feedback.
2022-09-09 18:43:36 -05:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Anton Kirilov
d8b290898c Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation

Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.

Copyright (c) 2022, Arm Limited.

* Refactor `from_artifacts` to avoid second `make_executable` (#1)

This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.

* Address the code review feedback

Copyright (c) 2022, Arm Limited.

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-09-08 09:35:58 -05:00
Andrew Brown
f063082474 x64: remove Inst::XmmLoadConst (#4876)
This is a cherry-pick of a long-ago commit, 2d46637. The original
message reads:

> Now that `SyntheticAmode` can refer to constants, there is no longer a
> need for a separate instruction format--standard load instructions will
> work.

Since then, the transition to ISLE and the use of `XmmLoadConst` in many
more places makes this change a larger diff than the original. The basic
idea is the same, though: the extra indirection of `Inst::XMmLoadConst`
is removed and replaced by a direct use of `VCodeConstant` as a
`SyntheticAmode`. This has no effect on codegen, but the CLIF output is
now clearer in that the actual instruction is displayed (e.g., `movdqu`)
instead of a made-up instruction (`load_const`).
2022-09-07 12:52:13 -07:00
Afonso Bordado
f57b4412ec cranelift: Implement missing i128 rotates on AArch64 (#4866) 2022-09-07 11:11:47 -07:00
Anton Kirilov
dd07e354b4 Cranelift AArch64: Fix the get_return_address lowering (#4851)
The previous implementation assumed that nothing had clobbered the
LR register since the current function had started executing, so
it would be incorrect for a non-leaf function, for example, that
contains the `get_return_address` operation right after a call.
The operation is valid only if the `preserve_frame_pointers` flag
is enabled, which implies that the presence of a frame record on
the stack is guaranteed.

Copyright (c) 2022, Arm Limited.
2022-09-07 11:09:22 -07:00
Jamey Sharp
3d6d49daba cranelift: Remove of/nof overflow flags from icmp (#4879)
* cranelift: Remove of/nof overflow flags from icmp

Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.

Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code
that could have been buggy.

This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.

* Restore trapif tests

It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
2022-09-07 08:38:41 -07:00
Alex Crichton
65930640f8 Bump Wasmtime to 2.0.0 (#4874)
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
2022-09-06 13:49:56 -05:00
Afonso Bordado
f30a7eb0c9 cranelift: Implement PartialEq on Ieee{32,64} (#4849)
* cranelift: Add `fcmp` tests

Some of these are disabled on aarch64 due to not being implemented yet.

* cranelift: Implement float PartialEq for Ieee{32,64} (fixes #4828)

Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.

This is not correct for floats because `NaN != NaN`.

PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.

The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.

* cranelift: Disable `fcmp ord` tests on aarch64

* cranelift: Disable `fcmp ueq` tests on aarch64
2022-09-02 10:42:42 -07:00
Chris Fallin
385bd0cbf8 x64: fix CvtFloatToUintSeq: do not clobber src. (#4842)
This slipped through the regalloc2 operand code update in #4811: the
CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It
was marked as a "mod" operand in the original and I mistakenly
converted it to a "use" as I had not seen the actual clobber. The
instruction now takes an extra temp and makes a copy of `src` in the
appropriate place.

Fixes #4840.
2022-09-01 22:46:57 +00:00
Afonso Bordado
08e7a7f1a0 cranelift: Add inline stack probing for x64 (#4747)
* cranelift: Add inline stack probe for x64

* cranelift: Cleanups comments

Thanks @jameysharp!
2022-09-01 22:32:54 +00:00
Chris Fallin
ae5fe8a728 aarch64: fix up regalloc2 semantics. (#4830)
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.

This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.

This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.
2022-09-01 21:25:20 +00:00
Andrew Brown
ac2d4c4818 x64: improve tests for heap_addr (#4841)
* x64: improve tests for `heap_addr`

This change adds Cranelift `compile` tests for the various cases for
`heap_addr`. The idea behind this is to more clearly show what the
penalties are for dynamically- vs statically-allocated memory as well as
turning Spectre mitigations on and off.

* Add test case: "right" size memory with Spectre enabled
2022-09-01 13:59:55 -07:00
Afonso Bordado
2beaf7352f cranelift: Test calling across different calling conventions (#4801)
* cranelift: Test calling across different calling conventions

* cranelift: Use `wasmtime_system_v` calling convention for cross cc tests
2022-08-31 15:16:41 -07:00
Afonso Bordado
cf7cb10036 cranelift: Add some filetests documentation (#4833) 2022-08-31 08:15:10 -07:00
Chris Fallin
186c7c3b89 x64: clean up regalloc-related semantics on several instructions. (#4811)
* x64: clean up regalloc-related semantics on several instructions.

This PR removes all uses of "modify" operands on instructions in the x64
backend, and also removes all uses of "pinned vregs", or vregs that are
explicitly tied to particular physical registers. In place of both of
these mechanisms, which are legacies of the old regalloc design and
supported via compatibility code, the backend now uses operand
constraints. This is more flexible as it allows the regalloc to see the
liveranges and constraints without "reverse-engineering" move instructions.

Eventually, after removing all such uses (including in other backends
and by the ABI code), we can remove the compatibility code in regalloc2,
significantly simplifying its liverange-construction frontend and
thus allowing for higher confidence in correctness as well as possibly a
bit more compilation speed.

Curiously, there are a few extra move instructions now; they are likely
poor splitting decisions and I can try to chase these down later.

* Fix cranelift-codegen tests.

* Review feedback.
2022-08-30 17:21:14 -07:00
Afonso Bordado
3ce3eeb668 cranelift: Register all functions in test file for interpreter (#4817)
* cranelift: Implement `bnot` in interpreter

* cranelift: Register all functions in test file for interpreter

* cranelift: Relax signature checking for bools and vectors
2022-08-30 15:45:21 -07:00
Chris Fallin
1a59b3e6c6 AArch64: port tls_value to ISLE. (#4821) 2022-08-30 16:51:15 +00:00
Damian Heaton
3d9d759380 Port fcmp to ISLE (AArch64) (#4819)
Ported the existing implementation of `fcmp` for AArch64 to ISLE.

This also ports the `lower_vector_comparison` method to ISLE.

Copyright (c) 2022 Arm Limited
2022-08-30 09:06:15 -07:00
Chris Fallin
2b4b257834 Revert "cranelift: Register all functions in test file for interpreter (#4800)" (#4810)
This reverts commit 500a9f17be.
2022-08-30 01:15:11 +00:00
Chris Fallin
955d4e4ba1 AArch64: port load and store operations to ISLE. (#4785)
This retains `lower_amode` in the handwritten code (@akirilov-arm
reports that there is an upcoming patch to port this), but tweaks it
slightly to take a `Value` rather than an `Inst`.
2022-08-29 17:45:55 -07:00
Afonso Bordado
500a9f17be cranelift: Register all functions in test file for interpreter (#4800)
* cranelift: Implement `bnot` in interpreter

* cranelift: Register all functions in test file for interpreter
2022-08-29 23:39:50 +00:00
Chris Fallin
a6eb24bd4f AArch64: port misc ops to ISLE. (#4796)
* Add some precise-output compile tests for aarch64.

* AArch64: port misc ops to ISLE.

- get_pinned_reg / set_pinned_reg
- bitcast
- stack_addr
- extractlane
- insertlane
- vhigh_bits
- iadd_ifcout
- fcvt_low_from_sint
2022-08-29 12:56:39 -07:00
Trevor Elliott
25d960f9c4 x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793)
Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.
2022-08-26 16:33:48 -07:00
Chris Fallin
8e8dfdf5f9 AArch64: Migrate calls and returns to ISLE. (#4788) 2022-08-26 16:26:39 -07:00
Trevor Elliott
ca6d648e37 x64: Ensure that constants are always 16 bytes for XmmMem (#4790)
Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load.

Fixes #4761
2022-08-26 20:04:38 +00:00
Afonso Bordado
7a9078d9cc cranelift: Allow call and call_indirect in runtests (#4667)
* cranelift: Change test runner order

Changes the ordering of runtests to run per target and then per function.

This change doesn't do a lot by itself, but helps future refactorings of runtests.

* cranelift: Rename SingleFunctionCompiler to TestCaseCompiler

* cranelift: Skip runtests per target instead of per run

* cranelift: Deduplicate test names

With the upcoming changes to the runtest infrastructure we require unique ExtNames for all tests.

Note that for test names we have a 16 character limit on test names, and must be unique within those 16 characters.

* cranelift: Add TestFileCompiler to runtests

TestFileCompiler allows us to compile the entire file once, and then call the trampolines for each test.

The previous code was compiling the function for each invocation of a test.

* cranelift: Deduplicate ExtName for avg_round tests

* cranelift: Rename functions as they are defined.

The JIT internally only deals with User functions, and cannot link test name funcs.

This also caches trampolines by signature.

* cranelift: Preserve original name when reporting errors.

* cranelift: Rename aarch64 test functions

* cranelift: Add `call` and `call_indirect` tests!

* cranelift: Add pauth runtests for aarch64

* cranelift: Rename duplicate s390x tests

* cranelift: Delete `i128_bricmp_of` function from i128-bricmp

It looks like we forgot to delete it when it was moved to
`i128-bricmp-overflow`, and since it didn't have a run invocation
it was never compiled.

However, s390x does not support this, and panics when lowering.

* cranelift: Add `colocated` call tests

* cranelift: Rename *more* `s390x` tests

* cranelift: Add pauth + sign_return_address call tests

* cranelift: Undeduplicate test names

With the latest main changes we now support *unlimited* length test names.

This commit reverts:
52274676ff631c630f9879dd32e756566d3e700f
7989edc172493547cdf63e180bb58365e8a43a42
25c8a8395527d98976be6a34baa3b0b214776739
792e8cfa8f748077f9d80fe7ee5e958b7124e83b

* cranelift: Add LibCall tests

* cranelift: Revert more test names

These weren't auto reverted by the previous revert.

* cranelift: Disable libcall tests for aarch64

* cranelift: Runtest fibonacci tests

* cranelift: Misc cleanup
2022-08-26 12:42:16 -07:00
Trevor Elliott
c1f9736938 x64: Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE (#4787)
Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE.
2022-08-26 09:07:22 -07:00
Trevor Elliott
9386409607 x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780)
Lower extractlane, scalar_to_vector and splat in ISLE.

This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
2022-08-25 09:38:03 -07:00
Trevor Elliott
b8b6f2781e x64: Lower shuffle and swizzle in ISLE (#4772)
Lower `shuffle` and `swizzle` in ISLE.

This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
2022-08-24 21:49:51 +00:00
Damian Heaton
94bcbe8446 Port Fcopysign..FcvtToSintSat to ISLE (AArch64) (#4753)
* Port `Fcopysign`..``FcvtToSintSat` to ISLE (AArch64)

Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `Fcopysign`
  - Also introduced missing support for `fcopysign` on vector values, as
    per the docs.
  - This introduces the vector encoding for the `SLI` machine
    instruction.
- `FcvtToUint`
- `FcvtToSint`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtToUintSat`
- `FcvtToSintSat`

Copyright (c) 2022 Arm Limited

* Document helpers and abstract conversion checks
2022-08-24 10:37:14 -07:00
Afonso Bordado
7e3c481f4e cranelift: Avoid lowering VEX instructions with memory encodings (#4768) 2022-08-24 10:35:06 -07:00
Afonso Bordado
d394edcefe x64: Mask shift amounts for small types (#4752)
* x64: Mask shift amounts for small types

* cranelift: Disable i128 shifts in fuzzer again

They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed

* cranelift: Avoid masking shifts for 32 and 64 bit cases

* cranelift: Add const shift tests and fix them

* cranelift: Remove const `rotl` cases

Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
2022-08-24 10:31:38 -07:00
Jamey Sharp
9cb987c678 Don't limit ExternalName::TestName length (#4764)
In order to keep the `ExternalName` enum small, the `TestcaseName`
struct was limited to 17 bytes: a 1 byte length and a 16 byte buffer.
Due to alignment, that made `ExternalName` 20 bytes.

That fixed-size buffer means that the names of functions in Cranelift
filetests are truncated to fit, which limits our ability to give tests
meaningful names. And I think meaningful names are important in tests.

This patch replaces the inline `TestcaseName` buffer with a
heap-allocated slice. We don't care about performance for test names, so
an indirection out to the heap is fine in that case. But we do care
somewhat about the size of `ExternalName` when it's used during
compiles.

On 64-bit systems, `Box<[u8]>` is 16 bytes, so `TestcaseName` gets one
byte smaller. Unfortunately, its alignment is 8 bytes, so `ExternalName`
grows from 20 to 24 bytes.

According to `valgrind --tool=dhat`, this change has very little effect
on compiler performance. Building wasmtime with `--no-default-features
--release`, and compiling the pulldown-cmark benchmark from Sightglass,
I measured these differences between `main` and this patch:

- total number of allocations didn't change (`ExternalName::TestCase` is
  not used in normal compiles)
- 592 more bytes allocated over the process lifetime, out of 171.5MiB
- 320 more bytes allocated at peak heap size, out of 12MiB
- 0.24% more instructions executed
- 16,987 more bytes written
- 12,120 _fewer_ bytes read
2022-08-23 21:17:30 -07:00
Trevor Elliott
4bdfa76370 x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763)
https://github.com/bytecodealliance/wasmtime/pull/4763
2022-08-23 16:32:00 -07:00
Trevor Elliott
b5f1ab7780 x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741)
Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE.

For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.
2022-08-23 11:22:49 -07:00
Damian Heaton
3b68d76905 Port widening ops to ISLE (AArch64) (#4751)
Ported the existing implementations of the following opcodes for AArch64
to ISLE, and implemented support for 64-bit vectors (per the docs):
- `SwidenLow`
- `SwidenHigh`
- `UwidenLow`
- `UwidenHigh`

Also ported `WideningPairwiseDotProductS` as-is.

Copyright (c) 2022 Arm Limited
2022-08-23 09:42:11 -07:00
Damian Heaton
da1fb305a3 Port vconst to ISLE (AArch64) (#4750)
* Port `vconst` to ISLE (AArch64)

Ported the existing implementation of `vconst` to ISLE for AArch64, and
added support for 64-bit vector constants.

Also introduced 64-bit `vconst` support to the interpreter.

Copyright (c) 2022 Arm Limited

* Replace if-chains with match statements

Copyright (c) 2022 Arm Limited
2022-08-23 09:40:11 -07:00
Trevor Elliott
cee4b209f3 x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730)
https://github.com/bytecodealliance/wasmtime/pull/4730
2022-08-22 13:57:36 -07:00
Trevor Elliott
754cf7156a x64: Fix load sinking bugs in new lowerings (#4740)
Fixes #4736

Fix lowerings that were using values as both a Reg and a RegMem, making it look like a load could be sunk while its value in a register was still being used. Also add an assert that checks that loads that are sunk are never used.
2022-08-19 14:21:06 -07:00
Trevor Elliott
80c77da334 x64: Lower bitcast, fabs, and fneg in ISLE (#4729)
* Add tests for bitcast

* Migrate bitcast to ISLE

* Add tests for fabs

* Lower fabs in ISLE

* Add tests for fneg

* Lower fneg in ISLE
2022-08-18 17:59:23 -07:00
Trevor Elliott
8b6019909b x64: Lower widening and narrowing operations in ISLE (#4722)
Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.
2022-08-18 11:53:24 -07:00
Jamey Sharp
3629bbbd55 Print constants in a comment in CLIF output (#4725)
When trying to read generated CLIF, it's nice to be able to see at a
glance that some of the operands are defined by `iconst` and similar
instructions, without having to go find each operand's definition
manually.
2022-08-17 09:00:20 -07:00
Ulrich Weigand
a916788ab4 Fix mis-aligned access issues with s390x (#4702)
This fixes two problems: minimum symbol alignment for the LARL
instruction, and alignment requirements for LRL/LGRL etc.

The first problem is that the LARL instruction used to load a
symbol address (PC relative) requires that the target symbol
is at least 2-byte aligned.  This is always guaranteed for code
symbols (all instructions must be 2-aligned anyway), but not
necessarily for data symbols.

Other s390x compilers fix this problem by ensuring that all
global symbols are always emitted with a minimum 2-byte
alignment.  This patch introduces an equivalent mechanism
for cranelift:
- Add a symbol_alignment routine to TargetIsa, similar to the
  existing code_section_alignment routine.
- Respect symbol_alignment as minimum alignment for all symbols
  emitted in the object backend (code and data).

The second problem is that PC-relative instructions that
directly *access* data (like LRL/LGRL, STRL/STGRL etc.)
not only have the 2-byte requirement like LARL, but actually
require that their memory operand is *naturally* aligned
(i.e. alignment is at least the size of the access).

This property (natural alignment for memory accesses) is
supposed to be provided by the "aligned" flag in MemFlags;
however, this is not implemented correctly at the moment.

To fix this, this patch:
- Only emits PC-relative memory access instructions if the
  "aligned" flag is set in the associated MemFlags.
- Fixes a bug in emit_small_memory_copy and emit_small_memset
  which currently set the aligned flag unconditionally, ignoring
  the actual alignment info passed by their caller.

Tested with wasmtime and cg_clif.
2022-08-16 12:39:42 -07:00
Trevor Elliott
fbfceaec98 x64: Migrate iadd_pairwise to ISLE (#4718)
* Add a test for iadd_pairwise with swiden input

* Implement iadd_pairwise for swiden_{low,high} input

* Add a test case for iadd_pairwise with uwiden input

* Implement iadd_pairwise with uwiden
2022-08-16 12:21:06 -07:00
Trevor Elliott
3c1490dd59 x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE (#4704)
https://github.com/bytecodealliance/wasmtime/pull/4704
2022-08-16 09:03:50 -07:00
Afonso Bordado
863cbc345c cranelift: Fix icmp.i128 eq for aarch64 (#4706)
* cranelift: Fix `icmp.i128 eq` for aarch64

* cranelift: Use ccmp in `icmp.i128 eq` for aarch64
2022-08-15 11:11:22 -07:00
Afonso Bordado
e577a76c0d cranelift: Sign extend immediates in instructions that embed them. (#4602)
* cranelift: Sign extend immediates in instructions that embed them.

* cranelift: Clarify imm instruction behaviour

* cranelift: Deduplicate imm_const

* cranelift: zero extend logical imm ops
2022-08-15 18:08:20 +00:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Afonso Bordado
ac9725840d cranelift: Add shifts and extends to fuzzer (#4700)
* cranelift: Remove shifts-small-types runtests

These were moved to the main shifts file in #4519 but this file was accidentaly left in tree.

It also fixes the missing sshr_i8_i8 testcase

* cranelift: Add shifts to fuzzer

* cranelift: Add extends to fuzzer
2022-08-11 17:57:00 -07:00