PR #773 detects, at reload time, `copy` instructions that copy a value from stack
slot back to the same stack slot. It replaces them with `copy_nop` instructions
that have a null encoding (hence producing no code).
For x86_64, `copy_nop` encodings for the types I64, I32, F64 and F32 are
provided. Unfortunately the code that detects the redundant copy doesn't
check the type of the copied value, hence leaving itself open to the danger of
creating a `copy_nop` instruction cannot be encoded (which is different from
saying it has a null encoding).
This patch:
* Expands the x86_64 set of `copy_nop` encodings to: I64 I32 I16 I8 F64 and F32
* Adds encodings for the same for x86_32, rv64 and rv32.
* In `visit_inst()` in `reload.rs`, checks the type of the copied value accordingly.
* Adds comments explaining the above.
The result of the emitter is a vector of bytes holding machine code,
jump tables, and (in the future) other read-only data. Some clients,
notably Firefox's Wasm compiler, needs to separate the machine code
from the data in order to insert more code directly after the code
generated by Cranelift.
To make such separation possible, we record more information about the
emitted bytes: the sizes of each of the sections of code, jump tables,
and read-only data, as well as the locations within the code that
reference (PC-relatively) the jump tables and read-only data.
This follows the rest of the code base data structures, where we have a
mutable data structure builder; once the data structure is constructed,
it's immutable.
This also makes the definition of instructions easier, and it paves the
way for defining immediate variants.
This is also https://bugzilla.mozilla.org/show_bug.cgi?id=1552737.
Cranelift currently has a tendency to create redundant copies (self-copies) of
values from a stack slot back to the same stack slot. This generates a
pointless load and store and an unnecessary register use. The copies are
created by `visit_inst` in regalloc/reload.rs. They appear to occur mostly,
but not exclusively, at loop heads. It's unclear why this happens.
This patch adds a special case to `visit_inst` to find such copies. They are
converted into a new instruction, `copy_nop`, which takes and produces the
same SSA names, so as not to break any of the SSA invariants, but which has a
zero-length encoding, hence removing the copy at emission time.
`copy_nop`s source and destination operands must be stack slots and of course
the *same* stack slot. The verifier has been enhanced to check this, since
misuse of `copy_nop` will likely lead to hard-to-find incorrect-code bugs.
Attempts were made to write a standalone .clif test case. But these failed
because it appears the .clif parser accepts but ignores location hints that
are stack slots. So it's impossible to write, in clif, the exact form of
`copy` instruction that triggers the transformation.
This adds test cases to ensure, to a reasonably high degree of certainty, that
the magic-number generators `magic_u32`, `magic_s32`, `magic_u64` and
`magic_s64` work correctly. This is done by iterating through a large number
of `(n, d)` pairs, generating the magic numbers for `d`, interpreting the
magic numbers so as to perform the division, and comparing against the result
produced directly by the hardware. The distribution of numbers is arranged so
that particular emphasis is given to corner cases -- the range ends and
midpoints -- but also so that there is at least some cover for values away
from those areas. In total 50,148,000 tests are performed.