PR #773 detects, at reload time, `copy` instructions that copy a value from stack
slot back to the same stack slot. It replaces them with `copy_nop` instructions
that have a null encoding (hence producing no code).
For x86_64, `copy_nop` encodings for the types I64, I32, F64 and F32 are
provided. Unfortunately the code that detects the redundant copy doesn't
check the type of the copied value, hence leaving itself open to the danger of
creating a `copy_nop` instruction cannot be encoded (which is different from
saying it has a null encoding).
This patch:
* Expands the x86_64 set of `copy_nop` encodings to: I64 I32 I16 I8 F64 and F32
* Adds encodings for the same for x86_32, rv64 and rv32.
* In `visit_inst()` in `reload.rs`, checks the type of the copied value accordingly.
* Adds comments explaining the above.
The result of the emitter is a vector of bytes holding machine code,
jump tables, and (in the future) other read-only data. Some clients,
notably Firefox's Wasm compiler, needs to separate the machine code
from the data in order to insert more code directly after the code
generated by Cranelift.
To make such separation possible, we record more information about the
emitted bytes: the sizes of each of the sections of code, jump tables,
and read-only data, as well as the locations within the code that
reference (PC-relatively) the jump tables and read-only data.
Also:
- make sure to apply diversions when determining offsets for code
relaxation.
- select the smallest encoding when selecting a relaxed branch
instruction.
* Use single index for param register allocation for windows callconv (#691)
The used registers depend entirely on the parameter index (1st, 2nd, 3rd, 4th, ... param)
and we cannot shift unused registers to other indexes, if they are not designated for
the use for that parameter index.
This was previously using the following condition to decide that a block
hadn't been visited yet: either dest_offset is non-0 or the block isn't
the entry block. Unfortunately, this didn't work when the first block
would be non-empty but wouldn't generate code at all.
Since the original code would do at least one pass over the entire code,
the first pass that determines initial EBB offsets is done separately,
without considering branch relaxation. This ensures that all EBBs have
been visited and have correct initial offsets, and doesn't require a
special check to know whether an EBB has been visited or not.