This is also https://bugzilla.mozilla.org/show_bug.cgi?id=1552737.
Cranelift currently has a tendency to create redundant copies (self-copies) of
values from a stack slot back to the same stack slot. This generates a
pointless load and store and an unnecessary register use. The copies are
created by `visit_inst` in regalloc/reload.rs. They appear to occur mostly,
but not exclusively, at loop heads. It's unclear why this happens.
This patch adds a special case to `visit_inst` to find such copies. They are
converted into a new instruction, `copy_nop`, which takes and produces the
same SSA names, so as not to break any of the SSA invariants, but which has a
zero-length encoding, hence removing the copy at emission time.
`copy_nop`s source and destination operands must be stack slots and of course
the *same* stack slot. The verifier has been enhanced to check this, since
misuse of `copy_nop` will likely lead to hard-to-find incorrect-code bugs.
Attempts were made to write a standalone .clif test case. But these failed
because it appears the .clif parser accepts but ignores location hints that
are stack slots. So it's impossible to write, in clif, the exact form of
`copy` instruction that triggers the transformation.
This adds test cases to ensure, to a reasonably high degree of certainty, that
the magic-number generators `magic_u32`, `magic_s32`, `magic_u64` and
`magic_s64` work correctly. This is done by iterating through a large number
of `(n, d)` pairs, generating the magic numbers for `d`, interpreting the
magic numbers so as to perform the division, and comparing against the result
produced directly by the hardware. The distribution of numbers is arranged so
that particular emphasis is given to corner cases -- the range ends and
midpoints -- but also so that there is at least some cover for values away
from those areas. In total 50,148,000 tests are performed.
Also:
- make sure to apply diversions when determining offsets for code
relaxation.
- select the smallest encoding when selecting a relaxed branch
instruction.
- Both the `wasm` and `compile` commands get this new subcommand, and it defaults to false. This means that test runs with `wasm` can request disassembly (the main reason I am doing this) while test runs with `compile` now must request it, this changes current behavior.
- Switch to using context.compile_and_emit directly, and make the reloc and trap printers just accumulate output, not print it. This allows us to factor the printing code into the disasm module.
* Use single index for param register allocation for windows callconv (#691)
The used registers depend entirely on the parameter index (1st, 2nd, 3rd, 4th, ... param)
and we cannot shift unused registers to other indexes, if they are not designated for
the use for that parameter index.
This was previously using the following condition to decide that a block
hadn't been visited yet: either dest_offset is non-0 or the block isn't
the entry block. Unfortunately, this didn't work when the first block
would be non-empty but wouldn't generate code at all.
Since the original code would do at least one pass over the entire code,
the first pass that determines initial EBB offsets is done separately,
without considering branch relaxation. This ensures that all EBBs have
been visited and have correct initial offsets, and doesn't require a
special check to know whether an EBB has been visited or not.