This situation could be triggered that can_add_var would return true
while a variable was already added for the given register.
For instance, when we have a reassignment (because of a fixed register
input requirement) and a fixed input conflict on the same fixed
register, this register will not be available in the regs_in set after
inputs_done (because of the fixed input conflict diversion) but will
have its own variable.
Pushing on the `val_stack` vector is CL's biggest source of calls to
malloc/realloc/free, by some margin. It accounts for about 27.7% of all heap
blocks allocated when compiling wasm_lua_binarytrees. This change removes
pretty much all dynamic allocation by changing to a SmallVec<[Value; 8]>
instead. A fixed size of 4 gets all the gains to be had, in testing, so 8
gives some safety margin and is harmless from a stack-use perspective: 8
Values will occupy 32 bytes.
As a bonus, this change also reduces the compiler's dynamic instruction count
by about 0.5%.
It was implemented this way before to avoid borrow-checking issues,
where self would be both mutably borrowed (because of the solver) and
immutably borrowed (because of the ABI parameters list). This is worked
around by adding a local AbiParams struct which contains a summary of
the information that's needed by program_input_abi, allowing to retrieve
the ABI params within the method's body itself.
When using basic block instructions cannot be added in-between jump instructions which are ending basic blocks. These changes create extra basic blocks such that extra space is available for the spilling and moving registers where they are expected.
-Add resumable_trap, safepoint, isnull, and null instructions
-Add Stackmap struct and StackmapSink trait
Co-authored-by: Mir Ahmed <mirahmed753@gmail.com>
Co-authored-by: Dan Gohman <sunfish@mozilla.com>
PR #773 detects, at reload time, `copy` instructions that copy a value from stack
slot back to the same stack slot. It replaces them with `copy_nop` instructions
that have a null encoding (hence producing no code).
For x86_64, `copy_nop` encodings for the types I64, I32, F64 and F32 are
provided. Unfortunately the code that detects the redundant copy doesn't
check the type of the copied value, hence leaving itself open to the danger of
creating a `copy_nop` instruction cannot be encoded (which is different from
saying it has a null encoding).
This patch:
* Expands the x86_64 set of `copy_nop` encodings to: I64 I32 I16 I8 F64 and F32
* Adds encodings for the same for x86_32, rv64 and rv32.
* In `visit_inst()` in `reload.rs`, checks the type of the copied value accordingly.
* Adds comments explaining the above.
This is also https://bugzilla.mozilla.org/show_bug.cgi?id=1552737.
Cranelift currently has a tendency to create redundant copies (self-copies) of
values from a stack slot back to the same stack slot. This generates a
pointless load and store and an unnecessary register use. The copies are
created by `visit_inst` in regalloc/reload.rs. They appear to occur mostly,
but not exclusively, at loop heads. It's unclear why this happens.
This patch adds a special case to `visit_inst` to find such copies. They are
converted into a new instruction, `copy_nop`, which takes and produces the
same SSA names, so as not to break any of the SSA invariants, but which has a
zero-length encoding, hence removing the copy at emission time.
`copy_nop`s source and destination operands must be stack slots and of course
the *same* stack slot. The verifier has been enhanced to check this, since
misuse of `copy_nop` will likely lead to hard-to-find incorrect-code bugs.
Attempts were made to write a standalone .clif test case. But these failed
because it appears the .clif parser accepts but ignores location hints that
are stack slots. So it's impossible to write, in clif, the exact form of
`copy` instruction that triggers the transformation.