The control flow graph does not guarantee any particular ordering for
its successor lists, and the post-order we are computing for building
the dominator tree needs to be "split-invariant".
See #146 for details.
- Discover EBB successors directly from the EBB instruction sequence to
guarantee that the post-order we compute is canonical/split-invariant.
- Use an alternative graph DFS algorithm which doesn't require indexing
into a slice of successors.
This changes cfg_postorder in some cases because the edge pruning when
converting the (DAG) CFG to a tree for the DFT is different.
These tests were only using "test compile" because it doesn't require
any filecheck directives to be present, so just stop requiring filecheck
directives for "test regalloc" and other filecheck-based test drivers.
Fixes#178.
When an instruction with a fixed output operand defines a globally live
SSA value, we need to check if the fixed register is available in the
`regs.global` set of registers that can be used across EBB boundaries.
If the fixed output register is not available in regs.global, set the
replace_global_defines flag so the output operands are rewritten as
local values.
Fixes#175.
The Intel division instructions have fixed input operands that are
clobbered by fixed output operands, so the value passed as an input will
be clobbered just like a tied operand.
The FixedTied operand constraint is used to indicate a fixed input
operand that has a corresponding output operand with the same fixed
register.
Teach the spiller to teach a FixedTied operand the same as a Tied
operand constraint and make sure that the input value is killed by the
instruction.
`stack_addr` and unrestricted loads and stores *can* be used with entirely
defined behavior. The sense in which they're not "safe" is only that it's
possible to misuse them. This subtlety wasn't captured in the definition
of "safe" here, so for now, just remove the definition so that it doesn't
cause confusion.
* Clarify undefined behavior and notrap.
Remove the "No undefined behavior" paragraph from the README. The other
paragraphs, specifically "Portable semantics" and
"Fast sandbox verification", describe Cretonne's goals in this area.
Define *addressable* and *accessible* memory, so that trapping remains a fully defined part of the semantics, and we have a clear boundary around undefined behavior, and use these terms to describe related constructs.
Add EBB parameter and EBB argument to the langref glossary to clarify
the distinction between formal EBB parameter values and arguments passed
to branches.
- Replace "ebb_arg" with "ebb_param" in function names that deal with
EBB parameters.
- Rename the ValueDef variants to Result and Param.
- A bunch of other small langref fixes.
No functional changes intended.
This allows GVN to avoid hoisting them. These will be to coarse for
things that want more precise dependence information, however we can
work that out when we build such things.
Add integer and floating comparison instructions that return CPU flags:
ifcmp, ifcmp_imm, and ffcmp.
Add conditional branch instructions that check CPU flags: brif, brff
Add instructions that check a condition in the CPU flags and return a
b1: trueif, trueff.
These two value types represent the state of CPU flags after an integer
comparison and a floating point comparison respectively.
Instructions using these types TBD.
The value types are now classified into three groups:
1. Lane types are scalar types that can also be used to form vectors.
2. Vector types 2-256 copies of a lane type.
3. Special types. This is where the CPU flag types will go.
The special types can't be used to form vectors.
Change the numbering scheme for value types to make room for the special
types and add `is_lane()` and `is_special()` classification methods.
The VOID type still has number 0, but it can no longer appear as a
vector lane. It classifies as special now.
The word "scalar" is a bit vague and tends to mean "non-vector". Since
we are about to add new CPU flag value types that can't appear as vector
lanes, make the distinction clear: LaneType represents value types that
can appear as a vector lane.
Also replace the Type::is_scalar() method with an is_vector() method.
Track allocatable registers both locally and globally: Add a second
AllocatableSet which tracks registers allocated to global values without
accounting for register diversions. Since diversions are only local to
an EBB, global values must be assigned un-diverted locations that don't
interfere.
Handle the third "global" interference domain in the constraint solver in
addition to the existing "input" and "output" domains.
Extend the solver error code to indicate when a global define just can't
be allocated because there are not enough available global registers.
Resolve this problem by replacing the instruction's global defines with
local defines that are copied into their global destinations
afterwards.
The register allocator can't handle branches with constrained register
operands, and the brz.b1/brnz.b1 instructions only have the t8jccd_abcd
in 32-bit mode where no REX prefixes are possible.
This adds a worst case encoding for those cases where a b1 value lives
in a non-ABCD register.
This renames WasmRuntime to ModuleEnvironment, and makes several changes
to allow for more flexible compilation.
ModuleEnvironment no longer derives from FuncEnvironment, and no longer
has the `begin_translation` and `next_translation` functions, so that
independent `FuncEnvironment` instances can operate within the same
module.
Also, this obviates the rest of TranslationResult, as it moves processing
of function bodies into the environment. The DummyEnvironment implementation
gives an example of decoding the function bodies as they are parsed, however
other implementation strategies are now possible.
Also, redo how functions are named in the DummyRuntime. Use the FunctionName
field to just encode the wasm function index rather than trying to shoehorn
a printable name into it. And to make up for that, teach the wasm printer
to print export names as comments next to the function definitions.
This also makes the fields of DummyRuntime public, in preparation for
the DummyRuntime to have a more general-purpose debugging role, as well
as possibly to allow it to serve as a base for other implementations.
The register allocator doesn't even try to compile unreachable EBBs, so
any values defined in such blocks won't be assigned registers.
Since the dominator tree already has determined which EBBs are
reachable, we should just eliminate any unreachable blocks instead o
trying to do something with the dead code.
Not that this is not a "dead code elimination" pass which would also
remove individual instructions whose results are not used.
- Create a new kind of stack slot: emergency_slot.
- Add a get_emergency_slot() method which finds a suitable emergency
slot given a list of slots already in use.
- Use emergency spill slots when schedule_moves needs them.