use approach with copysign for handling negative zero
format
refactor for better branch prediction
move copysign back to internal branch
format
fix
use abs instead branches
better comments
switch arms for better branch prediction
Baldrdash's API requires that there is at most one result in a register,
across all the possible register classes: in particular, it's not
possible to return an i64 value in a register while returning an v128
value in another register.
This patch adds a notion of "remaining register values", so this is
properly taking into account when choosing whether a return value may be
put into a register or not.
* entity: Fix typo in generated documentation
The same function documentation was used for `from_u32()` and `as_u32()`
while their behaviour is different
instead of always being relative to CARGO_MANIFEST_DIR, each use site is
responsible for either putting that variable or another one (set by a
build.rs) at the start of witx paths.
The logic for generation of shifts-by-immediate was not quite right. The result was that even
shifts by an amount known at compile time were being done by moving the shift immediate into %cl
and then doing a variable shift by %cl. The effect is worse than it sounds, because all of
those shift constants are small and often used in multiple places, so they were GVN'd up and
often ended up at the entry block of the function. Hence these were connected to the use points
by long live ranges which got spilled. So all in all, most of the win here comes from avoiding
spilling.
The problem was caused by this line, in the `Opcode::Ishl | Opcode::Ushr ..` case:
```
let (count, rhs) = if let Some(cst) = ctx.get_constant(inputs[1].insn) {
```
`inputs[]` appears to refer to this CLIF instruction's inputs, and bizarrely `inputs[].insn` all
refer to the instruction (the shift) itself. Hence `ctx.get_constant(inputs[1].insn)` asks
"does this shift instruction produce a constant" to which the answer is always "no", so the
shift-by-unknown amount code is always generated. The fix here is to change that expression to
```
let (count, rhs) = if let Some(cst) = ctx.get_input(insn, 1).constant {
```
`get_input`'s result conveniently includes a `constant` field of type `Option<u64>`, so we just
use that instead.
The fundamental problem is that the target distance of jump-like operations may change in the DWARF expression translation process. Intervening DW_OP_deref will expand to about 10 bytes, for example.
So the jumps must be relocated. We approach this task by inserting artificial LandingPad markers (new CompiledExpressionParts constructors) into the parsed vector at actual Jump targets.
LandingPads are identified by JumpTargetMarker tokens which are generated on the fly.
Additionally we now parse the Jump instructions. These also get their corresponding JumpTargetMarker token.
We bail in two situations:
frame_base is too complicated (i.e. itself contains Jump)
some jump distance in the original expression is fishy.
Clif files are not meant to be written by end-users anyway. The main
effects are that non-ascii identifiers fail to lex instead of parse and
whitespace must now be in the ascii range. Comments still have full
unicode support.
This also inlines all char::is_* methods to avoid nested matches.
Overall this results in a slight reduction of instruction count.
The pinned register hack didn't work because the GetPinnedReg is marked
as having side-effects, so that GVN wouldn't try to common it out.
This commit tweaks the function used during lowering to vcode, so that
the GetPinnedReg opcode is specially handled. It's a bit lame, but it
makes the hack work again.
Also, use_input needs to be a no-op for real registers.
It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`,
`AtomicLoad`, `AtomicStore` and `Fence`.
The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad`
becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a
normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to `cmpxchg` it back to memory, and repeats if necessary.
This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.
The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix`
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).
In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.
No point in resorting to LEB128 encoding for such constants,
using the native `u32` is faster and more compact.
Adds `write_u32` method to facilitate this.
This change primarily adds the ability to lower packed `[move|load|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move|load|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).