In particular:
- try to optimize the integer emission into a 32-bit emission, when the
high bits are all zero, and stop relying on the caller of `imm_r` to
ensure this.
- rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`.
- generate a sign-extending mov 32-bit immediate to 64-bits, whenever
possible.
- fix a few places where the previous commit did introduce the
generation of zero-constants with xor, when calling `put_input_to_reg`,
thus clobbering the flags before they were read.
It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`,
`AtomicLoad`, `AtomicStore` and `Fence`.
The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad`
becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a
normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to `cmpxchg` it back to memory, and repeats if necessary.
This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.
The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix`
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).
In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.
This change primarily adds the ability to lower packed `[move|load|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move|load|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).
This change is a pure refactoring--no change to functionality. It removes newlines between the `use ...` statements in the x64 backend so that rustfmt can format them according to its convention. I noticed some files had followed a manual convention but subsequent additions did not seem to fit; this change fixes that and lightly coalesces some of the occurrences of `use a::b; use a::c;` into `use::{b, c}`.
- put the division in the synthetic instruction as well,
- put the branch table check in the inst's emission code,
- replace OneWayCondJmp by TrapIf vcode instruction,
- add comments describing code generated by the synthetic instructions
Removes unneeded data structure that was holding instructions for
xmm based move instructions. These instructions can should be categorized
as rm not just r. This change is intended to simplify organization and
cases when lowering.
This patch implements the required but not already available
x64 instructions for copysign as well as the actual lowering sequence
and tests for the newly implemented x64 instructions.
Those instructions include:
andps,
andnps,
movaps,
movd,
orps,
The lowering sequence is based on the lowering for f32.copysign
in the current cranelift backend. movd does not have a test yet
due to some logic needed express a 32-bit register as a source
for xmm_rm_r instructions. This code also begins some
rethinking/refactoring of how the sse move instuctions
are written and so also includes new emit cases that will replace
current ones that match a different enum used to describe sse moves.
Adds support for lowering clif instructions Fdiv and Fmul
for new vcode backend. Misc adds lowering and test for
sqrtss and removes a redundant to_string() func for the
SseOpcode struct.
Adds support for addss and subss. This is the first lowering for
sse floating point alu and some move operations. The changes here do
some renaming of data structures and adds a couple of new ones
to support sse specific operations. The work done here will likely
evolve as needed to support an efficient, inituative, and consistent
framework.