This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
Rename the existing AtomicRMW to AtomicRMWLoop and directly lower
atomic_rmw operations, without a loop if LSE support is available.
Copyright (c) 2021, Arm Limited
There were cases where the AArch64 backend assumed that an IR
operation would always operate on certain types (the most likely
reason being that the corresponding WebAssembly instruction did
not cover anything else), even though the definition of the IR
operation imposed no constraints like that.
Copyright (c) 2021, Arm Limited.
The tests for the SIMD floating-point maximum and minimum operations
require particular care because the handling of the NaN values is
non-deterministic and may vary between platforms. There is no way to
match several NaN values in a test, so the solution is to extract the
non-deterministic test cases into a separate file that is subsequently
replicated for every backend under test, with adjustments made to the
expected results.
Copyright (c) 2021, Arm Limited.
The AArch64 support was a bit broken and was using Armv7 style
barriers, which aren't required with Armv8 acquire-release
load/stores.
The fallback CAS loops and RMW, for AArch64, have also been updated
to use acquire-release, exclusive, instructions which, again, remove
the need for barriers. The CAS loop has also been further optimised
by using the extending form of the cmp instruction.
Copyright (c) 2021, Arm Limited.
This commit addresses two issues:
* A panic when shifting any non i128 type by i128 amounts (#3064)
* Wrong results when lowering shifts with small types (i8, i16)
In these types when shifting for amounts larger than the size of the
type, we would not get the wrapping behaviour that we see on i32 and i64.
This is because in these larger types, the wrapping behaviour is automatically
implemented by using the appropriate instruction, however we do not
have i8 and i16 specific instructions, so we have to manually wrap
the shift amount with an AND instruction.
This issue is also found on x86_64 and s390x, and a separate issue will
be filed for those.
Closes#3064
Also, reorganize the AArch64-specific VCode instructions for unary
narrowing and widening vector operations, so that they are more
straightforward to use.
Copyright (c) 2021, Arm Limited.
Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD.
This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers.
However currently we only recognize elf_gd so lets start with that as a TLS implementation.
Lowering icmp was duplicated across callers that only cared about
flags, and callers that only cared about the bool result.
Merge both callers into `lower_icmp` which does the correct thing
depending on a new IcmpOutput parameter.
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
if they are equal
* For overflow checks, we perform a i128 add and check the
resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
We compare both the low bits and high bits, and if the high bits are
equal we return the result of the unsigned comparision on the low bits
As with other i128 ops, we are still missing immlogic support.
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.