I copied the `bmask` precise-output tests from x64 and used
CRANELIFT_TEST_BLESS=1 to generate this test.
I don't know aarch64 well enough to decide if this output is correct.
However, for I128 it is identical to the previous I128-only
precise-output tests, and the existing runtests for bmask pass on
aarch64, so I think it's likely correct.