As found by @julian-seward1, movss/movsd aren't included in the zero-latency move instructions section of the Intel optimization manual. Use MOVAPS instead for those moves.
This crate contains the core Cranelift code generator. It translates code from an intermediate representation into executable machine code.