This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
This crate contains the metaprogram used by cranelift-codegen. It's not useful on its own.