* Use `__builtin_setjmp` instead of `sigsetjmp`.
Use [`__builtin_setjmp`] instead of `sigsetjmp`, as it is implemented in
the compiler, performed inline, and saves much less state. This speeds up
calls into wasm by about 8% on my machine.
[`__builtin_setjmp`]: https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html
* Add a comment confirming that 5 really is the documented size.
* Add a comment about callee-saved state and __builtin_setjmp.
* On clang on aarch64, use sigsetjmp.
* Fix a stray `#endif`.