chiark / gitweb /
symm/gcm-*.S: GCM acceleration using hardware polynomial multiplication.
Add assembler implementations of the low-level GCM arithmetic which make
use of polynomial multiplication instructions on x86 (the delightfully
named `pclmul{l,h}q{l,h}dq' instructions) and ARM processors (the ARM32
`vmull.p64' and ARM64 `pmull{,2}' instructions). Of course, this
involves adding the necessary CPU feature detection.
GCM's bit and byte order is remarkably confusing. I've tried quite hard
to write the code so as to help the reader keep track of which bits are
where, but it's very difficult.
There's also a Python implementation which has proven invaluable while
debugging these things.