chiark / gitweb /
symm/salsa20-*.S: Optimize the output permutations.
authorMark Wooding <mdw@distorted.org.uk>
Tue, 1 Nov 2016 22:38:41 +0000 (22:38 +0000)
committerMark Wooding <mdw@distorted.org.uk>
Tue, 1 Nov 2016 22:38:41 +0000 (22:38 +0000)
commit3cb47d2759de69a9e4fdf0030f518ca513b59c7b
treeb38846799206ed9f2008054a968719b5fc0cd5aa
parentd6b9dc043945e8b65ebcd84bfa2668e93041f598
symm/salsa20-*.S: Optimize the output permutations.

A little analysis, and a lot of trial and error, shows reveals that the
state permutation can be decomposed into some rotations of the rows, a
matrix transpose, and another rotation of the rows.  These steps can be
done moderately efficiently using the Intel and ARM SIMD instructions.
symm/salsa20-arm-neon.S
symm/salsa20-x86ish-sse2.S