The original idea was this: since one can change one's view of how the
bits in an XMM register are divided into lanes on a per-instruction
basis, it would make more sense if I took a single consistent view of
how the bits are arranged, with the least significant on the right and
the most significant on the left. Therefore, I listed the shuffle
indices from left to right, counting from right to left.
This, I now realise, was a mistake. The thing which finally made this
clear to me was that it makes the order of indices in the `SHUF' macro
be inconsistent with the order of bytes in a table for the SSSE3
`pshufb' instruction, and I can't do anything about that.
So: change the order of the arguments, and track down all uses of this
macro to fix them. Sorry about that.
To verify that I got them all:
for i in $(git grep -l SHUF); do
git blame -- $i | grep SHUF
done | less