// First job is to slurp the matrix into XMM registers. The words
// have already been permuted conveniently to make them line up
// better for SIMD processing.
// First job is to slurp the matrix into XMM registers. The words
// have already been permuted conveniently to make them line up
// better for SIMD processing.