Spotted by Clang's assembler. GAS is obviously too lenient.
// the byte substitution.
dup v0.4s, w14
aese v0.16b, v1.16b // effectively, just SubBytes
// the byte substitution.
dup v0.4s, w14
aese v0.16b, v1.16b // effectively, just SubBytes
b 2f
// First word of the cycle. Byte substitution, rotation, and round
b 2f
// First word of the cycle. Byte substitution, rotation, and round
1: ldrb w13, [x5], #1 // next round constant
dup v0.4s, w14
aese v0.16b, v1.16b // effectively, just SubBytes
1: ldrb w13, [x5], #1 // next round constant
dup v0.4s, w14
aese v0.16b, v1.16b // effectively, just SubBytes
eor w14, w13, w14, ror #8
// Common ending: mix in the word from the previous cycle and store.
eor w14, w13, w14, ror #8
// Common ending: mix in the word from the previous cycle and store.