chiark / gitweb /
Mark Wooding [Wed, 10 May 2017 20:52:30 +0000 (21:52 +0100)]
progs/rspit.c: Make the `salsae' tab be `const'.
Silly oversight.
Mark Wooding [Wed, 10 May 2017 20:51:45 +0000 (21:51 +0100)]
symm/hmac-def.h: Set HMAC keys up in a more principled manner.
No longer does it reach into the hash context and run `HASH_compress' by
hand.
This means that nothing assumes that `HASH_compress' exists any more.
Mark Wooding [Wed, 10 May 2017 20:50:04 +0000 (21:50 +0100)]
symm/hmac-def.h: Report key sizes as 16-bit quantities.
Hash states can be huge. It was an obvious mistake defining the
recommended key size in terms of the state size, but I can't change it
now.
Mark Wooding [Wed, 10 May 2017 20:46:39 +0000 (21:46 +0100)]
base/keysz.[ch]: Add a flag to say that arguments are 16 bits wide.
This breaks programs which thing they can parse arbitrary key-size
descriptors. The obvious such thing is the Python interface, so note
that we need a later version.
Mark Wooding [Mon, 1 May 2017 00:38:30 +0000 (01:38 +0100)]
symm/keccak1600.[ch]: Add the Keccak-p[1600, n] permutation.
Currently just a special snowflake. Fancier things forthcoming.
Mark Wooding [Wed, 10 May 2017 19:58:34 +0000 (20:58 +0100)]
symm/sha512.[ch], etc.: Support SHA512/224 and SHA512/256.
These are more truncated versions of SHA512 with different initial
values. The point of the exercise is performance: SHA512 runs faster
than SHA256 on 64-bit processors (it munches twice as much data per run
through the compression function, but has only 25% more rounds). Add
test vectors for the hash function from NIST and Wikipedia, and HMAC
tests I found under a rock.
Mark Wooding [Wed, 10 May 2017 19:53:27 +0000 (20:53 +0100)]
symm/t/sha...: Add official NIST HMAC test vectors.
I found some at last, annoyingly provided as PDF documents.
Unsurprisingly, the code passed first time.
Strange: the tests include two tests for the message `Sample message for
keylen=blocklen', exactly one of which has the key length equal to the
block length. Whatevs.
Mark Wooding [Wed, 10 May 2017 18:48:20 +0000 (19:48 +0100)]
symm/: Eliminate the remaining checked-in stubby source files.
Now that $(STUBS_SRC) actually works, use it to eliminate `safersk.c',
`sha224.c', `sha384.c', and `whirlpool256.c'. Move test vectors to
their new homes, and modify the base files to actually run them.
Alas, the build machinery wants to ship `t/safersk' even though it's
empty, so leave it as a stub. (Maybe...) And the HMAC mode machinery
wants to put its test in the mode test-vector file, which is a bit
annoying. Still, the cruft is reduced.
Mark Wooding [Wed, 10 May 2017 18:36:44 +0000 (19:36 +0100)]
symm/stub.c.in: Add a trivial test rig which says to look over there.
As hinted.
Mark Wooding [Wed, 10 May 2017 18:35:59 +0000 (19:35 +0100)]
symm/Makefile.am: Add a `base' column to the $(STUBS_SRC) list.
The list is currently empty, so this is just a matter of fiddling with
the bits of Makefile which process it. But it means that we can add
things to `stub.c.in' which refer to the base C file, for example to
tell a reader where the real thing is.
Mark Wooding [Wed, 10 May 2017 19:26:41 +0000 (20:26 +0100)]
symm/blkc.h, symm/hash.h: Factor out pieces of the test machinery.
This will allow a source file to include tests for a hash function or
block cipher /and/ other kinds of tests. Possibly even for another hash
function or block cipher.
This was mostly done already for block ciphers: the remaining piece
involved making a macro to populate the test table. But hash functions
haven't been as fortunate.
Fix the new definitions to allow non-identifier names for hashes and
block ciphers, to match the mode definitions.
Mark Wooding [Wed, 10 May 2017 18:29:41 +0000 (19:29 +0100)]
symm/: Allow block cipher and hash functions with strange names.
This is quite a performance, actually.
* The `multigen' tool now has a modifier `:f' which makes a filename-
safe version of a value.
* The `multigen' input files and `Makefile.am' have been changed to
use `:f' appropriately.
* All of the `MUMBLE-def.h' header files have been changed to
introduce a new macro `MUMBLE_DEFX' with two extra arguments: the
thing's presentable name (for use in class structures), and a
filename-safe version of it. The old `MUMBLE_DEF' macro still
exists for compatibility (has anyone else written a mode?).
* Similar changes have been made to the testing machinery in `blkc.h'
and `hash.h', but this still needs cleaning up somewhat.
Mark Wooding [Wed, 10 May 2017 21:24:53 +0000 (22:24 +0100)]
math/{genlimits.c,mpdump.c}: Delete long-defunct source files.
These programs' jobs have been taken over by `mpgen', which is much
better at it.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
progs/: Generate XDH and EdDSA operations using macros.
There are already two very similar XDH implementations, and EdDSA is
likely to have more. Let's not write more code than we need to.
Mark Wooding [Wed, 10 May 2017 20:03:51 +0000 (21:03 +0100)]
pub/
ed25519.c: Use the correct type for the field-element constants.
This fixes a bug: `bz_pieces' had the wrong type, but likely worked
anyway by luck -- especially on little-endian machines.
Mark Wooding [Wed, 10 May 2017 20:01:03 +0000 (21:01 +0100)]
math/f{25519,goldi}.[ch]: Export the piece type.
Mark Wooding [Wed, 10 May 2017 20:19:54 +0000 (21:19 +0100)]
math/scaf.c: Add some debugging utilities I found handy.
Mark Wooding [Wed, 10 May 2017 20:19:32 +0000 (21:19 +0100)]
math/scaf.c: Fix conditional subtractions in `scaf_reduce'.
So that they actually subtract the right thing. Obvious blunder. The
big surprise is that none of the literally thousands of
Ed25519 tests
which have hammered on that code caught it. (Found during development
of Ed448, coming later.)
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
pub/rsa-pub.c: Implement the optimal addition chains for e = 3, e = 65537.
Also add tests for e = 3 (previously missing) and e = 17 (to exercise
the general modexp path).
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
progs/perftest.c: Allow setting the public exponent in RSA tests.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
pub/rsa-gen.c, progs/key.c: Overhaul RSA key generation.
Rewrite the key-generation code from scratch. The new version seems
simpler to me, and allows the caller to choose the public exponent. It
also retries repeatedly until it finds acceptable values unless told to
stop within a finite number of steps.
Add an option to `key' to allow the user to select a different
exponent. Recommend e = 3 in the manpage.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
math/strongprime.c: Improve the commentary.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
math/strongprime.c: Replace inexplicable exponentiation with extended-gcd.
For some reason, I calculated s^-1 as s^{r-2} (mod r). This code isn't
even slightly constant-time, and gcd is faster than modexp. Also, this
bit isn't time-critical anyway, and the code is way simpler like this.
Mark Wooding [Sun, 14 May 2017 03:11:09 +0000 (04:11 +0100)]
Merge branch '2.3.x'
* 2.3.x:
Release 2.3.1.
pub/bbs-gen.c, pub/rsa-gen.c: Remove the lower-bounding on q.
math/strongprime.c: Clamp the starting point.
math/strongprime.c: Reduce failures by adding some more slop bits.
progs/catcrypt.c, progs/cc-sig.c: Compare MAC tags in constant time.
progs/cc-sig.c: Initialize hash context properly for RSA-PSS.
progs/cc-sig.c: Don't destroy an RSA context just after building it.
math/g-bin.c, math/g-prime.c: Fix type incompatibility.
math/g-*.c: Group implementations include `group.h' via `group-guts.h'.
key/key-io.c: Produce valid key lines for empty keys.
key/key-io.c: Fix segfault opening `KOPEN_READ | KOPEN_NOFILE' key files.
Conflicts:
math/group-guts.h (trivial)
progs/catcrypt.c (already picked up)
Mark Wooding [Sat, 13 May 2017 14:21:43 +0000 (15:21 +0100)]
Release 2.3.1.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
pub/bbs-gen.c, pub/rsa-gen.c: Remove the lower-bounding on q.
It's unnecessary. It was a bad idea because it biases q quite heavily,
but now `strongprime' generates primes in the right interval so that
getting the right bit length isn't a problem.
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
math/strongprime.c: Clamp the starting point.
Now the result will be in the upper quarter of the `obvious' range, and
the product of two such values is guaranteed to have the desired number
of bits. This saves callers from doing stupid things like trying to
clamp one of the factors by hand, which ends up significantly biasing
the second factor. (This isn't very bad, because there's a /lot/ of
randomness in the chosen congruence class, but it's good to fix this
sort of thing.)
Mark Wooding [Thu, 11 May 2017 09:42:15 +0000 (10:42 +0100)]
math/strongprime.c: Reduce failures by adding some more slop bits.
In my experiments, failures were happening about 2--3% of the time,
which is way more than one is really willing to tolerate.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/catcrypt.c, progs/cc-sig.c: Compare MAC tags in constant time.
Mark Wooding [Mon, 17 Apr 2017 23:03:01 +0000 (00:03 +0100)]
progs/cc-sig.c: Initialize hash context properly for RSA-PSS.
Somehow this seemed to work anyway on my machine; but valgrind agrees
that it was wrong.
Mark Wooding [Mon, 17 Apr 2017 22:31:11 +0000 (23:31 +0100)]
progs/cc-sig.c: Don't destroy an RSA context just after building it.
It causes an assertion failure later. Really embarrassing.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/g-bin.c, math/g-prime.c: Fix type incompatibility.
Callers of the abstract group API expect to pass in a pointer-to-
structure. The binary and prime group implementations expected a
pointer-to-pointer, which looks different. Change the way these work,
so that the group element is a structure holding a pointer, rather than
just a bare pointer. This doesn't make any difference on targets with
sane ABIs, but it fixes a potentially nasty problem on weirder
platforms.
Add a macro explaining this change so that users of this unstable
interface can cope with both versions.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/g-*.c: Group implementations include `group.h' via `group-guts.h'.
And not directly.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
key/key-io.c: Produce valid key lines for empty keys.
If a key contains only an empty tree of structures, then `key_write'
returns an empty string, which breaks the whitespace-separated field
structure of the output key line. Notice this and insert an empty
structure by hand as an unpleasant bodge.
The resulting key is still highly anomalous. In particular, it doesn't
match any filter, because structure nodes don't have flags. I don't
know what to do about this.
Mark Wooding [Sat, 13 May 2017 11:27:31 +0000 (12:27 +0100)]
key/key-io.c: Fix segfault opening `KOPEN_READ | KOPEN_NOFILE' key files.
They're useless, but they shouldn't cause a crash.
Mark Wooding [Sun, 30 Apr 2017 17:43:46 +0000 (18:43 +0100)]
Merge branches 'mdw/latin-ietf' and 'mdw/curve25519'
* mdw/latin-ietf:
symm/{chacha,salsa20}.[ch]: Support RFC7539-style 96-bit nonces.
symm/{chacha,salsa20}.c: Change how the test code sets up the cipher.
symm/{chacha,salsa20}.c: Abstract out cipher and rand initialization.
symm/{chacha,salsa20}.[ch]: Compress systematic naming better in comments.
symm/stub.h.in: Fix bogus characters in the include guard macro name.
symm/stub.h.in: Add include guard around header.
symm/t/chacha: Fix typo in comment.
* mdw/curve25519:
pub/, progs/: Add support for X448 key exchange, defined in RFC7748.
math/fgoldi.c: Add support for Hamburg's `Goldilocks' field.
pub/, progs/: Implement Bernstein's
Ed25519 signature scheme.
math/f25519.[ch]: More field operations.
pub/, progs/: Implement Bernstein's X25519 key-exchange algorithm.
math/f25519.c: Implementation for arithmetic in GF(2^255 - 19).
.gitignore, utils/.gitignore: Change Sage ignore rules.
Mark Wooding [Wed, 26 Apr 2017 10:55:08 +0000 (11:55 +0100)]
pub/, progs/: Add support for X448 key exchange, defined in RFC7748.
Mark Wooding [Wed, 26 Apr 2017 10:54:29 +0000 (11:54 +0100)]
math/fgoldi.c: Add support for Hamburg's `Goldilocks' field.
GF(2^448 - 2^224 - 1).
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
pub/, progs/: Implement Bernstein's
Ed25519 signature scheme.
Mark Wooding [Wed, 26 Apr 2017 10:53:05 +0000 (11:53 +0100)]
math/f25519.[ch]: More field operations.
Most are fairly simple utilities, except for `f25519_quosqrt' which does
a combined division and square root.
Mark Wooding [Mon, 17 Apr 2017 23:39:24 +0000 (00:39 +0100)]
pub/, progs/: Implement Bernstein's X25519 key-exchange algorithm.
Mark Wooding [Mon, 17 Apr 2017 23:39:24 +0000 (00:39 +0100)]
math/f25519.c: Implementation for arithmetic in GF(2^255 - 19).
There's both a fast implementation for platforms with 64-bit arithmetic,
and a slow baseline for minimal C89 platforms. The code works better on
two's complement systems with arithmetic right shifts, but it works
portably.
* Arithmetic shifts are implemented with hairy masking and exact
division, but GCC notices and optimizes accordingly.
* Two's complement is used in the conditional-swap machinery, but
there's a fallback using multiplication if the `configure' script
can't detect it.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/{chacha,salsa20}.[ch]: Support RFC7539-style 96-bit nonces.
I think these are a bad idea, but they'll be popular (and are etched
into the AEAD proposal).
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/{chacha,salsa20}.c: Change how the test code sets up the cipher.
Introduce a macro which does the key, nonce and position setup in one
go.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/{chacha,salsa20}.c: Abstract out cipher and rand initialization.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/{chacha,salsa20}.[ch]: Compress systematic naming better in comments.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/stub.h.in: Fix bogus characters in the include guard macro name.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/stub.h.in: Add include guard around header.
Most Catacomb public headers do this, so the stubs ought to too.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/t/chacha: Fix typo in comment.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
.gitignore, utils/.gitignore: Change Sage ignore rules.
It seems Sage now makes `.sage.py' files instead of plain `.py'. This
is a much better idea, and it means that we can have a single rule to
ignore all of them.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/cc-kem.c: Add `naclbox' crypto transform.
This uses Salsa20/r (or ChaChar) and Poly1305 in the same way as NaCl
`secretbox'. Difference: NaCl uses XSalsa20 for the extended nonce
size, but we have no need of that here.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/catcrypt.c, progs/cc-kem.c: Refactor bulk encryption.
The bulk crypto transform is now owned by the KEM machinery, and
provided to callers as one object rather than a bunch of little
components. There are some conceptual changes in the UI, but in fact
everything still works the way it did before.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/key.c: Support applying parameters in all key-generation algorithms.
If the algorithm itself can't make use of parameters, at least it can
copy the key attributes.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/key.c: Let `copyparam' worry about the parameter key's type.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/key.c: Report full parameter-key name in errors about it.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/catcrypt.c, progs/cc-sig.c: Compare MAC tags in constant time.
Mark Wooding [Mon, 17 Apr 2017 23:03:01 +0000 (00:03 +0100)]
progs/cc-sig.c: Initialize hash context properly for RSA-PSS.
Somehow this seemed to work anyway on my machine; but valgrind agrees
that it was wrong.
Mark Wooding [Mon, 17 Apr 2017 22:31:11 +0000 (23:31 +0100)]
progs/cc-sig.c: Don't destroy an RSA context just after building it.
It causes an assertion failure later. Really embarrassing.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
key/key-io.c: Produce valid key lines for empty keys.
If a key contains only an empty tree of structures, then `key_write'
returns an empty string, which breaks the whitespace-separated field
structure of the output key line. Notice this and insert an empty
structure by hand as an unpleasant bodge.
The resulting key is still highly anomalous. In particular, it doesn't
match any filter, because structure nodes don't have flags. I don't
know what to do about this.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/g-bin.c, math/g-prime.c: Fix type incompatibility.
Callers of the abstract group API expect to pass in a pointer-to-
structure. The binary and prime group implementations expected a
pointer-to-pointer, which looks different. Change the way these work,
so that the group element is a structure holding a pointer, rather than
just a bare pointer. This doesn't make any difference on targets with
sane ABIs, but it fixes a potentially nasty problem on weirder
platforms.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/g-*.c: Group implementations include `group.h' via `group-guts.h'.
And not directly.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/...: Make a number of functions be const-correct.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/poly1305.c: Implement `flushzero' to zero-pad to a block boundary.
I prefer plain `flush', but not all implementations expose it. The
`flushzero' operation is the one wanted by RFC7539 AEAD.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/poly1305.c: Implement Bernstein's Monte-Carlo test.
I did run the full test once, but it took almost an hour.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/t/poly1305: Add the tests from Bernstein's original paper.
They were tucked away in an appendix and I missed them. Also, I
implemented from the NaCl paper, which is a better fit for modern usage.
Mark Wooding [Fri, 14 Apr 2017 22:27:50 +0000 (23:27 +0100)]
Merge branch '2.3.x'
* 2.3.x:
symm/salsa20.[ch]: Add missing LGPL notices.
math/mpx-mul4-test.c: Set `dstr' length correctly in conversion function.
symm/chacha.c: Fix `tell' response.
symm/chacha.[ch]: Fix comment headers.
symm/{chacha.c,salsa20.c}: Fix random generator allocation sizes.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/poly1305.c: Fix 16/32-bit `carry_reduce'.
I managed to botch the bounds last time.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/salsa20.[ch]: Add missing LGPL notices.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/mpx-mul4-test.c: Set `dstr' length correctly in conversion function.
(cherry picked from commit
b00264d9e2ac2f2be2808e7ad663c35115519504)
Mark Wooding [Thu, 13 Apr 2017 13:47:28 +0000 (14:47 +0100)]
symm/chacha.c: Fix `tell' response.
Mark Wooding [Thu, 13 Apr 2017 14:50:46 +0000 (15:50 +0100)]
symm/chacha.[ch]: Fix comment headers.
Mark Wooding [Thu, 13 Apr 2017 13:47:11 +0000 (14:47 +0100)]
symm/{chacha.c,salsa20.c}: Fix random generator allocation sizes.
This makes a real mess.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/mpx-mul4-test.c: Set `dstr' length correctly in conversion function.
Mark Wooding [Sat, 8 Apr 2017 10:05:49 +0000 (11:05 +0100)]
symm/poly1305.c: Change reading of 26-bit pieces.
This way, the masks fit together visually.
Mark Wooding [Sat, 8 Apr 2017 08:52:56 +0000 (09:52 +0100)]
symm/poly1305.c: Fix visual code misalignment.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/perftest.c: Add performance test for Poly1305.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
progs/perftest.c: Split out magic table includes into their own stanza.
Mark Wooding [Fri, 7 Apr 2017 09:15:03 +0000 (10:15 +0100)]
symm/poly1305.h: Add missing `POLY1305_TAGSZ' definition.
Mark Wooding [Thu, 6 Apr 2017 16:31:30 +0000 (17:31 +0100)]
symm/poly1305.c: Fix 64-bit shift error.
Thank you, GCC, for warning about that.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
symm/: Implement Daniel Bernstein's `Poly1305' message authentication code.
Mark Wooding [Wed, 5 Apr 2017 08:01:13 +0000 (09:01 +0100)]
Release 2.3.0.1.
Mark Wooding [Wed, 5 Apr 2017 07:59:33 +0000 (08:59 +0100)]
base/asm-common.h: Fix the sense of the `WANT_EXECUTABLE_STACK' check.
Brown paper bag time.
Mark Wooding [Wed, 5 Apr 2017 08:05:59 +0000 (09:05 +0100)]
math/: Distribute the `mpx-mul4' test vectors, with the correct name.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
math/: Add low-level testing for accelerated `mpx-mul4' multiplier.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
Makefile.am: Some reformatting.
Mark Wooding [Thu, 26 May 2016 08:26:09 +0000 (09:26 +0100)]
vars.am: Some reformatting.
Mark Wooding [Mon, 3 Apr 2017 09:25:30 +0000 (10:25 +0100)]
Release 2.3.0.
Mark Wooding [Wed, 4 Jan 2017 01:42:16 +0000 (01:42 +0000)]
math/mpx-mul4-amd64-sse2.S: SSE2 multipliers for AMD64.
Plus the various hangers on.
Mark Wooding [Wed, 4 Jan 2017 01:41:22 +0000 (01:41 +0000)]
math/mpx-mul4-x86-sse2.S: Maintain a local copy of the counter.
I've no idea whether one's allowed to mutate a parameter passed on the
stack. Play it safe.
This means that (a) the counter is now in a fixed place in the frame so
that `testtail' doesn't need to be told where it is, an (b)
`testprologue' needs to initialize it from the caller's parameter, so it
needs to grow a macro argument.
Mark Wooding [Wed, 4 Jan 2017 01:35:50 +0000 (01:35 +0000)]
math/mpx-mul4-x86-sse2.S: Make stack alignment more standard.
This actually slightly reduces the amount of stack needed, but I don't
quite understand why. There's a knock-on rearrangement of the stack
frame in the test wrappers and C-interface subroutines.
There's also a slightly sneaky introduction of space for a later change.
But there shouldn't be any externally observable difference.
Mark Wooding [Wed, 4 Jan 2017 01:36:56 +0000 (01:36 +0000)]
math/mpx-mul4-x86-sse2.S: Slightly reorder to reduce dependence.
Doesn't help much.
Mark Wooding [Wed, 4 Jan 2017 01:36:13 +0000 (01:36 +0000)]
math/mpx-mul4-x86-sse2.S: Fix comment formatting.
Mark Wooding [Thu, 29 Dec 2016 15:24:56 +0000 (15:24 +0000)]
math/mpx-mul4-x86-sse2.S: Additional piece of commentary.
Mark Wooding [Thu, 29 Dec 2016 15:24:26 +0000 (15:24 +0000)]
math/mpx-mul4-x86-sse2.S: Use default arguments for macros.
I'd muddled up my macro languages and misremembered that GNU as handles
omitted macro arguments sensibly. So use default argument values
throughout. Some of the macro arguments have been reordered to make
defaulting work better. No functional change.
Mark Wooding [Thu, 29 Dec 2016 14:36:12 +0000 (14:36 +0000)]
math/mpx-mul4-x86-sse2.S: Use the correct vector-multiply instruction.
Not sure why GNU as let me get away with that.
Mark Wooding [Sat, 5 Nov 2016 21:28:22 +0000 (21:28 +0000)]
math/mpx-mul4-x86-sse2.S: Give `squash' an explicit destination argument.
Also, rearrange the arguments so the destination(s) are at the start.
Mark Wooding [Sat, 5 Nov 2016 21:28:22 +0000 (21:28 +0000)]
math/mpx-mul4-x86-sse2.S: Optimize `squash'.
We can use `punpckldq' to assemble the 32-bit pieces, rather than a lot
of shifting to clear bits and then `por'.
Mark Wooding [Sat, 5 Nov 2016 21:28:22 +0000 (21:28 +0000)]
math/mpx-mul4-x86-sse2.S: Use `movdqa' to move between XMM registers.
Not `movdqu'. I don't think there's a performance difference (any
more), but it's better style.
Mark Wooding [Sat, 5 Nov 2016 21:28:22 +0000 (21:28 +0000)]
math/mpx-mul4-x86-sse2.S: Add an extra blank line to improve layout.
Mark Wooding [Sat, 5 Nov 2016 21:28:22 +0000 (21:28 +0000)]
math/mpx-mul4-x86-sse2.S: Fix operand name in commentary.