chiark / gitweb /
catacomb
5 weeks agoprogs/perftest.c: Use from Glibc syscall numbers. master
Mark Wooding [Sun, 17 Mar 2024 12:34:32 +0000 (12:34 +0000)]
progs/perftest.c: Use from Glibc syscall numbers.

6 weeks agosymm/gcm-arm64-pmull.S (mul192): Fix commentary.
Mark Wooding [Sun, 10 Mar 2024 18:05:23 +0000 (18:05 +0000)]
symm/gcm-arm64-pmull.S (mul192): Fix commentary.

6 weeks agoconfigure.ac: Lightly modernize.
Mark Wooding [Sun, 10 Mar 2024 16:46:38 +0000 (16:46 +0000)]
configure.ac: Lightly modernize.

  * Use `@%:@' for hashes.
  * Delete unnecessary toothpicks in `AC_DEFINE'.
  * Replace `AC_TRY_...' with `AC_..._IFELSE'.
  * Use `LT_INIT' instead of `AM_PROG_LIBTOOL'.

6 weeks agobase/dispatch-x86ish.S: Lift register allocation definitions.
Mark Wooding [Sun, 10 Mar 2024 16:08:11 +0000 (16:08 +0000)]
base/dispatch-x86ish.S: Lift register allocation definitions.

... and use the logical names during setup.  This seems to be the
convention I've followed elsewhere and it makes some sense as
(a) establishing a target for the following setup code to aim for,
and (b) giving a visual indication of how well we're getting there.

6 weeks agobase/dispatch-x86ish.S (dispatch_x86ish_cpuid): Add missing `#undef'.
Mark Wooding [Sun, 10 Mar 2024 16:07:19 +0000 (16:07 +0000)]
base/dispatch-x86ish.S (dispatch_x86ish_cpuid): Add missing `#undef'.

Leaked from the local register allocation switch.

6 weeks agobase/dispatch-x86ish.S (dispatch_x86ish_cpuid): Skip `EFLAGS_ID' dance on AMD64.
Mark Wooding [Sun, 10 Mar 2024 15:30:24 +0000 (15:30 +0000)]
base/dispatch-x86ish.S (dispatch_x86ish_cpuid): Skip `EFLAGS_ID' dance on AMD64.

The 64-bit instruction set postdates the `cpuid' instruction.

6 weeks agobase/dispatch-x86ish.S (dispatch_x86ish_xgetbv): Preserve `edi' on i386.
Mark Wooding [Sun, 10 Mar 2024 14:58:19 +0000 (14:58 +0000)]
base/dispatch-x86ish.S (dispatch_x86ish_xgetbv): Preserve `edi' on i386.

Oh, dear!  This broke the world on 32-bit x86 and I didn't notice.
Definite brown-paper-bag time.

6 weeks agosymm/{chacha,salsa20}-{arm64,arm-neon}.S: Improve rotation code.
Mark Wooding [Fri, 8 Mar 2024 03:17:45 +0000 (03:17 +0000)]
symm/{chacha,salsa20}-{arm64,arm-neon}.S: Improve rotation code.

Apparently I was asleep when I read the architecture reference because I
missed the `sri' instruction and how it can be used to synthesize
rotations with only two instructions rather than three.

Also replace rotation by 16 with the obvious `rev32'.

6 weeks agosymm/salsa20-arm-neon.S: Indent some reordered instructions.
Mark Wooding [Fri, 8 Mar 2024 03:16:13 +0000 (03:16 +0000)]
symm/salsa20-arm-neon.S: Indent some reordered instructions.

To emphasize that they're not part of the nearby logic but moved to
improve pipelining.

6 weeks agosymm/salsa20-*.S: Fix vector diagrams to be little-endian.
Mark Wooding [Fri, 8 Mar 2024 03:13:02 +0000 (03:13 +0000)]
symm/salsa20-*.S: Fix vector diagrams to be little-endian.

Missed these in the previous pass.

7 weeks agobase/reggump.h, base/regdump-*.S: Commentary and formatting improvements.
Mark Wooding [Sat, 2 Mar 2024 12:02:39 +0000 (12:02 +0000)]
base/reggump.h, base/regdump-*.S: Commentary and formatting improvements.

No code change at all.

7 weeks agobase/dispatch.c, base/dispatch.h: Add proper detection for AVX2.
Mark Wooding [Sat, 2 Mar 2024 12:01:58 +0000 (12:01 +0000)]
base/dispatch.c, base/dispatch.h: Add proper detection for AVX2.

No plans to use this for anything yet.

2 months agobase/asm-common.h, *.S: Use consistent little-endian notation for SIMD regs.
Mark Wooding [Sat, 3 Feb 2024 23:02:22 +0000 (23:02 +0000)]
base/asm-common.h, *.S: Use consistent little-endian notation for SIMD regs.

This makes operations which involve changing one's perspective about the
SIMD processing elements make significantly more sense.  In particular,
I hope that this removes a layer of brain-twisting from the GCM code.

  * Adjust all of the register-contents diagrams so that less
    significant elements are on the right, rather than on the left.

  * Change the x86 `SHUF' macro so that the desired pieces are listed in
    decreasing significance order, so `SHUF(3, 2, 1, 0)' would be a
    no-op.

I would, of course, continue to use big-endian notation on a target
which actually used a big-endian ordering natively, but we don't
currently support any of them.

2 months agoutils/permute.lisp (demonstrate-permutation-network): Optionally show offsets.
Mark Wooding [Sat, 3 Feb 2024 23:34:13 +0000 (23:34 +0000)]
utils/permute.lisp (demonstrate-permutation-network): Optionally show offsets.

Add an option to print a table showing how far each bit has still to
move.  This can be helpful for understanding some permutations.  Others,
not so much.

2 months agoutils/permute.lisp (demonstrate-permutation-network): : Print nice diagrams.
Mark Wooding [Sat, 3 Feb 2024 23:33:29 +0000 (23:33 +0000)]
utils/permute.lisp (demonstrate-permutation-network): : Print nice diagrams.

Delete the ones in the commentary again.

2 months agoutils/permute.lisp (demonstrate-permutation-network): Use keyword args.
Mark Wooding [Sat, 3 Feb 2024 23:31:09 +0000 (23:31 +0000)]
utils/permute.lisp (demonstrate-permutation-network): Use keyword args.

2 months agoutils/permute.lisp: Add supporting diagrams for IP permutations.
Mark Wooding [Fri, 2 Feb 2024 11:25:31 +0000 (11:25 +0000)]
utils/permute.lisp: Add supporting diagrams for IP permutations.

I decided I didn't want to lose these, and putting them in the main code
seemed too cluttery.

2 months agosymm/des.c: Replace PC1 and PC2 permutation tables with Beneš networks.
Mark Wooding [Thu, 1 Feb 2024 14:37:27 +0000 (14:37 +0000)]
symm/des.c: Replace PC1 and PC2 permutation tables with Beneš networks.

2 months agomath/gfx-sqr.c: Use bithacking rather than a table for squaring.
Mark Wooding [Thu, 1 Feb 2024 14:32:42 +0000 (14:32 +0000)]
math/gfx-sqr.c: Use bithacking rather than a table for squaring.

This gives a modest performance improvement to binary-curve scalar
multiplication.

2 months agosymm/des-base.h: Improve the IP permutation network.
Mark Wooding [Thu, 1 Feb 2024 14:29:06 +0000 (14:29 +0000)]
symm/des-base.h: Improve the IP permutation network.

The new network is the same number of steps as the old one, but by
exchanging bits between the two halves, we reduce the number of CPU
operations needed to perform the permutation.

This is the same network used by PuTTY, but I derived it independently.

2 months agobase/permute.h, utils/permute.lisp, symm/...: Formalize bit permutations.
Mark Wooding [Thu, 1 Feb 2024 12:33:33 +0000 (12:33 +0000)]
base/permute.h, utils/permute.lisp, symm/...: Formalize bit permutations.

Add a bunch of Lisp code for messing with permutations.  Add a C header
file with macros for implementing interesting primitive permutations.
Apply these to DES and Keccak-1600 as a demonstration.

3 months agosymm/gcm.c, symm/gcm-*.S, utils/gcm-ref: Replace one-bit shift with algebra.
Mark Wooding [Tue, 16 Jan 2024 14:03:11 +0000 (14:03 +0000)]
symm/gcm.c, symm/gcm-*.S, utils/gcm-ref: Replace one-bit shift with algebra.

The ARM32 and x86 instruction sets lack an instruction to reverse the
order of bits in each byte of a vector register.  Therefore, they
resolve the GCM bit-ordering nightmare by reordering the bytes and
working with reversed polynomials.  But when you multiply reversed
polynomials, the result ends up being shifted by one bit relative to the
answer you actually wanted -- and SIMD instruction sets are bad at
multiprecision bit shifts, so this involves a lot of work.

Instead, use algebra.  If the result is shifted by one bit position,
then it's been multiplied by the generator t.  We can therefore fix this
by dividing through by t.  Of course, this might not even be possible
working in general the ring of polynomials over GF(2), but we're
actually working in the GCM quotient field, and t definitely has an
inverse there.  Also, dividing by t will take time and effort -- but in
fact one of the operands is something we know ahead of time and get to
prepare in whatever way we like.  So, in particular, we could divide it
by t before we even start.

The result is that we get to delete a bunch of rather fiddly assembler
code, in favour of some fairly simple C setup (and extra compensation
work in `recover_k').

I stole this trick from PuTTY.

3 months agosymm/gcm-*.S (mul256): Label the partial-product terms correctly.
Mark Wooding [Tue, 16 Jan 2024 14:01:01 +0000 (14:01 +0000)]
symm/gcm-*.S (mul256): Label the partial-product terms correctly.

Two problems in three letters:

  * the middle term was written as `d', rather than `b' as introduced in
    the previous paragraph; and

  * the three are listed in the wrong order.

3 months agosymm/gcm-arm-crypto.S (mul96): Fill in the clobbered-registers list.
Mark Wooding [Tue, 16 Jan 2024 13:58:08 +0000 (13:58 +0000)]
symm/gcm-arm-crypto.S (mul96): Fill in the clobbered-registers list.

3 months agoutils/gcm-ref: Pull `poly64_mul' and `poly64_redc' out of `poly64_common'.
Mark Wooding [Tue, 16 Jan 2024 13:54:50 +0000 (13:54 +0000)]
utils/gcm-ref: Pull `poly64_mul' and `poly64_redc' out of `poly64_common'.

Basically a refactoring, but there's some foreshadowing too -- most
notably the UWHAT and VWHAT arguments.

3 months agoutils/gcm-ref (present_gf_vmullp64): Add `v' prefix to match front end.
Mark Wooding [Tue, 16 Jan 2024 13:52:55 +0000 (13:52 +0000)]
utils/gcm-ref (present_gf_vmullp64): Add `v' prefix to match front end.

The inconsistency annoyed me.

3 months agoutils/gcm-ref (poly64_mul_simple): Strip padding off the product.
Mark Wooding [Tue, 16 Jan 2024 13:46:58 +0000 (13:46 +0000)]
utils/gcm-ref (poly64_mul_simple): Strip padding off the product.

Rather than leaving this job to the caller.  I'm going to decree that
it's the presentation-function's job to show padding in the right place,
rather than the multiplier's job to retain it.  This means that we need
to keep track of the padding properly, but it's pretty easy.

The most important effect is that there's no longer a rather strange
bodge in `poly64_common' to strip the padding in one particular case
because `poly64_mul_simple' has done it properly in every case.

3 months agoutils/gcm-ref: Fix embarrassing mistakes in comments.
Mark Wooding [Tue, 16 Jan 2024 13:44:58 +0000 (13:44 +0000)]
utils/gcm-ref: Fix embarrassing mistakes in comments.

Imagine my head hanging in shame.

3 months agoutils/gcm-ref (present_gf_pmull): Round width up to a multiple of 64 bits.
Mark Wooding [Tue, 16 Jan 2024 13:38:52 +0000 (13:38 +0000)]
utils/gcm-ref (present_gf_pmull): Round width up to a multiple of 64 bits.

Otherwise the later loop, which pulls off 64-bit chunks, gets badly
confused.

Now `gcm-ref' can actually calculate all of the things properly.

for p in pclmul vmullp64 pmull; do
  while read u v; do utils/gcm-ref $p $u $v || break 2; done <<EOF
cde4bef260d7bcda 163547d348b75511
cde4bef260d7bcda163547d3 48b7551195e77022907dd1df
cde4bef260d7bcda163547d348b75511 95e77022907dd1dff7dac5c9941d26d0
cde4bef260d7bcda163547d348b7551195e77022907dd1df f7dac5c9941d26d0c6eb14ad568f86edd1dc9268eeee5332
cde4bef260d7bcda163547d348b7551195e77022907dd1dff7dac5c9941d26d0 c6eb14ad568f86edd1dc9268eeee533285a6ed810c9b689daaa9060d2d4b6003
EOF
done

I wonder what this means about the changes coming up...

3 months agoutils/gcm-ref (poly64_mul_simple): Pad v based on the length of v.
Mark Wooding [Tue, 16 Jan 2024 13:36:33 +0000 (13:36 +0000)]
utils/gcm-ref (poly64_mul_simple): Pad v based on the length of v.

Not the already-padded length of u, which doesn't make any sense.  Now
we actually calculate 96-bit products correctly using the `poly64'
multiplication machinery.

3 months agoutils/gcm-ref (demo_table_l): Use an end-swap function that exists.
Mark Wooding [Tue, 16 Jan 2024 13:43:09 +0000 (13:43 +0000)]
utils/gcm-ref (demo_table_l): Use an end-swap function that exists.

Now `table-l' actually runs at all.

3 months agosymm/gcm-*.S: Fix the `ṽ' encodings.
Mark Wooding [Tue, 16 Jan 2024 12:33:22 +0000 (12:33 +0000)]
symm/gcm-*.S: Fix the `ṽ' encodings.

For some reason, Emacs got confused about how it wanted to display
that.  I think I did something crazy involving entering combining
characters by hand; this time, I just typed the thing on the keyboard
using the compose key and it worked.

4 months agorand/rand.c: Fix multiplication and division formatting.
Mark Wooding [Sat, 23 Dec 2023 14:18:57 +0000 (14:18 +0000)]
rand/rand.c: Fix multiplication and division formatting.

4 months agorand/rand.c: Rearrange some comparisons to avoid arithmetic overflow.
Mark Wooding [Sat, 23 Dec 2023 14:18:18 +0000 (14:18 +0000)]
rand/rand.c: Rearrange some comparisons to avoid arithmetic overflow.

4 months agomath/f25519.h: Fix argument name in commentary.
Mark Wooding [Sat, 23 Dec 2023 14:09:25 +0000 (14:09 +0000)]
math/f25519.h: Fix argument name in commentary.

4 months agoMerge branch '2.5.x' into HEAD
Mark Wooding [Sat, 23 Dec 2023 14:33:00 +0000 (14:33 +0000)]
Merge branch '2.5.x' into HEAD

* 2.5.x:
  rand/rand.c (rand_gate): Evolve r->ibits in a more sensible manner.
  rand/rand.c (rand_getgood): Stretch the output buffer if necessary.

4 months agoMerge branch '2.4.x' into 2.5.x 2.5.x
Mark Wooding [Sat, 23 Dec 2023 14:31:14 +0000 (14:31 +0000)]
Merge branch '2.4.x' into 2.5.x

* 2.4.x:
  rand/rand.c (rand_gate): Evolve r->ibits in a more sensible manner.
  rand/rand.c (rand_getgood): Stretch the output buffer if necessary.

4 months agorand/rand.c (rand_gate): Evolve r->ibits in a more sensible manner. 2.4.x
Mark Wooding [Fri, 28 Aug 2020 23:25:56 +0000 (00:25 +0100)]
rand/rand.c (rand_gate): Evolve r->ibits in a more sensible manner.

It's not really clear what this code was trying to do.  Write i and o
for the initial values of r->ibits and r->obits, respectively, i' and 'o
for their respective final values, and O for RAND_OBITS.  In the case
that i + o <= O, we update i' = 0 and o' = i + o, maintaining the
invariant that i' + o' = i + o.  But if i + o > O, then we set o' = O and
i' = (i + o) - i = o, which seems nonsensical.  In particular, in the
case that i = 1 and o = O, it apparently magics O - 1 bits of entropy
from nowhere.

Modify the code so that it at least maintains the sum of the entropy
counters in either branch.  I'm not sure this is actually correct, but
it seems like a defensible position.

4 months agorand/rand.c (rand_getgood): Stretch the output buffer if necessary.
Mark Wooding [Sat, 23 Dec 2023 14:13:34 +0000 (14:13 +0000)]
rand/rand.c (rand_getgood): Stretch the output buffer if necessary.

It's possible to have `r->o == RAND_BUFSZ' in the main loop, while
`r->obits' is larger than the requested size.  The following program
contrives this situation, though it can (and does) happen organically.

#include <stdio.h>
#include <stdlib.h>

#include "noise.h"
#include "rand.h"

int main(void)
{
  rand_pool pool;
  unsigned char buf[64];
  size_t n;

  rand_init(&pool);
  rand_noisesrc(&pool, &noise_source);
  rand_seed(&pool, 64);

  while (pool.obits < RAND_OBITS) rand_seed(&pool, RAND_IBITS);
  while (pool.o < RAND_BUFSZ) {
    n = RAND_BUFSZ - pool.o; if (n > sizeof(buf)) n = sizeof(buf);
    rand_getgood(&pool, buf, n);
  }
  rand_getgood(&pool, buf, 4);
  return (0);
}

When this happens, `rand_getgood' gets stuck in an infinite loop,
trimming the chunk size to zero because the output buffer is exhausted,
but not refilling it because there's still notional entropy remaining.
Detect this situation and stretch the output buffer when there's nothing
left, as in `rand_get'.

11 months agobase/dispatch.c: Check atomic copy of the probes flags, not the original.
Mark Wooding [Sat, 6 May 2023 23:58:39 +0000 (00:58 +0100)]
base/dispatch.c: Check atomic copy of the probes flags, not the original.

Rather defeats the point otherwise.

11 months agobase/asm-common.h: Hoist executable-stack stuff into the ELF block.
Mark Wooding [Fri, 28 Apr 2023 13:06:11 +0000 (14:06 +0100)]
base/asm-common.h: Hoist executable-stack stuff into the ELF block.

11 months agobase/asm-common.h: Move ELF section-type business into the main ELF block.
Mark Wooding [Fri, 28 Apr 2023 13:05:06 +0000 (14:05 +0100)]
base/asm-common.h: Move ELF section-type business into the main ELF block.

11 months agobase/asm-common.h: Sink the `PIC' stuff a little.
Mark Wooding [Fri, 28 Apr 2023 13:04:44 +0000 (14:04 +0100)]
base/asm-common.h: Sink the `PIC' stuff a little.

11 months agobase/asm-common.h: Farm out the section selection macros.
Mark Wooding [Fri, 28 Apr 2023 13:00:49 +0000 (14:00 +0100)]
base/asm-common.h: Farm out the section selection macros.

Leave them to the end because nobody's relying on them earlier.

11 months agobase/asm-common.h: Add a macro for setting the types of data symbols.
Mark Wooding [Fri, 28 Apr 2023 12:59:28 +0000 (13:59 +0100)]
base/asm-common.h: Add a macro for setting the types of data symbols.

11 months agobase/asm-common.c: Add a paragraph comments and squash some blank lines.
Mark Wooding [Fri, 28 Apr 2023 12:58:46 +0000 (13:58 +0100)]
base/asm-common.c: Add a paragraph comments and squash some blank lines.

11 months agobase/asm-common.h (FORCE_EXECUTABLE_STACK): Rename from `WANT...'.
Mark Wooding [Fri, 28 Apr 2023 12:51:50 +0000 (13:51 +0100)]
base/asm-common.h (FORCE_EXECUTABLE_STACK): Rename from `WANT...'.

Makes it sound less like a sensible thing to want.

12 months agopub/x25519.h, pub/x448.h: Add descriptions of the curves.
Mark Wooding [Tue, 25 Apr 2023 00:44:47 +0000 (01:44 +0100)]
pub/x25519.h, pub/x448.h: Add descriptions of the curves.

12 months agomath/f25519.c (f25519_load): Improve the tabular layout.
Mark Wooding [Tue, 25 Apr 2023 00:44:20 +0000 (01:44 +0100)]
math/f25519.c (f25519_load): Improve the tabular layout.

12 months agomath/f25519.c: Add missing space in commant.
Mark Wooding [Tue, 25 Apr 2023 00:44:07 +0000 (01:44 +0100)]
math/f25519.c: Add missing space in commant.

12 months agosymm/poly1305.c: Remove spaces around `&'.
Mark Wooding [Tue, 25 Apr 2023 00:43:54 +0000 (01:43 +0100)]
symm/poly1305.c: Remove spaces around `&'.

21 months agosymm/chacha-x86ish-sse2.S: Fix mathematical errors in commentary.
Mark Wooding [Tue, 26 Jul 2022 10:21:12 +0000 (11:21 +0100)]
symm/chacha-x86ish-sse2.S: Fix mathematical errors in commentary.

The commentary proving the correctness for the general solution of
`gfreduce_quadsolve' was wrong in a number of details that somehow
managed to cancel out.  The claim is correct, and now the proof has been
fixed, with somewhat expanded calculations making the tricks easier to
notice.

21 months agomath/mpx-mul4-arm64-simd.S: Fix case of argument `I' in commentary.
Mark Wooding [Tue, 26 Jul 2022 10:17:49 +0000 (11:17 +0100)]
math/mpx-mul4-arm64-simd.S: Fix case of argument `I' in commentary.

21 months agomath/mpx-mul4-arm64-simd.S: Fix unfortunate line-breaks in commentary.
Mark Wooding [Tue, 26 Jul 2022 10:16:58 +0000 (11:16 +0100)]
math/mpx-mul4-arm64-simd.S: Fix unfortunate line-breaks in commentary.

2 years agomath/group-dstr.c: Delete some spurious blank lines.
Mark Wooding [Thu, 12 Aug 2021 10:01:30 +0000 (11:01 +0100)]
math/group-dstr.c: Delete some spurious blank lines.

2 years agosymm/* (aead): Implement the `szok' methods.
Mark Wooding [Thu, 12 Aug 2021 09:34:01 +0000 (10:34 +0100)]
symm/* (aead): Implement the `szok' methods.

I think this was intended to be part of the initial AEAD work, but got
forgotten.  Oh, well.

2 years agosymm/ccm.c: Fix the title of the comment for `ccm_check'.
Mark Wooding [Tue, 10 Aug 2021 16:14:12 +0000 (17:14 +0100)]
symm/ccm.c: Fix the title of the comment for `ccm_check'.

2 years agosymm/*.S: Delete stray dots in some banner comments.
Mark Wooding [Thu, 1 Apr 2021 21:28:21 +0000 (22:28 +0100)]
symm/*.S: Delete stray dots in some banner comments.

3 years agosymm/*-def.h: Fix repeated garbled commentary by adding the missing word.
Mark Wooding [Fri, 12 Mar 2021 22:26:07 +0000 (22:26 +0000)]
symm/*-def.h: Fix repeated garbled commentary by adding the missing word.

This is a bit embarrassing.  I should have read this text more carefully
before copying it everywhere.

3 years agosymm/eax-def.h: Fix bungled `\' alignment.
Mark Wooding [Fri, 12 Mar 2021 22:25:41 +0000 (22:25 +0000)]
symm/eax-def.h: Fix bungled `\' alignment.

3 years agomath/mp-nthrt.c: Add commentary for `mp_perfect_power_p'.
Mark Wooding [Wed, 14 Oct 2020 02:03:41 +0000 (03:03 +0100)]
math/mp-nthrt.c: Add commentary for `mp_perfect_power_p'.

This is quite simple, really, but it doesn't hurt to explain what's
going on.

3 years agomath/mp-nthrt.c: Fix garbled commentary.
Mark Wooding [Wed, 14 Oct 2020 02:03:20 +0000 (03:03 +0100)]
math/mp-nthrt.c: Fix garbled commentary.

3 years agomath/mp-nthrt.c: Delete redundant check for termination.
Mark Wooding [Wed, 14 Oct 2020 02:02:21 +0000 (03:02 +0100)]
math/mp-nthrt.c: Delete redundant check for termination.

This case is already handled above.

3 years agobase/asm-common.h, ...: Add missing `cmov' instruction (and `.CC' variants).
Mark Wooding [Tue, 13 Oct 2020 23:15:17 +0000 (00:15 +0100)]
base/asm-common.h, ...: Add missing `cmov' instruction (and `.CC' variants).

This instruction conditionally moves a value from one register to
another, otherwise leaving the destination unchanged:

cmov RD, RN, CC == csel RD, RN, RD, CC

Also define `cmov.CC' for all condition codes CC.

Use this to slightly improve `rijndael_setup_arm64_crypto'.

3 years agobase/asm-common.h: Improve conditional instruction notation.
Mark Wooding [Tue, 13 Oct 2020 23:14:49 +0000 (00:14 +0100)]
base/asm-common.h: Improve conditional instruction notation.

ARM64 conditional instructions -- `ccmp', `csel', etc. -- are
inexplicably notated differently from conditional branches.  The latter
are rather pleasantly written as `b.CC TARGET', while the former are,
disappointingly, `csel RD, RN, RM, CC' and similar, with the condition
tacked on the end.

Fix this by introducing aliases `csel.CC' and suchlike for all of the
conditional instructions.

3 years agobase/regdump.c: Print matching condition codes along with CPU flags.
Mark Wooding [Sun, 11 Oct 2020 23:20:30 +0000 (00:20 +0100)]
base/regdump.c: Print matching condition codes along with CPU flags.

3 years agobase/regdump-arm64.S, base/regdump.h: Save NZCV and x8--x15 early.
Mark Wooding [Sun, 11 Oct 2020 23:18:15 +0000 (00:18 +0100)]
base/regdump-arm64.S, base/regdump.h: Save NZCV and x8--x15 early.

Alas, the processor flags /and/ at least x14 and x15 are clobbered by
the PLT on-demand linkage machinery, so we must save them in the macro
before calling out to the library.  To be safe, preserve all of the
non-argument call-clobbered registers.

3 years agobase/regdump-arm.S, base/regdump.h: Save CPSR before `regdump_gpsave'.
Mark Wooding [Sun, 11 Oct 2020 23:03:58 +0000 (00:03 +0100)]
base/regdump-arm.S, base/regdump.h: Save CPSR before `regdump_gpsave'.

Alas, the processor flags are clobbered by the PLT on-demand linkage
machinery, so we must save them in the macro before calling out to the
library.

3 years agom4/mdw-uint-bits.m4: Delete stray `dnl' from the comment header.
Mark Wooding [Sun, 9 Aug 2020 04:36:34 +0000 (05:36 +0100)]
m4/mdw-uint-bits.m4: Delete stray `dnl' from the comment header.

3 years agodebian/catacomb2: Add missing symbol version entries.
Mark Wooding [Fri, 28 Aug 2020 23:45:16 +0000 (00:45 +0100)]
debian/catacomb2: Add missing symbol version entries.

3 years agodebian/changelog: Prepare for next minor version.
Mark Wooding [Fri, 28 Aug 2020 23:44:01 +0000 (00:44 +0100)]
debian/changelog: Prepare for next minor version.

3 years agomath/mp-nthrt.c: Implement nth-root, and perfect-power detection.
Mark Wooding [Wed, 22 Jul 2020 22:44:38 +0000 (23:44 +0100)]
math/mp-nthrt.c: Implement nth-root, and perfect-power detection.

3 years agosymm/square-mktab.c, etc.: Provide enough round constants for short keys.
Mark Wooding [Sun, 19 Jul 2020 22:10:05 +0000 (23:10 +0100)]
symm/square-mktab.c, etc.: Provide enough round constants for short keys.

It turns out that one needs 35 round constants to correctly schedule a
32-bit key, not just 32.  It further turns out that Clang orders the
various constant tables differently from GCC, which leads to the two
implementations producing different, but both incorrect, answers.

This is all very embarrassing.  Fortunately, nobody will use a 32-bit
key and expect anything useful to come of it, and no larger key size is
affected.  I think the main effect is that a bunch of the mode test
vectors needed changing.

3 years agomath/mpx-mul4-*.S: Use more portable type syntax for ambiguous instructions.
Mark Wooding [Sun, 19 Jul 2020 19:13:57 +0000 (20:13 +0100)]
math/mpx-mul4-*.S: Use more portable type syntax for ambiguous instructions.

Specifically, replace `adcd MEM, 0' by `adc dword ptr MEM, 0'.  This
removes one reason why Clang's defective assembler won't work, but there
are others.

This is not part of a concerted effort to improve Clang support.
Honestly, as far as I'm concerned, `CCASFLAGS=-fno-integrated-as
-Wno-unicode' is sufficient support for building Catacomb using
Clang.  (That said, I don't actively object to supporting Clang: it's
just not something I want to put much effort into.  I'm happy to accept
tasteful patches which improve Clang support.)  But in retrospect, using
`adcd' here was kind of bletcherous in its own right, and it should be
fixed.

3 years agoRelease 2.6.2. 2.6.2
Mark Wooding [Sat, 13 Jun 2020 17:12:06 +0000 (18:12 +0100)]
Release 2.6.2.

3 years agobase/dispatch.c: Fix feature probe for AESNI.
Mark Wooding [Sat, 13 Jun 2020 16:57:48 +0000 (17:57 +0100)]
base/dispatch.c: Fix feature probe for AESNI.

Oh, this is embarrassing.  2.6.0 and 2.6.1 are broken on pre-AESNI
hardware.

3 years agoprogs/mkphrase.c: Fix trailing spaces in usage message.
Mark Wooding [Tue, 26 May 2020 21:12:12 +0000 (22:12 +0100)]
progs/mkphrase.c: Fix trailing spaces in usage message.

3 years agoRelease 2.6.1. 2.6.1
Mark Wooding [Mon, 25 May 2020 16:45:24 +0000 (17:45 +0100)]
Release 2.6.1.

3 years agorand/rand-x86ish.S: Establish GOT pointer before making an i386 PLT call.
Mark Wooding [Mon, 25 May 2020 16:36:13 +0000 (17:36 +0100)]
rand/rand-x86ish.S: Establish GOT pointer before making an i386 PLT call.

Otherwise you just get a segfault.

3 years agoRelease 2.6.0. 2.6.0
Mark Wooding [Sat, 9 May 2020 16:39:28 +0000 (17:39 +0100)]
Release 2.6.0.

3 years agobase/dispatch.c, rand/rand.c, and asm: Support `rdseed' for quick noise.
Mark Wooding [Mon, 6 Apr 2020 00:07:41 +0000 (00:07 +0000)]
base/dispatch.c, rand/rand.c, and asm: Support `rdseed' for quick noise.

Prefer the `rdseed' instruction over `rdrand' for quick randomness, if
it's available.

3 years agorand/rand-x86ish.S: Hoist argument register allocation outside.
Mark Wooding [Mon, 6 Apr 2020 00:06:27 +0000 (00:06 +0000)]
rand/rand-x86ish.S: Hoist argument register allocation outside.

This will soon be shared with another entry point for `rdseed'.

3 years agorand/rand-x86ish.S: Add missing `undef' of the `COUNT' register.
Mark Wooding [Mon, 6 Apr 2020 00:04:57 +0000 (00:04 +0000)]
rand/rand-x86ish.S: Add missing `undef' of the `COUNT' register.

3 years agobase/dispatch.c, base/dispatch-x86ish.S: Add opcode to `rdrand_works_p'.
Mark Wooding [Mon, 6 Apr 2020 00:02:41 +0000 (00:02 +0000)]
base/dispatch.c, base/dispatch-x86ish.S: Add opcode to `rdrand_works_p'.

I want to add support foer the `rdseed' instruction, but this might be
broken on AMD64 like `rdrand'.  Rather than duplicate this logic, add an
opcode argument to the checking functions.

3 years agobase/dispatch.c: Make `cpuid_feature_p' more easily extensible.
Mark Wooding [Sun, 5 Apr 2020 23:52:56 +0000 (23:52 +0000)]
base/dispatch.c: Make `cpuid_feature_p' more easily extensible.

It turns out that Intel scatter feature flags throughout the various
CPUID leaves.  Change the interface for checking these flags so that we
can cover more ground without too much extra work.

  * Firstly, rename the function to `cpuid_feature_p' because it's only
    really useful for checking one feature at a time.

  * Secondly, make the first argument be a code indicating which
    particular `cpuid' output we're interested in; the second is still a
    mask used to check for the bit we're interested in.

Obviously this involves changing all of the callers too.

3 years agorand/dsarand.c: Return the old number of passes from `DSARAND_PASSES'.
Mark Wooding [Sat, 16 Nov 2019 17:12:16 +0000 (17:12 +0000)]
rand/dsarand.c: Return the old number of passes from `DSARAND_PASSES'.

Also, don't update if the input operand is zero.

3 years ago*.c: Check for ARM64 SIMD before using the accelerated code.
Mark Wooding [Fri, 15 Nov 2019 17:09:01 +0000 (17:09 +0000)]
*.c: Check for ARM64 SIMD before using the accelerated code.

I don't expect ARM64 processors to omit the SIMD instructions, but it's
convenient to have a way to inhibit the accelerated code (e.g., for
performance measurement).

3 years agobase/dispatch.c: Reformat an ugly line-break.
Mark Wooding [Fri, 15 Nov 2019 17:08:30 +0000 (17:08 +0000)]
base/dispatch.c: Reformat an ugly line-break.

3 years agomath/mpx-mul4-{arm-neon,arm64-simd}.S, etc.: Add ARM versions of `mul4'.
Mark Wooding [Mon, 4 Nov 2019 12:22:00 +0000 (12:22 +0000)]
math/mpx-mul4-{arm-neon,arm64-simd}.S, etc.: Add ARM versions of `mul4'.

With this, I think we (finally) have parity across the various premier
target platforms.

3 years agobase/regdump.[ch]: Add a feature for printing plain messages.
Mark Wooding [Thu, 7 Nov 2019 01:34:06 +0000 (01:34 +0000)]
base/regdump.[ch]: Add a feature for printing plain messages.

Introduce a `REGSRC_NONE' which just prints the message, and add a `msg'
macro which invokes this.

3 years agomath/mpmont.c: Fix comment title for `mulcore'.
Mark Wooding [Thu, 7 Nov 2019 01:41:55 +0000 (01:41 +0000)]
math/mpmont.c: Fix comment title for `mulcore'.

3 years agomath/mpx-mul4-*.S: Output expanded Montgomery factor in a sensible order.
Mark Wooding [Tue, 5 Nov 2019 11:13:03 +0000 (11:13 +0000)]
math/mpx-mul4-*.S: Output expanded Montgomery factor in a sensible order.

The current order is (y'_0, y'_1; y''_0, y''_1), (y'_2, y'_3; y''_2,
y''_3), but while this makes sense in the context of SSE2, it's not
really very satisfactory as a common currency.  (In particular, if we
want to resolve the expanded factor into a value then we'll have to do
it by steam because the limb placements are irregular.)

Instead, fix the ordering in the test stubs so that the pieces come out
as (y'_0, y''_0; y'_1, y''_1), (y'_2, y''_2; y'_3, y''_3), which is
generally much better to work with outside of SSE2.

Of course, this only affects testing, not the actual code, so
performance is unchanged.

3 years agomath/mpx-mul4-amd64-sse2.S: Improve the end-of-loop condition testing.
Mark Wooding [Thu, 7 Nov 2019 01:54:57 +0000 (01:54 +0000)]
math/mpx-mul4-amd64-sse2.S: Improve the end-of-loop condition testing.

Previously, I waited until `rdi' was set up for the next iteration
before comparing it against the limit.  But in fact, `DV' already has
the right value, so we can compare earlier.

3 years agomath/mpx-mul4-amd64-sse2.S: Save a spill by better register allocation.
Mark Wooding [Thu, 7 Nov 2019 01:51:37 +0000 (01:51 +0000)]
math/mpx-mul4-amd64-sse2.S: Save a spill by better register allocation.

The Windows code doesn't need to spill r12, because we don't need the
`mi' register after we've loaded and expanded the Montgomery factor.
This doesn't save any stack space because we need 16-byte alignment, but
it does avoid saving and restoring the register.

3 years agomath/mpx-mul4-*-sse2.S (mpxmont_redc4): Fix end-of-outer-loop commentary.
Mark Wooding [Thu, 7 Nov 2019 01:46:50 +0000 (01:46 +0000)]
math/mpx-mul4-*-sse2.S (mpxmont_redc4): Fix end-of-outer-loop commentary.

  * The carry loop is wrong if the destination is an exact multiple of
    four limbs.  Fortunately, it isn't.

  * The initial pass feeds into the main loop unconditionally, unlike
    `mpxmont_mul4_...' (from which I think the commentary was
    uncritically copied), so being at the end of it doesn't tell you
    anything about whether to start another.  And, indeed, we do indeed
    check the loop-end condition.

3 years agomath/mpx-mul4-*-sse2.S: Remove an unhelpful comment.
Mark Wooding [Thu, 7 Nov 2019 01:43:46 +0000 (01:43 +0000)]
math/mpx-mul4-*-sse2.S: Remove an unhelpful comment.

It's not actually wrong, but it's misleading because we don't actually
care that the flags are preserved at this point, because the next
instruction clobbers them anyway.  I think this was cut-and-paste
lossage from the earlier code which relies on `mov' preserving the carry
flag.

3 years agomath/mpx-mul4-*.S: Fix up some of the commentary.
Mark Wooding [Mon, 4 Nov 2019 12:01:42 +0000 (12:01 +0000)]
math/mpx-mul4-*.S: Fix up some of the commentary.

  * Fix bogus formatting.

  * Fill in the `...' in the AMD64 version.

  * Explain the common notation and register allocation conventions.

3 years agobase/asm-common.h: Decorate pseudoregister `nil' as `nil'.
Mark Wooding [Mon, 4 Nov 2019 12:20:16 +0000 (12:20 +0000)]
base/asm-common.h: Decorate pseudoregister `nil' as `nil'.

This allows `nil' to be passed through macros which want to apply
decoration transforms to their register arguments through to other
macros which treat `nil' as a special marker that a register is absent
or otherwise not to be used.

3 years agomath/t/mpx-mul4: Fix comment markers.
Mark Wooding [Mon, 4 Nov 2019 12:19:33 +0000 (12:19 +0000)]
math/t/mpx-mul4: Fix comment markers.

3 years agomath/: Delete some unnecessary blank lines.
Mark Wooding [Thu, 7 Nov 2019 01:41:26 +0000 (01:41 +0000)]
math/: Delete some unnecessary blank lines.