chiark - git - ianmdlvl - secnet.git/log

serpent: Provide little-endian version too, but ours is big

Apparently, almost everyone else is using little-endian Serpent.

Since we are going to introduce a new transform, we can make our new
transform have conventional little-endian Serpent.  But we need to
retain the big-endian one for compatibility.

In this patch:
* Make serpent.h compile-time bytesexual by providing a new definition
   of GETPUT_CP.  We default to (conventional) little-endian.
* The big-endian and little-endian versions have different names:
   decorate the function names.  serpent.h declares both sets.
* Provide serpentbe.c which compiles to a .o implementing the big-endian
   version under the new names.  Build only that one.
* Change all the call sites to use the big-endian Serpent.
* Mention this issue in the documentation.

Ultimately, no functional change in this patch.

Regarding the endianness of Serpent, Mark Wooding writes:

  I've understood how my Serpent implementation differs from Secnet's, and
  have reproduced [Ian's EAX] test vectors.

  NIST have managed to completely screw up their archive of the AES
  contest pages, but the WayBack Machine works fine -- even on the various
  PDF documents.

  The sorry tale begins with the initial Request for Candidate Algorithm
  Nominations, wherein the specification[1] for the test files unhelpfully
  muddles things by implying a big-endian representation in the test
  vector files, e.g., in 3.1,

  : Each of the possible key basis vectors is tested in this manner, by
  : shifting the "1" a single position at a time, starting at the most
  : significant (left-most) bit position of the key.

  It doesn't help that the original Serpent C implementations in the
  submission package[2] failed to implement the defined API correctly, and
  implicitly coerced BYTE pointer (from the defined entry point
  `blockEncrypt' to an `unsigned long' pointer as an argument to the
  internal `serpent_encrypt' function.  The Java version carefully picks
  words out of its input byte array in a little-endian order.

  Secnet's implementation is derived from this original reference
  implementation.  Originally, (v0.03) it had the same bug (only with
  `uint8_t' and `uint32_t'), but was patched (v0.1.16) to correct the
  obvious dependency on host endianness.  Unfortunately, this patch is
  wrong: it creates a perfectly portable completely-byte-reversed Serpent
  compatible with nothing else.  I've not found many other
  implementations, but where I have, they agree with me and the Java
  reference, and not with Secnet.

  [1] http://web.archive.org/web/20000420040415/http://csrc.nist.gov/encryption/aes/katmct/katmct.htm
  [2] http://www.cl.cam.ac.uk/~rja14/Papers/serpent.tar.gz

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

serpent, transform: rework GET_32BIT_MSB_FIRST, PUT_...

The macros GET_32BIT_MSB_FIRST and PUT_32BIT_MSB_FIRST duplicate
functionality available in unaligned.h, namely get_uint32 and
put_uint32.  Replace all uses outside serpent.c with calls to
get_uint32 and put_uint32.

We are going to make our serpent implementation compile-time
bytesexual, so we do retain a separate implementation of this
functionality in a form which will make that easier.  In particular,
they take arguments for the base and length of the array in
which 32-bit we are making the 32-bit access.

Also, this disentangles serpent.c from secnet.h.  To make serpent.c
easier to reuse, remove the #include of secnet.h (and replace it with
stdint.h, which we do need and were getting via secnet.h).

No functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

serpent: const-correct

Decorate serpent_encrypt, serpent_decrypt and serpent_makekey with
appropriate consts in their arguments.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

transform: Do not look at any bytes of PKCS#5 padding other than the last

This might avoid some timing-related information leaks. In principle
this is a protocol change: we now no longer use actual PKCS#5 padding;
instead, we use a padding scheme where all but the last byte of the
padding may be sent as anything and are ignored by the receiver.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

memcmp: Introduce and use consttime_memeq

We need to use a constant-time memcmp in MAC checking, to avoid
leaking (to an adversary) how much of the MAC is right.  (This would
be especially dangerous if our MAC was outside the encryption, which
thankfully it isn't.)

The use of "volatile" on the accumulator prevents the compiler from
optimising away any of the updates to the accumulator, each of which
depends on all the bits of the two bytes being compared.  So that
stops the compiler shortcutting the computation.  This also prevents
the compiler spotting our boolean canonicalisation, and forces it to
do it the way we say.

This is true according to the spec by virtue of C99 6.7.3(6).

In an attempt to get the compiler to eliminate the pointless repeated
loading and storing of the single-byte accumulator value, I have
specified it as "register volatile".  There is no rule against
"register volatile", but my compiler ignores the "register".

To double check that all is well, here is an annotated disassembly,
from this command line:

  gcc -save-temps -DHAVE_CONFIG_H -I. -I. -Wall -Wwrite-strings -g -O2 \
      -Werror -W -Wno-unused -Wno-pointer-sign -Wstrict-prototypes \
      -Wmissing-prototypes -Wmissing-declarations -Wnested-externs \
      -Wredundant-decls -Wpointer-arith -Wformat=2 -Winit-self \
      -Wswitch-enum -Wunused-variable -Wbad-function-cast \
      -Wno-strict-aliasing -fno-strict-aliasing -c util.c -o util.o

This is the relevant part of util.s, as generated by gcc 4.4.5-8
i486-linux-gnu, with my annotations:

  .globl consttime_memeq
  .type   consttime_memeq, @function
  consttime_memeq:
  .LFB80:
  .loc 1 367 0
  .cfi_startproc
  .LVL8:
  pushl   %ebp
  .LCFI8:
  .cfi_def_cfa_offset 8
  movl    %esp, %ebp
  .cfi_offset 5, -8
  .LCFI9:
  .cfi_def_cfa_register 5
  pushl   %edi
  pushl   %esi
  pushl   %ebx
  subl    $16, %esp
  .loc 1 367 0
  movl    16(%ebp), %ebx ebx : n
  .cfi_offset 3, -20
  .cfi_offset 6, -16
  .cfi_offset 7, -12
  movl    8(%ebp), %esi esi : s1in
  movl    12(%ebp), %edi edi : s2in
  .loc 1 369 0
  movb    $0, -13(%ebp) -13(ebp) : accumulator
  .LVL9:
  .loc 1 371 0
  testl   %ebx, %ebx if (!n)
  je      .L15     goto no_bytes;
      i.e. if (n) { ...loop... }
  .LVL10:

The compiler has chosen to invent an offset variable for controlling
the loop, rather than pointer arithmetic.  We'll call that variable
i.  It ranges from 0..n-1 as expected.  The compiler doesn't
explictly compute the per-iteration pointers s1 and s2, instead using
an addressing mode.

  xorl    %edx, %edx edx : i
  .p2align 4,,7
  .p2align 3
  .L16: more_bytes: /* loop */
  .loc 1 372 0
  movzbl  (%edi,%edx), %eax eax = *s2;
  .LVL11:
  movzbl  -13(%ebp), %ecx ecx = accumulator;
  xorb    (%esi,%edx), %al al = *s1 ^ *s2;
  addl    $1, %edx      i++;
  orl     %ecx, %eax eax = accumulator | (*s1^*s2)
  .LVL12:
  .loc 1 371 0
  cmpl    %edx, %ebx [ if (i==n) ... ]
  .loc 1 372 0
  movb    %al, -13(%ebp) accumulator = eax;
           i.e., overall,
  accumulator |= *s1 ^ *s2
  .loc 1 371 0
  jne     .L16 ... [if (i!=n)]
          goto more_bytes;
  .LVL13:
  .L15: no_bytes:
  .loc 1 374 0 /* end of loop and if */

              /* now doing the shift-by-4: */
  movzbl  -13(%ebp), %eax eax = accumulator;
  movzbl  -13(%ebp), %edx edx = accumulator;
  shrb    $4, %al     al >>= 4;
  .LVL14:
  orl     %edx, %eax eax |= edx;
  .LVL15:
  movb    %al, -13(%ebp) accumulator = eax;
           i.e., overall,
  accumulator |= accumulator >> 4;
  .loc 1 375 0
  movzbl  -13(%ebp), %eax /* same again but for shift-by-2 */
  movzbl  -13(%ebp), %edx
  shrb    $2, %al
  orl     %edx, %eax
  .LVL16:
  movb    %al, -13(%ebp)
  .loc 1 376 0
  movzbl  -13(%ebp), %eax /* same again but for shift-by-1 */
  movzbl  -13(%ebp), %edx
  shrb    %al
  orl     %edx, %eax
  .LVL17:
  movb    %al, -13(%ebp) /* now computed final accumulator */
  .loc 1 377 0
  movzbl  -13(%ebp), %eax eax = accumulator;
  andl    $1, %eax    eax &= 1;
  .LVL18:
  movb    %al, -13(%ebp) accumulator = eax;
  .loc 1 378 0
  movzbl  -13(%ebp), %eax eax = accumulator;
  xorl    $1, %eax    eax ^= 1;
  .LVL19:
  movb    %al, -13(%ebp) accumulator = eax;
  .loc 1 379 0
  movzbl  -13(%ebp), %eax eax = accumulator;
  .LVL20:
  .loc 1 380 0
  addl    $16, %esp function epilogue
  popl    %ebx     ...
  .LVL21:
  popl    %esi     ...
  .LVL22:
  popl    %edi     ...
  .LVL23:
  .loc 1 379 0
  movzbl  %al, %eax sign extend?
  .loc 1 380 0
  popl    %ebp     ...
  ret    return eax;
   i.e. return accumulator;
  .cfi_endproc
  .LFE80:
  .size   consttime_memeq, .-consttime_memeq

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

util, buffers: Preparatory improvements

We invent a new kind of buffer_if which is a readonly view of another
block of memory; such a view can be initialised with
buffer_view_readonly or buffer_view_clone. This makes a convenient
function to allow reparsing a packet, or using the buffer machinery to
parse a particular existing block of memory.

Also, make buffer_assert_free and buffer_assert_used actually call
assert (as well as logging the buffer ownership). So when these fail
(a) we don't attempt the clean teardown, and (b) we get a core dump if
those are enabled.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

unaligned.h: rationalise; provide buf_append_uint8 et al

Replace the formulaic macros
  buf_{,un}{append,prepend}_{uint32,uint16}
with some macro-generated inline functions.  These have better
typechecking, and are also better because it's not possible to
accidentally mess up one of the definitions by failing to permute it
in the right way.

Add the functions for uint8 too.  This involves providing put_uint8
and get_uint8 macros, which we do along the lines of the existing put_
and get_ macros.

Use the new uint8 function in the one place where it's currently useful.
More call sites will appear shortly.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

rsa.c: Check public key length.

The private key is checked quite carefully -- even to a fault -- for
being sensibly sized, but the corresponding function for public keys
appears to have no checking at all. This is a shame since message-
representative construction assumes that the message representative will
fit in a fixed-size buffer.

Fix this situation by checking public key sizes in `rsapub_apply'.

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

rsa.c: Replace the magic length 1024 with a (larger) constant.

While 15360-bit RSA keys are rather large, they're not completely beyond
the realms of possibility and it seems unreasonable to forbid
them. (Specifically, 15360 is the length recommended by NIST for
256-bit security levels.)

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>

rsa.c: Factor out constructing the EMSA-PKCS1 message representative.

This was done in two different places for no reason I could understand.
Replace them both with a single implementation.

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>

rsa.c: Fix incorrect commentary.

The Euler function phi(n) is defined to be

phi(n) = #{ 1 < i < n | gcd(i, n) = 1 }

the number of natural numbers less than n and prime to it; equivalently,
it's the size of the multiplicative group (Z/nZ)^*.

If n = p q is the product of two primes then phi(n) = (p - 1)(q - 1).
But phi(n) is not (if n is composite) the exponent of (Z/nZ)^*. It's
certainly true that

a^{phi(n)} = 1

for all a in (Z/nZ)^*; but the exponent of a group G is the /smallest/
positive integer e such that

a^e == 1

for all a in G. This quantity is denoted lambda(n); in our simple case
where n = p q is the product of two primes it's true that

lambda(n) = lcm(p - 1, q - 1)

Since p and q are large primes, both p - 1 and q - 1 are even, so
lambda(n) is at least a factor of 2 smaller than phi(n).

In fact, lambda(2) = 1, lambda(2^f) = 2^{f-2} for f >= 1, and
lambda(p^f) = p^{f-1} (p - 1) for prime p > 2; and, in general, if n =
p_1^{f_1} ... p_m^{f_m} is the prime factorization of n then

lambda(n) = lcm(lambda(p_1^{f_1}), ... lambda(p_m^{f_m}))

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>

slip: tolerate SLIP decoding errors.

slip_unstuff previously responded to bad SLIP encoding or an overlong
SLIP packet by terminating secnet completely with a fatal error or
assertion failure. It turns out that SLIP encoding corruption can
occur through external action (for example, if you attach gdb to
secnet then things get a bit confused when gdb suspends it) and so
this should be a handled error condition rather than a terminal panic.

Therefore, introduce a system whereby SLIP decoding errors result in a
warning log message followed by ignoring everything in the SLIP stream
up to the next (unescaped) END byte, whereupon we resynchronise and
start decoding again.

Signed-off-by: Simon Tatham <anakin@pobox.com>

tun: add hard routes even if they are currently down.

If a netlink is configured with OPT_SOFTROUTE, we expect to be adding
and removing the kernel routing table entry as the remote site appears
and disappears, so tun_set_route should bring the route up or down
depending on route->up. However, if it isn't, then we won't be doing
that during PHASE_RUN, and therefore we should bring the route up once
and for all at startup.

Previously the state was as follows:

secnet's 'tun' netlink will add and remove kernel routing table
entries during PHASE_RUN if the OPT_SOFTROUTE option is set.
However, at startup during PHASE_GETRESOURCES it will set up routes
for a site only if that site lists an address. So if you have a site
in your sites file with no address but also haven't enabled
OPT_SOFTROUTE (e.g. because you run secnet so that it drops privs),
then no route for that site will _ever_ be set up.

These two policies don't match. We should bring a site's route(s) up
at startup in any situation where we will not be prepared to do so
dynamically during run time. In other words, routes should be added
at startup time not only if they have a fixed address parameter, but
also if they do not have OPT_SOFTROUTE set.

(See also 04f92904ea6c41517ff7154910c16ef4c3bc646b, which seems to be
fixing the analogous bug for userv-ipif. [
bdd4351ff2fc6dc8b1dad689f751ac46347636cf which seems to be the
relevant later commit is in fact empty -iwj. ]

This commit implements this fix, and causes my home secnet
implementation to be able to route to my laptop successfully
(sgo/resolution/resolution). In order to do this I've had to move the
definition of OPT_SOFTROUTE out of netlink.c into netlink.h so that
tun_set_route can see it, which suggests a possible layering
violation, but on the other hand since the netlink_client structure is
visible outside netlink.c it seems only reasonable that the bit flags
used in its 'options' field should be visible too.

Signed-off-by: Simon Tatham <anakin@pobox.com>

build system: debian packaging improvements

changelog: describe version 0.3.0

site: When if our MSG5s (or peer's MSG6s) get lost, preserve the key

When we time out in state SENTMSG5, keep the key we negotiated.
SENTMSG5 gives the peer permission to start sending packets with it so
we need to be able to decrypt them. If we see such packets, we switch
to using the new key at that point and throw the old key away.

This is the final fix to the "connectivity loss during final key
setup can cause locked-up session" bug.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Keep old keys, and allow them to be used by peer

After we have switched to a new key, keep the old key around until we
see packets using the new key.

This is part of the fix to the "connectivity loss during final key
setup" bug. Fixing this requires that both ends be willing to keep
both old and new data keys available until the peer has sent data with
the new key (which might never happen).

If there were also a key setup, the site would then need three keys:
the old data key, the current data key which the peer hasn't started
using yet, and the fresh key it is trying to negotiate. So we have to
a third key, with its own lifetime expiry.

In the code we call the old key the "auxiliary" key because we're
going to use it for an additional fixup in the next patch.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Generalise deletion and timeout of keys

Introduce delete_one_key, and rename delete_key to delete_keys. We
distinguish, now, between deleting a single key, and deleting all the
keys for this site.

The expiry check calls delete_one_key rather than delete_keys, and is
likewise done with a helper function.

No functional change other than to the key expiry log message.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Move current_transform, _key_timeout and remote_session_id into a struct

We are going to introduce another set of these for the previous key,
and putting them together in this struct means we will be able to talk
about themm easily.

This patch is very mechanical: we move the three variables into the
new struct, and simply search-and-replace the references.

No functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: No longer track key validity ourselves

We used to have a variable st->current_valid, which we would always
set to True when assigning a key to current_transform, and to False
when calling delkey (or on initialisation).

Now that we have transform->valid(), this is no longer needed.

We do introduce a trivial helper function current_valid(st) to help us
call it, so the places where st->current_valid was checked don't get
too ugly.

There is no intentional change to the externally-visible behaviour.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

transform: add ->valid() function

This allows outside callers to find out whether the transform has been
keyed, and can therefore relieve them of the need to track this
separately.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Deal with losing our MSG6 - retransmit MSG6 when we see MSG5 in RUN

Fix the lost MSG6 bug in the case where we are the responder:

If we are in RUN and we see a SENTMSG5 encrypted with what we now
regard as the data transfer key, conclude that the peer must have lost
the MSG6.  Regenerate and transmit a new MSG6.

This requires a generalised version of generate_msg6 which allows the
new call site to specify which transform and session id to use, and
which doesn't set st->retries.  So break the meat of generate_msg6 out
into a new function which I have chosen to call create_msg6.  (The
fact that functions called `generate_msg*' set st->retries is slightly
unfortunate.)

We also have to add a transform argument to process_msg5 so that we
can try decrypting MSG5s with the data transfer key.

It is still possible, even after this patch, to get the two ends out
of sync: if, just after the responder has switched to the new key and
entered RUN, the network fails entirely for the key setup timeout, the
initiator will abandon the key setup and switch to WAIT, and continue
to use the new key.  However, the responder has already discarded the
old key.

To fix this is not trivial: it will require us to introduce a third
key, with its own lifetime expiry.  This will be in a later patch.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Deal with losing peer's MSG6 - go to RUN on MSG0 with new key

The original protocol, as implemented, has the following bug:

The responder sends MSG6 on its transition from SENTMSG4 to RUN, on
receipt of the initiator's MSG5.  Unlike all the other key protocol
messages, MSG6 is not retransmitted (for there is no retransmission in
RUN).  If the responder's MSG6 is lost, the initiator has no way of
knowing and won't switch to the new key.  The initator will keep
retransmitting MSG5, which the responder (in old versions of secnet)
ignores.  The responder will be sending data with the new key, which
the initiator (in old versions of secnet) ignores.  This broken
situation can persist until the initiator sends data, which it will do
with the old key, possibly causing a new key exchange attempt to be
initiated by the original responder (in old versions of secnet only),
which may or may not experience the same bug.

This bug needs to be fixed separately at both ends, ideally, so that
running a fixed version at either end avoids the bug.

Here we fix for the case where we are the initiator, as follows:

If are in SENTMSG5 and we see data encrypted with the new key,
conclude that we must have lost the MSG6 and simply switch to the new
key, and state RUN, right away.

Because the transform operates destructively (and changing that fact
about the transform API would be too intrusive), this means we need to
keep a copy of the original ciphertext in process_msg0.

For this purpose we introduce a new buffer `scratch' in the site state
structure, and a new buffer_copy utility function for making a copy of
a buffer (reallocating the destination buffer if necessary).

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site, transform: Do not initiate rekey when packets too much out of order

If packets arrive sufficiently out of order, we may unnecessarily
initiate a rekey simply because of the skew.  Solve this as follows:
* transform->reverse gives a distinct error code for `too much skew'.
* site does not initiate a rekey when it sees that error code.

The rekey-on-decrypt-failure feature is in principle unnecessary.
However, due to another bug it is possible for a key setup to be
regarded as successful at only one of the two ends, and the ability to
immediately do another key setup to try to fix things is useful.

In principle this new behaviour might prevent a necessary rekey: if
the sender has sent around 2^31 packets none of which were received,
the receiver would see the new packets as too old.  But: firstly, this
is very unlikely (it would have involved transmitting several Gbytes
into the void, in a period less than the maximum key lifetime).
Secondly, we ought to expire keys before then anyway so the sender
should know that the key had `worn out' (although currently this is
not enforced).

The benefit of this change is that the old behaviour would be likely
to initiate unnecessary rekeys when on unreliable links (eg, mobile
internet).  Unnecessary rekeys are a bad idea in these circumstances
not just because they clog up the link but also because they make the
association more fragile.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Remove pointless check from decrypt_msg0

It is not necessary to check whether we have a current key before
attempting to unpick and decrypt a MSG0. If we don't,
transform->reverse will simply fail due to lacking a key.

This check needs to be removed, because in forthcoming patches we will
want to be able to try to decrypt the packet with other keys, at which
point the lack of a currently valid data transfer key will not be
relevant and this check will be harmful.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Break out separate function for decrypting msg0

The control flow here is going to become more complicated, and this
change will make the next patches, and the resulting code, clearer.

Note that process_msg0's return value is never used; it is only
defined to return bool_t so that it can use the CHECK_AVAIL macro.
Knowing this will make it easier to see that the patch is correct.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

netlink: abolish check_config and output_config

Apparently, a long time ago, MSG5 and MSG6 used to contain some
netlink configuration data, which the receiver of the MSG5 or MSG6
would check.

However, for a long time now the output_config function has been a
no-op and the check function has unconditionally eaten and discarded
anything extra in the message.

Furthermore, because the MSG6 is not retransmitted, this mechanism
couldn't be reliable without a protocol change. So the existing
interface is defective.

So, abolish it the interface, the dummy implementation, and all the
call sites. The check_config call sites in site.c now instead
directly discard any unexpected data at the end of MSG5 and MSG6.

This patch should cause no behavioural change in actual operation.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

netlink: report why a packet is bad

Replace
<netlink>: bad IP packet from <source>
with
<netlink>: bad IP packet from <source>: reason

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: transport peers: fix incorrect stride when debug output enabled

When there are multiple peer addresses, attempts to copy them from one
table of peers addresses to another with transport_peers_copy will go
wrong because the stride argument to transport_peers_debug is wrong -
we take sizeof() the pointer rather than of the array element.

This will typically cause a segfault if it happens, but the bug can
only be triggered if LOG_PEER_ADDRS debugging is enabled.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

messages: add some missing newlines

Message and cfgfatal must be called with a message containing a
newline (or, a newline sent in a later call). Fix a few call sites.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

log: Print truncated messages

If a message doesn't survive the vsnprintf in vMessage untruncated,
the result may be that its newline is lost.  After that, buff will
never manage to gain a newline and all further messages will be
discarded.

Fix this by always putting "\n\0" at the end of buff.  That way a
truncated message is printed and the buffer will be cleared out for
the next call.

This does mean that sometimes a message which is just slightly shorter
than the buffer will produce a spurious empty message sent to the
syslog.  However, the whole logging approach here will inevitably
occasionally insert spurious newlines, if only to split up long
messages; long messages may also be truncated.  The bad effect of the
change in this patch is simply to make the length at which
infelicities appear very slightly shorter.  The good effect is that it
turns a very nasty failure mode into a benign one.

So I think it is reasonable to err on the side of code which clearly
cannot go wrong in a bad way.  Other approaches which make the
infelicity threshold a couple of characters longer, or which slightly
reduce the different kinds of infelicity, are much more complicated
and therefore more likely to have bugs.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

log: Eliminate potential out-of-control recursion

vMessage looks for the system log instance.  If there is one, it uses
it; otherwise it just uses stderr or stdout.

The system log instance consists, eventually, of syslog_vlog or
logfile_vlog.  Both of these functions have a fallback: if they are
properly set up they do their thing; otherwise they call vMessage.
This is wrong because in this situation they have probably been called
by vMessage so it doesn't help.

At first sight this ought to produce unbounded recursion but for
complicated reasons another bug prevents this.  Instead, messages can
just vanish.

Break out the fallback mode of [v]Message into [v]MessageFallback so
that syslog_vlog and logfile_vlog can use it directly.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Makefile: honour EXTRA_CFLAGS, etc.

This makes it easier to add command-line options at build-time.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

SECURITY: actually reject messages with improper lengths

transform_reverse checks to see if the incoming message is a multiple
of the block cipher block size.  If the message fails this check, it
sets *errmsg but it fails to return.  Instead, it proceeds to attempt
to decrypt it.  This will involve a buffer read/write overrun.

It transform_reverse fails to check to see if the incoming message is
long enough to contain the IV, MAC and padding.  This can easily be
exploited to cause secnet to segfault.

buffer_unprepend should check that the buffer has enough data in it to
unprepend.  With the other patches, this should be irrelevant, but it
turns various other potential read overrun bugs into crashes.

site_incoming should check that the incoming message is long enough.
Again, without this, this is an buffer read overrun or possibly a
segfault.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

userv-ipif: Always request routes from userv, regardless of link quality

Previously the userv-ipif netlink would not request (from userv) a
route for a site for which the link quality was DOWN. The link
quality is a dynamic quantity but userv-ipif lacks any machinery for
dynamically adding routes, so this is wrong.

Instead, in userv-ipif, unconditionally add routes for all sites,
regardless of link up status.

In practice this code is run during startup and the only reason a link
might be down at that point, ie LINK_QUALITY_DOWN, is that it does not
have an address configured. Mobile sites are often in this situation.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

build system: Update build-dependency information

Update build-dependency documentation and add missing references to
flex and bison to debian/control.

Signed-off-by: Matthew Vernon <matthewv@chiark.greenend.org.uk>
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: Do not permit "include" in simple sites files

Restrict the "include" directive to the "header" of -u (groupfile
update) mode. Callers who are simply using make-secnet-sites to
transform a (possibly untrusted) sites file into a (to be trusted)
sites.conf file should not have to worry about includes.

"include" directives are already forbidden in group files.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: In -u mode, output file "dereferences" include directives

Whenm make-secnet-sites is writing the "output" sites file in -u
(groupfile update) mode, it includes the effective contents of files
referenced in "include" directives, rather than the "include"
directive itself.

So the "output" sites file does not any longer depend on any files
included by the header.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: Do newline-trimming in pline()

This will make life a little easier in a future patch. While we're at
it, use rstrip rather than open-coding it.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: Actually include addresses in sites.conf

This was broken in

   commit c9d84e31f48a04eb0af525324707f22c738cc8b7
   Author: Ian Jackson <ijackson@chiark.greenend.org.uk>
   Date:   Sat Dec 17 21:21:47 2011 +0000

       make-secnet-sites: Allow sites with no address

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: If definition found in wrong place, bomb out

If it doesn't exit here, make-secnet-sites might subsequently crash
with Python stack backtrace.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: New -P <prefix> option

make-secnet-sites generates a config file which defines the keys
"vpn-data", "vpn", and "all-sites".

To make it possible to usefully include the output of more than one
different piece of output from make-secnet-sites, support the option
-P <prefix>, which generates a config file which defines
"<prefix>vpn-data", "<prefix>vpn", and "<prefix>all-sites".

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: Allow sites with no address

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: Fix userv invocation after pfilepath

The commit 9b8369e07aeba5ed2c69fb4a7f74d07c8cebe015
make-secnet-sites: refactor to break out new function "pfilepath"
broke the userv service invocation, because it turned out that later
code depended on the "headerinput" variable whose assignment had been
removed and replaced by a call to pfilepath.

Make pfilepath return the lines read from the file and assign the
result to headerinput.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

netlink: Fix up link down behaviour

Partially reverts 04f92904ea6c41517ff7154910c16ef4c3bc646b
"userv-ipif: Always request routes from userv, regardless of link quality"

It turns out that this check is necessary to avoid bringing up a route
for a "netlink" stanza in the configuration file which is never used.

In particular, this avoids bringing up a netlink for (a) sites which
are not mentioned in the config file (b) the site on which secnet is
running.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

userv-ipif: Always request routes from userv, regardless of link quality

Previously the userv-ipif netlink would not request (from userv) a
route for a site for which the link quality was DOWN. The link
quality is a dynamic quantity but userv-ipif lacks any machinery for
dynamically adding routes, so this is wrong.

Instead, in userv-ipif, unconditionally add routes for all sites,
regardless of link up status.

In practice this code is run during startup and the only reason a link
might be down at that point, ie LINK_QUALITY_DOWN, is that it does not
have an address configured. Mobile sites are often in this situation.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: new "include" keyword

Allow "headers" files and "sites" files to contain "include"
directives.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

make-secnet-sites: refactor to break out new function "pfilepath"

No intentional functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

build system: More release checklist updates

Need to "make dist" again.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Update VERSION

build system: Fix Ian's email address in debian/changelog

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

authbind: get endianness right (again)

It appears that:

* authbind's documentation authbind-helper(8) describes the
   endianness convention of authbind's helper program incorrectly.
   See Debian #651694.

* The version of secnet 0.1.16 tagged as such in revision control
   contains a "fix" which was based on the authbind documentation but
   not apparently tested against authbind.  Ie, this part from NEWS:
    4) Change the endianess of the arguments to authbind-helper.
       sprintf("%04X") already translates from machine repesentation to most
       significant octet first so htons reversed it again.

* The version of secnet 0.1.16 actually in service on chiark had an
   out-of-version-control change to udp.c to make it work with
   chiark's authbind 1.2.0.  The actual code found has been recorded
   on the dead branch "chiark-0.1.16" in the master git repo, but the
   version of udp.c is exactly that from 0.1.15 so it looks like we
   just reverted to the previous udp.c during deployment of 0.1.16.

* We (re)discovered all this after the release of secnet 0.2.0
   because my attempt to deploy 0.2.0 on chiark was not actually
   effective.

Therefore, undo the authbind endianness change introduced in secnet
0.1.16.  This is most easily achieved by constructing the arguments to
the helper from the sockaddr rather than the contents of "st".

Thanks are due to Simon Tatham for the bug report.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Update VERSION

build system: Further improvements to release checklist

We have to:
* Update debian/changelog
* Build a new .deb
* Copy the files to chiark:~secnet/ in case anyone looks there

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

New --managed option for use when running under a daemon supervisor.

The effect is secnet is told that it is running as a daemon right from
the start, so it knows to follow the logging rules for daemons but not
to fork.

The conflation of daemonization with dropping privilege is also
unpicked by this patch. Most importantly this ensures that errors
from PHASE_GETRESOURCES operations such as 'route' commands is sent to
the logfile (or syslog).

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Make the list of phases be an enumeration.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Set group ID and group list.

More sensible username lookup.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

build system: Include signing of tarballs in release checklist

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

build system: New "make dist" uses git; add release checklist

We no longer try to separately track which files should be shipped in
the tarball. Instead, the tarball contains exactly the same files as
the repo.

Add a release checklist at the bottom of Makefile.in, based on the one
from the stable branch (in 813ff677f7020b12fb1356e977a73dc910e64ce5).

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

md5: correct size arg to memset().

On realistic targets the effect of the error was that the MD5 context
wasn't fully cleared.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

cleanup: build on Ubuntu Lucid

The discard() idiom is chosen because Clang tolerates it.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Add a man page.

As it stands it is purely reference material - no conceptual or
tutorial stuff whatsoever.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Remove snprintf reimplementation.

We don't want to think about the "Frontier Artistic License" and the
requirements it may or may not impose on distribution of secnet.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Config file fixes.

* Reject integers in excess of 2^32-1 (rather than reducing them mod
  2^32).
* ptree_dump():
  - Remove a magic number.
  - More realistic recursion limit.
* Various bits of type hygeine.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Detect Fink automatically.

The logic is that if Fink is in the path then we should use it. So
(unlike certain build systems I could mention) if you want to use a
local install of GMP, it should be sufficient to run ./configure
without /sw/bin in the path.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

OSX: fix home directory in setup.mac.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Mobile sites: Update test example

Make the "inside" site in test-example be mobile.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Multiple udp ports: Add to test

For testing, add an extra port to one of the sites in test-example.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Multiple udp ports for the same site (multiple "comm"s)

Now you can meaningfully specify more than one comm closure (ie, more
than one udp port) for a site. secnet will respond to incoming key
exchange and data packets on any of the ports.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Mobile sites: Support in make-secnet-sites

This allows a sites file to specify that a site is mobile; the mobile
flag gets propagated into the generated sites.conf.

Includes new support for boolean values.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Mobile sites: Use different default tuning parameters

Links involving mobile peers are best served by somewhat different
tuning parameters. So make the defaults vary accordingly.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Mobile sites: Require specification of whether we think we are mobile

We introduce a new config option "local-mobile" which must match our
peers' idea of whether we are mobile. This is cross-checked with the
"site" entry for our own site, if there is one, to detect mistakes.

We do not transfer this data in the protocol because we don't want to
break compatibility with older secnets which do not understand mobile
peers at all.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Mobile sites: Maintain multiple addresses for some peers (new feature)

Provides the following new config option, and some satellite options:

  mobile (bool): if True then peer is "mobile" ie we assume it may change
    its apparent IP address and port number without either it or us
    being aware of the change; so, we remember the last several port/addr
    pairs we've seen and send packets to all of them (subject to a timeout).
    We maintain one set of addresses for key setup exchanges, and another for
    data traffic. [false]

In the code this involves replacing the comm_addrs for setup_peer and
peer with a new struct transport_peers which contains zero or more
comm_addrs.  This subsumes peer_valid, too.

Additionally, we are slightly cleaner about the use of setup_peer: we
ensure that we clean it out appropriately when we go into states where
it won't (shouldn't) be used.

The transport_peers structure is opaque to most of site.c and is
manipulated by a new set of functions which implement the detailed
semantics described in site.c.

The main code in site.c is no longer supposed to call dump_packet and
st->comm->sendmsg directly; it should use transport_xmit (which will
do dump_packet as well as appropriate sendmsgs).

No intentional functional change if mobile=false (the default) and the
new debug log feature ("peer-addrs") is not enabled.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Protocol change: Initiate key setup on incoming packets, not outgoing ones

If data is exchanged after the "renegotiation time", we need to
refresh the data transfer key - ie, to initiate a key setup.
Previously this was done by the peer which wanted to transmit data
using an existing but in-need-of-refreshing key.

However, mobile peers may often be disconnected.  There is no point
trying to do a key renegotiation while they are disconnected.  Thus it
is better to have the renegotiation initiated by a peer which receives
a data packet.  That means that if there is a network outage,
renegotiation will be deferred until the network is restored.

In particular, it means that a mobile node which has no underlying
network but which has applications trying to send data will not waste
effort attempting key renegotiation until it once more has
connectivity.

This minor functional change should be harmless or even beneficial for
non-mobile sites too.  It simply means that the other peer will play
the role of initiator during renegotiation, but since which peer
played this role is arbitrary for non-mobile sites this should make no
difference.

Compatibility: In the case of an old version of secnet talking to a
new version, only data packets in one direction will cause
renegotiation.  This should not be a problem since all real-world
IP protocols involve data in both directions.

So we make the new behaviour universal rather than making it depend on
the forthcoming "mobile-peer" site config option.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

comm, udp: Provide an addr_to_string method

This method writes (into a single static buffer) a string describing a
comm_addr. It describes both the comm instance and the peer address.

No callers yet, but one is about to be introduced.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Site tuning defaults: Improve documentation; internal improvements

Two documentation fixes:

* Quote the correct default value for setup-timeout
  (setup_retry_interval) in the README.
* Improve the documentation of the default value for renegotiate-time.

Three simple internal improvements:

* Change definitions of DEFAULT_* to forms which make the semantics
  of the values clearer, thus obviating the need for the comments
  giving human-readable values.
* Add comments to DEFAULT_* time values giving the units (currently,
  they are all in millisecons).
* Rename (in the code) SETUP_TIMEOUT and setup_timeout to
  SETUP_RETRY_INTERVAL, which is more accurate.  We leave the config
  dictionary entry with the misleading name (particularly, since we
  have no facility for spotting obsolete or misspelled config keys).

Also two changes which are preparatory to the mobile peers support:

* New macro DEFAULT(D) which currently simply expands to DEFAULT_*.
* New convenience macro CFG_NUMBER for a common pattern of use for
  dict_read_number, which uses DEFAULT().

No intentional functional changes.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

comm, site: pass a new "struct comm_addr" rather than sockaddr_in

This abstracts away the fact that the peer address is a sockaddr_in;
instead, most of the code in site.c now only handles the peer address
as an opaque structure.

We put the struct comm_if* in the address structure too, as this will
be useful later. Amongst other things, doing this arranges that the
comm client knows which comm is notifying about an incoming packet.
Previously the client was expected to "just know" because the only
actual client in secnet is site which currently only deals with one
comm.

Also make the relevant arguments const-correct.

Contains consequential changes the signatures of many functions but no
intentional functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: When shutting down, if debug enabled, do dump the MSG7

Insert a call to dump_packet in send_msg7. The packet is mostly
ciphertext and may not make a great deal of sense but turning on
debugging should not show all management packets; only data packets
should not be shown.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

site: Remove some spurious "break"s

Remove two occurrences of "break" which immediately follow
unconditional "return". No functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Keepalives: Document that they're unimplemented; remove vestigial code

Keepalives are not actually implemented; the keepalive option does
nothing. Mention this in the README. Remove the option's parsing
and recording from site.c.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

cleanup: Style improvement to python literals in make-secnet-sites

Python literals can have a trailing comma after the last element.
Doing this is a good idea as it means that adding new elements at the
end doesn't necessitate modifying the previous line.

No intentional functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

cleanup: provide helpful FILLZERO macro (for certain memset calls)

This macro replaces these idioms:
memset(&foo,0,sizeof(foo)); => FILLZERO(foo);
memset(foo,0,sizeof(*foo)); => FILLZERO(*foo);

This makes it impossible to accidentally get the wrong size.

Use this macro in all such patterns in secnet, apart from two in
site.c which are going to be removed soon anyway.

No intentional functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

secnet.h: provide helpful STRING macro (for preprocessor stringification)

Uses preprocessor stringification; can be helpful for rendering
compile-time constants (eg compile time limit macros) for messages,
etc.

Not used yet; will be used later in this series.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Test example: improve logging, choice of ports

* Turn up the logging in test-example/common.conf, to "all".

* Change the udp port numbers we use to 16900 and 16910, which
conveniently have a "0" for "outside" and a "1" for "inside".

* Swap the order of the inside and outside sites in the sites file, so
that the consistentliy-lower-numbered "outside" comes first.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

.gitignore build-stamp.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Introduce an installdirs target.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Correct a couple of errors in the README.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Prohibit lists where single values expected.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Document XCode 3.2 import.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Fix version number in a couple of places.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Security: Reduce impact of bogus key setup packet DoS

If a MSG1 (key setup initiation packet) is received containing
expected local and remote site names, the receiving secnet will start
a key setup attempt with details from that packet.

MSG1 packets are (almost necessarily) unauthenticated, so anyone on
the Internet can cause this to happen.  secnet is only willing to have
one key exchange attempt ongoing at once, and will ignore subsequent
incoming MSG1s until it has dealt with the first key exchange attempt.

So this means that an attacker who can send packets to any secnet
instance can DoS secnet at session setup (or key renewal) time.  All
the attacker needs to know is the secnet site names, and the IP
address and port number of one of the secnets.  The attacker does not
need to spoof their IP address or know any secret keys.

If the attacker sends a contant stream of bogus packets they can
probably prevent the link coming up at all.

This is difficult to fix without changing the protocol.

However, there is worse: when the key setup with the bogus peer
eventually fails, as it must, secnet invalidates the current session
key and its note of where to send actual data packets.  It will then
refuse to attempt a new key exchange for a timeout period.  During
this period, data packets will not flow.

This means that sending one fairly easy to construct udp packet can
cause a 20s outage.  Worse, after this one packet has had its effect,
the attacker can prevent the connection being reestablished, as
described above.

In this patch we fix the latter problem.  It is simply a bug that the
session key and data transport peer address (resulting from a previous
successful key exchange) are discarded when a key setup fails.

We also provide a test program "test-example/bogus-setutp-request.c"
which can be used to reproduce the problem.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Test example: instructions for running under valgrind memcheck.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Test example: Files for a simple testing configuration now in test-example/

Including a set of dummy keys, and dummy IP addresses in 172.18.232.0/28.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

build system: add *.pyc to .gitignore

Python tends to leave this dropping.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

resolver: support IPv4 address literals

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
IP ADDR LIT size_t strlen

Build system: make several patterns in .gitignore absolute paths

Specifically, version.c, and the files which are generated by
autoconf/automake but whose names do not unambiguously say so.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

Log write failures on tun device.

This is useful when debugging. Short writes count as failures The
error reporting is rate limited to ensure that the logs are not
flooded.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

sys_cmd error handling improved.

(1) If the subprocess exits nonzero then the exit status
    is unpicked and logged.
(2) If the exec in the child fails, the command and
    errno string are written to stderr (which should
    end up in secnet's usual log output).
(3) _exit() is used instead of exit(), to avoid
    any possibility of craziness with stdio/atexit/etc.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

.gitignore XCode-specific files.

Checking those files in doesn't make much sense as they change at the
drop of a hat.

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>

Update Mac-specific notes.

- how (and why) to enable IP forwarding
- correct 'launchctl unload' invocation

Signed-off-by: Richard Kettlewell <rjk@terraraq.org.uk>