secnet-0.6.0 connection stalling
Ian Jackson
ijackson at chiark.greenend.org.uk
Thu Feb 27 16:11:37 GMT 2020
Stephen Early writes ("secnet-0.6.0 connection stalling"):
> Within 24 hours I started to receive reports that the central site was
> unable to communicate with the other sites. Logging in to both ends, I
> discovered that packets sent from the central site were not being
> received at the remote site; however, sending a packet from the remote
> site to the central site restored connectivity in both directions. This
> problem occurred with all the remote sites, although did not affect all
> of them at the same time.
Thanks for the report. I don't have an immediately helpful theory
about what this might be. The symptoms are consistent with a problem
with peer public address selection.
> There was nothing in the secnet logs at either end. No keys had timed
> out. Nothing was logged when connectivity was restored.
Is NAT involved at all ? I see some of the public addresses in the
sites file are dyndns names; are they all up to date ? I see the
sites file doesn't declare any of these sites mobile; do any of them
think, locally, that they are mobile ?
> I downgraded secnet at the central site and one of the remote sites to
> the previously working version (0.4.5). After a further 24 hours, there
> were still reports of connectivity problems between the central site and
> the sites still running version 0.6.0. I've now downgraded all the
> sites to 0.4.5.
>
> Would it be helpful for me to try version 0.5.1 at some or all of the
> sites? Is there anything else that it would be helpful for me to do?
If you are willing and able to reproduce the problem with debug
logging enabled, that would definitely help. Eg,
log logfile {
filename "/var/log/secnet";
class "info","notice","warning","error","security","fatal","debug";
Best would be if you were able to bisect the problem. I can advise
how to do that. But it sounds like the problem is intermittent and
takes a long time to manifest, which may make that awkward ?
As I say I suspect peer public address selection is going wrong. I
have reviewed the changelogs between 0.4.5 and 0.6.0 and none of those
changes ought to impinge on that in non-mobile configurations. But
there has been a lot of rearrangement and maybe one of those
rearrangements made an unintentional change.
Other than bisection, another possibility would be to reproduce the
problem and then see where the central site is sending its packets,
and whether the leaf site is seeing them, using strace and/or tcpdump.
I have one other rather crazy theory: maybe something in 0.4.5 is
resulting in the sending of within-vpn packets from the leaf site to
the central site. Log messages maybe. There have been changes to the
way secnet does its logging so maybe some are going to files that were
going to syslog and packets that were keeping things working have now
gone away.
Ian.
--
Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own.
If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.
More information about the sgo-software-discuss
mailing list