PGP source code
Brian Gladman
Brian Gladman" <brg at gladman.plus.com
Mon, 3 Sep 2001 14:52:51 +0100
From: "Charles Lindsey" <chl@clw.cs.man.ac.uk>
To: <ukcrypto@chiark.greenend.org.uk>
Sent: Monday, September 03, 2001 10:09 AM
Subject: Re: PGP source code
> On Sun, 02 Sep 2001 17:38:32 +0100
> Ben Laurie <ben@algroup.co.uk> said...
>
> >
> > Charles Lindsey wrote:
> > >
> > > Actually, the main reason I would want to compile it would be to
verify
> > > that the result was the same binary as I had already obtainrd
(legally)
> > > from NAI. (How easy is it to do that check in practice?)
> >
> > In my experience, impossible - the problem being that parts of the
> > binary (padding, typically) tend to be from uninitialised data.
>
> I don't think that makes it impossible - just difficult. What you
> need to show is that everything present in the source code is present
> (and correct) in the binary. Where the source code says "here be an
> uninitialized array", you don't care what is in the binary.
I think there is a big difference between an 'in principle' and an 'in
practice' answer here for any code base that is as large as PGP. I have
attempted this form of PGP verification on earlier versions and it was
impossible in practice to achieve a sensible answer.
There were many reasons for the failure: PGP optimisation options, library
compilation options, compiler options and so on. And on top of these
conscious options, each of the components existed in a number of versions
that did not give the same results (for example MS VC++ version 6 is now up
to service pack 5 and each different compiler binary produces different
results). And in the absense of a very well specified and documented
environment for the original source to binary transformation it is simply
not ecconomically practical to attempt to recreate it (for example by
decompilation)
In total I found a very large number of variables, each of which changed the
final result of compilation but which were not specified or controlled as a
part of the NAI source code release. In consequence any attempt to
demonstrate binary code equivalence was simply not practical. In order to
be practical this has to be planned for from the beginning and this has not
been done with the PGP source code (or the tools used to compile and link
it).
> Other things that can be different arise from 'date of compilation'
> and suchlike embedded in the source code; different order of assembly
> of segments by the loader, thus jumps will apparently go to different
> places; different optimizations performed by the two compilers (which
> may be different versions of the same compiler); different register
> allocations by the two compilations; and so on.
>
> These problems are all in principle solvable. My question was whether
> this problem had been looked at at all, and whether there were tools
> around to do it.
>
> For example, if you had binary 1, and used my compiler to produce
> binary2, and then used a decompiler on binary1 and binary2 to procude
> pseudo-source1 and pseudo-source2, and then tried to compare the two
> pseudo-sources, would that be an easier task?
Not in my view for PGP. Unless this form of verification is planned for from
the outset it is not practical to undertake it for a large code base such as
PGP.
In my view the important thing here is to (a) have the source code and (b)
to be able to compile one's own binaries. For earlier versions of PGP I did
exactly this in order to substitute my own binary for one legally obtained
from NAI. This does not eliminate the compiler risks but it does remove
some other forms of risk.
But it now seems that this is no longer permitted. Once there is a good
Windows front end for GPG I suspect I will be moving to use it in place of
PGP!
Brian