What is Communications Data?

Richard Clayton richard at highwayman.com
Wed, 13 Nov 2002 00:25:20 +0000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <PXJKW2AoWW09EwML@perry.co.uk>, Roland Perry
<roland@linx.net> writes

>In message <B9F6FCF5.25FED%zenadsl6186@zen.co.uk>, Peter Fairbrother
><zenadsl6186@zen.co.uk> writes

>>Simon's list includes the size of the communication in bytes -  in most
>>cases that would be enough to identify the page accessed.
>
>Therein lies one of the devils in the detail. Each thing that looks like
>a page to you is actually anything up to a hundred separate items.

Each of which may well have a characteristic size, but the combination
may be extremely rare -- so this will improve identification.

>Logging these plays havoc with the size of web cache files (which is why
>they don't last very many days) and what you'll get is a pile of jigsaw
>pieces. Even assuming you know where to look [the only clue's the
>machine with the website] reverse engineering of the kind you allude to
>is non-trivial.

As it happens, it's quite easy :( -- even when the data stream is
obscured by SSL (so the page lengths are rounded off)

My colleague George Danezis has been looking at this problem. Here's a
gentle introduction to the ideas...

        http://www.cl.cam.ac.uk/~gd216/TrafficAnalysisPoster.pdf

... and if you start messing with Markov models (using statistical
models of the way in which you might move through a site) you can get
even more exact identifications.

            George will doubtless correct any misapprehensions
            I may have about his work :)

Bottom line -- when looking at web sites, bytes transferred is a very
precise indicator of content access; even when the connection is via an
intermediary so the site being visited is not immediately apparent!

- -- 
richard                                              Richard Clayton

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety.         Benjamin Franklin

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1

iQA/AwUBPdGb8BfnRQV/feRLEQI0+QCfbr58+w97Wt31Y+5UqJD77F8ablsAoJGH
+T4VkAevA5+EOKDC3sX5QITx
=G3kI
-----END PGP SIGNATURE-----