What is Communications Data?
Richard Clayton
richard at highwayman.com
Wed, 13 Nov 2002 00:25:20 +0000
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
In article <PXJKW2AoWW09EwML@perry.co.uk>, Roland Perry
<roland@linx.net> writes
>In message <B9F6FCF5.25FED%zenadsl6186@zen.co.uk>, Peter Fairbrother
><zenadsl6186@zen.co.uk> writes
>>Simon's list includes the size of the communication in bytes - in most
>>cases that would be enough to identify the page accessed.
>
>Therein lies one of the devils in the detail. Each thing that looks like
>a page to you is actually anything up to a hundred separate items.
Each of which may well have a characteristic size, but the combination
may be extremely rare -- so this will improve identification.
>Logging these plays havoc with the size of web cache files (which is why
>they don't last very many days) and what you'll get is a pile of jigsaw
>pieces. Even assuming you know where to look [the only clue's the
>machine with the website] reverse engineering of the kind you allude to
>is non-trivial.
As it happens, it's quite easy :( -- even when the data stream is
obscured by SSL (so the page lengths are rounded off)
My colleague George Danezis has been looking at this problem. Here's a
gentle introduction to the ideas...
http://www.cl.cam.ac.uk/~gd216/TrafficAnalysisPoster.pdf
... and if you start messing with Markov models (using statistical
models of the way in which you might move through a site) you can get
even more exact identifications.
George will doubtless correct any misapprehensions
I may have about his work :)
Bottom line -- when looking at web sites, bytes transferred is a very
precise indicator of content access; even when the connection is via an
intermediary so the site being visited is not immediately apparent!
- --
richard Richard Clayton
They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety. Benjamin Franklin
-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1
iQA/AwUBPdGb8BfnRQV/feRLEQI0+QCfbr58+w97Wt31Y+5UqJD77F8ablsAoJGH
+T4VkAevA5+EOKDC3sX5QITx
=G3kI
-----END PGP SIGNATURE-----