Mastering the Internet

ken ukcrypto at chiark.greenend.org.uk
Tue, 05 May 2009 13:14:59 +0100


Peter Fairbrother wrote:

 > What if the black boxes look for keywords,
 > encrypted material, etc in  content, and just
 > send that type of content back to Cheltenham?
 > That solves the bandwidth problem.

Only if you either store the content somewhere so you can look 
at it later if it turns out to have been interesting, or else 
then use that keyword information to set up a more detailed 
monitoring of some stream of communications in future.

You have to actually READ the stuff. If you don't, keywords 
don't win you anything that traffic analysis already did.

There is (I imagine but don't know for sure) too much encrypted 
or compressed data flying around already for the mere presence 
of encryption in otherwise untargeted traffic to be sufficient 
to attract the attention of spooks.

 > Or maybe GCHQ has a magical compression algorithm?
 > We have very little
 > idea of the actual entropy of most communications.

Yes we do. Really. People who do boring things like designing 
routers and network cards have a very good idea of what is sent 
in real life. And the maths is the same whatever magic they have 
in Cheltenham. And is exactly the same maths as we use to do 
bioinformatics and stuff - for example trying to tell genes from 
"junk" DNA, or find where genes start and end on a genome. 
Linguists do it too. Telling signal from noise can be hard, but 
we know about compression and entropy. Really.

For what its worth I have very little doubt that GCHQ and the 
other three and four letter acronym agencies in this country and 
abroad now and again read people's mail, whatever the law says. 
I have no idea at all whether they do that a few dozen times a 
year or a few hundred times a second. My guess is the latter is 
more likely.

Which is (pretty obviously) the reasoning behind UK government's 
reluctance to use intercept data in criminal prosecutions. 
Sometimes the ordinary police get tipped off from illegal 
intercepts.

But its just very UNlikely that they can intelligently scan ALL 
electronic communications, even all unencrypted electronic 
communication, and extract sense from it. Not only, as Roland 
said,  would they need a computer & comms infrastructure of the 
same order of magnitude as the whole rest of the world (& the 
NSA & the CIA & the FBI are big but they are not THAT big - 
although they are conveniently located right next to the US end 
of loads of transatlantic cables) but also to read the stuff 
they would either have to employ ten percent of the population 
to snoop on the other ninety percent (like they used to in 
Romania and East Germany) or else they would have to have some 
sort of AI that we are pretty sure they don't (because if they 
did it would be getting used for all sorts of other things by now)

Keyword scans don't cut it. They can can only alert you to some 
possible information of interest but you still need to read it 
to be able to take action on it. Even really clever stuff like 
Google does. (Google is relevant. I suspect that the spooks are 
the followers not the leaders on the software side of this 
technology, as they have been on the hardware side for decades)