Question for Duncan Campbell re: Word-Spotting Capabilities

Duncan Campbell duncan at gn.apc.org
Tue, 03 Aug 1999 00:09:41 +0100


2 August 1999

The first item in this strand asked about the basis for my conclusion in 
the recent European Parliament report 
(http://www.gn.apc.org/duncan/stoa_cover.htm) that "word spotting" methods 
did not exist in any useful deployable form for sigint analysis and message 
selection when dealing with high capacity voice communications interception 
- i.e., typically, the form of analysis required to select messages of 
intelligence interest from a digital or analogue multiplex carrying 
thousands of simultaneous telephone calls.

The writer appears not to have read the technical annexe to the IC2000 
report (http://www.gn.apc.org/duncan/ic2kreport.htm#Annexe) but asserts as 
fact what some third party thought it said "I was recently in touch with 
Guy Polis, who tells me that ..."   The annex in fact explains the reasons 
why, although highly trained speech recognisers can be installed on modern 
desktops and function with a reasonably low error rate, such systems are 
not translatable into broadband surveillance systems, and provide no basis 
for supposing that such a capability can exist.

There have been many reports of such a capability based on Nicky Hager's 
book Secret Power (1996).   In fact, Nicky's account of the ECHELON system 
in New Zealand identifies ECHELON and its critical component, the 
DICTIONARY computers, as functioning only against machine readable signals 
- that is to say data, e-mail, (OCR'd) faxes, telex and the like.  Indeed, 
he points out that New Zealand does not have the Sigint personnel to listen 
to phone calls.

In the passage quoted, Nicky referred to a different book, Spy World (1994) 
written by Mike Frost and Mike Grattan.  This does refer to an NSA-designed 
suitcase called ORATORY, which was used for Sigint collection in hostile 
city environment.   If it was true that a 1990 era suitcase could contain 
not just a microwave downconverter and full associated demux equipment, 
plus multichannel speech recognition equipment with word spotting built in, 
and all associated recorders and control computers, that would indeed be a 
remarkable black box.

Mike Frost was a former employee of the Canadian sigint agency CSE.  Mike 
Grattan was a journalist.   Mike Grattan wrote the book.  My understanding 
is that Mike Frost has subsequently made it clear that ORATORY's capacities 
were mis-stated by Mike Grattan in the book, and that ORATORY functions 
only to recognise keywords in intercepted telex type traffic; ie machine 
readable signals.

The major original investigations of Sigint and Echelon in the last ten 
years have suggested that word-spotting in voice channels is part of 
Echelon.  The opposite is reported.   These include my own reports, the 
recent Australian Channel 9 documentary, a British TV report which 
uncovered a DICTIONARY computer in London (targetted only on telex) and the 
landmark Baltimore Sun series in 1995 by Scott Shane and Tom Bowman (which 
quite specifically reports that NSA had not achieved this task).  However, 
secondary reporting by others has commonly added such a claim.

The conclusion that a capability to word-spot does not exist is based on 
(a) detailed study of the literature, including the work done at NSA's 
behest in annual DARPA sponsored workshops; (b) sources with direct inside 
knowledge; and (c) reliable published journalistic sources.

The results in each arm are the same.   No speaker-independent word 
recognition system can produce anything ressembling an acceptable error 
rate (false positivies and false negatives) for Sigint use as a message 
selection technique.    Inside sources say quite specifically that much as 
they would like to have deployed such a system, it has been 
unachievable.    Noting that ORATORY has been misdescribed, no reliable or 
first hand journalistic source say that the technique has been 
achieved.  Several, including Shane and Bowman, indicate the opposite.

I cannot of course identify confidential sources, but there are two I can 
quote.  One is Rear Admiral Bobby Inman, former NSA director, who told me 
in a 1993 interview that "I have wasted more US taxpayers dollars trying to 
do that (word spotting in speech) than anything else in my intelligence career"

The second is Professor Steve Young, a UK director of the cutting edge 
speech recognition firm Entropic (mentioned by John Young), who said last 
month "It is true that word spotting is not effective -- I don't know 
anybody these days still trying to do it."

Entropic Inc are among the the world leaders in speech recognition using 
Hidden Markov Models (HMMs).  There is nothing surprising about the amount 
of literature around about HMMs; all today's speech recognition packages 
use this system.  The only other speech recognition game in town, using 
neural networks, does not produce better results.

It is true that all the early work in speech recognition was inspired by 
the Sigint agencies.  Thirty years later, like the Internet itself, the 
civilian applications have proved the more usable.  Steve Young also 
comments : "The better approach, as you suggest, is to do a full 
transcription and then use the text for topic spotting and/or information 
retrieval."

In summary : Voice message (i.e., phone call) selection can be done on 
called or calling telecommunications address (including the so-called "wild 
card" criteria, selecting all message from a particular city suburb and/or 
at a particular time); or by individual speaker recognition.   Word 
spotting is not available, but as computational power increases, topic 
spotting by running continuous speech recognition engines on a per-channel 
basis will become affordable, at first for high value targets

Duncan Campbell