Question for Duncan Campbell re: Word-Spotting Capabilities
Duncan Campbell
duncan at gn.apc.org
Tue, 03 Aug 1999 00:09:41 +0100
2 August 1999
The first item in this strand asked about the basis for my conclusion in
the recent European Parliament report
(http://www.gn.apc.org/duncan/stoa_cover.htm) that "word spotting" methods
did not exist in any useful deployable form for sigint analysis and message
selection when dealing with high capacity voice communications interception
- i.e., typically, the form of analysis required to select messages of
intelligence interest from a digital or analogue multiplex carrying
thousands of simultaneous telephone calls.
The writer appears not to have read the technical annexe to the IC2000
report (http://www.gn.apc.org/duncan/ic2kreport.htm#Annexe) but asserts as
fact what some third party thought it said "I was recently in touch with
Guy Polis, who tells me that ..." The annex in fact explains the reasons
why, although highly trained speech recognisers can be installed on modern
desktops and function with a reasonably low error rate, such systems are
not translatable into broadband surveillance systems, and provide no basis
for supposing that such a capability can exist.
There have been many reports of such a capability based on Nicky Hager's
book Secret Power (1996). In fact, Nicky's account of the ECHELON system
in New Zealand identifies ECHELON and its critical component, the
DICTIONARY computers, as functioning only against machine readable signals
- that is to say data, e-mail, (OCR'd) faxes, telex and the like. Indeed,
he points out that New Zealand does not have the Sigint personnel to listen
to phone calls.
In the passage quoted, Nicky referred to a different book, Spy World (1994)
written by Mike Frost and Mike Grattan. This does refer to an NSA-designed
suitcase called ORATORY, which was used for Sigint collection in hostile
city environment. If it was true that a 1990 era suitcase could contain
not just a microwave downconverter and full associated demux equipment,
plus multichannel speech recognition equipment with word spotting built in,
and all associated recorders and control computers, that would indeed be a
remarkable black box.
Mike Frost was a former employee of the Canadian sigint agency CSE. Mike
Grattan was a journalist. Mike Grattan wrote the book. My understanding
is that Mike Frost has subsequently made it clear that ORATORY's capacities
were mis-stated by Mike Grattan in the book, and that ORATORY functions
only to recognise keywords in intercepted telex type traffic; ie machine
readable signals.
The major original investigations of Sigint and Echelon in the last ten
years have suggested that word-spotting in voice channels is part of
Echelon. The opposite is reported. These include my own reports, the
recent Australian Channel 9 documentary, a British TV report which
uncovered a DICTIONARY computer in London (targetted only on telex) and the
landmark Baltimore Sun series in 1995 by Scott Shane and Tom Bowman (which
quite specifically reports that NSA had not achieved this task). However,
secondary reporting by others has commonly added such a claim.
The conclusion that a capability to word-spot does not exist is based on
(a) detailed study of the literature, including the work done at NSA's
behest in annual DARPA sponsored workshops; (b) sources with direct inside
knowledge; and (c) reliable published journalistic sources.
The results in each arm are the same. No speaker-independent word
recognition system can produce anything ressembling an acceptable error
rate (false positivies and false negatives) for Sigint use as a message
selection technique. Inside sources say quite specifically that much as
they would like to have deployed such a system, it has been
unachievable. Noting that ORATORY has been misdescribed, no reliable or
first hand journalistic source say that the technique has been
achieved. Several, including Shane and Bowman, indicate the opposite.
I cannot of course identify confidential sources, but there are two I can
quote. One is Rear Admiral Bobby Inman, former NSA director, who told me
in a 1993 interview that "I have wasted more US taxpayers dollars trying to
do that (word spotting in speech) than anything else in my intelligence career"
The second is Professor Steve Young, a UK director of the cutting edge
speech recognition firm Entropic (mentioned by John Young), who said last
month "It is true that word spotting is not effective -- I don't know
anybody these days still trying to do it."
Entropic Inc are among the the world leaders in speech recognition using
Hidden Markov Models (HMMs). There is nothing surprising about the amount
of literature around about HMMs; all today's speech recognition packages
use this system. The only other speech recognition game in town, using
neural networks, does not produce better results.
It is true that all the early work in speech recognition was inspired by
the Sigint agencies. Thirty years later, like the Internet itself, the
civilian applications have proved the more usable. Steve Young also
comments : "The better approach, as you suggest, is to do a full
transcription and then use the text for topic spotting and/or information
retrieval."
In summary : Voice message (i.e., phone call) selection can be done on
called or calling telecommunications address (including the so-called "wild
card" criteria, selecting all message from a particular city suburb and/or
at a particular time); or by individual speaker recognition. Word
spotting is not available, but as computational power increases, topic
spotting by running continuous speech recognition engines on a per-channel
basis will become affordable, at first for high value targets
Duncan Campbell