This article is about how Professor Hawking's voice was replaced with a software emulation. Lots of details are missed out for the sake of brevity, but I hope it gives some idea about what was involved in the project.
I have an interest in computers which was started with the Sinclair ZX81. Nearly everything I've done since then has had a strong computing element. Currently I run the computer systems for one division of the Department of Engineering at the University of Cambridge.
A key person in Professor Hawking's support team is the Graduate Assistant. They look after all technical aspects of Professor Hawking's wheelchair, including the speech system. They have the important job of making sure that it all works whenever Professor Hawking needs it. If it doesn't work, they drop whatever they're doing to fix whatever the issue is.
I worked with the last two Graduate Assistants, Sam Blackburn and later Jonathan Wood, to update the part of Professor Hawking's communication system that converts text to speech. There were several objectives, including removing the dependence on ageing hardware, now about 30 years old, and to reduce the complexity of the installation.
Articles published about this work would have you believe that it was a single attempt to do this that just needed enough effort to make it work. That would have been very high risk, and the reality is that the project I worked on was one among several being persued simultaneously. Whichever project produced an acceptable result first would likely be the one adopted.
An approach that had already been tried by Phonetic Arts and others, and which was largely successful, was to train a modern voice system using a script designed to be phoneme complete. This is particularly easy to do with the voice synthesizer board because the computer isn't going to get bored reading training data. I haven't heard the result myself, but understand that it was pretty convincing, at least to other people. However it wasn't close enough to be accepted by Professor Hawking.
The graduate assistants had also previously looked at modifying the DECTalk voice, with Intel and Ed Bruckert (inventor of DECTalk) which had a common ancestry with MITalk, Klattalk, and SpeechPlus. But again the resulting voice was not accepted by Professor Hawking.
Another approach was to find more SpeechPlus CallText boards. When I started on the project, there were two working CallText 5010 boards in Cambridge: the one Professor Hawking was actually using on his chair, and another board on his backup chair. Intel had another which they used for developing the ACAT text entry system. By the time the project finished, Professor Hawking had five boards in total – three 5010 models, and two earlier generation 5000 models that were updated to make them sound the same. Three of these were not working at various stages and were repaired by Paweł Woźniak, an electronics engineering student in Huddersfield. Some of the repairs were possible thanks to Eric Dorsey, who was on the team that created the voice system in the first place and who was in touch with Hari Vyas, the original electronics designer of the board; Hari found some old schematics for the 5000 board in his garage!
There were other plans too. Eric, Jonathan and Patti Price, a speech synthesis expert who was a postdoc at MIT under Dennis Klatt in the early 80s, tried to find the original sources for the program. Eric tracked down other original members of the 1980s SpeechPlus team, but none had any source code. Instead they then tracked the company buyouts. In 1990, SpeechPlus was acquired by Centigram Communications, which was in turn bought out by Lernout & Hauspie (1997), then ScanSoft (2001), to Nuance Communications (2005). Amazingly, via his contacts at Nuance, Eric managed to locate the SpeechPlus source code for a later version on backup tapes at Nuance, from which we might have been able to reverse out the changes to get the 1986 version. It was for the lawyers to sort out an NDA that would protect everyone's interests. That ultimately came to nothing because we had an another approach that worked first.
I was fortunate to work on the approach that was successful, accepted by Professor Hawking and installed on his chair. This was an emulator of the hardware board, which could then run the same program and therefore produce the same voice. This work was done without the benefit of source code. We did get schematics, but that was very late on in the project and we were well passed the stage where they would have been useful.
The project started with a chance meeting in 2010 between myself and Sam Blackburn, Professor Hawking's graduate assistant at the time. When we started we knew nothing about the boards, or who wrote the original software, and we had no useful contacts. We just had the two boards in Cambridge, and neither of them was in any sense spare – both boards need to be available for use at any moment.
In this position, there is no course of action that is completely without risk. If we do nothing, the boards might fail anyway. If we do something, we might accidentally damage a board, leaving Professor Hawking completely dependent on a single, old hardware board. I'm glad to say that Professor Hawking let us investigate the backup board, and that we didn't destroy it in the process!
The first step was to identify the major components. The board used mostly small and medium scale integration devices, such as the 74-series integrated circuits, with some programmable logic arrays to reduce the chip count. We could also identify a USART, DTMF encoder and decoder, RAM, EPROMs, DAC and various analogue components.
Sam took some high quality photographs of the board, and by careful study, we were able to construct the beginning of a functional block diagram of the board.
The next step was to take off the heatsink, which was probably the most nerve-wracking step in the whole project – we knew nothing about the packaging of the device underneath, and if we broke it, we probably wouldn't be able to fix it. We got to the point where it simply wasn't practical to defer this any longer.
It turned out to be an Intel 80188, in an ordinary IC package, and that we hadn't needed to be so nervous. The CPU is similar to Intel CPU used in contemporaneous computers, and it's easy to find disassemblers.
Following the logic of the code, I was able to find initialisation of the PCB (peripheral control block), which sets up basic CPU parameters such as where memory and other hardware devices are mapped into memory address space.
After a bit of analysis, I was able to prove that one of the EPROMs was not required at all – there was no code path that could ever result in any program or data being used from that device. It turned out that the board that I hadn't seen didn't have that EPROM, and we could confirm, by removing it from the backup board, that it was not in fact required.
I started work on understanding some of the code.
Even in unlabelled assembler instructions, it is very easy to spot memory testing. I already knew where the memory was mapped, but to report the success or failure of memory testing, the program has to do something visible. There isn't a computer screen attached so it has to communicate by setting LEDs. Identifying that code tells you where the data latch that drives the LEDs is mapped into memory address space.
After memory testing, it does other hardware initialisation, and by looking at how each hardware device is initialised, you can figure out which one is which, and from that the address mapping for all the devices.
The USART initialisation is quite complicated. The device has several modes of operation, and setting which is active produces distinctive code. Finding the code that sets the mode tells you mapping of the USART into address space.
I then found where the program branches into either reading from the USART or using some text inside the EPROM. That corresponds to a debug switch, which is part of a bank of DIP switches, and that told me the address of those switches.
During this process, I was feeding information to Sam about deductions I had made about the hardware based on what I was looking at in software. He was feeding information to me about his discoveries based on tracing PCB tracks very carefully. This allowed corrections and refinements to be made in the functional block diagram.
We understood enough at this point to believe that emulation of the hardware was a practicable approach.
We quickly rejected existing x86 PC emulators. The hardware on the board has nothing in common with a PC, and it was easier to write a CPU emulator from scratch rather than pick apart the processor implementation in someone else's emulator.
The CPU emulator doesn't need to be complete, for example, I ignored the unused BCD instructions. The program didn't adjust the memory map after initialisation, so instead of implementing the memory mapping control registers in the PCB, I hard-coded the memory map used on the hardware board.
The more interesting parts of the emulator come when you start looking at I/O. Rather than full emulating a USART, it is only necessary to provide the bare essentials, namely Rx/Tx registers, a few status flags, and interrupt signals for RxFull and TxEmpty.
To complete the USART emulation, you need to run interrupt routines when input data arrives, which turns out to be very easy. With the emulated DIP switches set to enable hardware flow control, the program enables and disables the receiver according to whether there is any space to store another character. The interrupt routine can be triggered by as simple a test as checking that both of 'end of text input not reached' and 'receiver enabled' are true. Initially it'll be triggered much faster than in the hardware board – as fast as the interrupt routine will allow until the input buffer is full. Thereafter the interrupt rate will be regulated by how quickly the program can convert text to speech; the emulator won't spend all of its time servicing interrupts.
Early in initialisation the program writes to and reads from the state of a hardware device, and I don't recognise the pattern at all. Looking at the code, it was clear that there was only one code path that would complete, and that forcing certain values for the reads from the device would let the program make progress. Other values would just cause a loop until the correct value was received. By forcing a series of values that depend on the current value of the instruction pointer, the program would get past all the hardware initialisation and start the main program.
I could see that there were several stages of the transformation of the input text, going from ordinary English test to a more phonetic form, with each stage storing its data in a linked list. The data appears to go one way only, so you can think of the stages of processing like a pipeine in a Unix shell. With nothing to read from the output of the pipeline, the input blocks pretty quickly, as soon as the buffers in the pipeline fill.
After more analysis, I found the routine that reads data out, and not surprisingly it's another interrupt routine. Before actually trying to outputting data, the routine checks if there is data eligible to be output. In the absence of any better information for how the output interrupt is triggered, I used that same condition to trigger the interrupt handler.
In the interrupt routine, I could see data being sent to the same unknown device that had got stuck in initialisation. Again I could see that there was only one code path, so I could emulate the hardware responses without knowing their meaning.
Now I could enter text data, and see the corresponding output data. The problem was that it was implausibly short – it couldn't be the waveforms that I was expecting to see.
I also didn't know where the output was going, but there was only one IC that we hadn't accounted for, which looked superficially like yet another EPROM.
At about this time, Jon Peatfield, a Computer Officer in DAMTP came on board and was interested identifying what this chip really was. He identified it as the programmable version of the 7720 DSP chip. This chip has: 512 words of 23 bit instruction ROM, 510 words of 13 bit data ROM, and 128 words of data RAM, arranged in a Havard architecture. It's not at all like a modern CPU. It's main claim to fame is that it does multiplications very quickly, much faster than the x86 can.
So now we know that the x86 program sends voice data, in short packets, to the DSP chip, and that the DSP chip generates the audio output. We don't know know the structure of the data, and hopefully we don't need to know.
I reached out to various groups that might have a programmer/reader for the 7720 DSP, and got precisely no response, so Sam built a reader to get the ROM images out of the device. Initially he tried programming and reading from some blank DSP chips sourced from Ebay. Once he was confident that it worked correctly, he read out the instruction and data ROM images from the real DSP.
Jon got to work on understanding the DSP code, and then on emulating the DSP. He took an existing emulator, but got stuck when he found an illegal instruction in the main loop of the DSP code. An illegal instruction might well do something useful on real hardware, but it won't be in the data sheet. It would take considerable investigation to figure it out.
Sam moved on to another job, which made access to the hardware much harder, but work continued.
Then disaster – Jon died. Overnight, everyone's priorities changed to be about the continuation of his main job, which was running the department's computer systems.
Some time later Jonathan, the new Graduate Assistant, managed to track down the original creators of the hardware board, but the project wasn't restarted at this point.
Later still, in 2017, Paweł was working on a related project with Jonathan and by chance stumbled upon the previous work by Jon and myself. He wondered what was the obstacle that couldn't be overcome and got in contact with me. After discovering that it wasn't a technical problem that stopped the project, he requested to work on the voice system for his undergraduate project.
This was the first I knew that Jon's work hadn't been lost, and I had wanted to resume the project anyway, so Jonathan got us together to make a plan.
Jonathan had got back in contact with Eric Dorsey and Patti Price from the team that made the hardware board. It was useful to have their recollections of the design. These matched what the emulator was doing, which gave confidence that the work done so far was correct.
Paweł borrowed a board so he could add probes to monitor the data between the CPU and the DSP. He was then able to compare my emulator against the hardware. For most text there was an exact match. This was the first time the emulator had been properly validated, so it was very pleasing to get such a good result.
For some text, there were differences, but we decided to ignore that for now. We now knew enough that for at least some of the time we could consider the data as flowing in one direction only from the CPU to the DSP. That meant we could focus effort on the DSP only, rather than the CPU and DSP together as a coupled system.
In the time since Jon's death, the state of DSP emulators had improved considerably. In particular the Higan emulator contains a good implementation of a later version of the DSP chip. The reason to start with an existing DSP emulator is that the DSP datasheet is incomplete. It doesn't give anywhere near enough information to be able to write reliable programs, let alone an emulator. Byuu had already done all the work required to fill in those gaps. The processor emulation implementation in Higan has clean interfaces that allow it to be easily separated from the rest of the Higan code and integrated with our emulator instead.
Paweł tracked down the final details of how the DSP is physically connected to the rest of the hardware. The DSP has two status flags that are externally visible, and one of these connects to an interrupt line of the x86. Another channel, less obvious, is that the x86 can read the top of the status register, which gives it access to two more flags in status register.
The DSP is starved of registers, so DR, the external data register, gets overloaded as an ordinary processor register. A naive approach of feeding data in whenever the DR register is read won't work because of this overloading – you have to set DR only when the program is expecting it.
The protocol between the CPU and DSP involves the DSP triggering an x86 interrupt to send a frame, and using another status flag to signal the x86 to send the next word within the frame. By using this protocol, DR is only set externally when the program in expecting to receive a frame.
Checking the detail of the DSP program shows that, without exception, every conditional jump based on the RQM status flag is followed by exactly one read of the DR data register, which allows for a simpler protocol – update DR whenever RQM is tested.
I found a bug in the output routine of the DSP – a race between the main DSP program and its interrupt routine. The interrupt routine outputs a new sound value to the DAC whenever it is triggered by one of the x86 timers. In the emulator, the audio clock is provided by the sound driver. I don't want two systems clocking data, especially if they aren't synchronised, so I fixed the race by removing the interrupt routine. The DSP emulator just outputs sounds values as fast as it can, and the only sound driver regulates the data rate.
Connecting up the emulators produced plausible audio values from the DSP's serial output, and putting that into a sound device produced fragments of a recognisably human voice.
After some adjustments, I was able to get the voice to be Professor Hawking's voice – most of the time. On Christmas Day, 2017, Paweł and I sent Jonathan the first recording of the new voice. It said, "Hello, welcome to the emulator. Merry Christmas from Paweł and Peter."
Following on from this, Paweł recorded some samples of speech from the hardware and from the software emulator. Patti analysed these to give us an objective measure of how close we were to the original sound.
There were still glitches, including a sound that was completely wrong, as well as there being odd pauses in longer sentences. There was also the issue we knew about from earlier, of the data packets between the x86 and the DSP not always matching, particularly in sentences with commas.
Paweł did some more experiments that revealed that the hardware board he had also got one of the sounds wrong. One might suppose that the emulator was correct after all, but the sound wasn't familiar to Jonathan. Paweł did some more digging and found differences between the hardware boards. He re-read all the EPROMs and found small differences there too. I compared the two versions of the x86 program and found that one of them had a bug in that could not be explained by programming error in the source. For about 20 years the boards had no covers over the EPROM windows, which is not ideal.
Fixing the incorrect EPROM image fixed the wrong sound, but didn't fix the odd pauses. Jonathan had established that the pause only happened for long sentences. I took the long sentence that demonstrated the problem, and cut bits off the start until it worked. Then I compared what was going through the different processing stages until there was a difference. I homed in on the condition that inserted the pause or not, and found that the pause was inserted when a linked list of data from the previous stage was empty. This showed that the pause was intentionally added, and I just needed to figure out why.
After thinking a bit harder, I realised that its purpose was to predict when the input data isn't arriving fast enough to keep the next processing stage busy. The buffers in the data pipeline aren't large enough for an entire sentence, so for long sentences it starts speaking before it knows for sure that the end of the sentence will arrive in time. It is therefore possible to run out of input data at an inopportune moment; by deliberately inserting pauses at convenient points, it can slow down the output to match the input rate.
I realised that I had made a fundamental error. Just because the data words flow in one direction from the CPU to the DSP, doesn't mean that information flows in one direction. That CPU/DSP communication is subject to flow control, which is information in the opposite direction, and that information is being used for decision making in the x86 program; the CPU and DSP programs are more closely coupled than I had realised.
At this point I could have set up a full emulation that got all the timing consistent with the original board, since we know that works. The DSP timing is easy – its instructions always take two cycles. The x86 is harder, but with enough effort the timing can be reproduced exactly. With a bit more effort we can emulate the timing of other hardware devices.
I wasn't keen on this approach because what we had worked nearly perfectly without being careful with timing at all. If we did a full emulation, we'd lose a nice property of the emulated version, namely that when there is no work to do, the emulators of the x86 and the DSP completely stop rather than busy-waiting. If you do that, the x86 emulator spends most of the time stopped, which nearly halves the CPU requirement. The ultimate aim was to run alongside ACAT and other software on Professor Hawking's laptop, so it's polite not to hog all the CPU.
An approach I considered was to just delay extracting data from the program. By triggering the output interrupt as soon as data is ready, I was taking data out too fast, which was effectively making the x86 program run slow compared to the DSP program. That causes the buffers in the data pipeline to run nearly empty most of the time, which ran the risk of them becoming completely empty and triggering the pause to be inserted. By deferring running the output routine I can counter that effect, but I don't like that approach – I don't have a good way to choose how long to defer and it feels fragile – it could go wrong later with different text input.
The approach I took was to extract output data as late as possible rather than as soon as possible, so that the pipeline buffers would run full for as long as possible. If the buffers never empty, no pauses get inserted.
At the heart of the program there is a loop that contains a set of routines which do work if there is space to store their output. If none of the main loop's subroutines are called, and the input interrupt routine is not called, no progress is being made so that's when the output routine should be called.
It didn't work properly. It sometimes did, but sometimes it got stuck in a loop. The critia for running one of the main loop's subroutines was being met, but in reality it made no progress. That's because there was supposed to be an interaction between that subroutine and the output interrupt routine, but under these circumstances the output interrupt routine was never called.
Rather than looking to see which subroutines are called, I looked at the effect on the program's data structures. All the subroutines read from one buffer and output to another buffer, so I can check for progress by monitoring the size of all buffers and the status of all condition variables. If none of them change during the main loop, and the input interrupt routine isn't run, no progress is being made so the output routine should be called.
That worked perfectly.
I spent some time looking at the DSP code that has the illegal instruction so see if I could understand the intent of the code. Having done that, I wrote a corrected version, but it was slightly longer so I used some unused instruction words near the end of the ROM. Out of curiousity, I put a trap in the emulator to see if I could detect when the code runs – it isn't ever run. Upon looking further, it can only run when the x86 doesn't produce data fast enough to keep the DSP busy. This never happens in the emulator – if there is no data I just stop the progam until data arrives. With ample CPU available, there is no need to guard against the x86 not keeping up with the DSP, so a gap in the data only ever corresponds to intentional silence.
Jonathan, Paweł and I demoed it to Professor Hawking in January and got the okay. Phew!
One annoying feature, barely audible on my laptop but annoyingly loud on the chair's sound system, is a click just before the voice starts and just after it stops. This is because the DSP outputs sound values with a small dc offset. On the hardware board this doesn't matter – it outputs continuously and doesn't produce an audible effect. The emulator starts and stop the DSP depending on whether the voice is speaking or not, and the sudden transition when the sound driver runs out of input causes a click. It's easy to fix though – I sacrificed some gain, to give space to apply the opposite bias to the sound values.
The next step was to install onto Professor Hawking's wheelchair. Out of curiosity, I decided to try it out on a Raspberry Pi 2 computer. It worked well, but the output had lots of unwanted noise. Jonathan tried it with an external USB audio device and the resulting audio was very good. On 26th January 2018, Jonathan took it over to Professor's house and installed it onto his wheelchair. Professor Hawking wrote, "I love it."
Later we upgraded the USB DAC in order to provide greater volume, and upgraded the Raspberry Pi 2 to a Pi 3. The audio volume was now similar to that from the hardware voice synthesizer, but with a much cleaner output, with no discernible noise. This reduction in noise made the voice clearer, while still being very much his voice.
On 14 March, 2018, on PI day, and on the anniversary of Einstein's birth, Professor Hawking died. It was this voice emulator that he used to communicate his final spoken words to his close friends and family. Although deeply saddened by his sudden loss, it is a small comfort to know that he enjoyed using it.
The reason to put so much effort into creating the emulator was because the voice is part of Professor Hawking's identity. For the same reason, I don't want the voice to be used again for someone else despite having devoted so much time to the project. I have said that I would be happy for the Professor's speeches to be read out by the emulator. There are a couple of issues to consider. Firstly, and most importantly, the speeches are Professor Hawking's words, not someone else's. Secondly, the recordings of the Professor's speeches already exist. In this context, the emulator provides a another way to reproduce the recording. It's hard to see any other use case that would be a reasonable way to use the emulator.
A number of people wondered if the emulator is available for download. Even if it was a good idea to make it available, we don't have rights to distribute the program that runs inside the emulator, and the emulator is useless without that program.
Peter Benie <firstname.lastname@example.org>