Recovery of "Deleted" Email

Mon Aug 15 17:38:50 BST 2011

On Mon, Aug 15, 2011 at 09:57:59AM +0100, Ian Batten wrote:
> I was asked this question yesterday:

> If two people, communicating via ordinary commercial webmail
> services, exchange unencrypted email, and they both then delete the
> messages using the normal deletion facilities the providers' usual
> interfaces offer, how recoverable are the messages by a discovery
> motion?
> 
> His contention was that, for practical purposes, a sufficiently
> resourced adversary all email is discoverable indefinitely, or,
> alternatively, you cannot know that it is not discoverable at any
> specific point in time.
[...]

When I wrote/helped to admin one of the lesser known UK webmail
services, it worked like I describe below.  Whether or not this is how
all webmail services work, I have no idea.  There's quite a lot of
information available about how gmail works: look for descriptions of
"Google File System" as a start.

The email system I worked on:

There were three implementations of the email service.  The first was
based on Lotus Notes, and I won't go into that horror.

The second implementation stored the emails themselves as BLOBs in a
regular SQL database.  We also extracted metadata from the emails
(IIRC it was: From, To, Subject, CC, Date), and stored those
separately in another SQL table.

The database was stored on identical mirrored database servers, each
server having mirrored disks (RAID1 or similar).  Thus in theory there
were 4 copies of each piece of data.  In addition, the whole database
was backed up regularly using a tape library, on regular DDS-4.  I
don't recall how long the full rotation of the backups were, but it
would have been something of the order of months and under a year.

When an email was deleted in the UI, it was deleted from the SQL
database virtually instantly (before the user would have seen the next
web page load).  The SQL database replication would have happened
within a few seconds, so the mirror would also be deleted pretty much
instantaneously.  The data itself probably still existed on disk or in
SQL logs, but was pretty much unrecoverable from there without
forensic tools.

It turned out (rather obviously in hindsight) that storing emails as
BLOBs is both hugely expensive and very slow.  In the third
implementation of the webmail service we migrated all of the email off
the database into a regular ext2 (or ext3??) filesystem.  The file
servers were SCSI disks arranged in RAID 5 using Linux softraid with a
number of hot and cold spares.

The email was stored in qmail Maildir format directories, one per user
per mailbox.  The SQL database still contained metadata fields (To,
From, Subject etc).

In the third impl, deleting a message in the UI would delete both the
disk file and the SQL metadata.  The disk file would probably have
been more easily recoverable using a forensic tool, but also the rate
of writes to these disks was very high and I doubt that deleted emails
would have been recoverable for very long.

The tape backup still existed, and in theory could have been
recovered.  *However* I don't think we ever went to the backup for any
law enforcement requests.  I don't think we would have done unless it
was an extraordinarily serious case, because the only way we could
have done it (for the second impl) would be to sever the SQL mirrors,
and restore the backup onto one of those mirrors (because we didn't
have enough storage to restore it anywhere else).

For the third impl I'm less certain because I wasn't directly involved
in those backups, but they might have been able to restore individual
files.

I don't think the question of deleted emails ever came up in any law
enforcement request that I can remember.

Rich.