Targeted junkmail "from" your GP?
Nigel Heffernan
ukcrypto at chiark.greenend.org.uk
Wed, 2 Jul 2008 03:00:46 -0700 (PDT)
On Tuesday 01 July 2008, Peter Fairbrother wrote:
>=20
> If I am correct about that then there is very little
> difference between distributed and centralised databases
> from a search efficiency viewpoint, and a distributed
> database is about half the cost. far more robust and
> secure, and almost infinitely more privacy-preserving
> than a centralised database.
>=20
> -- Peter Fairbrother
>=20
Search efficiency for a distributed database is very sensitive to database =
design, particularly with respect to the choice of which indexes are held o=
n a central server (or on a distributed cluster that is, in effect, a centr=
al server) and to query design.
In simple terms, querying the entire national database for a regional break=
down of cancer rates would run very quickly: the data is segmented geograph=
ically but this is no obstacle to a process which is, essentially, running =
the same query repeatedly on each local and geographically-defined volume a=
nd then aggregating the results. Running a breakdown of cancer rates by age=
would be much harder, as you might need to start by aggregating the entire=
data set - or rather, construct and mount an index of it - on the machine =
that runs your query.
This has security and privacy implications, but I have to admit that I'm no=
t sure how they would play out in the real world. The theft of anonymised d=
ata from (say) a research project at an STD clinic wouldn't be too difficul=
t to tie back to identified names, if you had a stolen password to the rele=
vant local volume, because the clinic would draw patients from at most one =
or two health trusts - the query wouldn't cut across geographically-separat=
ed data volumes. Large-scale data losses from a national project, however, =
would be almost unusable.
Meanwhile, you are both right and wrong about distributed databases being m=
ore secure. Yes, data losses arising from a local password theft, subverted=
staff, or a stolen backup tape are limited to a given data volume: but thi=
s presupposes that there is a rational structure of data access and securit=
y management. It is entirely possible that a central authority would have u=
nlimited access and, once their security measures are subverted or bypasse=
d, the entire database is available to all.
There are ways in which good data design can help. For a start, 'regional' =
queries should only be permitted to retrieve anonymised data: the lookup ta=
ble tying back to patients' names should never leave the local data server.=
That is to say, the usual hierarchy of user and superusers can be delibera=
tely broken in a distributed database, with no central 'sysadmin' having al=
l the access privileges of lesser mortals - local managers or even individu=
al GPs would be the only people who would have full access to their local l=
ookup list of identifiable patient names. It would be a trivial matter to e=
nsure that queries from (say) an out-of-region GP or hospital go through a =
scalar function with a half-second delay or a counter rather than the 'vect=
or' or table-returning function that mounts an index; thus, if you're outsi=
de the area, you can only get records one at a time rather than scanning th=
e entire table.
In theory. I have no reason to believe that even the most elementary design=
consideration has been given to security-by-design rather than security-as=
-a-bolt-on in the NHS patient database and, in the unlikely event that soun=
d security schema was ever implemented, I am certain that it would be delib=
erately bypassed by a central authority.
Nigel Heffernan=0A=0A=0A _____________________________________________=
_____________=0ANot happy with your email address?.=0AGet the one you reall=
y want - millions of new email addresses available now at Yahoo! http://uk.=
docs.yahoo.com/ymail/new.html