Backup volume layout * Volume group structure Each backup volume group is named with a /tag/ to distinguish it from others. A backup volume group is named =bkp-TAG=. It has two logical volumes named =meta= and =crypt=. The =meta= volume contains a small unencrypted ext2 filesystem; the =crypt= volume contains a much larger LUKS-encrypted ext2 filesystem. * The metadata volume The =meta= volume contains metadata about the encrypted volume. The root directory of the volume should have a directory =cur= containing the following files. + =blob= :: A *seccure*-encrypted copy of the LUKS `passphrase' for the encrypted volume. The decryption key is =priv/backup-disk=. The `passphrase' is raw binary data; currently this is 512 bytes, though this isn't part of the specification. + =keys.tgz= :: A partial archive of the =/etc/keys/= directory; specifically, this contains the =pub= and =recov= directories, and the =README= file. + =passwd= and =group= :: Fragments of the server's *passwd* and *group* files, which can be used to decode the uid and gid numbers in the volumes. + =hashes= :: A *sha256sum*-format list of the hashes of the other files. + =hashes.sig= :: A *seccure* signature on the =hashes= file, which can be verified using the key =pub/backup-auth.pub=. Files in =cur= other than =hashes= and =hashes.sig= which are not listed in the =hashes= file are spurious and should not be trusted. There may be a =new= directory in the volume root; this contains a partially written replacement for =cur=. This replacement is performed as follows. + Delete =new= if it already exists. + Create =new= and populate it with the appropriate files. + Rename =cur= to =old=. + Rename =new= to =cur=. + Delete =old=. There is a point in this process when the =cur= directory does not exist: there are then =old= and =new= directories, and /both/ of them contain valid files. The tools provided do not handle this situation: it must be fixed manually: one of the directories must be renamed to =cur= and the other deleted. It is possible to specify a properly atomic update protocol, but this doesn't seem worth the additional complexity of fiddling with symbolic links and the more awkward recovery procedure. * The encrypted volume The =crypt= volume contains archived assets arranged in a hierarchy. (An `asset' is a thing that needs backing up. It's a bit more general than just a filesystem, since I also want to back up things like databases which are rather weird.) The topmost level splits the archive by hostname; the second level splits a host's assets by asset name. The third level splits out the dumps of an asset by date: each directory is named =YYYY-MM-DD#N.L=, indicating the date on which the dump was taken, and the dump level. The number =N= is a counter to distinguish multiple dumps taken on the same day. The number =L= (`level') is an integer which explains how to combine the dump with earlier dumps to perform a complete restore of the asset: a level-zero dump is complete; a level-$n$ dump contains everything since the previous level-$n$ or lower dump. The algorithm to restore up to a level-$n$ dump taken at a time $t_1$ is therefore as follows. 1. Identify the most recent level-0 dump prior to $t_1, and restore it. Let $t$ be the time of that level-0 dump. 2. Identify the lowest numbered dump level occurring after $t$ and before or at $t_1$; let $m$ be this level. Restore all of these level-$m$ dumps, in order. 3. If $m = n$ then the restore is complete. Otherwise update $t$ to be the time of the most recent level-$m$ dump prior to $t_1$ and go back to step 2. The third-level directory contains these files: + =hashes= :: A *sha256sum*-format list of the hashes of the dump files. + =hashes.sig= :: A *seccure* signature on the =hashes= file, which can be verified using the key =pub/backup-auth.pub=. It also contains other files which are specific to the kind of asset being stored. All of these files should be listed in the =hashes= file; there should be no other files present. In addition to the date/level directories, the third level may also have a directory =prepare=, which contains a partial dump in progress and various bits of metadata about it. The contents of this directory are not specified, and should not be trusted. Finally, there may be a directory =failed= which contains archive directories as above, but these directories are incomplete, and retained for diagnostic purposes. * Users and groups Each host is assigned a user and a group, both named =bkp-HOST=; each of the users is also a member of the group =backup=. All of the permanent files and directories in the encrypted volume are owned by =root=. All of the permanent directories within a host's tree are owned by =root= and group-owned by the host's group, and have mode 2755; the files within a dump are group-owned by the relevant host's group, with mode 640. Any =failed= directories are owned and group-owned by =root= and have mode 2755; the partial archives within are owned and group-owned by =root= and have mode 640. Any =prepare= directories have the usual permissions, but files directories within it may have other permissions, and may be under hostile control. This structure is designed to protect existing archives from hosts which are later compromised. No special precautions against attackers having open files are taken while fixing up the permissions on a completed dump, since the relevant attackers could just as easily have corrupted the dump earlier. * COMMENT Emacs cruft # Local variables: # mode: org # End: