Commit | Line | Data |
---|---|---|
99248ed2 MW |
1 | Backup volume layout |
2 | ||
3 | * Volume group structure | |
4 | ||
5 | Each backup volume group is named with a /tag/ to distinguish it from | |
6 | others. A backup volume group is named =bkp-TAG=. It has two logical | |
7 | volumes named =meta= and =crypt=. The =meta= volume contains a small | |
8 | unencrypted ext2 filesystem; the =crypt= volume contains a much larger | |
9 | LUKS-encrypted ext2 filesystem. | |
10 | ||
11 | ||
12 | * The metadata volume | |
13 | ||
14 | The =meta= volume contains metadata about the encrypted volume. The | |
15 | root directory of the volume should have a directory =cur= containing | |
16 | the following files. | |
17 | ||
18 | + =blob= :: A *seccure*-encrypted copy of the LUKS `passphrase' for | |
19 | the encrypted volume. The decryption key is =priv/backup-disk=. | |
20 | The `passphrase' is raw binary data; currently this is 512 bytes, | |
21 | though this isn't part of the specification. | |
22 | ||
23 | + =keys.tgz= :: A partial archive of the =/etc/keys/= directory; | |
24 | specifically, this contains the =pub= and =recov= directories, and | |
25 | the =README= file. | |
26 | ||
27 | + =passwd= and =group= :: Fragments of the server's *passwd* and | |
28 | *group* files, which can be used to decode the uid and gid numbers | |
29 | in the volumes. | |
30 | ||
31 | + =hashes= :: A *sha256sum*-format list of the hashes of the other | |
32 | files. | |
33 | ||
34 | + =hashes.sig= :: A *seccure* signature on the =hashes= file, which | |
35 | can be verified using the key =pub/backup-auth.pub=. | |
36 | ||
37 | Files in =cur= other than =hashes= and =hashes.sig= which are not listed | |
38 | in the =hashes= file are spurious and should not be trusted. | |
39 | ||
40 | There may be a =new= directory in the volume root; this contains a | |
41 | partially written replacement for =cur=. This replacement is performed | |
42 | as follows. | |
43 | ||
44 | + Delete =new= if it already exists. | |
45 | + Create =new= and populate it with the appropriate files. | |
46 | + Rename =cur= to =old=. | |
47 | + Rename =new= to =cur=. | |
48 | + Delete =old=. | |
49 | ||
50 | There is a point in this process when the =cur= directory does not | |
51 | exist: there are then =old= and =new= directories, and /both/ of them | |
52 | contain valid files. The tools provided do not handle this situation: | |
53 | it must be fixed manually: one of the directories must be renamed to | |
54 | =cur= and the other deleted. | |
55 | ||
56 | It is possible to specify a properly atomic update protocol, but this | |
57 | doesn't seem worth the additional complexity of fiddling with symbolic | |
58 | links and the more awkward recovery procedure. | |
59 | ||
60 | ||
61 | * The encrypted volume | |
62 | ||
63 | The =crypt= volume contains archived assets arranged in a hierarchy. | |
64 | (An `asset' is a thing that needs backing up. It's a bit more general | |
65 | than just a filesystem, since I also want to back up things like | |
66 | databases which are rather weird.) | |
67 | ||
68 | The topmost level splits the archive by hostname; the second level | |
69 | splits a host's assets by asset name. | |
70 | ||
71 | The third level splits out the dumps of an asset by date: each directory | |
72 | is named =YYYY-MM-DD#N.L=, indicating the date on which the dump was | |
73 | taken, and the dump level. The number =N= is a counter to distinguish | |
74 | multiple dumps taken on the same day. The number =L= (`level') is an | |
75 | integer which explains how to combine the dump with earlier dumps to | |
76 | perform a complete restore of the asset: a level-zero dump is complete; | |
77 | a level-$n$ dump contains everything since the previous level-$n$ or | |
78 | lower dump. The algorithm to restore up to a level-$n$ dump taken at a | |
79 | time $t_1$ is therefore as follows. | |
80 | ||
81 | 1. Identify the most recent level-0 dump prior to $t_1, and restore | |
82 | it. Let $t$ be the time of that level-0 dump. | |
83 | ||
84 | 2. Identify the lowest numbered dump level occurring after $t$ and | |
85 | before or at $t_1$; let $m$ be this level. Restore all of these | |
86 | level-$m$ dumps, in order. | |
87 | ||
88 | 3. If $m = n$ then the restore is complete. Otherwise update $t$ to | |
89 | be the time of the most recent level-$m$ dump prior to $t_1$ and go | |
90 | back to step 2. | |
91 | ||
92 | The third-level directory contains these files: | |
93 | ||
94 | + =hashes= :: A *sha256sum*-format list of the hashes of the dump | |
95 | files. | |
96 | ||
97 | + =hashes.sig= :: A *seccure* signature on the =hashes= file, which | |
98 | can be verified using the key =pub/backup-auth.pub=. | |
99 | ||
100 | It also contains other files which are specific to the kind of asset | |
101 | being stored. All of these files should be listed in the =hashes= file; | |
102 | there should be no other files present. | |
103 | ||
104 | In addition to the date/level directories, the third level may also have | |
105 | a directory =prepare=, which contains a partial dump in progress and | |
106 | various bits of metadata about it. The contents of this directory are | |
107 | not specified, and should not be trusted. Finally, there may be a | |
108 | directory =failed= which contains archive directories as above, but | |
109 | these directories are incomplete, and retained for diagnostic purposes. | |
110 | ||
111 | ||
112 | * Users and groups | |
113 | ||
114 | Each host is assigned a user and a group, both named =bkp-HOST=; each of | |
115 | the users is also a member of the group =backup=. All of the permanent | |
116 | files and directories in the encrypted volume are owned by =root=. All | |
117 | of the permanent directories within a host's tree are owned by =root= | |
118 | and group-owned by the host's group, and have mode 2755; the files | |
119 | within a dump are group-owned by the relevant host's group, with | |
120 | mode 640. Any =failed= directories are owned and group-owned by =root= | |
121 | and have mode 2755; the partial archives within are owned and | |
122 | group-owned by =root= and have mode 640. Any =prepare= directories | |
123 | have the usual permissions, but files directories within it may have | |
124 | other permissions, and may be under hostile control. | |
125 | ||
126 | This structure is designed to protect existing archives from hosts which | |
127 | are later compromised. No special precautions against attackers having | |
128 | open files are taken while fixing up the permissions on a completed | |
129 | dump, since the relevant attackers could just as easily have corrupted | |
130 | the dump earlier. | |
131 | ||
132 | ||
133 | * COMMENT Emacs cruft | |
134 | ||
135 | # Local variables: | |
136 | # mode: org | |
137 | # End: |