Commit | Line | Data |
---|---|---|
db2c5b8b MW |
1 | Archvsync |
2 | ========= | |
3 | ||
4 | This is the central repository for the Debian mirror scripts. The scripts | |
5 | in this repository are written for the purposes of maintaining a Debian | |
6 | archive mirror (and shortly, a Debian bug mirror), but they should be | |
7 | easily generalizable. | |
8 | ||
9 | ||
10 | Currently the following scripts are available: | |
11 | ||
12 | * ftpsync - Used to sync an archive using rsync | |
13 | * runmirrors - Used to notify leaf nodes of available updates | |
14 | * dircombine - Internal script to manage the mirror user's $HOME | |
15 | on debian.org machines | |
16 | * typicalsync - Generates a typical Debian mirror | |
17 | * udh - We are lazy, just a shorthand to avoid typing the | |
18 | commands, ignore... :) | |
19 | ||
20 | Usage | |
21 | ===== | |
22 | For impatient people, short usage instruction: | |
23 | ||
24 | - Create a dedicated user for the whole mirror. | |
25 | - Create a seperate directory for the mirror, writeable by the new user. | |
26 | - Place the ftpsync script in the mirror user's $HOME/bin (or just $HOME) | |
27 | - Place the ftpsync.conf.sample into $HOME/etc as ftpsync.conf and edit | |
28 | it to suit your system. You should at the very least change the TO= | |
29 | and RSYNC_HOST lines. | |
30 | - Create $HOME/log (or wherever you point $LOGDIR to) | |
31 | - Setup the .ssh/authorized_keys for the mirror user and place the public key of | |
32 | your upstream mirror into it. Preface it with | |
33 | no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="~/bin/ftpsync",from="IPADDRESS" | |
34 | and replace $IPADDRESS with that of your upstream mirror. | |
35 | - You are finished | |
36 | ||
37 | In order to receive different pushes or syncs from different archives, | |
38 | name the config file ftpsync-$ARCHIVE.conf and call the ftpsync script | |
39 | with the commandline "sync:archive:$ARCHIVE". Replace $ARCHIVE with a | |
40 | sensible value. If your upstream mirror pushes you using runmirrors | |
41 | bundled together with this sync script, you do not need to add the | |
42 | "sync:archive" parameter to the commandline, the scripts deal with it | |
43 | automatically. | |
44 | ||
45 | ||
46 | ||
47 | Debian mirror script minimum requirements | |
48 | ========================================= | |
49 | As always, you may use whatever scripts you want for your Debian mirror, | |
50 | but we *STRONGLY* recommend you to not invent your own. However, if you | |
51 | want to be listed as a mirror it *MUST* support the following minimal | |
52 | functionality: | |
53 | ||
54 | - Must perform a 2-stage sync | |
55 | The archive mirroring must be done in 2 stages. The first rsync run | |
56 | must ignore the index files. The correct exclude options for the | |
57 | first rsync run are: | |
58 | --exclude Packages* --exclude Sources* --exclude Release* --exclude ls-lR* | |
59 | The first stage must not delete any files. | |
60 | ||
61 | The second stage should then transfer the above excluded files and | |
62 | delete files that no longer belong on the mirror. | |
63 | ||
64 | Rationale: If archive mirroring is done in a single stage, there will be | |
65 | periods of time during which the index files will reference files not | |
66 | yet mirrored. | |
67 | ||
68 | - Must not ignore pushes whil(e|st) running. | |
69 | If a push is received during a run of the mirror sync, it MUST NOT | |
70 | be ignored. The whole synchronization process must be rerun. | |
71 | ||
72 | Rationale: Most implementations of Debian mirror scripts will leave the | |
73 | mirror in an inconsistent state in the event of a second push being | |
74 | received while the first sync is still running. It is likely that in | |
75 | the near future, the frequency of pushes will increase. | |
76 | ||
77 | - Should understand multi-stage pushes. | |
78 | The script should parse the arguments it gets via ssh, and if they | |
79 | contain a hint to only sync stage1 or stage2, then ONLY those steps | |
80 | SHOULD be performed. | |
81 | ||
82 | Rationale: This enables us to coordinate the timing of the first | |
83 | and second stage pushes and minimize the time during which the | |
84 | archive is desynchronized. This is especially important for mirrors | |
85 | that are involved in a round robin or GeoDNS setup. | |
86 | ||
87 | The minimum arguments the script has to understand are: | |
88 | sync:stage1 Only sync stage1 | |
89 | sync:stage2 Only sync stage2 | |
90 | sync:all Do everything. Default if none of stage1/2 are | |
91 | present. | |
92 | There are more possible arguments, for a complete list see the | |
93 | ftpsync script in our git repository. | |
94 | ||
95 | ||
96 | ||
97 | ftpsync | |
98 | ======= | |
99 | ||
100 | This script is based on the old anonftpsync script. It has been rewritten | |
101 | to add flexibilty and fix a number of outstanding issues. | |
102 | ||
103 | Some of the advantages of the new version are: | |
104 | - Nearly every aspect is configurable | |
105 | - Correct support for multiple pushes | |
106 | - Support for multi-stage archive synchronisations | |
107 | - Support for hook scripts at various points | |
108 | - Support for multiple archives, even if they are pushed using one ssh key | |
109 | - Support for multi-hop, multi-stage archive synchronisations | |
110 | ||
111 | Correct support for multiple pushes | |
112 | ----------------------------------- | |
113 | When the script receives a second push while it is running and syncing | |
114 | the archive it won't ignore it. Instead it will rerun the | |
115 | synchronisation step to ensure the archive is correctly synchronised. | |
116 | ||
117 | Scripts that fail to do that risk ending up with an inconsistent archive. | |
118 | ||
119 | ||
120 | Can do multi-stage archive synchronisations | |
121 | ------------------------------------------- | |
122 | The script can be told to only perform the first or second stage of the | |
123 | archive synchronisation. | |
124 | ||
125 | This enables us to send all the binary packages and sources to a | |
126 | number of mirrors, and then tell all of them to sync the | |
127 | Packages/Release files at once. This will keep the timeframe in which | |
128 | the mirrors are out of sync very small and will greatly help things like | |
129 | DNS RR entries or even the planned GeoDNS setup. | |
130 | ||
131 | ||
132 | Multi-hop, multi-stage archive synchronisations | |
133 | ----------------------------------------------- | |
134 | The script can be told to perform a multi-hop multi-stage archive | |
135 | synchronisation. | |
136 | ||
137 | This is basically the same as the multi-stage synchronisation | |
138 | explained above, but enables the downstream mirror to push his own | |
139 | staged/multi-hop downstreams before returning. This has the same | |
140 | advantage than the multi-stage synchronisation but allows us to do | |
141 | this over multiple level of mirrors. (Imagine one push going from | |
142 | Europe to Australia, where then locally 3 others get updated before | |
143 | stage2 is sent out. Instead of 4times transferring data from Europe to | |
144 | Australia, just to have them all updated near instantly). | |
145 | ||
146 | ||
147 | Can run hook scripts | |
148 | -------------------- | |
149 | ftpsync currently allows 5 hook scripts to run at various points of the | |
150 | mirror sync run. | |
151 | ||
152 | Hook1: After lock is acquired, before first rsync | |
153 | Hook2: After first rsync, if successful | |
154 | Hook3: After second rsync, if successful | |
155 | Hook4: Right before leaf mirror triggering | |
156 | Hook5: After leaf mirror trigger (only if we have slave mirrors; HUB=true) | |
157 | ||
158 | Note that Hook3 and Hook4 are likely to be called directly after each other. | |
159 | The difference is that Hook3 is called *every* time the second rsync | |
160 | succeeds even if the mirroring needs to re-run due to a second push. | |
161 | Hook4 is only executed if mirroring is completed. | |
162 | ||
163 | ||
164 | Support for multiple archives, even if they are pushed using one ssh key | |
165 | ------------------------------------------------------------------------ | |
166 | If you get multiple archives from your upstream mirror (say Debian, | |
167 | Debian-Backports and Volatile), previously you had to use 3 different ssh | |
168 | keys to be able to automagically synchronize them. This script can do it | |
169 | all with just one key, if your upstream mirror tells you which archive. | |
170 | See "Commandline/SSH options" below for further details. | |
171 | ||
172 | ||
173 | For details of all available options, please see the extensive documentation | |
174 | in the sample configuration file. | |
175 | ||
176 | ||
177 | Commandline/SSH options | |
178 | ======================= | |
179 | Script options may be set either on the local command line, or passed by | |
180 | specifying an ssh "command". Local commandline options always have | |
181 | precedence over the SSH_ORIGINAL_COMMAND ones. | |
182 | ||
183 | Currently this script understands the options listed below. To make them | |
184 | take effect they MUST be prepended by "sync:". | |
185 | ||
186 | Option Behaviour | |
187 | stage1 Only do stage1 sync | |
188 | stage2 Only do stage2 sync | |
189 | all Do a complete sync (default) | |
190 | mhop Do a multi-hop sync | |
191 | archive:foo Sync archive foo (if the file $HOME/etc/ftpsync-foo.conf | |
192 | exists and is configured) | |
193 | callback Call back when done (needs proper ssh setup for this to | |
194 | work). It will always use the "command" callback:$HOSTNAME | |
195 | where $HOSTNAME is the one defined in config and | |
196 | will happen before slave mirrors are triggered. | |
197 | ||
198 | So, to get the script to sync all of the archive behind bpo and call back when | |
199 | it is complete, use an upstream trigger of | |
200 | ssh $USER@$HOST sync:all sync:archive:bpo sync:callback | |
201 | ||
202 | ||
203 | Mirror trace files | |
204 | ================== | |
205 | Every mirror needs to have a 'trace' file under project/trace. | |
206 | The file format is as follows: | |
207 | ||
208 | The filename has to be the full hostname (eg. hostname -f), or in the | |
209 | case of a mirror participating in RR DNS (where users will never use | |
210 | the hostname) the name of the DNS RR entry, eg. security.debian.org | |
211 | for the security rotation) | |
212 | ||
213 | The content has (no leading spaces): | |
214 | Sat Nov 8 13:20:22 UTC 2008 | |
215 | Used ftpsync version: 42 | |
216 | Running on host: steffani.debian.org | |
217 | ||
218 | First line: Output of date -u | |
219 | Second line: Freeform text containing the program name and version | |
220 | Third line: Text "Running on host: " followed by hostname -f | |
221 | ||
222 | The third line MUST NOT be the DNS RR name, even if the mirror is part | |
223 | of it. It MUST BE the hosts own name. This is in contrast to the filename, | |
224 | which SHOULD be the DNS RR name. | |
225 | ||
226 | ||
227 | runmirrors | |
228 | ========== | |
229 | This script is used to tell leaf mirrors that it is time to synchronize | |
230 | their copy of the archive. This is done by parsing a mirror list and | |
231 | using ssh to "push" the leaf nodes. You can read much more about the | |
232 | principle behind the push at [1], essentially it tells the receiving | |
233 | end to run a pre-defined script. As the whole setup is extremely limited | |
234 | and the ssh key is not usable for anything else than the pre-defined | |
235 | script this is the most secure method for such an action. | |
236 | ||
237 | This script supports two types of pushes: The normal single stage push, | |
238 | as well as the newer multi-stage push. | |
239 | ||
240 | The normal push, as described above, will simply push the leaf node and | |
241 | then go on with the other nodes. | |
242 | ||
243 | The multi-staged push first pushes a mirror and tells it to only do a | |
244 | stage1 sync run. Then it waits for the mirror (and all others being pushed | |
245 | in the same run) to finish that run, before it tells all of the staged | |
246 | mirrors to do the stage2 sync. | |
247 | ||
248 | This way you can do a nearly-simultaneous update of multiple hosts. | |
249 | This is useful in situations where periods of desynchronization should | |
250 | be kept as small as possible. Examples of scenarios where this might be | |
251 | useful include multiple hosts in a DNS Round Robin entry. | |
252 | ||
253 | For details on the mirror list please see the documented | |
254 | runmirrors.mirror.sample file. | |
255 | ||
256 | ||
257 | [1] http://blog.ganneff.de/blog/2007/12/29/ssh-triggers.html |