1 This is version 1.0 of the INN feeder program `innfeed.'
2 It is written in ANSI C and tries to be POSIX.1 compliant. This software
3 was originally written and maintained by James Brister <brister@vix.com>
4 and is now part of INN. The features of it are:
6 1. Handles the STREAM extenstion to NNTP.
7 2. Will open multiple connections to the remote host to parallel
9 3. Will handle multiple remote hosts.
10 4. Will tear down idle connections.
11 5. Runs as a channel/funnel feed from INN, or by reading a funnel
13 6. Will stop issuing CHECK commands and go straight to TAKETHIS if
14 the remote responds affermatively to enough CHECKs.
15 7. It will go back to issuing CHECKs if enough TAKETHIS commands fail.
20 1. Config file inclusion now works via the syntax:
24 Config files can be included up to a nesting depth of 10. Line
25 numbers and file names are not properly reported yet on errors
26 when includes are used, though.
28 2. Signal handling is tidied up a bit. If your compiler doesn't
29 support the ``volatile'' keyword, then see the comment in
32 3. If you have a stdio library that hash limit on open files lower
33 then the process limit for open plain files (all flavours of
34 SunOS), then a new config file variable ``stdio-fdmax'' can be
35 used to give that upper bound. When set, all new network
36 connections will be limited to file descriptors over this value,
37 leaving the lower file descriptors free for stdio. See
38 innfeed.conf(5) for more details. Remember that the config file
39 value overrides any compiled in value.
43 1. A major change has been made to the config file. The new format
44 is quite extensible and will let new data items be added in the
45 future without changing the basic format. There's a new option
46 ``-C'' (for ``check'') that will make innfeed read the config
47 file and report on any errors and then exit. This will let you
48 verify things before locking them into a newsfeeds file entry. A
49 program has been added ``innfeed-convcfg'' that will read your
50 old config off the command line (or stdin), and will write a new
53 The new config file structure permits non-peer-specific items to
54 be declared (like the location of the status file, or whether to
55 wrap the generated status file in HTML). This is part of the
59 news-spool: /var/news/spool/articles
60 backlog-directory: /var/news/spool/innfeed
62 status-file: innfeed.status
66 connection-stats: false
67 max-reconnect-time: 3600
69 so only option you'll probably need now is the ``-c'' option to
70 locate the config file, and as this is also compiled in, you may
73 See the innfeed.conf(5) man page for more details on config file
76 2. The backlog file handling is changed slightly:
78 - The .output file is always kept open (until rotation time).
79 - The .output file is allowed to grow for at least 30
80 seconds (or the value defined by the key
81 backlog-rotate-period in the config file). This prevents
82 thrashing of backlog files.
83 - The hand-prepared file is checked for only every 600
84 seconds maximum (or the value defined by the key
85 backlog-new), not every time the files are rotated.
86 - The stating of the three backlog files is reduced
89 3. Signal handling is changed so that they are more synchronous
90 with other activity. This should stop the frequent core-dumps
91 that occured when running in funnel file mode and sending
94 4. A bug related to zero-length articles was fixed. They will now be
97 5. More information is in the innfeed.status file, including the
98 reasons return by the remote when it is throttled.
100 6. SIGEMT is now a trigger for closing and reopening all the
101 backlog files. If you have scripts that need to fool with the
102 backlogs, then have the scripts move the backlogs out of the way
103 and then send the SIGEMT.
107 1. If your system supports mmap() *and* if you have your articles
108 stored on disk in NNTP-ready format (rare), then you can have
109 innfeed mmap article data to save on memory (thanks to Dave
110 Lawrence). There is an important issue with this:
112 if you try to have innfeed handle too many articles (by
113 running many connections and/or high max-check values in
114 innfeed.conf) at once, then your system (not your process)
115 may run out of free vnodes (global file descriptors), as a
116 vnode is used as long as the file is mmaped. So be careful.
118 If your articles are not in NNTP format then this will be
119 noticed and the article will be pulled into memory for fixing up
120 (and then immediately munmap'd). You can disable use of MMAP if
121 you've built it in by using the '-M' flag. I tried mixing
122 mmap'ing and articles not in NNTP format and it was a real
123 performance loss. I'll be trying it differently later.
125 2. If innfeed is asked to send an article to a host it knows
126 nothing about, or which it cannot acquire the required lock for
127 (which causes the "ME locked cannot setup peer ..." and "ME
128 unconfigured peer" syslog messages), then innfeed will deposit
129 the article information into a file matching the pattern
130 innfeed-dropped.* in the backlog directory (TAPE_DIRECTORY in
131 config.h). This file will not be processed in any manner -- it's
132 up to you to decide what to do with it (wait for innfeed to exit
133 before doing anything with it, or send innfeed a SIGHUP to get
134 it to reread its config file, which will roll this file).
136 4. The output backlog files will now be kept below a certain byte
137 limit. This happens via the ``-e'' option. If, after writing to
138 an output file, the new length is bigger than the given limit
139 (multiplied by a fudge factor defined in config.h -- default of
140 1.10) then the file will be shrunk down to this size (or slightly
141 smaller to find the end of line boundary). The front of the file
142 will be removed to do this. This means lost articles for the
145 3. A SIGHUP will make the config be reloaded.
147 4. The .checkpoint files have been dropped in favour of scribbling
148 the offset into the input file itself.
150 5. When the process exits normally a final syslog entry covering
151 all of the peers over the life of the process is written. It
154 Jan 12 15:51:53 data innfeed.tester[24189]: ME global
155 seconds 2472 offered 43820 accepted 10506
156 refused 31168 rejected 1773 missing 39
158 6. SIGALARM now rolls the input file, rather than the log
159 file. This is useful in funnel file mode when you move the input
160 file and tell innd to flush it, then send innfeed the signal.
162 7. The location of the pid file, config file and status file, can
163 now be relative, in which case they're relative to the backlog
166 8. stdin stdout and stderr are initialized properly when innfeed is
167 started by a process that has closed them.
169 9. Various values in config.h have changed (paths to look more like
170 values used in inn 1.5 and others to support point #7 above
173 10. procbatch.pl can now 'require' innshellvars.pl that comes with
174 1.5. The default is not to. You nead to do a one line tweak if
175 you want it to. The defaults in procbatch.pl match the new
178 11. Core files that get generated on purpose will be done so in
179 CORE_DIRECTORY (as defined in config.h), if that is defined to a
180 pathname. If CORE_DIRECTORY is defined to be NULL (the default
181 now), then the core will be generated in the backlog directory (as
182 possibly modified by the '-b' option).
187 1. Now includes David Lawrence's patches to handle funnel files.
189 2. EAGAIN errors on read and writes are caught and dealt with (of
190 interest to Solaris `victims').
192 3. It is now much faster at servicing the file descriptor attached
193 to innd. This means it is faster at recognising it has been
194 flushed and at dropping connections. This means fewer
195 conflicts with new innfeeds starting before the old one has
196 finished up. It is still a good net-citizen and it finishes the
197 commands already started, so the fast response is only as fast
198 as your slowest peer, but it no longer tries to send
199 everything it had queued internally, and locks get released much
202 4. Includes Michael Hucka's patch to make the innfeed.status output
205 5. Includes Andy Vasilyev's HTML-in-innfeed.status patch (but you
206 have to enable it in config.h).
208 6. Added a '-a' (top of news spool) and a '-p' (pid file path)
213 1. Format of innfeed.conf file changed slightly (for per-peer
215 2. Including Greg Patten's innlog.pl (taken from
216 ftp://loose.apana.org.au/pub/local/innlog/innlog.pl)
217 3. Added Christophe Wolfhugel's patch to permit a per-peer
218 restriction on using streaming.
219 4. More robust handling of peers that return bad responses (no long
224 1. Massive syslog messages cleanup courtesy of kre.
225 2. The innlog.awk-patch hash been dropped from the distribution
226 until the new syslog messages are dealt with.
227 3. State machine more robust in the face of unexpected responses
228 from remote. Connection gets torn down and bad response's
230 4. The fixed timers (article reception timeout, read timeout,
231 and flush timeout) are all adjusted by up to +/-10% so that
232 things aren't quite so synchronised.
233 5. The innfeed.status file has been expanded and reformatted to
234 include more information.
238 1. A change in the handing off of articles to connections in order to
239 encourage connections that were opened due to activity spikes,
240 to close down sooner.
241 2. The backlog files are no longer concatenated together at process
242 startup, but the .input is simply used if it exists, and if not
243 then the hand-dropped file is used first and the .output file
245 3. The innfeed.status is no longer updated by a innfeed that is in
247 4. Specifically catch the 480 response code from NNRPD when we try
249 5. The connection reestablishment time gets properly increased when
250 the connection fails to go through (up to and including the
251 reading of the banner message).
252 6. Bug fix that occasionally had articles sit in a connection and
254 7. Bug fix in the counter of number of sleeping connections.
255 8. Bug fix in config file parsing.
256 9. Procbatch.pl included.
258 Changes for version 0.8.1
260 1. various bug fixes.
261 2. core files generated by ASSERT are (possibly) put in a seperate
262 directory to ease debugging are
264 Changes for version 0.8
266 1. The implicit state machine in the Connection objects has been
268 2. Various bug fixes.
270 Changes for version 0.7.1
272 1. Pulled the source to inet_addr.c from the bind distribution.
273 (Solaris and some others don't have it).
275 Changes for version 0.7
277 1. The backlog file mechanism has been completely reworked. There are
278 now only two backlog files: one for output and on for input. The
279 output file becomes the input file when the input file is
281 2. Much less strenuous use of writev. Solaris and other sv4r
282 machines have an amazingly low value for the maximum number of
283 iovecs that can be passed into writev.
284 3. Proper connection cleanup (QUIT issued) at shutdown.
285 4. A lock is taken out in the backlog directory for each peer. To feed
286 the same peer from two different instances of innfeed (with a
287 batch file for example), then you must use another directory.
288 5. Creating a file in the backlog directory with the same name as the
289 peer, the that file will be used next time backlog files are
290 processed. Its format must be:
294 where pathname is absolute, or relative to the top of the news
296 6. More command line options.
297 7. Dynamic peer creation. If the proper command line option is
298 used (-y) and innfeed is to told to feed a peer that it doesn't
299 have in its config file, then it will create a new binding to
300 the new peer. The ip name must be the same as the peername,
301 i.e. if innd tells innfeed about a peer fooBarBat, then
302 gethostbyname("fooBarBat") better work.
303 8. Connections will be periodically torn down (1 hour is the
304 default), even if they're active, so that non-innd peers don't
305 have problems with their history files being kept open for too
307 9. The input backlog files are checkpointed every 30 seconds
308 so that a crash while processing a large backlog doesn't require
309 starting over from the beginning.
311 Changes for version 0.6
313 1. Logging of spooling of backlog only happens once per
314 stats-logging period.
316 Bugs/Problems/Notes etc:
318 1. There is no graceful handling of file descriptor exhaustion.
320 2. If the following situation occurs:
322 - articles on disk are NOT in NNTP-ready format.
323 - innfeed was built with HAVE_MMAP defined.
324 - memory usage is higher than expected
326 try running innfeed with the '-M' flag (or recompiling with
327 HAVE_MMAP undefined). Solaris, and possibly other SVR4 machines,
328 waste a lot of swap space.
330 3. On the stats logging the 'offered' may not equal the sum of the
331 other fields. This is because the stats at that moment were
332 generated while waiting for a response to a command to come
333 back. Innfeed considers an article ``offered'' when it sends the
334 command, not when it gets a response back. Perhaps this should
337 4. If all the Connections for a peer are idle and a new backlog file
338 is dropped in by hand, then it will not be picked up until the
339 next time it gets an article from innd for that peer. This will
340 be fixed in a later version, but for now, if the peer is likely
341 to be idle for a long time, then flush the process.
343 5. Adding a backlog file by hand does not cause extra Connections to
344 be automatically created, only the existing Connections will use
345 the file. If the extra load requires new Connections to be built
346 when innd delivers new articles for tranmission, then they too
347 will use the file, but this a side effect and not a direct
348 consequence. This means if you want to run in '-x' mode, then
349 make sure your config file entry states the correct number of
350 initial connections, as they're all the Connections that will be
353 6. If '-x' is used and the config file has an entry for a peer that
354 has no batch file to process, then innfeed will not exit after
355 all batch files have been finished--it will just site there idle.
357 7. If the remote is running inn and only has you in the nnrp.access
358 file, then innfeed will end up talking to nnrpd. Innfeed will
359 try every 30 seconds to reconnect to a server that will accept
360 IHAVE commands. i.e. there is no exponential back of retry
361 attempt. This is because the connection is considered good once
362 the MODE STREAM command has been accepted or rejected (and nnrpd
367 1. Innfeed will eventually take exploder commands.
369 2. The config file will be revamped to allow for more global
370 options etc and run-time configuration. Too much is compile-time
371 dependant at the moment.
373 3. The connection retry time will get more sophisticated to catch
374 problems like the nnrpd issue mentioned above.
376 4. Include the number of takesthis/check/ihave commands issued in
379 5. Heaps more stuff requested that's buried in my mail folders.
382 Any compliments, complaints, requests, porting issues etc. should go to
385 Many thanks to the following people for extra help (above and beyond the
386 call of duty) with pateches, beta testing and/or suggestions:
388 Christophe Wolfhugel <wolf@pasteur.fr>
389 Robert Elz <kre@munnari.oz.au>
390 Russell Vincent <vincent@ucthpx.uct.ac.za>
391 Paul Vixie <paul@vix.com>
392 Stephen Stuart <stuart@pa.dec.com>
393 John T. Stapleton <stapes@mro.dec.com>
394 Alan Barrett <apb@iafrica.com>
395 Lee McLoughlin <lmjm@doc.ic.ac.uk>
396 Dan Ellis <ellis@mail.microserve.net>
397 Katsuhiro Kondou <kondou@uxd.fc.nec.co.jp>
398 Marc G. Fournier <scrappy@ki.net>
399 Steven Bauer <sbauer@msmailgw.sdsmt.edu>
400 Richard Perini <rpp@ci.com.au>
401 Per Hedeland <per@erix.ericsson.se>
402 Clayton O'Neill <coneill@premier.net>
403 Dave Pascoe <dave@mathworks.com>
404 Michael Handler <handler@netaxs.com>
405 Petr Lampa <lampa@fee.vutbr.cz>
406 David Lawrence <tale@uu.net>
407 Don Lewis <Don.Lewis@tsc.tdk.com>
408 Landon Curt Noll <noll@sgi.com>
410 If I've forgotten anybody, please let me know.
412 Thanks also to the ISC for sponsoring this work.