The timecaf storage manager is like the timehash storage manager, except that it stores multiple articles in one file. The file format is called CAF (for "crunched article file", putting multiple articles together into one big file), and uses a library 'caf.c' dating back from the pre-storage manager days when I made a locally-hacked version of INN1.5 that used this code in order to boost performance on my system. Originally I had planned to do one big file per newsgroup, but it turns out that a time-based file layout rather than newsgroup-name-based is a. more efficient and b. much easier to fit into the current storage manager interface paradigm. Anyway, the pathnames for the files are of the form /timecaf-nn/bb/aacc.CF where 'nn' is the numeric storage class (same as in 'timehash') and the file contains all articles written during the interval from (time_t) 0xaabbcc00 to 0xaabbccFF. The way expiration works on the 'timecaf' storage manager is a bit complicated. When articles are expired or cancelled (via SMcancel()) they are at first just marked as expired in the .CF file -- no actual disk space is freed at first. But if fastrm/SMcancel() notices that a certain amount of space has been marked as free, then it will do a sort of garbage collection pass on the file, writing out a new file containing only the articles from the old file that have not yet expired and removing the old file. If fastrm notices that *all* the articles in a file have been expired, it just deletes the file and doesn't create a new one. This means that if your setup has newsgroups with differing expiration lengths put in the same timecaf storage class, everything will work ok but your expire runs will spend some extra time copying files about. In my experience this hasn't been too much of a problem. If you find that it is a problem, you may wish to consider dividing up your spool layout so each storage class gets newsgroups that expire at more-or-less the same time, or putting *.binaries in their own storage class. Some advantages and disadvantages compared to the 'timehash' and 'CNFS' storage methods: timecaf is somewhat faster than timehash in writing articles (the tests I did on the old news.ecn.uoknor.edu showed a roughly 4x improvement in artwrite speed). This is presumably due to improved locality of reference and not having to open/close article files all the time but only every 4 minutes or so. Artcancel speed, on the other hand, is not much different, because cancel requests have terrible locality of reference. Expire times seem to be generally somewhat faster than timehash as well, even given the extra copying overhead mentioned above. Timecaf is probably slower than CNFS, but I haven't had a chance to do any comparison tests. Timecaf does share the feature with timehash that you can get much more fine-tuned control of your expire times (on a group-by-group basis, if needed) than you can with CNFS. Down below is an old README telling more about the implementation details of the CAF file format. Most people won't care about this, but if you're curious, read on; it also tells some of the historical developments that went on in this code. I've been running some version of this code off and on for the past two years, and have been running it as a storage manager module for the past few months, so I'm pretty sure of it's stability. Richard Todd (rmtodd@mailhost.ecn.ou.edu/rmtodd@servalan.servalan.com) Implementation details (format of a CAF file) and some design rationale: Look at include/caf.h for the details, but basically, the layout is something like this. Each CAF file has a blocksize associated with it (usually 512 bytes, but it can vary). The layout of a CAF file is as follows: 1. Header (~52 bytes) containing information like low and high article numbers, amount of free space, blocksize. 2. Free space bitmap (size given by the FreeZoneTabSize field of the header). 3. CAFTOCENTs (CAF Table of Contents Entries), 1/article storable in the file. Each CAFTOCENT gives the article's size, creation time, and offset in the CAF file. Usually space is alloted in the CAF file for 64K CAFTOCENTs, even if the # of articles in the CAF file is nowhere near that amount. The unused CAFTOCENTs are all zeros, and this means CAF files are almost always sparse. 4. Articles, always stored starting at blocksize boundaries. When fastrm is told to remove an article, the article is not actually removed as such, it is merely marked as non-existent (the CAFTOCENT is zeroed), and the blocks taken up by the article are marked as 'free' in the free space bitmap. When innd writes an article to a CAF file, it first looks to see if the CAF file has any free blocks in a contiguous chunk large enough to handle the article, and if so writes the article into those blocks and marks those blocks as being in use. If there is no suitable free space chunk in the CAF file, then innd merely appends the article to the end of the CAF file and records the article's position in the TOC. [Given the way the CAF code is currently used by the timecaf storage manager, it's almost always the case that we're appending to the end of the file.] A note on the free bitmap portion of the CAF file: it's not just a simple bitmap (each bit of the bitmap tells whether a data block is in use or free.) First there is an 'index' bitmap which tells which blocks of the 'main' bitmap have free blocks listed in them, and then a 'main' bitmap which tells whether the data blocks are in use or free. This setup means that we can have bitmaps for CAF files as large as 8GB, while still being able to find free space by only reading the 'index' bitmap and one block of the 'main' bitmap. (Previous versions of the CAF code had just a 'main' bitmap and scaled the blocksize up when CAF files got large; this became rather, um, non-optimal when control.cancel started to hit hundreds of thousands of articles and 8K blocksizes.) In practice, CAF files over 2GB or 4GB may be a problem because of unsigned/signed long problems, and ones over 4G are probably impossible on anything besides an Alpha unless you track down all the places in innd where they assume off_t is a long and fix it to work with long longs. At some point I'd also like to try some other, more efficient directory layout for the CAF files, as opposed to the old /var/spool/news/newsgroup/name/component/ scheme. At the time I started implementing this, it seemed like it'd be too much of a hassle to change this in INN as it stands. I'm hoping that changing this assumption (that newsgroup a.b.c is always in directory a/b/c) will be easier once INN acquires a nice interface for specifying alternate storage managers. [It is and it isn't; as I said, we've currently abandoned all relationship between newsgroup names and CAF file names, which provided a sizable jump in performance. Before that, I had changed the code so that the CAF file for, e.g., alt.tv.babylon-5 will now be /var/spool/news/alt/tv/babylon-5.CF -- note the final . instead of a /. This pretty much bypasses the need for the 'terminal' layer of directories to be read, and means that these directory blocks will not be fighting with other blocks for the limited space available in the buffer cache. This provides more of an improvement than you might think; thruput on news.ecn.uoknor.edu went from 160,000 articles/day to >200,000 articles/day with this patch, and this is on an aging 32M 486/66.]