+++ /dev/null
-The timecaf storage manager is like the timehash storage manager, except that
-it stores multiple articles in one file. The file format is called CAF
-(for "crunched article file", putting multiple articles together into one big
-file), and uses a library 'caf.c' dating back from the pre-storage manager
-days when I made a locally-hacked version of INN1.5 that used this
-code in order to boost performance on my system. Originally I had planned to
-do one big file per newsgroup, but it turns out that a time-based file layout
-rather than newsgroup-name-based is a. more efficient and b. much easier to
-fit into the current storage manager interface paradigm. Anyway, the
-pathnames for the files are of the form
- <patharticles>/timecaf-nn/bb/aacc.CF
-where 'nn' is the numeric storage class (same as in 'timehash') and the
-file contains all articles written during the interval from
-(time_t) 0xaabbcc00 to 0xaabbccFF.
-
- The way expiration works on the 'timecaf' storage manager is a bit
-complicated. When articles are expired or cancelled (via SMcancel())
-they are at first just marked as expired in the .CF file -- no actual
-disk space is freed at first. But if fastrm/SMcancel() notices that a
-certain amount of space has been marked as free, then it will do a
-sort of garbage collection pass on the file, writing out a new file
-containing only the articles from the old file that have not yet
-expired and removing the old file. If fastrm notices that *all* the
-articles in a file have been expired, it just deletes the file and
-doesn't create a new one. This means that if your setup has
-newsgroups with differing expiration lengths put in the same timecaf
-storage class, everything will work ok but your expire runs will spend
-some extra time copying files about. In my experience this hasn't been too
-much of a problem. If you find that it is a problem, you may wish to
-consider dividing up your spool layout so each storage class gets newsgroups
-that expire at more-or-less the same time, or putting *.binaries in their own
-storage class.
-
-Some advantages and disadvantages compared to the 'timehash' and
-'CNFS' storage methods:
-
- timecaf is somewhat faster than timehash in writing articles (the tests
-I did on the old news.ecn.uoknor.edu showed a roughly 4x improvement in
-artwrite speed). This is presumably due to improved locality of reference and
-not having to open/close article files all the time but only every 4 minutes or
-so. Artcancel speed, on the other hand, is not much different, because
-cancel requests have terrible locality of reference. Expire times seem
-to be generally somewhat faster than timehash as well, even given the
-extra copying overhead mentioned above.
-
- Timecaf is probably slower than CNFS, but I haven't had a chance
-to do any comparison tests. Timecaf does share the feature with timehash
-that you can get much more fine-tuned control of your expire times (on a
-group-by-group basis, if needed) than you can with CNFS.
-
-Down below is an old README telling more about the implementation details
-of the CAF file format. Most people won't care about this, but if you're
-curious, read on; it also tells some of the historical developments that
-went on in this code. I've been running some version of this code off and
-on for the past two years, and have been running it as a storage manager
-module for the past few months, so I'm pretty sure of it's stability.
-
- Richard Todd
- (rmtodd@mailhost.ecn.ou.edu/rmtodd@servalan.servalan.com)
-
-\f
-Implementation details (format of a CAF file) and some design rationale:
-
- Look at include/caf.h for the details, but basically, the layout is
-something like this. Each CAF file has a blocksize associated with it
-(usually 512 bytes, but it can vary). The layout of a CAF file is as
-follows:
- 1. Header (~52 bytes) containing information like low and high
-article numbers, amount of free space, blocksize.
- 2. Free space bitmap (size given by the FreeZoneTabSize field of the
-header).
- 3. CAFTOCENTs (CAF Table of Contents Entries), 1/article storable
-in the file. Each CAFTOCENT gives the article's size, creation time,
-and offset in the CAF file. Usually space is alloted in the CAF file
-for 64K CAFTOCENTs, even if the # of articles in the CAF file is
-nowhere near that amount. The unused CAFTOCENTs are all zeros, and
-this means CAF files are almost always sparse.
- 4. Articles, always stored starting at blocksize boundaries.
-
-When fastrm is told to remove an article, the article is not actually
-removed as such, it is merely marked as non-existent (the CAFTOCENT is
-zeroed), and the blocks taken up by the article are marked as 'free'
-in the free space bitmap. When innd writes an article to a CAF file,
-it first looks to see if the CAF file has any free blocks in a
-contiguous chunk large enough to handle the article, and if so writes
-the article into those blocks and marks those blocks as being in use.
-If there is no suitable free space chunk in the CAF file, then innd
-merely appends the article to the end of the CAF file and records the
-article's position in the TOC. [Given the way the CAF code is currently
-used by the timecaf storage manager, it's almost always the case that we're
-appending to the end of the file.]
-
- A note on the free bitmap portion of the CAF file: it's not just a simple
-bitmap (each bit of the bitmap tells whether a data block is in use or free.)
-First there is an 'index' bitmap which tells which blocks of the 'main' bitmap
-have free blocks listed in them, and then a 'main' bitmap which tells whether
-the data blocks are in use or free. This setup means that we can have
-bitmaps for CAF files as large as 8GB, while still being able to find free
-space by only reading the 'index' bitmap and one block of the 'main' bitmap.
-(Previous versions of the CAF code had just a 'main' bitmap and scaled the
-blocksize up when CAF files got large; this became rather, um, non-optimal
-when control.cancel started to hit hundreds of thousands of articles and 8K
-blocksizes.) In practice, CAF files over 2GB or 4GB may be a problem because
-of unsigned/signed long problems, and ones over 4G are probably impossible
-on anything besides an Alpha unless you track down all the places in innd
-where they assume off_t is a long and fix it to work with long longs.
-
- At some point I'd also like to try some other, more efficient
-directory layout for the CAF files, as opposed to the old
-/var/spool/news/newsgroup/name/component/ scheme. At the time I
-started implementing this, it seemed like it'd be too much of a hassle
-to change this in INN as it stands. I'm hoping that changing this
-assumption (that newsgroup a.b.c is always in directory a/b/c) will be
-easier once INN acquires a nice interface for specifying alternate
-storage managers. [It is and it isn't; as I said, we've currently abandoned
-all relationship between newsgroup names and CAF file names, which
-provided a sizable jump in performance. Before that, I had changed the code
-so that the CAF file for, e.g.,
-alt.tv.babylon-5 will now be /var/spool/news/alt/tv/babylon-5.CF -- note the
-final . instead of a /. This pretty much bypasses the need for the 'terminal'
-layer of directories to be read, and means that these directory blocks will not
-be fighting with other blocks for the limited space available in the buffer
-cache. This provides more of an improvement than you might think; thruput on
-news.ecn.uoknor.edu went from 160,000 articles/day to >200,000 articles/day
-with this patch, and this is on an aging 32M 486/66.]