storage/timecaf/README.CAF

   1 The timecaf storage manager is like the timehash storage manager, except that
   2 it stores multiple articles in one file.  The file format is called CAF
   3 (for "crunched article file", putting multiple articles together into one big
   4 file), and uses a library 'caf.c' dating back from the pre-storage manager
   5 days when I made a locally-hacked version of INN1.5 that used this
   6 code in order to boost performance on my system.  Originally I had planned to
   7 do one big file per newsgroup, but it turns out that a time-based file layout
   8 rather than newsgroup-name-based is a. more efficient and b. much easier to
   9 fit into the current storage manager interface paradigm.  Anyway, the
  10 pathnames for the files are of the form
  11         <patharticles>/timecaf-nn/bb/aacc.CF
  12 where 'nn' is the numeric storage class (same as in 'timehash') and the
  13 file contains all articles written during the interval from
  14 (time_t) 0xaabbcc00 to 0xaabbccFF.
  15
  16   The way expiration works on the 'timecaf' storage manager is a bit
  17 complicated.  When articles are expired or cancelled (via SMcancel())
  18 they are at first just marked as expired in the .CF file -- no actual
  19 disk space is freed at first.  But if fastrm/SMcancel() notices that a
  20 certain amount of space has been marked as free, then it will do a
  21 sort of garbage collection pass on the file, writing out a new file
  22 containing only the articles from the old file that have not yet
  23 expired and removing the old file.  If fastrm notices that *all* the
  24 articles in a file have been expired, it just deletes the file and
  25 doesn't create a new one.  This means that if your setup has
  26 newsgroups with differing expiration lengths put in the same timecaf
  27 storage class, everything will work ok but your expire runs will spend
  28 some extra time copying files about.  In my experience this hasn't been too
  29 much of a problem.  If you find that it is a problem, you may wish to
  30 consider dividing up your spool layout so each storage class gets newsgroups
  31 that expire at more-or-less the same time, or putting *.binaries in their own
  32 storage class.
  33
  34 Some advantages and disadvantages compared to the 'timehash' and
  35 'CNFS' storage methods:
  36
  37   timecaf is somewhat faster than timehash in writing articles (the tests
  38 I did on the old news.ecn.uoknor.edu showed a roughly 4x improvement in
  39 artwrite speed).  This is presumably due to improved locality of reference and
  40 not having to open/close article files all the time but only every 4 minutes or
  41 so.  Artcancel speed, on the other hand, is not much different, because
  42 cancel requests have terrible locality of reference.   Expire times seem
  43 to be generally somewhat faster than timehash as well, even given the
  44 extra copying overhead mentioned above.
  45
  46   Timecaf is probably slower than CNFS, but I haven't had a chance
  47 to do any comparison tests.  Timecaf does share the feature with timehash
  48 that you can get much more fine-tuned control of your expire times (on a
  49 group-by-group basis, if needed) than you can with CNFS.
  50
  51 Down below is an old README telling more about the implementation details
  52 of the CAF file format.  Most people won't care about this, but if you're
  53 curious, read on; it also tells some of the historical developments that
  54 went on in this code.  I've been running some version of this code off and
  55 on for the past two years, and have been running it as a storage manager
  56 module for the past few months, so I'm pretty sure of it's stability.
  57
  58                         Richard Todd
  59         (rmtodd@mailhost.ecn.ou.edu/rmtodd@servalan.servalan.com)
  60
  61 \f
  62 Implementation details (format of a CAF file) and some design rationale:
  63
  64  Look at include/caf.h for the details, but basically, the layout is
  65 something like this.  Each CAF file has a blocksize associated with it
  66 (usually 512 bytes, but it can vary).  The layout of a CAF file is as
  67 follows:
  68   1.    Header (~52 bytes) containing information like low and high
  69 article numbers, amount of free space, blocksize.
  70   2.    Free space bitmap (size given by the FreeZoneTabSize field of the
  71 header).
  72   3.    CAFTOCENTs (CAF Table of Contents Entries), 1/article storable
  73 in the file.  Each CAFTOCENT gives the article's size, creation time,
  74 and offset in the CAF file.  Usually space is alloted in the CAF file
  75 for 64K CAFTOCENTs, even if the # of articles in the CAF file is
  76 nowhere near that amount.  The unused CAFTOCENTs are all zeros, and
  77 this means CAF files are almost always sparse.
  78   4.    Articles, always stored starting at blocksize boundaries.
  79
  80 When fastrm is told to remove an article, the article is not actually
  81 removed as such, it is merely marked as non-existent (the CAFTOCENT is
  82 zeroed), and the blocks taken up by the article are marked as 'free'
  83 in the free space bitmap.  When innd writes an article to a CAF file,
  84 it first looks to see if the CAF file has any free blocks in a
  85 contiguous chunk large enough to handle the article, and if so writes
  86 the article into those blocks and marks those blocks as being in use.
  87 If there is no suitable free space chunk in the CAF file, then innd
  88 merely appends the article to the end of the CAF file and records the
  89 article's position in the TOC. [Given the way the CAF code is currently
  90 used by the timecaf storage manager, it's almost always the case that we're
  91 appending to the end of the file.]
  92
  93    A note on the free bitmap portion of the CAF file: it's not just a simple
  94 bitmap (each bit of the bitmap tells whether a data block is in use or free.)
  95 First there is an 'index' bitmap which tells which blocks of the 'main' bitmap
  96 have free blocks listed in them, and then a 'main' bitmap which tells whether
  97 the data blocks are in use or free.  This setup means that we can have
  98 bitmaps for CAF files as large as 8GB, while still being able to find free
  99 space by only reading the 'index' bitmap and one block of the 'main' bitmap.
 100 (Previous versions of the CAF code had just a 'main' bitmap and scaled the
 101 blocksize up when CAF files got large; this became rather, um, non-optimal
 102 when control.cancel started to hit hundreds of thousands of articles and 8K
 103 blocksizes.)  In practice, CAF files over 2GB or 4GB may be a problem because
 104 of unsigned/signed long problems, and ones over 4G are probably impossible
 105 on anything besides an Alpha unless you track down all the places in innd
 106 where they assume off_t is a long and fix it to work with long longs.
 107
 108   At some point I'd also like to try some other, more efficient
 109 directory layout for the CAF files, as opposed to the old
 110 /var/spool/news/newsgroup/name/component/ scheme.  At the time I
 111 started implementing this, it seemed like it'd be too much of a hassle
 112 to change this in INN as it stands.  I'm hoping that changing this
 113 assumption (that newsgroup a.b.c is always in directory a/b/c) will be
 114 easier once INN acquires a nice interface for specifying alternate
 115 storage managers.  [It is and it isn't; as I said, we've currently abandoned
 116 all relationship between newsgroup names and CAF file names, which
 117 provided a sizable jump in performance.  Before that, I had changed the code
 118 so that the CAF file for, e.g.,
 119 alt.tv.babylon-5 will now be /var/spool/news/alt/tv/babylon-5.CF -- note the
 120 final . instead of a /.  This pretty much bypasses the need for the 'terminal'
 121 layer of directories to be read, and means that these directory blocks will not
 122 be fighting with other blocks for the limited space available in the buffer
 123 cache.   This provides more of an improvement than you might think; thruput on
 124 news.ecn.uoknor.edu went from 160,000 articles/day to >200,000 articles/day
 125 with this patch, and this is on an aging 32M 486/66.]