doc/pod/ovdb.pod

   1 =head1 NAME
   2
   3 ovdb - Overview storage method for INN
   4
   5 =head1 DESCRIPTION
   6
   7 Ovdb is a storage method that uses the BerkeleyDB library to store
   8 overview data.  It requires version 2.6.x or later of the BerkeleyDB
   9 library, but has mostly been tested with version 3 and 4.
  10
  11 Ovdb makes use of the full transaction/logging/locking functionality of
  12 the BerkeleyDB environment.  BerkeleyDB may be downloaded from
  13 L<http://www.sleepycat.com> and is needed to build the ovdb backend.
  14
  15 =head1 UPGRADING
  16
  17 This is version 2 of ovdb.  If you have a database created with a previous
  18 version of ovdb (such as the one shipped with INN 2.3.0) your database
  19 will need to be upgraded using ovdb_init(8).  See the man page
  20 ovdb_init(8) for upgrade instructions.
  21
  22 =head1 INSTALLATION
  23
  24 To build ovdb support into INN, specify the option C<--with-berkeleydb>
  25 when running the configure script.  By default, configure will search for
  26 a BerkeleyDB tree in several likely locations, and choose the highest
  27 version (based on the name of the directory, e.g., F<BerkeleyDB.3.0>) that
  28 it finds.  There will be a message in the configure output indicating the
  29 chosen pathname.
  30
  31 You can override this pathname by adding a path to the option, e.g.,
  32 C<--with-berkeleydb=/usr/BerkeleyDB.3.1>.  This directory is expected to
  33 have subdirectories F<include> and F<lib>, containing F<db.h>, and the
  34 library itself, respectively.
  35
  36 The ovdb database will take up more disk space for a given spool than the
  37 other overview methods.  Plan on needing at least 1.1 KB for every article
  38 in your spool (not counting crossposts).  So, if you have 5 million
  39 articles, you'll need at least 5.5 GB of disk space for ovdb.  With
  40 BerkeleyDB 2.x, the db files are 'grow only'; the library will not shrink
  41 them, even if data is removed.  So, reserving extra space above the
  42 estimate is a good idea.  Plus, you'll need additional space for
  43 transaction logs: at least 100 MB.  By default the transaction logs go in
  44 the same directory as the database.  To improve performance, they can be
  45 placed on a different disk -- see the DB_CONFIG section.
  46
  47 =head1 CONFIGURATION
  48
  49 To enable ovdb, set the I<ovmethod> parameter in F<inn.conf> to C<ovdb>.
  50 The ovdb database is stored in the directory specified by the
  51 I<pathoverview> paramter in F<inn.conf>.  This is the "DB_HOME" directory.
  52 To start out, this directory should be empty (other than an optional
  53 F<DB_CONFIG> file; see L<DB_CONFIG> for details) and B<innd> (or
  54 B<makehistory>) will create the files as necessary in that directory.
  55 Make sure the directory is owned by the news user.
  56
  57 Other parameters for configuring ovdb are in the ovdb.conf(5)
  58 configuration file.  See also the sample F<ovdb.conf>.
  59
  60 =over 4
  61
  62 =item cachesize
  63
  64 Size of the memory pool cache, in kilobytes.  The cache will have a
  65 backing store file in the DB directory which will be at least as big.  In
  66 general, the bigger the cache, the better.  Use C<ovdb_stat -m> to see
  67 cache hit percentages.  To make a change of this parameter take effect,
  68 shut down and restart INN (be sure to kill all of the nnrpds when shutting
  69 down).  Default is 8000, which is adequate for small to medium sized
  70 servers.  Large servers will probably need at least 20000.
  71
  72 =item numdbfiles
  73
  74 Overview data is split between this many files.  Currently, B<innd> will
  75 keep all of the files open, so don't set this too high or B<innd> may run
  76 out of file descriptors.  B<nnrpd> only opens one at a time, regardless.
  77 May be set to one, or just a few, but only do that if your OS supports
  78 large (>2G) files.  Changing this parameter has no effect on an
  79 already-established database.  Default is 32.
  80
  81 =item txn_nosync
  82
  83 If txn_nosync is set to false, BerkeleyDB flushes the log after every
  84 transaction.  This minimizes the number of transactions that may be lost
  85 in the event of a crash, but results in significantly degraded
  86 performance.  Default is true.
  87
  88 =item useshm
  89
  90 If useshm is set to true, BerkeleyDB will use shared memory instead of
  91 mmap for its environment regions (cache, lock, etc).  With some platforms,
  92 this may improve performance.  Default is false.  This parameter is
  93 ignored if you have BerkeleyDB 2.x
  94
  95 =item shmkey
  96
  97 Sets the shared memory key used by BerkeleyDB when 'useshm' is true.
  98 BerkeleyDB will create several (usually 5) shared memory segments, using
  99 sequentially numbered keys starting with 'shmkey'.  Choose a key that does
 100 not conflict with any existing shared memory segments on your system.
 101 Default is 6400.  This parameter is only used with BerkeleyDB 3.1 or
 102 newer.
 103
 104 =item pagesize
 105
 106 Sets the page size for the DB files (in bytes).  Must be a power of 2.
 107 Best choices are 4096 or 8192.  The default is 8192.  Changing this
 108 parameter has no effect on an already-established database.
 109
 110 =item minkey
 111
 112 Sets the minimum number of keys per page.  See the BerkeleyDB
 113 documentation for more info.  Default is based on page size:
 114
 115    default_minkey = MAX(2, pagesize / 2048 - 1)
 116
 117 The lowest allowed minkey is 2.  Setting minkey higher than the default is
 118 not recommended, as it will cause the databases to have a lot of overflow
 119 pages.  Changing this parameter has no effect on an already-established
 120 database.
 121
 122 =item maxlocks
 123
 124 Sets the BerkeleyDB "lk_max" parameter, which is the maxmium number of
 125 locks that can exist in the database at the same time.  Default is 4000.
 126
 127 =item nocompact
 128
 129 The nocompact parameter affects expireover's behavior.  The expireover
 130 function in ovdb can do its job in one of two ways:  by simply deleting
 131 expired records from the database, or by re-writing the overview records
 132 into a different location leaving out the expired records.  The first
 133 method is faster, but it leaves 'holes' that result in space that can not
 134 immediately be reused.  The second method 'compacts' the records by
 135 rewriting them.
 136
 137 If this parameter is set to 0, expireover will compact all newsgroups; if
 138 set to 1, expireover will not compact any newsgroups; and if set to a
 139 value greater than one, expireover will only compact groups that have less
 140 than that number of articles.
 141
 142 Experience has shown that compacting has minimal effect (other than
 143 making expireover take longer) so the default is now 1.  This parameter
 144 will probably be removed in the future.
 145
 146 =item readserver
 147
 148 Normally, each nnrpd process directly accesses the BerkeleyDB environment.
 149 The process of attaching to the database (and detaching when finished) is
 150 fairly expensive, and can result in high loads in situations when there
 151 are lots of reader connections of relatively short duration.
 152
 153 When the readserver parameter is "true", the nnrpds will access overview
 154 via a helper server (B<ovdb_server> -- which is started by B<ovdb_init>).
 155 This can also result in cleaner shutdowns for the database, improving
 156 stability and avoiding deadlocks and corrupted databases.  If you are
 157 experiencing any instability in ovdb, try setting this parameter to true.
 158 Default is false.
 159
 160 =item numrsprocs
 161
 162 This parameter is only used when I<readserver> is true.  It sets the
 163 number of ovdb_server processes.  As each ovdb_server can process only one
 164 transaction at a time, running more servers can improve reader response
 165 times.  Default is 5.
 166
 167 =item maxrsconn
 168
 169 This parameter is only used when I<readserver> is true.  It sets a maximum
 170 number of readers that a given ovdb_server process will serve at one time.
 171 This means the maximum number of readers for all of the ovdb_server
 172 processes is (numrsprocs * maxrsconn).  Default is 0, which means an
 173 umlimited number of connections is allowed.
 174
 175 =back
 176
 177 =head1 DB_CONFIG
 178
 179 A file called F<DB_CONFIG> may be placed in the database directory to
 180 customize where the various database files and transaction logs are
 181 written.  By default, all of the files are written in the "DB_HOME"
 182 directory.  One way to improve performance is to put the transaction logs
 183 on a different disk.  To do this, put:
 184
 185     DB_LOG_DIR /path/to/logs
 186
 187 in the F<DB_CONFIG> file.  If the pathname you give starts with a /, it is
 188 treated as an absolute path; otherwise, it is relative to the "DB_HOME"
 189 directory.  Make sure that any directories you specify exist and have
 190 proper ownership/mode before starting INN, because they won't be created
 191 automatically.  Also, don't change the DB_CONFIG file while anything that
 192 uses ovdb is running.
 193
 194 Another thing that you can do with this file is to split the overview
 195 database across multiple disks.  In the F<DB_CONFIG> file, you can list
 196 directories that BerkeleyDB will search when it goes to open a database.
 197
 198 For example, let's say that you have I<pathoverview> set to
 199 F</mnt/overview> and you have four additional file systems created on
 200 F</mnt/ov?>.  You would create a file "/mnt/overview/DB_CONFIG" containing
 201 the following lines:
 202
 203     set_data_dir /mnt/overview
 204     set_data_dir /mnt/ov1
 205     set_data_dir /mnt/ov2
 206     set_data_dir /mnt/ov3
 207     set_data_dir /mnt/ov4
 208
 209 (For BerkeleyDB 2.x, replace C<set_data_dir> with C<DB_DATA_DIR>.)
 210
 211 Distribute your ovNNNNN files into the four filesystems.  (say, 8 each).
 212 When called upon to open a database file, the db library will look for it
 213 in each of the specified directories (in order).  If said file is not
 214 found, one will be created in the first of those directories.
 215
 216 Whenever you change DB_CONFIG or move database files around, make sure all
 217 news processes that use the database are shut down first (including
 218 nnrpds).
 219
 220 The DB_CONFIG functionality is part of BerkeleyDB itself, rather than
 221 something provided by ovdb.  See the BerkeleyDB documentation for complete
 222 details for the version of BerkeleyDB that you're running.
 223
 224 =head1 RUNNING
 225
 226 When starting the news system, B<rc.news> will invoke B<ovdb_init>.
 227 B<ovdb_init> must be run before using the database.  It performs the
 228 following tasks:
 229
 230 =over 4
 231
 232 =item *
 233
 234 Creates the database environment, if necessary.
 235
 236 =item *
 237
 238 If the database is idle, it performs a normal recovery.  The recovery will
 239 remove stale locks, recreate the memory pool cache, and repair any damage
 240 caused by a system crash or improper shutdown.
 241
 242 =item *
 243
 244 Starts the DB housekeeping processes (B<ovdb_monitor>) if they're not
 245 already running.
 246
 247 =back
 248
 249 And when stopping INN, B<rc.news> kills the ovdb_monitor processes after
 250 the other INN processes have been shut down.
 251
 252 =head1 DIAGNOSTICS
 253
 254 Problems relating to ovdb are logged to news.err with "OVDB" in the error
 255 message.
 256
 257 INN programs that use overview will fail to start up if the ovdb_monitor
 258 processes aren't running.  Be sure to run B<ovdb_init> before running
 259 anything that accesses overview.
 260
 261 Also, INN programs that use overview will fail to start up if the user
 262 running them is not the "news" user.
 263
 264 If a program accessing the database crashes, or otherwise exits uncleanly,
 265 it might leave a stale lock in the database.  This lock could cause other
 266 processes to deadlock on that stale lock.  To fix this, shut down all news
 267 processes (using C<kill -9> if necessary) and then restart.  B<ovdb_init>
 268 should perform a recovery operation which will remove the locks and repair
 269 damage caused by killing the deadlocked processes.
 270
 271 =head1 FILES
 272
 273 =over 4
 274
 275 =item inn.conf
 276
 277 The I<ovmethod> and I<pathoverview> parameters are relevant to ovdb.
 278
 279 =item ovdb.conf
 280
 281 Optional configuration file for tuning.  See L<CONFIGURATION> above.
 282
 283 =item I<pathoverview>
 284
 285 Directory where the database goes.  BerkeleyDB calls it the 'DB_HOME'
 286 directory.
 287
 288 =item I<pathoverview>/DB_CONFIG
 289
 290 Optional file to configure the layout of the database files.
 291
 292 =item I<pathrun>/ovdb.sem
 293
 294 A file that gets locked by every process that is accessing the database.
 295 This is used by B<ovdb_init> to determine whether the database is active
 296 or quiescent.
 297
 298 =item I<pathrun>/ovdb_monitor.pid
 299
 300 Contains the process ID of B<ovdb_monitor>.
 301
 302 =back
 303
 304 =head1 TO DO
 305
 306 Implement a way to limit how many databases can be open at once (to reduce
 307 file descriptor usage); maybe using something similar to the cache code in
 308 ov3.c
 309
 310 =head1 HISTORY
 311
 312 Written by Heath Kehoe <hakehoe@avalon.net> for InterNetNews
 313
 314 =head1 SEE ALSO
 315
 316 inn.conf(5), innd(8), nnrpd(8), ovdb_init(8), ovdb_monitor(8),
 317 ovdb_stat(8)
 318
 319 BerkeleyDB documentation: in the F<docs> directory of the BerkeleyDB
 320 source distribution, or on the Sleepycat web page:
 321 L<http://www.sleepycat.com/>.
 322
 323 =cut