doc/man/dbz.3

   1 .TH DBZ 3 "6 Sep 1997"
   2 .BY "INN"
   3 .SH NAME
   4 dbzinit, dbzfresh, dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore, dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug \- database routines
   5 .SH SYNOPSIS
   6 .nf
   7 .B #include <dbz.h>
   8 .PP
   9 .B "bool dbzinit(const char *base)"
  10 .PP
  11 .B "bool dbzclose(void)"
  12 .PP
  13 .B "bool dbzfresh(const char *base, long size)"
  14 .PP
  15 .B "bool dbzagain(const char *base, const char *oldbase)"
  16 .PP
  17 .B "bool dbzexists(const HASH key)"
  18 .PP
  19 .B "off_t dbzfetch(const HASH key)"
  20 .B "bool dbzfetch(const HASH key, void *ivalue)"
  21 .PP
  22 .B "bool dbzstore(const HASH key, off_t offset)"
  23 .B "bool dbzstore(const HASH key, void *ivalue)"
  24 .PP
  25 .B "bool dbzsync(void)"
  26 .PP
  27 .B "long dbzsize(long nentries)"
  28 .PP
  29 .B "void dbzgetoptions(dbzoptions *opt)"
  30 .PP
  31 .B "void dbzsetoptions(const dbzoptions opt)"
  32 .PP
  33 .SH DESCRIPTION
  34 These functions provide an indexing system for rapid random access to a
  35 text file (the
  36 .I base
  37 .IR file ).
  38 .PP
  39 .I Dbz
  40 stores offsets into the base text file for rapid retrieval.  All retrievals
  41 are keyed on a hash value that is generated by the
  42 .I HashMessageID()
  43 function.
  44 .PP
  45 .I Dbzinit
  46 opens a database,
  47 an index into the base file
  48 .IR base ,
  49 consisting of files
  50 .IB base .dir
  51 ,
  52 .IB base .index
  53 , and
  54 .IB base .hash
  55 which must already exist.
  56 (If the database is new, they should be zero-length files.)
  57 Subsequent accesses go to that database until
  58 .I dbzclose
  59 is called to close the database.
  60 .PP
  61 .I Dbzfetch
  62 searches the database for the specified
  63 .IR key ,
  64 returning the corresponding
  65 .I value
  66 if any, if
  67 .I <\-\-enable\-tagged\-hash at configure>
  68 is specified.  If
  69 .I <\-\-enable\-tagged\-hash at configure>
  70 is not specified, it returns true and content of
  71 .I ivalue
  72 is set.
  73 .I Dbzstore
  74 stores the
  75 .I key - value
  76 pair in the database, if
  77 .I <\-\-enable\-tagged\-hash at configure>
  78 is specified.  If
  79 .I <\-\-enable\-tagged\-hash at configure>
  80 is not specified, it stores the content of
  81 .IR ivalue .
  82 .I Dbzstore
  83 will fail unless the database files are writable.
  84 .I Dbzexists
  85 will verify whether or not the given hash exists or not.  Dbz is
  86 optimized for this operation and it may be significantly faster than
  87 .IR dbzfetch() .
  88 .PP
  89 .I Dbzfresh
  90 is a variant of
  91 .I dbzinit
  92 for creating a new database with more control over details.
  93 .PP
  94 .IR Dbzfresh 's
  95 .I size
  96 parameter specifies the size of the first hash table within the database,
  97 in key-value pairs.
  98 Performance will be best if the number of key-value pairs stored in the
  99 database does not exceed about 2/3 of
 100 .IR size .
 101 (The
 102 .I dbzsize
 103 function, given the expected number of key-value pairs,
 104 will suggest a database size that meets these criteria.)
 105 Assuming that an
 106 .I fseek
 107 offset is 4 bytes,
 108 the
 109 .B .index
 110 file will be
 111 .I 4 * size
 112 bytes.  The
 113 .B .hash
 114 file will be
 115 .I DBZ_INTERNAL_HASH_SIZE * size
 116 bytes
 117 (the
 118 .B .dir
 119 file is tiny and roughly constant in size)
 120 until
 121 the number of key-value pairs exceeds about 80% of
 122 .IR size .
 123 (Nothing awful will happen if the database grows beyond 100% of
 124 .IR size ,
 125 but accesses will slow down quite a bit and the
 126 .B .index
 127 and
 128 .B .hash
 129 files will grow somewhat.)
 130 .PP
 131 .I Dbz
 132 stores up to
 133 .SM DBZ_INTERNAL_HASH_SIZE
 134 bytes of the message-id's hash in the
 135 .B .hash
 136 file to confirm a hit.  This eliminates the need to read the base file to
 137 handle collisions.  This replaces the tagmask feature in previous dbz
 138 releases.
 139 .PP
 140 A
 141 .I size
 142 of ``0''
 143 given to
 144 .I dbzfresh
 145 is synonymous with the local default;
 146 the normal default is suitable for tables of 5,000,000
 147 key-value pairs.
 148 Calling
 149 .I dbzinit(name)
 150 with the empty name is equivalent to calling
 151 .IR dbzfresh(name,\ 0) .
 152 .PP
 153 When databases are regenerated periodically, as in news,
 154 it is simplest to pick the parameters for a new database based on the old one.
 155 This also permits some memory of past sizes of the old database, so that
 156 a new database size can be chosen to cover expected fluctuations.
 157 .I Dbzagain
 158 is a variant of
 159 .I dbzinit
 160 for creating a new database as a new generation of an old database.
 161 The database files for
 162 .I oldbase
 163 must exist.
 164 .I Dbzagain
 165 is equivalent to calling
 166 .I dbzfresh
 167 with a
 168 .I size
 169 equal to the result of applying
 170 .I dbzsize
 171 to the largest number of entries in the
 172 .I oldbase
 173 database and its previous 10 generations.
 174 .PP
 175 When many accesses are being done by the same program,
 176 .I dbz
 177 is massively faster if its first hash table is in memory.
 178 If the ``pag_incore'' flag is set to INCORE_MEM,
 179 an attempt is made to read the table in when
 180 the database is opened, and
 181 .I dbzclose
 182 writes it out to disk again (if it was read successfully and
 183 has been modified).
 184 .I Dbzsetoptions
 185 can be used to set the
 186 .B pag_incore
 187 and
 188 .B exists_incore
 189 flag to new value which should be ``INCORE_NO'', ``INCORE_MEM'', or
 190 \&``INCORE_MMAP'' for the
 191 .B .hash
 192 and
 193 .B .index
 194 files separately; this does not affect the status of a database that has
 195 already been opened.  The default is ``INCORE_NO'' for the
 196 .B .index
 197 file and ``INCORE_MMAP'' for the
 198 .B .hash
 199 file.  The attempt to read the table in may fail due to memory shortage;
 200 in this case
 201 .I dbz
 202 fails with an error.
 203 .IR Store s
 204 to an in-memory database are not (in general) written out to the file
 205 until
 206 .IR dbzclose
 207 or
 208 .IR dbzsync ,
 209 so if robustness in the presence of crashes
 210 or concurrent accesses is crucial, in-memory databases
 211 should probably be avoided or the
 212 .B writethrough
 213 option should be set to ``true'';
 214 .PP
 215 If the
 216 .B nonblock
 217 option is ``true'', then writes to the
 218 .B .hash
 219 and
 220 .B .index
 221 files will be done using non-blocking I/O.  This can be significantly faster if
 222 your platform supports non-blocking I/O with files.
 223 .PP
 224 .I Dbzsync
 225 causes all buffers etc. to be flushed out to the files.
 226 It is typically used as a precaution against crashes or concurrent accesses
 227 when a
 228 .IR dbz -using
 229 process will be running for a long time.
 230 It is a somewhat expensive operation,
 231 especially
 232 for an in-memory database.
 233 .PP
 234 Concurrent reading of databases is fairly safe,
 235 but there is no (inter)locking,
 236 so concurrent updating is not.
 237 .PP
 238 An open database occupies three
 239 .I stdio
 240 streams and two file descriptors;
 241 Memory consumption is negligible (except for
 242 .I stdio
 243 buffers) except for in-memory databases.
 244 .SH SEE ALSO
 245 dbm(3), history(5), libinn(3)
 246 .SH DIAGNOSTICS
 247 Functions returning
 248 .I bool
 249 values return ``true'' for success, ``false'' for failure.
 250 Functions returning
 251 .I off_t
 252 values return a value with
 253 .I \-1
 254 for failure.
 255 .I Dbzinit
 256 attempts to have
 257 .I errno
 258 set plausibly on return, but otherwise this is not guaranteed.
 259 An
 260 .I errno
 261 of
 262 .B EDOM
 263 from
 264 .I dbzinit
 265 indicates that the database did not appear to be in
 266 .I dbz
 267 format.
 268 .PP
 269 If
 270 .SM DBZTEST
 271 is defined at compile-time then a
 272 .I main()
 273 function will be included.  This will do performance tests and integrity test.
 274 .SH HISTORY
 275 The original
 276 .I dbz
 277 was written by
 278 Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).
 279 Later contributions by David Butler and Mark Moraes.
 280 Extensive reworking,
 281 including this documentation,
 282 by Henry Spencer (henry@zoo.toronto.edu) as
 283 part of the C News project.
 284 MD5 code borrowed from RSA.  Extensive reworking to remove backwards
 285 compatibility and to add hashes into dbz files by Clayton O'Neill (coneill@oneill.net)
 286 .SH BUGS
 287 .PP
 288 Unlike
 289 .IR dbm ,
 290 .I dbz
 291 will refuse
 292 to
 293 .I dbzstore
 294 with a key already in the database.
 295 The user is responsible for avoiding this.
 296 .PP
 297 The RFC822 case mapper implements only a first approximation to the
 298 hideously-complex RFC822 case rules.
 299 .PP
 300 .I Dbz
 301 no longer tries to be call-compatible with
 302 .I dbm
 303 in any way.