README.bible Bible Retrieval System Chip Chapin, Hewlett Packard Company Initial release, September 5, 1989 Last Updated: April 26, 1993 The Bible Retrieval System (BRS) consists of a textual database of the Authorized ("King James") Version of the Old and New Testaments, a set of libraries for finding and retrieving text, and a program ("bible") which uses the libraries to retrieve Bible passages given references on the command line or from standard input. A built-in Concordance (word search facility) is also supported. A man page is provided. While the raw Bible text consumes over 4.4 megabytes, the BRS stores it in a special compressed form, requiring less than 1.8 megabytes. Despite the compression, retrieval is very fast, and a buffer caching scheme makes second and following references to a particular region of the text almost instantaneous. The concordance facility requires an additional datafile of less than 0.9 megabytes, which provides a pre-computed list of each verse in which a word appears. The current implementation provides a very simple but effective way to logically combine the results of word searches to narrow a selection. INSTALLATION (building from source) You probably have two files: bible.data.tar Tar file containing the two data files. bible.tar.Z BRS program source distribution, including READMEs and man page. Create a directory to work in, then "cd" to it and extract files from the distribution archives as follows: $ tar xvf bible.data.tar $ zcat bible.tar.Z | tar xvf - Now execute "make" and wait for a while. $ make When make has completed, you should be able to start up the bible program for interactive use: $ bible or $ ./bible Type "?" for a summary of commands. "Q" quits the program. Review the man page: $ nroff bible.1 | more If you wish to install the program, data, and man files into system-wide locations ("/usr/local/..."), and you have the proper permissions, type: $ make install If you wish to install them somewhere else, either edit the Makefile and change the DEST variable, or just install the files by hand. If you install the data files anywhere besides /usr/local/lib, you may want to edit "bible.c" to assign an initial value to "dfpath", otherwise use the program's "-p" option (see man page). THE LIBRARIES The Bible Retrieval System is intended to be more than just the "bible" retrieval program. Two libraries of routines are provided in the BRS that may be used to construct other applications. The "Text Storage Library" (TSL) routines could be used for *any* textual data file; they are entirely independent of the structure of the Bible. They support the use of the windowed compression scheme on any text, with fast retrieval of any particular line of the text. The concordance facility is also completely generic and should work with any text. For this release, no separate documentation is provided for the TSL, but comments in the files tsl.c, tsl.h, makeindex.c, and makeconcfile.c are fairly extensive. The "Bible Retrieval Library" (BRL) includes routines that are specifically oriented to the Book-Chapter-Verse structure of the Bible text, however they are independent of the storage structure of the textual data, leaving that to the TSL. The BRL routines make retrieval programs such as "bible" extremely simple. For this release no separate documentation is provided for the BRL -- see brl.c, brl.h, brl-index.c and bible.c. Actually, there's also a third library of sorts. "Compresslib" contains a routine which may be called to uncompress a buffer of LZW-compressed data. THE COMPRESSION SCHEME The text is compressed using a modified version of the Lempel-Ziv-Welch "compress" program. The modification is very simple, and consists merely of forcing compress to emit checkpoints after a fixed number of input bytes which I call a "window". One can thus easily determine which compressed "window" contains a particular byte of the original text. By keeping track of the locations of the checkpoints in the compressed data, it is then possible to uncompress only the windows that are needed. By the way, the uncompression is done by a subroutine within the library -- no exec's or temporary files are used. Compression windows can be any size -- the size is stored in the data file and the retrieval routines treat the file accordingly. In the default configuration, the windows are 64Kbytes, which was shown by experiment to offer a reasonable compromise between efficient compression and efficient buffer management. If you want to experiment, you can change the window size by editing the argument to "squish" in the Makefile. Some Personal Notes... In 1979, as the owner of "Chapin Associates" in San Diego, I started a project to create an affordable computer-based retrieval system for Bible text. Working in UCSD Pascal on a PDP-11/03 with 60Kbytes of memory and two 500Kbyte RX02 floppy drives, with my associates Neil Fraser and Jan Denser, we succeeded in prototyping a system that used word-level Huffman-coding for the text of the New Testament. Unfortunately, pressed between economics and the limitations of the available hardware, I wound up abandoning the effort in 1980. In early 1989 I gained access to one of the available freeware Bible retrieval programs for the PC. I immediately decided that the time had come to "close the loop" on this particular personal dream, with Unix as the target environment. There really aren't any serious technical challenges any more to producing an acceptable Bible retrieval implementation for Unix systems. So I snatched the Bible text, spent a few weekends and evenings at my workstation, and here it is. The LZW compression scheme is much simpler than the word-level Huffman coding, though the compression is not as good. And it's great being able to count memory and disk storage in MBytes instead of KBytes. Even so, I've really tried to keep the system's use of resources to a minimum. But I don't think 1.7+ megabytes of data is too high a price to pay nowadays in most Unix environments. So... I hope others find these tools useful. Chip Chapin Hewlett Packard Company September 5, 1989 -------------------------------------------------------------------- Chip Chapin, Hewlett-Packard Company, California Language Lab (HP/CSO/STG/STD/CLO/CLL) Internet: chip@cup.hp.com HPDesk: Chip Chapin/hp4700/um uucp: ... {allegra,decvax,ihnp4,ucbvax} !hplabs!hpclbis!chip or ... uunet!hp-sde!hpclbis!chip USMail: MS42U5; 11000 Wolfe Road; Cupertino, CA 95014-9804; USA Phone: 408/447-5735 Fax: 408/447-4924 HPTelnet: 1-447-5735 --------------------------------------------------------------------