[ Top | Up | Prev | Next | Map | Index ]

Readme for analog2.90beta1

Cache files

Analog has the ability to archive some of the data in your logfile into a cache file so that the logfile can be thrown away without losing the most important data.

For most people, the cache file will not be needed: compressing the logfile using a standard compression utility such as gzip will be sufficient. Compressing a logfile is very efficient owing to the large number of repeated strings: I find about 12 times compression in practice. That in itself may solve your filespace problems, without needing to throw away any information.

If you are going to use the cache file feature, it is very important that you understand what is and what is not recorded. It is not possible to reconstruct everything of interest in the logfile from the cache file. The cache file does contain information about the total number of requests for each host and each file, but not about, for example, which files were read by which hosts. (To do so would take up as much disk space as the compressed logfile.) So you cannot later look at only one file and see which hosts read that file. Similarly, you cannot later restrict the files or hosts by date, using FROM and TO commands.

In summary, you should do all the inclusions and exclusions you want when you create the logfile. If you want different sets of inclusions and exclusions, you should create several cache files from the same logfile. You cannot later apply extra inclusions and exclusions accurately.

One other minor point: the pattern of failed requests and redirected requests over time is not recorded in the cache file. So although the total number will still be correct, the number in the last 7 days can be under-reported subsequently.

You can create a cache file by setting the CACHEOUTFILE to be the file you want the cache to live in. Set
to turn it off again. You will still get the regular output as well as the cache output, unless you request OUTPUT NONE.

You can read in a previously-made cache file with the CACHEFILE command, or with the +U command line option. As with the LOGFILE command, you can use commas and wild cards to read in several cache files, and read compressed cache files using the UNCOMPRESS mechanism. Note that if you don't want to read a logfile as well as the cache file, you will have to explicitly set the LOGFILE to none.

When analog reads in a cache file, it will respect inclusions and exclusions as far as it can, but it does not apply any more aliases to the items. (This is to avoid double-aliasing.) So you must do any aliases you want at the time you create the cache file. Similarly, it does not obey the LOGOFFSET variable, to avoid double-offsetting, so any offset you want must be applied at cache-creation time too.

Sometimes you don't want to record all the types of item in the cache file. You might want to forget about which hosts had accessed your web site, for example, and only remember how many times each file was requested. Or you might even just want to remember the pattern of requests over time, not which files were requested. You can choose not to include one type of item in the cache file by setting its LOWMEM to 3; for example, specify

to exclude hosts from the cache file. Because this is a serious step, analog will produce a warning if you do this.

It is legal to have the CACHEOUTFILE the same as the CACHEFILE to overwrite the old cache file with an updated one, but it is not recommended. It is best to make a separate cache file for each logfile. Failing that, it is better to write the new cache to a different file, and only delete the old cache when you have verified that the new cache was created correctly.

To use this feature and avoid losing entries or double counting them, I suggest you follow the following procedure.
  1. Archive the old logfile, and restart the server with a fresh logfile. (See your server documentation for how to do this.)
  2. Make both a cache file and an ordinary report from the old logfile.
  3. Make a report from the cache file and compare it against the report from the logfile to check it works.
Now you can throw away the old logfile, if you've really understood what data you're losing by doing so.

I prefer to make a separate cache file from each logfile, in case something goes wrong with one of them, rather than a single cache file combining several logfiles, or a single cache file combining an old cache file and a logfile.

Finally, please remember that this is beta-test software. I can take no responsibility if something goes wrong. (See the licence.)

Stephen Turner
E-mail: sret1@cam.ac.uk

[ Top | Up | Prev | Next | Map | Index ]