Readme for analog 3.0

Introduction

Analog is a program which analyses logfiles from WWW servers. It works on almost any operating system. It is designed to be fast and to produce attractive statistics. It's free software.

This Readme describes analog3.0. For the latest version of analog, see the analog home page. For examples of the output see

Analog is freeware, but its use is covered by a licence. You must agree to the terms of the licence before using the program.

This is a version of the Readme in one page. If you're reading it on line, you might prefer the version on several smaller pages. There is an index at the end of this document.

Now you can go to


Starting to use analog

The only thing you need to run analog is to be able to read the logfiles which are produced by your web server. If you don't know what these logfiles are and where to find them, contact your internet service provider (ISP) or system administrator. Analog doesn't write the logfiles: it only reads them.

If you log in to your ISP's machine from your home machine, you have two options. If you have the right permissions, you can run analog on your ISP's machine. Otherwise, you can download (e.g., ftp) the logfiles from their machine to yours, and then run analog on your machine.

Once you've downloaded the right version of analog for your computer from the analog home page (or a mirror site), you need to know how to set it up and run it. This is very easy, but the instructions are slightly different depending which platform you're using.


Starting to use analog on a Mac

When you download the Mac version of analog, it should unpack itself. (If it doesn't, you might have to run StuffIt Expander on it). You should then find in the analog directory a configuration file called analog.cfg and the analog application itself, as well as the Readme, the Licence (which you must read and agree to before using analog) and a couple of other files. When you double-click on the analog icon, it will run in its own window, and produce an output file called Report.html.
You can configure analog by putting commands in the configuration file, analog.cfg. One command you will need straight away is
LOGFILE logfilename    # to set where your logfile lives
The logfile must live on your local disk -- analog doesn't fetch it from across the network. There's a sample logfile supplied with the program.

There's a list of basic commands later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on customising analog.


Another way to start analog is to drag a logfile onto the analog icon, in which case analog will try to analyse it, or drag a configuration file onto the icon, in which case analog will use the commands in that configuration file. (Analog detects whether it's a configuration file or a logfile by whether it starts with a # or not.) This enables you to create different reports without having two copies of the application.

One note: on other platforms, there is another way to give options, via command line arguments. You'll see these mentioned in this Readme from time to time, but the Mac doesn't have a command line, so ignore these.

If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).


Starting to use analog under Windows 95 & NT

When you've downloaded analog, and either you or your browser has unzipped it, you will find in the analog directory a configuration file called analog.cfg and the analog executable itself, as well as the Readme, the Licence (which you must read and agree to before using analog) and a couple of other files.

There are two ways of running analog. You can either run it from Windows by double-clicking on its icon, or you can run it from the DOS command prompt (under Start-Programs). If you run it from Windows, it will create a DOS window to run in. When it's finished, it will produce an output file called Report.html.


You can configure analog by putting commands in the configuration file, analog.cfg. One command you will need straight away is
LOGFILE logfilename    # to set where your logfile lives
The logfile must live on your local disk -- analog doesn't fetch it from across the network. There's a sample logfile supplied with the program.

There's a list of basic commands later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on customising analog.

In some ways, it's easier to run analog from the DOS command prompt, because you get to see any error or warning messages more easily. Also, if you run analog from the command prompt, there is another way to give options, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands.

If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).


Starting to use analog under OS/2

When you've downloaded analog, and either you or your browser has unzipped it, you will find in the analog directory a configuration file called analog.cfg and the analog executable itself, as well as the Readme, the Licence (which you must read and agree to before using analog) and a couple of other files. You can run analog by just typing analog. It should produce an output file called Report.html.
You can configure analog by putting commands in the configuration file, analog.cfg. One command you will need straight away is
LOGFILE logfilename    # to set where your logfile lives
You need to use \ not / as the directory separator in the logfile name. The logfile must live on your local disk -- analog doesn't fetch it from across the network. There's a sample logfile supplied with the program.

There's a list of basic commands later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on customising analog.

There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands.

If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions). There are instructions about compiling on another page.


Starting to use analog on other platforms

If you're not using a Mac or a PC, you'll have to compile your own version of analog from the source. But don't worry -- it's written in standard C throughout, so it will compile out of the box on most platforms. (The source code is the same for all platforms.)

First, you will want to look at the file analhead.h. These are all user-settable options, but most of them you can override later. You will probably want to check the first few options in the file, but you can even leave most of them until later.

When you have done that, you need to compile the program. How to do that depends on which system you're using.


Compiling under Unix. Just type
make
to compile the program. On most systems, that will be sufficient. If it fails to compile, have a look in the Makefile to see if there's anything that you need to change to suit your configuration, and try again. It says in that file what to do. In particular, Solaris 2 (SunOS 5) users need to change the LIBS= line (and may need to change the DEFS= line -- see below).

If you haven't got gcc, you will need to change the compiler - try acc or cc instead. If it still doesn't compile, try DEFS=-DNODNS to ignore the DNS lookup code.

There is a known problem with HP-UX 10 and some versions of gcc. If it complains about an error in the <sys/stat.h> library, you need to upgrade to gcc version 2.7.2.3 or later, or use HP's cc compiler. HP's compiler is not an ANSI C compiler by default, so you need to specify -Ae in the CFLAGS to tell the compiler to use ANSI C.

SunOS 4's cc doesn't seem to have the necessary header files for ANSI C. Often gcc doesn't work either -- you will probably need to use acc.

SunOS 5 sometimes seems to have a broken strcmp() function. If you get an "illegal instruction" error when running analog, compile it with the -DNOSTRCMP in the DEFS= line.

Compiling under VMS. Type

MMS
to compile analog. Under VMS 7.0 & 7.1, there is a VMS bug that stops analog compiling. The fix is to add "/define=(_VMS_V6_SOURCE)" to the cflags definitions at the top of the file descrip.mms.

Compiling under Acorn RiscOS. The Makefile is called Make.Risc, and you will have to rename it to Makefile before running make. Also you have to make directories called C, H and O, and move the sources files into the appropriate directories: e.g., alias.c must be renamed C.alias. And you will find that there are some filenames in the header file analhead.h that you want to change to fit into the RiscOS directory structure.

Compiling under OS/2. Although there is a precompiled version of analog for OS/2, if you want to compile your own you will need the EMX package. You should edit the Makefile to have OS=OS2. Then after running Make, you need to run the command

EMXBIND -b ANALOG
to generate the analog.exe executable.
After you've made the program, just type
analog
to run the program. (Or ./analog if for some reason . isn't in your $PATH.)

You can configure analog by putting commands in the configuration file, which is called analog.cfg by default. Two commands you will need straight away are

LOGFILE logfilename      # to set where your logfile lives
OUTFILE outputfile.html  # to send the output to a file instead of the screen
The logfile must live on your local disk -- analog doesn't fetch it from across the network. There's a sample logfile supplied with the program.

There's a list of basic commands later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on customising analog.

There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands.


Customising analog

This section is the bulk of the Readme. It tells you all the commands you can give to analog, and what they all do. First there's a list of which is as much as beginners need to read, until they want to do something which isn't listed there, or are curious to find out what they could do.

The following section is a technical (i.e., dull but important) section on the

Then there's documentation on all the configuration commands in the following categories. Analog has over 200 configuration commands, as well as several command line options, so sometimes these sections turn into lists of commands. But here's where you find out everything you can do with analog. There's also an index of all the commands and topics on a separate page.

Basic commands

Here is a list of basic configuration commands to get you started with analog. These commands should be added to your configuration file, analog.cfg, as explained in the section Starting to use analog. We'll see all the possible configuration commands in later sections.

Analog reads logfiles produced by your web server, and produces an output file based on the data in them. So you need to know how to specify which logfile to read, and which file to send the output to. The relevant commands look like

LOGFILE my_logfile
OUTFILE output.html
where, of course, you should substitute the names of the files you want to use. The logfile must be on your local disk -- analog doesn't fetch it from across the network, so if it's not on your local disk, you will have to fetch it yourself first. You can read several logfiles by giving several logfile commands, or by giving a comma-separated list, or by using wild cards in the logfile name. So, for example, if you use the commands
LOGFILE new1.log,old*.log
LOGFILE new2.log
analog will analyse the logfiles new1.log, new2.log, and all the old logfiles. Analog will recognise logfiles in several different formats. You can read more about this in the section Choosing a logfile.
There are a couple of other commands you need to know right at the beginning, not because they're particularly important in themselves, but because the output will look silly if you don't know them. First, you need to know how to put your own organisation's name and URL at the top of the report. For this, you need two commands such as
HOSTNAME "Spam Widgets Inc."
HOSTURL www.spam-widgets.com

If you have broken images in the output instead of graphs, you need to say in which directory on your server the images are stored. You do this by a command like

IMAGEDIR /analog/images/
(The images are distributed with the program - you will have to move them to whichever directory you choose.)
Next you will want to know how to turn individual reports on and off. Analog can produce 27 different reports, but here are the most important. Try them and see what happens. You can turn each report on with an ON command, or off with an OFF command. You can also use the commands ALL ON and ALL OFF to turn all reports on or off.
MONTHLY ON    # one line for each month
WEEKLY ON     # one line for each week
FULLDAILY ON  # one line for each day
DAILY ON      # one line for each day of the week
HOURLY ON     # one line for each hour of the day
GENERAL ON    # the General Summary at the top
REQUEST ON    # which files were requested
FAILURE ON    # which files were not found
DIRECTORY ON  # directory report
HOST ON       # which computers requested files
DOMAIN ON     # which countries they were in
REFERRER ON   # where people followed links from
FAILREF ON    # where people followed broken links from
BROWSER ON    # which browsers people were using
FILETYPE ON   # types of file requested
SIZE ON       # sizes of files requested
The referrer and browser reports will only appear if your server records the necessary information. You can configure lots of other things about each report, such as how many rows are listed, which columns are included, and how the reports are sorted. For example, the command
REQINCLUDE pages
tells analog only to list pages, rather than all files, in the request report. You can read about all the options in the sections on Time reports, Other reports and Hierarchical reports.
You can have the output in several different languages, by using a LANGUAGE command. For example, the command
LANGUAGE FRENCH
will give you the output in French. The possible languages at the moment are ENGLISH, US-ENGLISH, FRENCH, GERMAN, ITALIAN, PORTUGUESE, BR-PORTUGUESE, DANISH, SWEDISH, CZECH, SLOVAK, HUNGARIAN, ROMANIAN and SLOVENE, and I hope to have other languages available soon. See the section on Configuring the output for how to download, or even translate, new languages.
Two other common things you might want to do are to alias files or hosts (for example, to tell analog that two different filenames are really the same file), or to include or exclude certain files, hosts or dates (to ignore accesses from your site, for example, or to do an analysis only of a certain subdirectory or a certain time period. For these, see the later sections on Aliases and Inclusions and exclusions.

As I said, these are only a few of the commands available. To find out about all the commands, you'll have to read the remaining sections of the Readme, starting with a short section on the syntax of configuration commands.


Syntax of configuration commands

When analog starts up, it first reads options from configuration files and the command line (assuming that you are running analog from an operating system with a command line). Defaults for many of these options will have already been set in the file analhead.h at the time the program was compiled. So if you compile your own version of analog, rather than downloading a pre-compiled executable, you can also set some options in that file before compiling. Those options are all documented there.
The first file which analog reads is the default configuration file, normally called analog.cfg. You can stop this file being read by specifying the option -G on the command line. Then the command line arguments are read, in the order in which they appear. Finally, the mandatory configuration file is read, if you specified one when you compiled the program. This is a configuration file which cannot be overridden by the user: if it is not found, analog exits immediately. This allows a system administrator to prevent users analysing certain files or producing certain reports, for example. However, note that the only certain way to prevent users analysing things is to deny them access to the logfile. Otherwise there is nothing to stop them analysing the logfile using another copy of analog or another program.
You can include another configuration file from the command line by using a command like +gother.cfg. (Note that there is no space between +g and the filename; this is true of all command line arguments.) You can also include another configuration file from within a configuration file by a command like
CONFIGFILE other.cfg
The commands in the other configuration file are read immediately, in order. The program then continues reading the command line or calling configuration file where it left off.

In the Mac version, you can start up a program with a particular configuration file by dragging it onto the analog icon. The configuration file must start with a #. The default configuration file is still read first.

You can also specify any configuration command on the command line even if it doesn't have a command line abbreviation, by use of the +C command. For example, +C"UNCOMPRESS *.gz" will include that command.


Here are the syntax rules for configuration commands. A configuration file contains several commands on separate lines; any text after a hash (#) on a line is ignored as a comment. Each command consists of the command name followed by one or two arguments. An argument to a command may optionally be placed in single or double quotes or parentheses, and it must be if the argument contains a hash or a space. So, for example, here are some valid configuration commands
DAILY      OFF   # We don't want a daily summary
FULLDAILY  "ON"  # We want a full daily report instead 
HOSTNAME (Spam Widgets Inc.)  # Spaces, so quotes or brackets needed
Generally later commands override earlier ones if there is a conflict (e.g., for the OUTFILE, because you can have only one), or supplement them if there is no conflict (e.g., for the LOGFILE, because you can read several logfiles).
If all the options seem a bit confusing, just run
analog -settings [other options]
or include PRINTVARS ON in the configuration commands. That will tell you what the values of all the variables will be, based on the defaults in analhead.h, the configuration commands, and the command line options.

Choosing a logfile

This is a rather long page, so here is a quick summary of the most important points:
The basic command for selecting a logfile is
LOGFILE logfilename
or just to put the logfile name on the command line without any arguments, e.g., analog logfilename. A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. The word none means that the list of logfiles specified so far is erased. All logfiles must be on your local disk -- analog doesn't fetch them from across the network. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon.

You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:

LOGFILE logfile1,*.log
LOGFILE c:\logs\logfile2
The LOGFILE commands are cumulative, except that any logfiles on the command line or in user-specified configuration files override any in the default configuration file, and are themselves overridden by any in the mandatory configuration file.
Analog knows about several different types of logfile. By default it will attempt to see if your logfile is of one of the types it knows about, based on the first line. (Note: if the first line of your logfile is corrupt, or if your logfile has lines in different formats, you'll have to tell analog the logfile type yourself). The types it can diagnose are the common log format, the NCSA combined format, referrer log and browser log, the W3 extended log format, the Microsoft IIS format (sometimes), the Netscape format, the WebSTAR format, and the Netpresenz format (sometimes). Examples of all these formats are given at the end of this page. If you have debugging on, analog will report what type of logfile it thinks yours is.

The reason for the "sometimes" in the previous paragraph is as follows. The Microsoft and Netpresenz formats are extremely badly designed in that the date can occur in either of the forms date/month/year or month/date/year, and they don't say which they're using. Analog will detect them automatically if it can tell which date format is being used (e.g., 13/2/98 or 2/13/98), but if it can't, it will tell you to use one of the LOGFORMAT strings below. Also, the NCSA browser log can only be detected if it includes the date.


You can also specify a different type of logfile, using the LOGFORMAT or DEFAULTLOGFORMAT command. If all your logfiles are of formats that analog can diagnose, you need never use the these commands.

When you start up analog, all logfiles have the default logfile format. This is normally automatic detection, as explained above, but you can change it if your logfiles are always in a format which analog doesn't know about. You do this by means of the command

DEFAULTLOGFORMAT format
-- we'll discuss what the formats can be in a minute.

Sometimes you might want to analyse several logfiles with different formats. For this you need the LOGFORMAT command. This command only applies to future logfiles in the same configuration file. So if you change the format with a command like

LOGFORMAT format
then any logfiles you select with a LOGFILE command later in the same configuration file will get the new format.

The possible formats for use with the DEFAULTLOGFORMAT and LOGFORMAT commands are of two types. First there are some symbolic words, and then there are log format strings. We'll look at the words first.

There are format words for all the built-in formats analog knows about. For example, COMMON will select common format; you can also have COMBINED, REFERRER, BROWSER, EXTENDED, MICROSOFT-NA (North American date format), MICROSOFT-INT (international date format), NETSCAPE, WEBSTAR, NETPRESENZ-NA (North American) or NETPRESENZ-INT (international). There are also the words AUTO for automatic detection and DEFAULT for whatever the default log format is.

If your logfile is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows.

%S
host (computer making the request)
%r
file requested
%R
Mac-style filename, with colons instead of slashes
%B
browser
%f
referrer (URL referring to the file)
%u
user (tip: a cookie can usefully be defined as %u too)
%v
virtual host
%d
day of the month
%m
month in digits
%M
month, three letter abbreviation
%y
year, last two digits
%Y
year, four digits
%h
hour of the day
%n
minute of the hour
%a
a for am or p for pm (if %h is 12-hour clock)
%b
number of bytes transferred
%c
HTTP status code
%C
Special code, specific to particular servers
%q
query string (part of filename after ?, if recorded in a separate field)
%j
junk: ignore this field
%w
white space: spaces or tabs
%W
optional white space
%%
% sign
\n
new line
\t
tab stop
\\
single backslash
(I shall refer to the first seven things above as items.) So for example, the common log format, which looks like
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243
can be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
including two items, host and file. (The parentheses are needed because the argument contains spaces.)

Logfiles often contain lines in several different formats, so you can specify several log formats one after the other and they will accumulate. For example, the definition of common format should also include the line

LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
to handle lines where the HTTP/1.0 part of the request is absent. Or you might use
LOGFORMAT COMMON
LOGFORMAT COMBINED
to represent a logfile which had lines in both those formats. Analog tries to match the line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats.

The log formats which analog can handle are those which are known as instantaneously decipherable: this means that the character which terminates a string can never occur in the string. In the above example, if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, not at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both %m and %M, for example).

Sometimes you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like

[14/Mar/1996:17:48:10] http://guide-p.infoseek.com/Titles -> /~sret1/analog/
But the requests for /~sret1/analog/ would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a * in that item in the format string, like this:
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r)
Any of the seven items can be treated in this way.

Here are the exact rules about which logfile gets which log formats. The default logfile format starts off at AUTO. You can change it with a DEFAULTLOGFORMAT command, and then the default format accumulates unless you specify DEFAULTLOGFORMAT AUTO to return to automatic detection.

The current logfile format starts off at DEFAULT. You can change it with a LOGFORMAT command, and then the current format accumulates until a LOGFILE command intervenes; then it restarts at the next LOGFORMAT command. It also restarts if you specify LOGFORMAT AUTO or LOGFILE DEFAULT; or when the current format is reset to DEFAULT automatically, which happens at the end of the command line, and of every configuration file, and whenever a LOGFILE none command is encountered.

The default logfile selected at compilation time always gets the default format (although exactly what the default format is can still be changed with a DEFAULTLOGFORMAT command). Any logfile declared later, in a configuration file for example, gets the current log format at the time it is selected. If you specify several logfiles, they will all use the same format, unless there's a LOGFORMAT command or an implicit return to DEFAULT format between them.


There's also a second argument to the logfile command, which specifies a prefix to add to all the filenames in that logfile. This is useful if you've got several different servers or virtual hosts, when the same filename may occur on each of the servers. The argument can contain a %v, and the name of the virtual host will then be inserted at that point. For example,
LOGFILE log1,log2 http://www.%v.mydomain.com
would translate a filename /file.html with virtual host spam in log1 or log2 to http://www.spam.mydomain.com/file.html. If you are using the second argument to the LOGFILE command, you will probably want to use the SUBDIR command as well.

If %v is included in the argument and the line doesn't have virtual host, that line will be marked as corrupt. If VHOSTLOWMEM 3 is specified, the %v's will not be translated and will just appear as %v in the output.


There is one other command which applies to individual logfiles, in a similar way to the LOGFORMAT. Sometimes your server is not (or believes it is not) in the same timezone as you. So that you can give your statistics in your local time, there is a command LOGTIMEOFFSET to change the time by a certain number of minutes. You have to be careful using this. Because of daylight savings time in operation in different parts of the world at different times, analog cannot attempt to convert between different timezones. So it's your responsibility to set the right offset for different times of year. For example, if you were in Chicago, but your server was recording time in GMT, you would need to specify two different time offsets, one of minus five hours for summer and one of minus six hours for winter. You would need to split your logfiles in the right places and then run commands like
LOGTIMEOFFSET -300
LOGFILE summer*.log
LOGTIMEOFFSET -360
LOGFILE winter*.log

While we're on the subject of time offsets, there is one other similar command, which is not directly to do with logfiles. You can specify a TIMEOFFSET command to say how much analog should offset the time of the computer on which it is running, to get your local time.


It is often convenient to store logfiles compressed to save disk space. Analog on the Mac can read logfiles compressed using gzip. And analog on Unix, Win32, and VMS 7.0 and above can read compressed logfiles provided that you use an UNCOMPRESS command to say how to uncompress them. You need to supply the types of file that you want to uncompress in a comma-separated list, together with the name of a command that will uncompress the files to standard output (rather than to a file). For example, on Unix you might use
UNCOMPRESS *.gz,*.Z  /usr/bin/gzcat
whereas on Windows NT, you might use
UNCOMPRESS *.gz "c:\Program Files\gzip\gzip -cd"
and on VMS, it could be
UNCOMPRESS *.LOG-GZ;*  "gunzip -c"
This would be a suitable command to include in the default configuration file.

If analog determines when it starts to uncompress a logfile that that file isn't wanted for the analysis, two undesirable things can happen. Either the program might pause until the logfile is fully uncompressed, or there might be a "broken pipe" error reported. This is system dependent, and out of analog's control.


Appendix: logfile formats

Here is a summary of the various logfile formats which analog knows about.

The common logfile format is written by most servers. Its lines look like

jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243
Specifying LOGFORMAT COMMON is the same as specifying the three commands
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j" %c %b)

The NCSA referrer log looks like
[14/Mar/1996:17:48:10] http://guide-p.infoseek.com/Titles -> /~sret1/analog/
and the browser (or agent) log looks like
[14/Mar/1996:17:45:08] Mozilla/2.0 (X11; I; HP-UX A.09.05 9000/735)
The respective LOGFORMAT commands are
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r)
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B)
In both of these logfiles the date can be omitted, except if the date is omitted in the browser log, analog will not be able to detect the log format automatically. (It doesn't contain enough clues, so there is too much danger of confusing other log formats with it; just use "LOGFORMAT %B").
The NCSA combined log is the same as the common log, except that it has the referrer and browser on the end in quotes, like this:
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243
"http://www.statslab.cam.ac.uk/" "Mozilla/2.0 (X11; I; HP-UX A.09.05 9000/735)"
except all one line. If you are using the Apache server, you can generate this with the mod_log_config module, using the command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""
The corresponding LOGFORMAT commands are
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j" %c %b "%f" "%B")
It is usually better to use the combined log than separate logs, because it stores more information in less space.
The W3 extended log, the Netscape log, and the WebSTAR log can be recognised because they must include at or near the top a line telling analog what to expect on subsequent lines. Analog constructs a LOGFORMAT template based on this header line. (They may also contain later lines changing the format).

The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like

#Fields: date time cs-uri
In the rest of the logfile, the fields can be separated by spaces or tabs. The WebSTAR file has a header line like
!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAME
In the rest of the logfile, the fields are separated by tabs. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too. Finally, the Netscape header line looks like
format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%"
%Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%

Sometimes these three logfile formats can contain header lines which refer to the same item in two different ways. Analog doesn't know which one you want to count, so such header lines will generate a "corrupt format line" warning. You can then use a LOGFORMAT command to specify the format more precisely.


The Microsoft IIS logfile looks like
192.64.25.41, -, 21/02/97, 00:03:46, W3SVC, SPIDER, 192.16.225.10,
30, 303, 1455, 200, 0, GET, /siege.htm, -,
(except all on one line) or
LOGFORMAT (%S, %u, %d/%m/%y, %h:%n:%j, W3SVC, %j, %v, %j, %j, %b, %c, %j, %j, %r, %j,)
However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date 02/21/97 instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, you need to use the LOGFORMAT command MICROSOFT-NA for North American date format, or MICROSOFT-INT for international date format. It may even be that the date is in neither of these formats, in which case you need to use a LOGFORMAT command of your own.

There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. Analog can't automatically diagnose these: you need to write a LOGFORMAT string for them.


The Netpresenz logfile is unusual in that each entry can spread over several lines. It looks like
5:54 pm  14/11/96  134.87.19.110  HTTP    get file  Research.html
Web:Research:Research.html
Referer: http://guide-p.infoseek.com/Titles
The fields are separated by tabs. It is equivalent to four LOGFORMAT commands:
LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%j\t\n%R\nReferer: %f)
LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%j\t\n%R)
LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%R)
LOGFORMAT (%j)
Again, the Netpresenz format uses local conventions for the date and time. Analog will diagnose it where it can: otherwise, you will have to use
LOGFORMAT NETPRESENZ-NA    # dates like 9:14 AM  3/23/98 (upper case AM)
or
LOGFORMAT NETPRESENZ-INT   # dates like 9:14 am  23/3/98 (lower case am)
Again, it can be that the date and time is in neither of these forms, in which case you will have to enter your own LOGFORMAT string.

Aliases

After analog has read each logfile entry, it then applies aliases to each of the items. First, if you have a case insensitive filesystem, analog converts the filename to lower case. Usually analog assumes that Unix filesystems are case sensitive and other systems are case insensitive. You might want to override its choice, if, for example, you have transferred files from one machine to another, so as to use the convention on the original machine. You can do this by the commands
CASE INSENSITIVE
CASE SENSITIVE

Next it applies built-in aliases to each item. For example, it knows that %7E in a filename or referrer is equivalent to ~ and translates it accordingly. It also strips off the directory suffix from any filenames which have it. This suffix is normally index.html, but you can specify another one instead with a command such as

DIRSUFFIX default.htm
(You can only have one DIRSUFFIX.) There are other built-in aliases for other items: for example, hostnames are converted to lower case at this point.
After this, it applies user-specified aliases to each item. These aliases are useful if, for example, you know that two filenames correspond to the same file, or if you want to translate local hostnames to their internet equivalents. You specify aliases by commands like
FILEALIAS /football.html /soccer.html
HOSTALIAS lion lion.statslab.cam.ac.uk
There is also the special command FILEALIAS none, which cancels any other file aliases which might have been specified.

The alias commands for the other items are called BROWALIAS, REFALIAS, USERALIAS and VHOSTALIAS. Only one alias is ever applied to any item. So after

FILEALIAS /football.html /soccer.html
FILEALIAS /soccer.html /brazil.html
the file /soccer.html would get translated to /brazil.html, but /football.html would only get translated to /soccer.html and would not see the second alias.

You can also use wildcards (? and *) in alias commands. The left hand side can contain at most one *. If the right hand side contains a * too, then the part of the name represented by the * on the left hand side will be substituted at the position of the * on the right hand side. So, for example,

FILEALIAS /football/* /soccer/
would translate football/rules.html to /soccer/, but
FILEALIAS /football/* /soccer/*
would translate football/rules.html to /soccer/rules.html.
There is another set of alias commands, called output aliases. There is one of these for each of the reports, except the time reports. Instead of acting on items when the logfile is being read, they act on individual lines in the output. So for example, the command
TYPEOUTPUTALIAS .txt ".txt (Plain text files)"
would provide an explanation of that line in the file type report.

There can be some confusion between some ALIAS and OUTPUTALIAS commands. For example, what is the difference between HOSTALIAS and HOSTOUTPUTALIAS? In fact, there are several differences, resulting from the different times at which the aliases are processed. The HOSTALIAS applies to the host items, but the HOSTOUTPUTALIAS only applies to the lines in the host report. This means that the HOSTALIAS also affects the other reports which use the hosts, such as the domain report, whereas the HOSTOUTPUTALIAS only affects the host report. Also the HOSTOUTPUTALIAS applies separately to each line of the host report. This means that if two separate hosts translate to the same thing in a HOSTALIAS command, they will become one host ever after. But if one were to use the same HOSTOUTPUTALIAS commands, there would be two hosts, which would just happen to have the same name in one report.

In summary, HOSTALIAS would normally be used if a single host had two different names, so might otherwise appear to be two hosts, whereas HOSTOUTPUTALIAS would normally be used to annotate or clarify the host report.

The full list of output aliases is REQOUTPUTALIAS, REDIROUTPUTALIAS, FAILOUTPUTALIAS, TYPEOUTPUTALIAS, DIROUTPUTALIAS, HOSTOUTPUTALIAS, DOMOUTPUTALIAS, REFOUTPUTALIAS, REFSITEOUTPUTALIAS, REDIRREFOUTPUTALIAS, FAILREFOUTPUTALIAS, BROWOUTPUTALIAS, FULLBROWOUTPUTALIAS, VHOSTOUTPUTALIAS, USEROUTPUTALIAS and FAILUSEROUTPUTALIAS.

There is one known bug with OUTPUTALIAS. The report is sorted before the OUTPUTALIAS is applied. This means that if the SORTBY for the report is set to ALPHABETICAL, then the report will not be sorted correctly.


Inclusions and exclusions

After aliasing each item, analog decides whether that item is wanted or not. The whole line is only counted if all the items are wanted. (If one type of item doesn't occur on a line, that item's counted as wanted on that line.) Whether an item is wanted or not is determined by INCLUDE and EXCLUDE commands specified by the user. These commands can be used to exclude requests from your local users, for example, or to analyse only files in a subdirectory.

The rule for determining whether an item is included or excluded is as follows. All the INCLUDE and EXCLUDE commands for that item are considered one by one in order, and the item is included or excluded according to the last command it matched. Items which don't match any of the INCLUDE or EXCLUDE commands are included if the first command was an exclusion, and excluded if the first command was an inclusion. For example, the configuration

FILEINCLUDE /~sret1/*
FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/*
FILEINCLUDE /~sret1/backgammon/*.gif
would instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand,
FILEEXCLUDE /~sret1/*/img/*
would analyse all files, except for images in my various directories. Note that inclusions and exclusions can contain any number of wildcards.

The relevant commands for the other types of item are HOSTINCLUDE and HOSTEXCLUDE; BROWINCLUDE and BROWEXCLUDE; REFINCLUDE and REFEXCLUDE; USERINCLUDE and USEREXCLUDE; and VHOSTINCLUDE and VHOSTEXCLUDE. If you get confused with all the inclusions and exclusions, remember that you can always run analog -settings to see what the options you have specified represent.


There is also one other pair of commands which belongs in this category, namely the FROM and TO commands. These specify a time period to restrict the analysis to. The simplest usage of these commands is FROM yyMMdd or FROM yyMMdd:hhmm, where yy represents the last two digits of the year (analog assumes that the year is between 1970 and 2069), MM represents the month, dd is the date, hh the hour, and mm the minute. So, for example, to analyse only requests from July 1999 to June 2000 I would use the configuration
FROM 990701
TO   000630
Alternatively, each of the components can be preceded by + or - to represent time relative to the time at which the program was invoked. In this case, the date can have more than 2 digits. This allows constructions like
FROM -01-00+01   # from tomorrow last year
TO -00-0131  # to the end of last month (OK even if last month
             # didn't have 31 days)
FROM -00-00-112
TO   -00-00-01  # statistics for the last 16 weeks
FROM -00-00-00:-06+01  # statistics for the last 6 hours
There are command line abbreviations +F and +T for the FROM and TO commands; for example, +T-00-00-01:1800 looks at statistics until 6pm yesterday. -F and -T turn off the from and to, as do FROM OFF and TO OFF.
There are also INCLUDE and EXCLUDE commands for most of the reports. These exclude individual lines from particular reports. So, for example, the command
REFREPEXCLUDE http://www.yahoo.com/*
would exclude Yahoo! referrers from the referrer report. However, it would not exclude them from the failed referrer report, the referring site report, etc. (you need to use FAILREFEXCLUDE, REFSITEEXCLUDE etc. for that); nor would it prevent other analysis of logfile lines with those referrers, as REFEXCLUDE would. Also REFREPEXCLUDE would include the referrers in the "not listed" line at the bottom of the report.

The full list of these commands is REQINCLUDE and REQEXCLUDE; REDIRINCLUDE and REDIREXCLUDE; FAILINCLUDE and FAILEXCLUDE; TYPEINCLUDE and TYPEEXCLUDE; DIRINCLUDE and DIREXCLUDE; HOSTREPINCLUDE and HOSTREPEXCLUDE; DOMINCLUDE and DOMEXCLUDE; REFREPINCLUDE and REFREPEXCLUDE; REFSITEINCLUDE and REFSITEEXCLUDE; REDIRREFINCLUDE and REDIRREFEXCLUDE; FAILREFINCLUDE and FAILREFEXCLUDE; BROWSUMINCLUDE and BROWSUMEXCLUDE; FULLBROWINCLUDE and FULLBROWEXCLUDE; VHOSTREPINCLUDE and VHOSTREPEXCLUDE; USERREPINCLUDE and USERREPEXCLUDE; and FAILUSERINCLUDE and FAILUSEREXCLUDE. The inclusion or exclusion applies to the unaliased name, if you are doing any output aliases.


You can also use the symbolic word pages in suitable INCLUDE and EXCLUDE commands; one very common command is
REQINCLUDE pages
to include only pages in the request report.

Analog determines which files should count as pages (and thus which requests count as page requests) using another INCLUDE/EXCLUDE pair, called PAGEINCLUDE and PAGEEXCLUDE. By default, *.html, *.htm and directories (*/) count as pages. But you change the list by commands like

PAGEINCLUDE *.ps,*.ps.gz
PAGEEXCLUDE sret1.html
(I.e., Postscript and gzipped Postscript are pages, but sret1.html isn't).
There are a couple more INCLUDE and EXCLUDE commands which I'll mention now while we're on the subject. In the Request Report and the three referrer reports (Referrer Report, Redirected Referrer Report and Failed Referrer Report), analog can link to the files which it's listing. There are commands LINKINCLUDE and LINKEXCLUDE for the Request Report, and REFLINKINCLUDE and REFLINKEXCLUDE for the referrer reports, to specify exactly which files are linked to. (By default you get all pages). So, for example, LINKINCLUDE *.txt would link to *.txt files as well as pages in the Request Report, and REFLINKEXCLUDE * would tell analog to make no links in the three referrer reports.

Finally, there are commands called ARGSINCLUDE and ARGSEXCLUDE, and REFARGSINCLUDE and REFARGSEXCLUDE. Sometimes a URL contains arguments after a question mark. For example, the URL

/cgi-bin/script.pl?x=1&y=2
runs the /cgi-bin/script.pl program with arguments x=1 and y=2. (Sometimes the server records the arguments in a separate field in the logfile, but if so you can use the %q field in the LOGFORMAT command, and analog will translate the filename to the above format).

Analog can either read or ignore the arguments. If the command ARGSEXCLUDE /cgi-bin/script.pl were given, analog would ignore the arguments to that file, and so treat the above URL as being the same as /cgi-bin/script.pl. On the other hand, if ARGSINCLUDE /cgi-bin/script.pl were specified, analog would read the arguments, and treat the above URL as a different file from /cgi-bin/script.pl (or from /cgi-bin/script.pl?y=2&x=1), although a grand total for /cgi-bin/script.pl would still be listed in the Request Report.

REFARGSINCLUDE and REFARGSEXCLUDE are the same for referrers. By default, all arguments are included. The check for whether the arguments should be included happens before the filename is aliased: this means that you can't use pages in this command, because we don't know whether a file is a page until after it's been aliased.


Configuring the output

So far we have mainly discussed commands which control how analog reads the logfiles. We now get on to commands for configuring the output.

There are 27 different reports which analog can produce, if your logfiles contain the necessary information. Each one has a short name, and a code letter or number, as follows:

x  GENERAL      General Summary
m  MONTHLY      Monthly Report
W  WEEKLY       Weekly Report
D  FULLDAILY    Daily Report
d  DAILY        Daily Summary
H  FULLHOURLY   Hourly Report
h  HOURLY       Hourly Summary
4  QUARTER      Quarter-Hour Report
5  FIVE         Five-Minute Report
S  HOST         Host Report
o  DOMAIN       Domain Report
r  REQUEST      Request Report
i  DIRECTORY    Directory Report
t  FILETYPE     File Type Report
z  SIZE         File Size Report
E  REDIR        Redirection Report
I  FAILURE      Failure Report
f  REFERRER     Referrer Report
s  REFSITE      Referring Site Report
k  REDIRREF     Redirected Referrer Report
K  FAILREF      Failed Referrer Report
B  FULLBROWSER  Browser Report
b  BROWSER      Browser Summary
v  VHOST        Virtual Host Report
u  USER         User Report
J  FAILUSER     Failed User Report
c  STATUS       Status Code Report
For details on what the various reports mean, see the section on What the results mean. But in brief, the General Summary gives summary statistics, such as the total number of requests of each type. The next eight reports are known as time reports; they show the pattern of requests over time. The Host Report and the Domain Report show where people visited from. The Request Report, Directory Report, File Type Report and Size Report show what files people got from your server. The Redirection Report shows files which were redirected to some other file, including "click-thru's." The Failure Report shows files which your server couldn't send out for some reason. The various Referrer Reports show where people followed links from to reach your files. (The Failed Referrer Report is good for spotting broken links.) The Browser Report and Browser Summary show which browsers people were using. If you are using virtual hosts, the Virtual Host Report shows how many requests there were to each virtual host. Similarly if you are using user authentication, the User Report and Failed User Report list the activity for each user. Finally, the Status Code Report shows how many requests returned each HTTP status code.
You can turn each report on or off with configuration commands like
FIVE OFF
REFSITE ON
or by using command line arguments like -5 and +s. You can also turn all reports except the General Summary on or off with the commands ALL ON and ALL OFF, or with the command line arguments +A and -A.

You can turn the "Go To" lines in the report off with the command

GOTOS OFF
or with the -X command line argument; again, GOTOS ON and +X turn them on again.

The figures in parentheses in the General Summary are for the last seven days: either the seven days before the TO time, or if no TO time is given, the seven days before the time of the program start. The figures for the last seven days are normally included if some, but not all, of the requests fall in those seven days; but you can turn them off by means of the command

LASTSEVEN OFF
Of course LASTSEVEN ON turns them on again.

You can change the order of the reports by means of the REPORTORDER command. You should list the code letters for all the reports in the order you want them, like this:

REPORTORDER xcmdDhH45WriSoEItzsfKkuJvbB

You can change which file the output goes to with a command like

OUTFILE stats.htm
or with a command line argument like +Ostats.htm. If you use the filename - or stdout, the output will go to standard output, which is normally the screen, but Unix users might like to redirect it to another file or even into a pipe. You can also use an absolute path name, like
OUTFILE /usr/bin/httpd/htdocs/stats.html  # Unix
OUTFILE Hard Disk:Server Apps:WebSTAR:Analog:Report.html" # Mac

Now we come to some very important commands. The first is the OUTPUT command, which changes the style of the output. There are three possible output styles, HTML, ASCII and COMPUTER. The first produces Web pages, the second plain text files (which you could mail to people, for example) and the third produces output suitable for reading by a computer (useful for reading into a spreadsheet, or post-processing with a graphics package, for example). There is a separate section about the Computer readable output later. As well as a command like
OUTPUT ASCII
you can also select ASCII style with the command line argument +a, and HTML with the command line argument -a. You can also specify OUTPUT NONE for no output, if you are producing a cache file.

Next, you can change the language of the output. There are two ways to do this. The usual way is to use the LANGUAGE command. For example, the command

LANGUAGE FRENCH
will give you the output in French. The possible languages at the moment are ENGLISH, US-ENGLISH, FRENCH, GERMAN, ITALIAN, PORTUGUESE, BR-PORTUGUESE, DANISH, SWEDISH, CZECH, SLOVAK, HUNGARIAN, ROMANIAN and SLOVENE.

The other way is to use the LANGFILE command. This is useful if you want to download a new language from the analog home page, or if you want to translate one yourself, or even if you want to change some words or phrases or the way the dates and times are formatted in the output. The LANGFILE command tells analog in which file to find the various words and phrases for a new language. For example, the command

LANGFILE lang/guarani.lng
would read from that file. (Note that you have to include the directory name if the file isn't in the directory or folder which you're running analog from. In particular, it's not assumed to be in the same directory as the other language files.)

Some languages also have domains files available. You can tell analog to use a different domains file instead of the English one using the DOMAINSFILE command.

If you want to translate another language, I would be delighted! You'd be wise to contact me first to make sure that no-one else is already translating the same language. The English language file contains some brief instructions for translating new languages.


There are a few more minor, although cosmetically important, commands affecting the output. First there's a command IMAGEDIR which tells analog where the various images used to make the report live. It could be a relative or an absolute URL: for example
IMAGEDIR img/   # within the same directory as the output
IMAGEDIR /img/  # off the root directory of your server

There are three commands which affect the top line of the output. First, the LOGO command allows you to replace the analog logo with another image (for example, your organisation's logo). You can say

LOGO picture.gif  # for this file
LOGO /images/picture2.gif  # a different file
LOGO none         # for no logo
The logo is assumed to be inside the IMAGEDIR unless it starts with a slash, or contains ://

hen there are commands HOSTNAME and HOSTURL which affect the name and link at the end of the title line. For example, I might specify

HOSTNAME "Stephen Turner"
HOSTURL  http://www.statslab.cam.ac.uk/~sret1/
to generate the title "Web Server Statistics for Stephen Turner". Again, you can use none as the HOSTURL to specify no link. Analog will normally translate characters in the hostname to HTML if necessary. So to include literal HTML, such as accented characters, in the output you need to precede them by a backslash, like this:
HOSTNAME "M\&uuml;ller & S\&ouml;hne"

There are commands called HEADERFILE and FOOTERFILE. These let you specify files to be inserted near the top and bottom of your output. You can specify

HEADERFILE none
to cancel a previously-specified header file.

There are three related commands called SEPCHAR, REPSEPCHAR and DECPOINT. These specify single characters to be used as the thousands separator in numbers, the thousands separator within the columns in the reports, and the decimal point. For example, a French user might choose

SEPCHAR " "
REPSEPCHAR none
DECPOINT ,
to make "three thousand and a quarter" look like "3 000,25" in text and "3000,25" in the reports.

There is a command called RAWBYTES. Specify RAWBYTES ON if you want the exact number of bytes to be listed in reports, or RAWBYTES OFF if you want the number of kilobytes or Megabytes as appropriate to be listed instead.

Finally there is a command called PAGEWIDTH which specifies the width of the page. The output is not guaranteed to fit in this width, but analog will take notice of it when choosing the width of the time graphs, and when sorting the host report alphabetically; and if the output format is ASCII, when drawing horizontal rules and printing some bits of text. I recommend about PAGEWIDTH 65 for HTML output, and PAGEWIDTH 75 for ASCII output.


There are now some sections about configuring the output of particular reports, under the following headings: Time reports, Other reports and Hierarchical reports.

Time reports

This section is about commands which control the appearance of the time reports. There are eight such reports, which show the pattern of usage over time. Six of them show the usage at specific times, whilst the Hourly Summary and the Daily Summary show the total (not average) activity at particular times of day and week over the whole time period of the report.

Each time report can contain columns listing the requests, requests for pages, and bytes transferred at that time, using the following code letters.

R
Number of requests
r
Percentage of the requests
P
Number of page requests
p
Percentage of the page requests
B
Number of bytes transferred
b
Percentage of the bytes
Which columns appear in which reports is controlled by various COLS commands. For example, the command
HOURCOLS Pb
tells analog to include the number of page requests and percentage of the bytes, in that order, as the columns for the Hourly Summary. The other COLS commands are MONTHCOLS, WEEKCOLS, DAYCOLS (Daily Summary), FULLDAYCOLS (Daily Report), FULLHOURCOLS (Hourly Report), QUARTERCOLS and FIVECOLS. There is also a TIMECOLS command, which specifies that all the time reports are to have the specified columns.
Similarly, analog can plot the bar charts in the time reports according to the number of requests, number of page requests, or number of bytes. This is controlled by the GRAPH family of commands. So, for example,
FULLDAYGRAPH P
tells analog to plot the bar charts in the Daily Report by the number of page requests. This also controls how analog decides which is the busiest time period in the bottom line of the report. Using a lower case letter tells analog to plot the bar charts with ASCII characters instead of the normal red bars. (This produces shorter output, and it is how they appear anyway in ASCII output style, or when viewed with a non-graphical browser.) So, for example,
FULLDAYGRAPH b
would plot the Daily Report by bytes, without using the graphics. The other GRAPH commands are MONTHGRAPH, WEEKGRAPH, DAYGRAPH, HOURGRAPH, FULLHOURGRAPH, QUARTERGRAPH and FIVEGRAPH. There's also an ALLGRAPH command to set all of them simultaneously.
You can plot the graphs either forwards in time (starting from the earliest date) or backwards (starting from the latest date). Use commands like
MONTHBACK ON  # Monthly Report backwards
WEEKBACK OFF  # Weekly Report forwards
The other BACK commands are FULLDAYBACK, FULLHOURBACK, QUARTERBACK and FIVEBACK. It tends to be confusing to mix directions (and analog will warn you if you attempt it) so usually you want to use the ALLBACK command which will set all of them at once.
For the more detailed time reports, you usually only want to list the last few time periods. (Every five minutes for the last three years?? I think not.) So analog provides some ROWS commands to let you specify how many rows you want in the time reports. For example
QUARTERROWS 96  # only the last day's worth
MONTHROWS 0 # 0 means no restriction: show all time
The other ROWS commands are WEEKROWS, FULLDAYROWS, FULLHOURROWS and FIVEROWS. Even if a ROWS command is given, the line at the bottom of the report will still show the busiest time period ever, not just the busiest one in that many rows.
The character which is used for plotting the graphs in ASCII style or on a non-graphical browser is specified by means of the MARKCHAR command. For example,
MARKCHAR =
tells analog to use the equals sign.

There is a parameter called MINGRAPHWIDTH which sets the minimum nominal size of the graphs. For example, if you set

MINGRAPHWIDTH 10
then the graph will be allowed to be up to 10 characters wide, even if that would exceed the PAGEWIDTH.

There is one more command which affects the time reports. You can specify which day should be counted as the first day of the week. This affects the layout of the Daily Report, Daily Summary and Weekly Report. For example, our local student newspaper publishes a new edition on the web every Friday, so they like to specify WEEKBEGINSON FRIDAY for their reports.

In the next section, we'll look at commands relating to the non-time reports.


Other reports

This section deals with the non-time reports. There are quite a lot of commands which control these reports, although we've seen some of them already.

First, these reports have COLS commands, just like the time reports. (See the section on Time reports for how to use these commands.) In the non-time reports, one additional column is possible, namely D for date of last access. So, for example,

REQCOLS RD
lists the number of requests for each file in the Request Report, and the time when that file was last requested. The full list of COLS commands for non-time reports is HOSTCOLS, DOMCOLS, REQCOLS, DIRCOLS, TYPECOLS, SIZECOLS, REDIRCOLS, FAILCOLS, REFCOLS, REFSITECOLS, REDIRREFCOLS, FAILREFCOLS, FULLBROWCOLS (Browser Report), BROWCOLS (Browser Summary), VHOSTCOLS, USERCOLS, FAILUSERCOLS and STATUSCOLS. Not every column is allowed in every report, but if you specify an illegal one, analog will warn you about it.
Next you need to know how use a SORTBY command to specify how the reports should be sorted. There are six possible ways of sorting reports: REQUESTS, PAGES (i.e., page requests), BYTES, DATE, ALPHABETICAL and RANDOM (no sorting, sometimes useful for speed in very long reports). For example, the command
HOSTSORTBY ALPHABETICAL
will sort the Host Report alphabetically. The other SORTBY commands are DOMSORTBY, REQSORTBY, DIRSORTBY, TYPESORTBY, REDIRSORTBY, FAILSORTBY, REFSORTBY, REFSITESORTBY, REDIRREFSORTBY, FAILREFSORTBY, FULLBROWSORTBY, BROWSORTBY, VHOSTSORTBY, USERSORTBY, FAILUSERSORTBY and STATUSSORTBY. Again, not every sort method is possible in every report, but you'll be warned if you choose an illegal one.

There is one known bug concerned with SORTBY ALPHABETICAL. The report is sorted before any OUTPUTALIAS is applied. This means that if an OUTPUTALIAS has been specified for the report, then the report will not be sorted correctly.


You can also specify a FLOOR for most reports, saying how much activity an item needs before it is listed on the report. There are lots of possible ways of specifying floors, which I'll list here, using the DOMFLOOR (Domain Report FLOOR) command as an example. Essentially each one consists of a number indicating the level of the floor, followed by a letter indicating the floor criterion.
DOMFLOOR 1000r       # all domains with at least 1000 requests
DOMFLOOR 1000p       # at least 1000 requests for pages
DOMFLOOR 1000000b    # at least 1,000,000 bytes transferred
DOMFLOOR 1Mb         # at least 1 megabyte
DOMFLOOR 0.5%r       # 0.5% of the requests (ditto %p and %b)
DOMFLOOR 0.5:r       # 0.5% of the maximum number of requests
                     # for any domain (ditto :p and :b)
DOMFLOOR 970701d     # last access since 1st July 1997
DOMFLOOR -00-01-00d  # last access in last month (see
                     # doucumentation on FROM and TO commands)
DOMFLOOR -100r       # domains with top 100 number of requests
                     # (ditto -100p, -100b, -100d)
The other FLOOR commands are HOSTFLOOR, REQFLOOR, DIRFLOOR, TYPEFLOOR, REDIRFLOOR, FAILFLOOR, REFFLOOR, REFSITEFLOOR, REDIRREFFLOOR, FAILREFFLOOR, FULLBROWFLOOR, BROWFLOOR, VHOSTFLOOR, USERFLOOR, FAILUSERFLOOR, STATUSFLOOR. Once again, not every floor method is legal for every report, but you'll be warned if you try and choose an illegal one.
I've already told you about how to turn each report on and off from the command line using its code letter. In fact, you can specify the SORTBY and the FLOOR in the same command. Take the example of the Referrer Report. If you follow the +f (to turn the report on) with a letter, it represents the sort method according to the following code:
r
REQUESTS
p
PAGES
b
BYTES
d
DATE
a
ALPHABETICAL
x
RANDOM
You can then, or alternatively, use one of the above FLOOR formats to specify the floor. If you specify a SORTBY, you can also leave off the last letter of the floor, and analog will guess it according to the sort method: the floor will be by pages or bytes if that is the sort method, and otherwise by requests. Here are four examples:
+fp
means turn the referrer report on and sort it by page requests, but says nothing about the floor;
+f100r
means list all referrers with at least 100 requests, but says nothing about the sort method;
+fb10000
means list all referrers with at least 10,000 bytes, sorted by bytes;
+fa-000101d
means list all referrers with accesses this year, sorted alphabetically.

We've already seen some other commands affecting what was listed in the non-time reports. The output INCLUDE and EXCLUDE commands specified lines to omit from each report, and the OUTPUTALIAS commands specified some aliasing to do on the names before they were listed. There were also LINKINCLUDE and LINKEXCLUDE, and REFLINKINCLUDE and REFLINKEXCLUDE commands to control what was linked to in the Request Report and the three referrer reports. You might want to have another look at these paragraphs.

There's one other command which affects the links in the Request Report. The command BASEURL prepends an additional string to the URLs in the target of the link. For example, after the command

BASEURL http://www.statslab.cam.ac.uk
/~sret1/ will be linked to http://www.statslab.cam.ac.uk/~sret1/, not just to /~sret1/. This is very useful if you want to display the statistics on a different server from the server they refer to.

In the next section, we'll look at commands for generating hierarchical reports, which are closely related to the commands in this section.


Hierarchical reports

Some of the non-time reports have a hierarchical (or tree) structure: so, for example, each domain in the domain report can have subdomains listed under it, which in turn can have sub-subdomains, and so on. This section describes commands for managing hierarchical reports.

First, you need to be able to control what gets listed in the reports. For this you need to use the SUB family of commands. So, for example, the command SUBDIR /~sret1/* would ensure that the Directory Report would not only contain an entry for the sum of my files, but also one for each of my subdirectories, something like this:

29,111: /~sret1/
10,234:   /~sret1/analog/
 5,179:   /~sret1/backgammon/
11,908: /~steve/

If you specify a SUB command, all the intermediate levels are included automatically. So, for example, after

SUBDOMAIN statslab.cam.ac.uk
cam.ac.uk and ac.uk will be included in the Domain Report too, and after *.*.ac.uk, *.ac.uk will be included.

Here are examples of the other three SUB commands:

SUBTYPE *.gz      # in the File Type Report
SUBBROW Mozilla/*  # in the Browser Summary
REFDIR http://search.yahoo.com/*   # Referring Site Report

The SUBDOMAIN report (but none of the others) can included a second argument describing the subdomain. For example

SUBDOMAIN cam.ac.uk 'University of Cambridge'
Then that subdomain will be listed with its translation in the Domain Report. You can also have numerical subdomains: e.g.,
SUBDOMAIN 131.111 'University of Cambridge'
If you sort the subdomains alphabetically, the numerical ones will also be sorted alphabetically, not numerically. I don't think this will cause any problems.

One other use for the SUBDIR command is if you have used the second argument to the LOGFILE command. Suppose you have translated files like /index.html into http://www.mycompany.com/index.html. Then the command

SUBDIR http://www.mycompany.com/*
would be appropriate to make the directory report look right.
The lower levels of each report have FLOOR and SORTBY commands which work exactly the same as those we have already seen for the top level. These commands are SUBDIRFLOOR, SUBDOMFLOOR, SUBTYPEFLOOR, SUBBROWFLOOR and REFDIRFLOOR; and SUBDIRSORTBY, SUBDOMSORTBY, SUBTYPESORTBY, SUBBROWSORTBY and REFDIRSORTBY.

An sub-item is listed in a hierarchical report only if it is above the sub-FLOOR, and it is included with a SUB command, and its immediate parent is listed. For example, specifying

SUBDIR /*/*/
SUBDIRFLOOR -3r
SUBDIRSORTBY REQUESTS
would list the three subdirectories with most requests under each directory. SUBDIRFLOOR 1:r would have listed any subdirectory with at least 1% of the maximum number of requests of any top level directory.

The report INCUDE and EXCLUDE commands for a hierarchical report only apply to the top level of the report: you can use the SUB commands for the lower levels.

The three file reports (Request Report, Redirection Report and Failure Report) and the three referrer reports (Referrer Report, Redirected Referrer Report and Failed Referrer Report) are not fully hierarchical, but they do list search arguments together under the file to which they refer (provided that the arguments have been read in: see the ARGSINCLUDE command). So they have similar sub-FLOOR and sub-SORTBY commands, namely REQARGSFLOOR, REDIRARGSFLOOR, FAILARGSFLOOR, REFARGSFLOOR, REDIRREFARGSFLOOR and FAILREFARGSFLOOR; and REQARGSSORTBY, REDIRARGSSORTBY, FAILARGSSORTBY, REFARGSSORTBY, REDIRREFARGSSORTBY and FAILREFARGSSORTBY.

That concludes the description of all the output configuration commands. Now we move on to some other individual topics, starting with the domains file.


The domains file

The domains file tells analog which country is represented by each domain. You can tell analog where to find your domains file with a command like
DOMAINSFILE domains.tab
This is useful if you want to use a domains file in a different language, for example. If you haven't got a domains file, you can download one from http://www.statslab.cam.ac.uk/~sret1/analog/domains.tab. It should contain each domain code followed by its location on a new line, thus:
ad   Andorra
ae   United Arab Emirates
[...]
It does not need to be in alphabetical order, though humans may prefer it that way.

Only domains which occur in the domains file will get their own line in the Domain Report: the rest are probably spurious, and will be accumulated together as "unknown domains". If you have debugging turned on, you can see which domains were unknown.


Computer-readable output style

This section describes the computer-readable output style. You can select this style by means of the command
OUTPUT COMPUTER
This style is designed to be easy to read into spreadsheets, or post-process with graphics creation tools, for example.

Each line in the output is separated into fields by means of a special string. You can specify this string by means of the COMPSEP command; for example

COMPSEP :::
if for some reason you wanted three colons between each column. Make sure not to use anything that might occur in the output: for example, a single or double space would not be suitable.

Each line in the preformatted output begins with a letter indicating which report the line is part of. (The code letters for the reports are listed in the section on Configuring the Output.) After that, there follows a field indicating the remaining columns in the report (using the letters RrPpBbD as usual). Then there are the numerical data and then the name of the item. Times actually take up several fields: year, month, date, hour & minute, or as many of those as are necessary to identify the time.

The general summary is a bit different. After an initial x, there is a two-character code saying what the line contains. The possible codes are

HN
HOSTNAME
HU
HOSTURL
PS
Program start time
FR
Time of first request
LR
Time of last request
L7
Time last 7 days starts after
SR
Total successful requests
S7
Total successful requests in last 7 days
PR
Total successful requests for pages
P7
Total successful requests for pages in last 7 days
FL
Total failed requests
F7
Total failed requests in last 7 days
RR
Total redirected requests
R7
Total redirected requests in last 7 days
NC
Logfile lines without status code
C7
Lines without status code in last 7 days
NF
Number of distinct files requested
N7
Number of distinct files requested in last 7 days
NH
Number of distinct hosts served
H7
Number of distinct hosts served in last 7 days
CL
Number of corrupt lines in the logfile
UL
Number of unwanted lines in the logfile
BT
Total number of bytes transferred
B7
Total number of bytes transferred in last 7 days

If you do anything interesting with this output style, I should be delighted to hear about it. Anyone want to write a program to turn it into those pretty charts that executives seem to love?


Cache files

Analog has the ability to archive some of the data in your logfile into a cache file so that the logfile can be thrown away without losing the most important data.

For most people, the cache file will not be needed: compressing the logfile using a standard compression utility such as gzip will be sufficient. Compressing a logfile is very efficient owing to the large number of repeated strings: I find about 12 times compression in practice. That in itself may solve your filespace problems, without needing to throw away any information.

If you are going to use the cache file feature, it is very important that you understand what is and what is not recorded. It is not possible to reconstruct everything of interest in the logfile from the cache file. The cache file does contain information about the total number of requests for each host and each file, but not about, for example, which files were read by which hosts. (To do so would take up as much disk space as the compressed logfile.) So you cannot later look at only one file and see which hosts read that file. Similarly, you cannot later restrict the files or hosts by date, using FROM and TO commands.

In summary, you should do all the inclusions and exclusions you want when you create the logfile. If you want different sets of inclusions and exclusions, you should create several cache files from the same logfile. You cannot later apply extra inclusions and exclusions accurately.

One other minor point: the pattern of failed requests and redirected requests over time is not recorded in the cache file. So although the total number will still be correct, the number in the last 7 days can be under-reported subsequently.


You can create a cache file by setting the CACHEOUTFILE to be the file you want the cache to live in. Set
CACHEOUTFILE none
to turn it off again. You will still get the regular output as well as the cache output, unless you request OUTPUT NONE.

You can read in a previously-made cache file with the CACHEFILE command, or with the +U command line option. As with the LOGFILE command, you can use commas and wild cards to read in several cache files, and read compressed cache files using the UNCOMPRESS mechanism. Note that if you don't want to read a logfile as well as the cache file, you will have to explicitly set the LOGFILE to none.

When analog reads in a cache file, it will respect inclusions and exclusions as far as it can, but it does not apply any more aliases to the items. (This is to avoid double-aliasing.) So you must do any aliases you want at the time you create the cache file. Similarly, it does not obey the LOGOFFSET variable, to avoid double-offsetting, so any offset you want must be applied at cache-creation time too.

Sometimes you don't want to record all the types of item in the cache file. You might want to forget about which hosts had accessed your web site, for example, and only remember how many times each file was requested. You can choose not to include one type of item in the cache file by setting its LOWMEM to 3; for example, specify

HOSTLOWMEM 3
to exclude hosts from the cache file. Because this is a serious step, analog will produce a warning if you do this. You can even set all six LOWMEMs to 3 if you just want to remember the pattern of requests over time, not even which files were requested.

It is legal to have the CACHEOUTFILE the same as the CACHEFILE to overwrite the old cache file with an updated one, but it is not recommended. It is best to make a separate cache file for each logfile. Failing that, it is better to write the new cache to a different file, and only delete the old cache when you have verified that the new cache was created correctly.


To use this feature and avoid losing entries or double counting them, I suggest you follow the following procedure.
  1. Archive the old logfile, and restart the server with a fresh logfile. (See your server documentation for how to do this.)
  2. Make both a cache file and an ordinary report from the old logfile.
  3. Make a report from the cache file and compare it against the report from the logfile to check it works.
Now you can throw away the old logfile, if you've really understood what data you're losing by doing so. (But please remember that I can take no responsibility if something goes wrong. (See the licence.)

I prefer to make a separate cache file from each logfile, in case something goes wrong with one of them, rather than a single cache file combining several logfiles, or a single cache file combining an old cache file and a logfile.


DNS lookups

Sometimes a logfile contains numerical IP addresses - like 131.111.20.59 - for the computers that have visited you, instead of names like lion.statslab.cam.ac.uk. This section describes how you can get analog to do so-called DNS lookups to translate these numbers into names. This relies on you having a suitably configured system: DNS lookups are not possible on some systems.

Unfortunately DNS lookups are typically very slow, because your computer has to ask across the network to find out the names of the hosts. For this reason, analog saves the addresses it has looked up in a file, so that you don't have to look them up again next time. (Even so, you may find the DNS lookups too slow to be usable.) The file is specified by a command like

DNSFILE dnsfile.txt
(The first time you use this command, you will get a missing-file warning, but it will exist the next time.)

There are four possible levels of DNS activity. If you specify DNS NONE, no numerical addresses will be resolved. If you specify DNS READ, then analog will read the DNS file for old lookups, but no new lookups will take place. This mode is suitable if you are running analog while not connected to the internet. The third level is DNS WRITE. This reads the old file, looks up new addresses, and adds them to the file. The fourth level is DNS LOOKUP. This reads the old file and looks up new addresses, but doesn't add the new addresses to the file, so that they will not be remembered for next time. The reason for this is that if two copies of analog were running at once, both with DNS WRITE, then it is possible that the DNS file could become corrupted (although the chance is quite small).

Jason Linhart has written an application for the Mac called DNSTran, which creates DNS files for analog to read. Because it uses Mac-specific code, it's faster than getting analog to create the file, and I recommend it.

Analog never deletes anything from the DNS file: this means that the DNS file will grow, and can become quite large. You should delete the top of it every so often.

There are two parameters which say how long to trust old lookups for. If you set

DNSGOODHOURS 672
for example, then successful lookups will be checked again after 672 hours (4 weeks). You can also set the DNSBADHOURS similarly, to check failed lookups again after a certain time.

Finally, there is a debugging command, DEBUG +D to show all the DNS lookups that analog is making.


Coping with low memory

This section describes how to run analog with lower amounts of memory. For a normal logfile this will make analog run a bit slower. But if your computer is running out of memory when running analog, it will go very slowly indeed: so for large logfiles, this can make analog run much faster, or even make an analysis possible that wouldn't otherwise be possible.

Recall what happens to an item when it has been read in. First it is aliased. Secondly, it is checked to see whether it is included or excluded. Then finally, if all the items are wanted, one request is added to its score.

Normally the name of the item is saved before the aliasing takes place. This avoids analog having to do the aliasing again next time the same item is encountered. But this can take up more memory than necessary. So there is a family of LOWMEM commands provided, which tell analog to record the name at a later stage, or even not at all. If you use these commands, analog will have to do a bit more work than normal, but it will use less memory. On most sites, the hosts take up most of the memory, so I'll use the HOSTLOWMEM command as an example.

The command

HOSTLOWMEM 0
represents the normal case, when the hostname is recorded before being aliased. If you specify
HOSTLOWMEM 1
instead, then the hostname is not recorded until after the aliasing. If you specify
HOSTLOWMEM 2
then the name is not recorded until after the inclusion and exclusion lookup has been done as well. And finally, if you give the command
HOSTLOWMEM 3
then the hostname is not saved at all, and the Host Report will not be constructed, even if you've asked for it. (The Domain Report can still be constructed though.) The analogous commands for the other items are FILELOWMEM, BROWLOWMEM, REFLOWMEM, USERLOWMEM and VHOSTLOWMEM.
So what should you do if analog runs out of memory? First, look in your logfile to see which items are taking up all the memory. If you have lots of different filenames, ones you generate on the fly for example, you would want to use the FILELOWMEM commands. Maybe you could combine all the similar filenames into one with a FILEALIAS command, and use FILELOWMEM 1. (If you have lots of different filenames caused by different search arguments, then using ARGSEXCLUDE might solve your problem without any need to use LOWMEM at all). But for most users, it is the hostnames which cause the problem. If you only want to analyse requests from certain hosts, then you could use HOSTLOWMEM 2 to exclude the others before recording those that are left. If you don't want to exclude any hosts, and you haven't got enough memory to record all the different hostnames, then HOSTLOWMEM 3 would be appropriate.

Debugging

This section lists commands to help you debug analog, if you think it's going wrong. There's another section later which lists all the errors and warnings which analog can generate, and what they all mean, and another section which tells you how to report bugs.

First, remember the option we mentioned before, to list the current settings of all of analog's variables. To get this, just put -settings on the command line, or PRINTVARS ON in one of your configuration files, along with your other commands. Then analog will produce the list of settings instead of running in the normal way.


There are commands which control how much debugging information and warning information analog gives out while it is running. By default you get all the warnings and no debugging, but you can change this by means of the commands DEBUG and WARNINGS. If you say
DEBUG ON
you get all the debugging. (And DEBUG OFF turns it off again.) You can also get just certain categories of debugging. The categories are
C
list all corrupt logfile lines
D
information about DNS lookups
F
information about file opening and closing
S
summary information about each logfile when it's closed
U
list unknown domains
So, for example, the command
DEBUG FS
would give you information about file opening and closing, and what was in each logfile, but none of the other sorts of debugging. Each line of debugging information is prepended with its code letter. You can also specify
DEBUG +CD
to add C- and D-category debugging, and
DEBUG -CD
to remove them.

The WARNINGS command acts similarly. As well as WARNINGS ON and WARNINGS OFF, there are warnings in the following categories.

C
invalid configuration specified
D
dubious configuration specified
F
files missing
L
apparent problems in logfiles
M
possibly problems in logfiles
R
turning off empty reports
See the section on errors and warnings for more details about these various categories. Again, warnings are printed with their code letters.

You can also use command line abbreviations for these commands. The DEBUG command is represented by +V (for ON), -V (for OFF), +VFS (to select options FS), +V+FS (to add those options), and +V-FS (to remove them). Similarly the WARNINGS command can be given by +q, -q, +q<options>, +q+<options> or +q-<options>.


There is one more command which is useful when trying to debug analog. If you give the command
PROGRESSFREQ 20000  # say
then analog will produce a little message after every 20,000 lines it reads from the logfile. This is useful to determine whether the program has really stopped or (as is more likely) is just being slow for some reason (such as using DNS lookups).

There is just one more section about analog's configuration commands and command line arguments, but it's a rather long one, on the form interface. (This is a way of running analog by selecting options from a web page.) You might prefer to go straight onto the section on What the results mean.


Form interface

The form interface provides an HTML front end to analog. That means that users can select options from a web page, instead of having to create a configuration file.

The form interface is suitable for ordinary users to use, but it needs to be set up by a system administrator or other expert. In order to set it up, you need to know what CGI programs are, where they live on your server, and how to set up their permissions properly. It would also be hepful if you can write HTML forms. I shall assume this level of background knowledge for the rest of this section.

Warning: CGI programs can contain security loopholes which allow an unscrupulous user to harm your system. (If you don't know about this, you shouldn't be running CGI programs at all.) I have tried to make this form interface safe, but I cannot guarantee it, and take no responsibility if anything goes wrong. You use it at your own risk. (See the licence.)

The form interface consists of two parts: a form to choose the options, and a cgi program to interpret them and pass them to the analog program. You don't in fact need the form at all: if you want to create a link to the cgi program, with the arguments passed in the URL in the usual way, then that's fine.

To compile the cgi program, you first need to edit the top of anlgform.c to indicate where analog lives on your system. Then type make form, which should compile this source into a program called anlgform.cgi. (On Windows 95 & NT, the cgi program is compiled already: it assumes that analog is at \Program Files\analog\analog.exe, so you must move analog there if necessary.) Then put the cgi program wherever your server can find it. Make sure that analog is executable by the server, and that the logfile and domains file are readable. You will probably need to use the full path name for these files.

The form anlgform.html which is distributed with the program should only be regarded as an example form. Almost every configuration command has a counterpart in the CGI program, and so you can add to the form options to do almost anything you want. (The main exceptions are aliases, which are too complicated, and HEADERFILE and FOOTERFILE, which would allow people to view any file on your system.) I shall give the full list in a minute.

Before you use the form, you must edit the action at the top to indicate where anlgform.cgi lives on your server. I have also included two other important options at the top, commented out. First, it is often useful to set the logfile to be analysed (or allow the user to choose it), with a field with name="lo". Secondly, some servers need a timezone to be set in a field with name="TZ", or all the times will be wrong. If you are on Unix, you can put any of the standard timezones in this field: the correct one may well be in your own TZ environment variable.

You can specify other configuration files to be included. When analog is called by the CGI program, it first processes the default configuration file as usual. Then it processes any configuration file specified by an argument with name cg. Then it processes all the other arguments which the CGI program specifies. After that, it processes any configuration file specified by an argument with name cm. Finally, it processes the mandatory configuration file as usual.

If the option qv=1 is sent to the CGI program, then analog is not run, but a list of the configuration commands which would have been sent to analog is printed instead. This is useful for checking that the CGI program is working properly. It can also be used for using the form to produce a configuration file.

Troubleshooting

There are lots of reasons why the form interface may not work, and I can't diagnose them very easily. If it doesn't work, first check the following points:
  1. Look in the server's error log for clues.
  2. Are all the file permissions set correctly? Do other CGI programs work on your server?
  3. Include qv=1 in the arguments as explained above. If this works, then at least the CGI program is working.
  4. If you get a long wait, then no data returned, the server is probably timing out the request before analog has finished. The remedy is to increase the timeout interval.
  5. Try running the cgi program from the shell. Set the environment variable QUERY_STRING to equal "xq=1", or "xq=1&qv=1", and run anlgform.cgi directly.
  6. If everything works but the images don't appear in the output, be careful about the IMAGEDIR. It probably shouldn't be inside your /cgi-bin/ directory, or your server will try and execute the images, not send them out.

Here is the complete list of options which can be added to the form and will be interpreted by the CGI program. Each has a two letter name. Values are the same as for the corresponding configuration command except where stated.

Time reports

The first letter is the standard code letter for the report, except that the Quarter-Hour Report is q and the Five-Minute Report is p. The second letter is as follows. If the first letter is lower case, read the column marked lc; if it is upper case, read uc. So, for example, FIVECOLS is pc, but WEEKCOLS is Wd.
        lc   uc    value
ON/OFF   q    p    1 for on, 0 for off
GRAPH    g    h
ROWS     r    s
COLS     c    d

Other reports

Again, the first letter is the code letter for the report. The second letter is
                lc   uc    value
ON/OFF           q    p    1 for on, 0 for off
FLOOR            f    g    Excluding floor method
Floor method     h    i    r, p, b or d
SORTBY           s    t    0 for requests, 1 for pages, 2 for bytes,
                           3 for date, 4 for alphabetical, 5 for random
SUB              j         (Where applicable)
SUBFLOOR         w    x    As above
Subfloor method  y    z    As above
SUBSORTBY        u    v    As above
COLS             c    d
Report INCLUDE   l    m
Report EXCLUDE   n    o

Items

First letter as follows:
Browser       b
Referrer      f
File          r
Host          s
User          u
Virtual host  v
Second letter:
LOWMEM    k
INCLUDE   x
EXCLUDE   z

Miscellaneous

Command        Code    Value/Notes

ALLBACK        ab      1 for on, 0 for off
BASEURL        ba
CASE           ca      1 for sensitive, 0 for insensitive
CONFIGFILE     cg/cm   See above
COMPSEP        cp
DNSGOODHOURS   da
DNSBADHOURS    db
DECPOINT       de
DOMFILE        df
DIRSUFFIX      di
DNSFILE        dn      Also sets DNS READ; o/wise DNS is NONE
FROM           fr
MINGRAPHWIDTH  gw
HOSTNAME       hn
HOSTURL        hu
IMAGEDIR       ie
LANGUAGE       la      Name of language: LANGFILE overrides
CACHEFILE      lc
LANGFILE       lf      Overrides LANGUAGE
LOGO           lg
LOGFORMAT      lm      Format for all logfiles
LOGFILE        lo
LASTSEVEN      ls
LOGTIMEOFFSET  lt      For all logfiles
REFLINKINCLUDE lw
LINKINCLUDE    lx
REFLINKEXCLUDE ly
LINKEXCLUDE    lz
MARKCHAR       ma
OUTPUT         ot      0 for HTML, 1 for ASCII, 2 for COMPUTER
PAGEWIDTH      pw
PAGEINCLUDE    px
PAGEEXCLUDE    pz
RAWBYTES       rb
REPORTORDER    re
SEPCHAR        sa
REPSEPCHAR     sb
TIMEOFFSET     tm
TO             to
WARNINGS       wa
WEEKBEGINSON   wb      0 for Sunday, 1 for Monday, ..., 6 for Saturday
GOTOS          xp
GENERAL        xq
REFARGSINCLUDE yw
ARGSINCLUDE    yx
REFARGSEXCLUDE yy
ARGSEXCLUDE    yz

What the results mean

This section of the Readme is about understanding the results analog produces. It's divided into two subsections.

How the web works

This page is about what happens when somebody connects to your web site, and what statistics you can and can't calculate. There is a lot of confusion about this. It's not helped by statistics programs which claim to calculate things which cannot really be calculated, only estimated, with varying degrees of accuracy. The simple fact is that certain data which we are used to knowing for traditional print and even broadcast media are simply not available on the web.

I should say that these ideas are not new to me. In particular, I can recommend four excellent articles about this subject: Interpreting WWW Statistics by Doug Linder; Making Sense of Web Usage Statistics by Dana Noonan; Getting Real about Usage Statistics by Tim Stehle; and, the most negative of all, Why Web Usage Statistics are (Worse Than) Meaningless by Jeff Goldberg.


1. The basic model. Let's suppose I visit your web site. I follow a link from somewhere else to your front page, read some pages, and then follow one of your links out of your site.

So, what do you know about it? First, I make one request for your front page. You know the date and time of the request and which page I asked for (of course), and the internet address of my computer (my host). I also usually tell you which page referred me to your site, and the make and model of my browser. I do not tell you my user name or my e-mail address.

Next, I look at the page (or rather my browser does) to see if it's got any graphics on it. If so, and if I've got image loading turned on in my browser, I make a separate connection to retrieve each of these graphics. I never log into your site: I just make a sequence of requests, one for each new file I want to download. The referring page for each of these graphics is your front page. Maybe there are 10 graphics on your front page. Then so far I've made 11 requests to your server.

After that, I go and visit some of your other pages, making a new request for each page and graphic that I want. Finally, I follow a link out of your site. You never know about that at all. I just connect to the next site without telling you.


2. Caches. It's not always quite as simple as that. One major problem is cacheing. There are two major types of cacheing. First, my browser automatically caches files when I download them. This means that if I visit them again, the next day say, I don't need to download the whole page again. Depending on the settings on my browser, I might check with you that the page hasn't changed: in that case, you do know about it, and analog will count it as a new request for the page. But I might set my browser not to check with you: then I will read the page again without you ever knowing about it.

The other sort of cache is on a larger scale. I'm in the UK. Because the link across the Atlantic is sometimes very congested, we've set up a national cache. (Many individual ISP's also do the same thing.) I can set my browser to get your pages from the national cache instead of directly from you. If anyone else in the country has used the cache to look at your pages recently, the cache will have saved them, and will give them out to me without ever telling you about it. So hundreds of people could read your pages, even though you'd only sent it out once. Also, if the page I wanted wasn't already stored in the cache, the cache would ask for it from you on my behalf. This would mean that the request appeared to come from the cache, rather than from me. If several people did this, you would think that only one host was accessing the cache, rather than lots of different ones.


3. What you can know. The only things you can know for certain are the number of requests made to your server, when they were made, which files were asked for, and which host asked you for them.

You can also know what people told you their browsers were, and what the referring pages were. You should be aware, though, that many browsers lie deliberately about what sort of browser they are, or even let users configure the browser name. Also, some browsers send incorrect referrers, telling you the last page that the user was on even if they weren't referred by that page.


4. What you can't know.
  1. You can't tell the identity of your readers. Unless you explicitly require users to provide a password, you don't know who's connected or what their e-mail addresses are.
  2. You can't tell how many visitors you've had. You can guess by looking at the number of distinct hosts that have requested things from you. But this is not always a good estimate for three reasons. First, if users get your pages from a local cache server, you will never know about it. Secondly, sometimes many users connect from the same host: either users from the same company or ISP, or users using the same cache server. Finally, sometimes one user connects from many different hosts. In most countries, 'phone calls are not free. So users sometimes download one page, disconnect from their ISP, and then reconnect to follow a link: but when they reconnect, they will often be allocated a different hostname by their ISP. The same can happen if users access the web from their company through a firewall.
  3. You can't tell how many visits you've had. Many programs, under pressure from advertisers' organisations, define a "visit" (or "session") as a sequence of requests from the same host until there is a half-hour gap. This is an unsound method for several reasons. First, it assumes that each host corresponds to a separate person and vice versa. This is simply not true in the real world, as discussed in the last paragraph. Secondly, it assumes that there is never a half-hour gap in a genuine visit. This is also untrue. I quite often follow a link out of a site, then step back in my browser and continue with the first site from where I left off. Should it really matter whether I do this 29 or 31 minutes later? Finally, to make the computation tractable, such programs also need to assume that your logfile is in chronological order: it isn't always, and analog will produce the same results however you jumble the lines up.
  4. You can't follow a person's path through your site. Even if you assume that each person corresponds one-to-one to a host, you don't know their path through your site. It's very common for people to go back to pages they've downloaded before. You never know about these subsequent visits to that page, because their browser has cached them. So you can't track their path through your site accurately.
  5. You often can't tell where they entered your site, or where they found out about you from. If they are using a cache server, they will often be able to retrieve your home page from their cache, but not all of the subsequent pages they want to read. Then the first page you know about them requesting will be one in the middle of their true visit.
  6. You can't tell how they left your site, or where they went next. They never tell you about their connection to another site, so there's no way for your to know about it.
  7. You can't tell how long people spent reading each page. The same comments apply as in the previous paragraph. You can't tell which pages they are reading between successive requests for pages. They might be reading some pages they downloaded earlier. They might have followed a link out of your site, and they might or might not return later. They might have interrupted their reading for a quick game of Minesweeper. You just don't know.
The bottom line is that HTTP is a stateless protocol. People don't log in and retrieve several documents: they make a separate connection for each file they want. And a lot of the time they don't even behave as if they were logged into one site. Hence analog's emphasis on requests, rather than visits.
I've presented a somewhat negative view on this page, emphasising what you can't find out. Web statistics are still informative: it's just important not to slip from "this page has received 30,000 requests" to "30,000 people have read this page." In the next section, I'll tell you exactly how analog defines its terms, and what counts in each category.

Analog's Definitions

This page describes how analog defines its terms, and exactly what is counted in each category. We start with some basic definitions.

The host is the computer which has asked you for a file. The file might be a page (i.e., an HTML document) or it might be something else, such as an image. The total requests counts all the files which have been requested, including pages, graphics, etc. (Some people call this the number of hits, but that word is used in different ways by different people, so I avoid it). The requests for pages obviously only counts pages. The referrer for a request is the place that the user (or his computer) heard about your file from. If he followed a link to reach a page, it will be the previous page. In the case of a graphic on a page, the referrer will be the page containing the graphic.


Analog recognises four categories of request, based on the HTTP status code of the request. You can see the total number of requests for each status code, and what the codes mean, in the Status Code Report. (Or see the HTTP spec for a detailed description.)

First, successful requests are those with HTTP status codes in the 200's (where the document was returned) or with code 304 (where the document was requested but was not needed because it had not been recently modified and the user could use a cached copy). Sometimes the logfile line doesn't contain a status code. These lines are also assumed by analog to be successes.

Redirected requests are those with other codes in the 300's, indicating that the user was directed to a different file instead. The most common cause of these requests is that the user has incorrectly requested a directory name without the trailing slash. The server replies with a redirection ("you probably mean the following") and the user then makes a second connection to get the correct document (although usually the browser does it automatically without the user's intervention or knowledge). The other common cause of redirected requests is their use as "click-thru" advertising banners.

Failed requests are those with codes in the 400's (error in request) or 500's (server error). They come about for a variety of reasons, but the most common are when the requested file is not found or is read-protected.

Finally, requests returning informational status code are those with status codes in the 100's. These are very rare at the moment.

There are a few other types of logfile lines listed in the General Summary. Lines without status code refers to those logfile lines without a status code, and the successful requests in the General Summary only counts the ones with a status code: except if the line contains the name of the file requested, and the filename is being counted (not starred in the LOGFORMAT), then it's listed in the successes. Corrupt logfile lines are those which analog didn't manage to parse. And unwanted logfile entries are ones which we have specifically excluded. Successful requests for pages refers to those lines on which the file requested was given and was defined as a page by the PAGEINCLUDE command.


Most reports only include successful requests in calculating the number of requests, requests for pages, bytes, and last date: unless, of course, the report is a redirection or failure report. There is a further restriction on the time reports, the status code report and the file size report: the logfile line must also contain the name of the file requested, and the filename must be being counted. This is necessary to stop double counting if the server uses separate logs.

The "not listed" line at the bottom of each of the non-time reports includes both those items which were explicitly excluded at the output stage with an OUTPUTEXCLUDE command, and those which were not listed because they were below the floor for the report.

The figures in parentheses in the General Summary are for the last seven days: either the seven days before the TO time, or if no TO time is given, the seven days before the time of the program start. (It would be nicer to use the seven days before the last time in the logfile, but we don't know when this is until we've read the whole logfile, and by then it's too late.) The figures for the last seven days are not included if all, or none, of the requests fall in the last seven days.

In the Domain Report, "domain not given" means that the hostname did not contain a dot. "Unknown domain" means that it did contain a dot, but that the domain name was not in the domains file.


There are probably some other things which I could include on this page. If you have any suggestions, then feel free to contact me. Next I shall give an explanation of all the errors and warnings which analog can generate.

Errors and warnings

This page lists all the errors and warnings which analog can produce, together with a short explanation.

First, you should understand the difference between a crash, an error, a warning, and a debugging message. First, a crash is when analog exits prematurely, without producing the whole output file. The system might give a message, but analog will not give one of its own messages. Analog should never crash. If it does crash, please tell me about it.

An error is something which stops analog finishing its job. Whenever an error is detected, analog gives a message starting something like analog: Fatal error: and will then tell you what type of thing went wrong before quitting.

A warning is a problem which is not fatal to analog: it will keep on with its processing. These vary from the possibly serious, such as files which could not be found, to purely informational. They produce a message starting analog: Warning. You can turn warnings off using the WARNINGS command.

Finally, a debugging message gives information on the state of the program. They just begin with a single code letter followed by a colon. You don't get any debugging messages unless you've asked for them.

Now I shall describe all the possible errors and warnings in detail.


Errors

Ran out of memory: cannot continue
Analog ran out of memory. Try increasing the memory available to the process, if your operating system will allow it, or using the LOWMEM commands.
Cannot ignore mandatory configuration file
See the section in the Readme on the mandatory configuration file.
Can't find language file
Language file too short
Language file contains excessively long lines
Analog can't run without a well-formed language file. See the documentation on language files.
Attempted to read more than 50 configuration files
The most likely explanation for this is that you have accidentally created a loop using the CONFIGFILE command.
Incorrect default given in analhead.h
Default given in analhead.h too short
If you've compiled your own version, and you've specified an incorrect configuration in the file analhead.h, analog gives up to allow you to fix it.
Failed to open output file for writing
Analog couldn't create, or couldn't write to, the output file you specified.
OUTPUT NONE and CACHEOUTFILE none selected
You requested no output.

Warnings

Remember that warnings are not fatal, and that you can turn them off using the WARNINGS command. The possible warnings come in several different categories, shown by a letter in the warning message. The categories are as follows.

Category C

This category indicates an incorrect configuration. Analog will either ignore what you said, or try and do the best it can with it. There are too many warnings in this category to list completely. You will have to consult the documentation for the particular configuration command that gave an error. If you get an error for a command which used to work in a previous version of analog, have a look in the section Updating from older versions.

Category D

This is for configurations which might be intended, but which look suspicious.
Offset not a multiple of 30
Offset more than 25 hours
The time offsets are meant to be for correcting between differences in time zones. These differences are usually multiples of 30 minutes between -25 and +25 hours. Maybe you specified the offset in hours instead of minutes by mistake, or something like that.
SORTBY doesn't match FLOOR
SORTBY (or FLOOR) isn't included in COLS
Within one report, it's helpful to your readers to have the sort method and the floor compatible, and both included in the COLS. (See the section on Non-time reports).
Time reports have not all got same value of BACK
It's usually helpful to have all the time reports running in the same direction.
Report contains no COLS
You've got an empty COLS list for one report, so you'll just get a list of names, not any information about them.
LOWMEM 3 prevents that item being cached
You're making a cache file, but one item is not being recorded because of a LOWMEM command, and will therefore not be saved in the cache file.
OUTFILE and CACHEOUTFILE are the same
The regular output will overwrite, or possibly be appended to, the cache file.

Category F

This category is for diagnosing files which couldn't be opened or read successfully. These can be serious, but the messages should be self-explanatory.

Category L

When analog finishes reading a logfile, it checks whether there might have been something wrong with it.
Large number of corrupt lines
This could indicate a problem with the logfile, or with the LOGFORMAT specification. If you have a WebSTAR, Netscape or extended logfile, it might be missing the mandatory header line.
Logfiles overlap: possible double counting
Two logfiles which were counting the same thing overlapped in time. Maybe you read two copies of the same logfile. Or maybe the LOGFORMAT specification should have told analog to ignore some of the items.

Category M

This category is for warnings about logfile formats which might make analog produce unexpected results.
Logfile contains lines with no [whatevers], which are being filtered
This is usually harmless. It is perhaps best explained by example. Suppose you are excluding certain files from the analysis, but that you are also analysing a browser log which just contains information about the browsers used, not which files they read. Then we can't exclude the browsers which read the excluded files, because we don't know which they were, so all browsers will be included.
Logfile contains lines with no file names (or bytes): page (or byte) counts may be low
If a logfile line doesn't contain a file name, analog will assume that the request wasn't for a page. Similarly, if it doesn't give the number of bytes transferred, analog will make the bytes zero. So the number of page requests or bytes credited to the other items on that line will then be too low.

Category R

This is used when analog turns off an empty report. This could be because none of the relevant items were included in any of the logfiles, or perhaps beacause a LOWMEM command stopped them being recorded.

Broken Pipe

This is not an analog-generated warning, but it can result from analog closing an logfile it's uncompressing without reading the whole of it, when it determines that it will not need it.

Frequently asked questions

  1. When I try and compile analog, it gives me an error.
  2. Analog just runs for a moment and then quits.
  3. Analog didn't write the logfile when I ran it.
    See the section entitled Starting to use analog.
  4. Is analog Year 2000 compatible?
    Yes (and so are all previous versions). It works properly for dates between 1970 & 2069 inclusive.
  5. My stats have stopped updating.
  6. My stats have reset to zero.
    If your ISP runs analog for you, you'll have to ask them.
  7. How do I find out the number of hits from your data?
    I don't use the word hits, because people use it in different ways, so it's misleading. I use requests for the number of transfers of any type of file (text, graphics, ...), and page requests for the number of transfers of HTML pages. See the section on Analog's definitions for more information.
  8. Why doesn't the Daily Report only show the last six weeks?
    This is controlled by the FULLDAYROWS command.
  9. How do I get information on just my pages, not everybody's?
  10. How do I ignore accesses from my site?
    Use the FILEINCLUDE command or HOSTEXCLUDE command respectively.
  11. I want to make several different statistics pages. Do I have to install several copies of analog?
    No. Just install it once, and run it with different configuration files.
  12. Can I get data on individual visitors, or visits, to my site?
    No, it's not technically possible, and don't believe any program which tells you it is. See the section on How the web works for details.
  13. I want to see the total number of hits from my organisation in the Host Report.
    You can see this in the Domain Report if you use the SUBDOMAIN command.
  14. Do I have to save all my old logfiles?
    This is answered in the section about Cache files.
  15. What does this error (or warning) mean?
    See the section on Errors and warnings.
  16. Why doesn't analog agree with the counter on my page?
    There are lots of possible reasons. Do they both start from the same date? Are you just looking at requests for that one page with analog, not for all your other pages and graphics? Also, analog will record all requests to that page; if it's a graphic, your counter will only measure requests from people on graphical browsers that reached that place on the page.
  17. How can I do such-and-such with a commandline option?
    Use the +C option to put any configuration command on the commandline.
  18. Why does the form interface give "Document Returned no Data"?
    If it doesn't happen for a while, then probably the server is giving up before the analog process has finished running. Increase the timeout interval on the server.
  19. The images don't appear when running analog from the form interface.
    You probably need to change the IMAGEDIR. If the images are in your /cgi-bin/ directory, the server will try to execute them instead of just sending them out.
  20. Can I find out the number of hosts that have accessed each file?
  21. Can I find out the number of hosts visiting on each day?
    No; it would require storing too much data (all host/file pairs, or all host/day pairs). If there's a particular file you're interested in, use FILEINCLUDE to restrict the analysis to only that file. If there's a particular day you're interested in, use FROM and TO to restrict the analysis to only that day.
  22. How can I run analog every day?
    This depends on your particular machine. On Unix, you need to run analog as a cron job (see "man cron"). This is my cron command:
    20 1 * * * $HOME/misc/analog
    On Windows NT you can do the same with the at command, but only an administrator can run at. On Windows 95 it's not possible.
    On Mac, there are programs called Cron or CronoTask to do this.
  23. How can I specify different logfiles from the form interface?
    Just add a new field to the form with name=lo
  24. Why are directories listed in the request report?
    They are not directories, they are pages with the same name as the directory. For example, I have a directory called /analog/ and a page called /analog/ (which is the same as /analog/index.html).
  25. Why don't you just use one image, and scale it with the WIDTH and HEIGHT attributes?
    The WIDTH and HEIGHT only tell the browser what size the image will be. They cannot be used for scaling the image, whatever some browsers do.
  26. There is a CTRL-Z character in my logfile, and analog stops reading there.
    Analog is behaving correctly. Under Windows, CTRL-Z (ASCII 26) signifies end-of-file in a text file.
  27. Why do I only get "unresolved numerical addresses" in the domain report?
    Your server only records the numerical IP address of the hosts that contact you, not their names. Read the section about DNS lookups.
  28. Couldn't you do the DNS lookups faster with threads?
    The problem is, the standard commands for DNS lookups are not thread-safe on most Unices.
  29. How about an operating system report?
    Unfortunately, this is not possible. Many browsers record their operating system in your browser log, but not all do, and those that do don't always record it in a consistent format.
  30. How do I make a link on my page that runs analog?
    Link to the analform program, with the desired options. But be careful about the load on your server.
  31. My server lists local names in the logfile. Can you put a common suffix on them automatically?
    This wouldn't be a good idea, because things like "unknown" would get the suffix. You can always add them using HOSTALIAS.
  32. Why don't you make proper graphs or tables?
    Because lots of people then couldn't read them. Analog produces HTML 2.0 output so that people with any browser can read it. Also, I don't want to assume that people have any particular graphics creation tools.
  33. Can I change the background colour of my output?
    Sadly not. Colours only exist in HTML 3.2, not HTML 2.0. Unfortunately, there doesn't seem to be a way to produce the bar charts in my time reports in HTML 3.2.
  34. Can you extrapolate from the current month's partial data to produce a prediction for the whole month, based on the rate so far?
    No. There are too many problems in trying to produce anything sensible, especially near the beginning of the month. Different days of the week and different times of day cause lots of problems. I would prefer to produce raw accurate data than suspect derived data.
  35. Can you extend the Domain Report to say which US states people visited from?
    No. Some programs pretend to do this, but you can only tell which state the computer they're using is in, which may be quite different from where the user is for ISP's or other large organisations.
  36. Can I make multiple reports with one pass through the logfile?
    Not at the moment. I want to do this in a future version, but it will require some considerable work.
  37. I ran out of memory when trying to run analog. What can I do?
    See the section on Coping with low memory.
  38. You're processing 10,000,000 requests in under 10 minutes. Why is mine much slower?
  39. Analog appears to stall.
    If you have DNS lookups on, they are very slow. Otherwise, it probably depends on the speed of your computer and disks, and what other programs are running at the same time. You can use the PROGRESSFREQ command to see if it's really stalled or whether it's just being slow.
  40. I host lots of virtual domains. How should I set up analog?
    In my opinion, the best thing is to log each virtual server to a different logfile and analyse them independently. If you log them all to the same logfile, then make sure to log the virtual host name on each line, and use analog's VHOSTINCLUDE command to pick out the lines you want.
  41. Why don't you sell analog?
    I didn't write analog for the money, and I'm happy just to see people use it. I haven't got time to support it commercially, and I can't use my academic account for commercial purposes. Also, by making it freeware, lots of people send me ideas and code to include in future versions. (Of course, if you want to send me money, or gifts in kind, or even just postcards...).

If your question is not answered here or in the rest of the Readme, and you think it should be, see the next section for how to contact me.

How to report bugs

I welcome mail about analog, both praise and bug reports! I am also usually happy to help people who have trouble with analog: it helps me to find bugs, and know where the documentation is unclear.

I get a lot of e-mail about analog, so I would appreciate it if you would do the following simple things before mailing me.

  1. Read the FAQ. Maybe I've answered your question already. If I have, I'll just direct you to the FAQ, not answer it again.
  2. Read the list of known bugs at my site, to see if your bug is already known about.
  3. Read the other relevant pages of the Readme, particularly the section on Starting to use analog, and the section on Errors and warnings if your question is about one of those. If your question is already answered on one of those pages, I'll just direct you to it, not answer the question again.
  4. If your question is "How do I do ... with analog" then don't ask it until you've read the whole of the section on Customising analog yourself, and still don't know how to do it. I don't appreciate people who are too lazy to read the documentation. (If the documentation is unclear, or the relevant paragraph is too well hidden, then that's a different matter. Of course I want to know about that.)
  5. If analog isn't doing what you thought you asked it to, then run it with the PRINTVARS ON configuration command, and see what options it thinks it's meant to be using.
  6. Describe exactly what you did, what you expected, and what the computer did. Include the exact text of any error messages, not a précis.
  7. Do not send long files or attachments unless I ask you to. I do not want to see your configuration file, your header file, your output file, or any logfile over 20 lines long. They are almost always useless to me.
  8. Include the word "analog" in the subject of your e-mail. That way it will end up in the right mailbox.

I'm sorry to be so fussy, but a lot of the mail I get really needn't have been sent at all. As I say, I really do welcome genuine mail. After all that, you can send your mail to sret1@cam.ac.uk.

There is also a mailing list for receiving news of updates to analog. To join that list, see the next section.


The analog mailing list

There is a mailing list for announcements about analog, such as news of new versions. It usually only gets a message every few months or so. To join the mailing list, just send mail to sret1@cam.ac.uk with "subscribe analog" in the subject line. Don't expect a confirmation: I'll just add you before I send out the next announcement. You can put a message in the body of the e-mail if you want: I still read them.

There is no mailing list for discussing how to use analog, although I'm thinking of setting one up. For the moment, just send mail to me.


Acknowledgements

Many people have helped me with analog, and I can't thank them all specifically. But I do appreciate everyone who's given me feedback or sent me bug reports.

Thanks are due to the author of getstats, Kevin Hughes. In the days before analog there were only three serious logfile analysis programs, and only one of them, getstats, had attractive output. I wrote analog when getstats stopped being able to cope with the size of our logfile, but my output still looks similar to his.

Thanks are also due to all those who helped in the early stages of writing this program, and gave me the encouragement to continue with analog and to release it publicly. Those who made helpful suggestions during beta testing are numerous, but I must mention particularly Dan Anderson, Martyn Johnson, Joe Ramey, Chris Ritson, Quentin Stafford-Fraser and Dave Stanworth. Above all Gareth McCaughan gave, and continues to give, me lots of programming advice. The program would have run much more slowly without him.

Many people have provided mirror sites for analog, starting with Dave Stanworth (again!). The full list of mirror sites is listed elsewhere; thanks to all of them.

Mark Roedel first suggested porting analog to different platforms, and made the original DOS port. Shortly afterwards, Jason Linhart made the Mac port, and has continued to contribute lots of extra code for that platform and for the program in general. The Mac version also includes code contributed by Stephan Somogyi and Nigel Perry, and uses the ZLib library by Jean-loup Gailly & Mark Adler. Later ports were made by Dave Jones (VMS), Magnus Hagander (Win32), Nick Smith (Acorn RiscOS), Scott Tadman (BeOS), and Martin Kraemer & Holger Schranz (BS2000/OSD). Ivan Martinez compiles the OS/2 version. The BS2000/OSD port includes code developed by the Apache Group for use in the Apache HTTP server project. Thanks to all the other people who have contributed bits of code too.

For the translations into other languages, many thanks are due to the following: Patrice Lafont, Lucien Vieira, Jean-Marc Coursimault & Lionel Delaude (French), Mario Ellebrecht, Martin Kraemer, Holger Schranz & Thomas Jacob (German), Furio Ercolessi (Italian), Adrian Price (Danish), Björn Malmberg (Swedish), Ivan Martinez (Brazilian Portuguese), Jaime Carvalho e Silva (European Portuguese), Jan Simek (Czech), Stefan Billik (Slovak), Laszlo Nemeth (Hungarian), Andrej Zizmond (Slovene), and Alex Mihaila (Romanian).

Finally, thanks to you for using the program!


What's new in this version?

This page lists the new features in this version of analog. There's also another page about how to upgrade from older versions of analog, listing which commands have changed or been abolished.
3.0 (15-Jun-98)
Fix for broken strcmp() function on SunOS 5.
Portuguese, Brazilian Portuguese, Danish and Hungarian language files included.
Precompiled executable for OS/2 available.
2.91beta1 (04-Jun-98)
Uses less memory when compiling reports.
New operating system, BS2000/OSD, and code for EBCDIC character set.
New command DEFAULTLOGFORMAT.
LASTSEVEN and BASEURL reinstated.
More information added to PRINTVARS output.
AppleScript support for Unix-style command lines added to Mac version.
Now works on SunOS 4, and other small bug fixes.
French, German, Swedish, Czeck, Slovak, Slovene and Romanian language files included.
One page version of the Readme included in the documentation.
2.90beta4 (09-Apr-98)
Mended DNS cache file reading, which I broke in yesterday's release.
2.90beta3 (08-Apr-98)
Fixed bug that caused a crash while giving warning messages on SunOS; bug that caused configuration files that called other configuration files not to be completed; and other smaller bugs.
Italian language files included.
2.90beta2 (03-Apr-98)
Separate LOGFORMATs for North American and international date formats, when using Microsoft or Netpresenz logs.
Understands the AppleShare IP server's attempt at the WebSTAR format.
Directory report now works properly even if you use the second argument to the LOGFILE command.
Wild cards in filenames work properly on the Mac.
Other small bug fixes.
One speed improvement (I gain about 3%).
Several corrections and clarifications to the documentation.
2.90beta1 (27-Mar-98)
This version is a completely rewritten version. Every single line of code is new. The whole code is shorter despite considerable improvements in functionality. Several people have reported that it is significantly faster. The most important new features are: The following features have been abolished. The following features are not yet present, but will be added by version 3.
What was new in version 2?
What was new in version 1?

Upgrading from earlier versions

This page lists those commands which existed in older versions of analog, but which have been changed or abolished in this version. The new features in this version are listed on the page What's new in this version?.

Upgrading from 2.90beta1

Upgrading from 2.11 and earlier

Upgrading from 2.0, Win32 users

Upgrading from 1.92 and earlier, Mac users

Upgrading from 1.9beta's

Upgrading from 1.2's and earlier


What was new in version 2?

This page lists the new features which were in version 2 of analog.
What's new in version 3?
2.11 (14-Mar-97)
Minor bug fixes to yesterday's release.
2.1 (13-Mar-97)
Language support rewritten, causing reduction in code size of 2200 lines.
New configuration command LANGFILE.
New Acorn RiscOS version.
Page requests per day reported.
Bug fix: CASE INSENSITIVE could cause %7E-type conversions not to take place.
2.0.2 (04-Mar-97)
DNS lookups and wildcards should now work in the Win32 version.
New configuration command PRINTVARS.
Fix for zero length hostnames after DNS lookups.
Minor corrections in French and Spanish translations.
2.0 (10-Feb-97)
New native Win32 version.
Wildcards allowed in filenames on Mac.
Ignores browser "-".
1.93beta (18-Jan-97)
New commands BROWALIAS, CONFIGFILE and PROGRESSFREQ.
Form program can now call configuration files.
Form program now uses the default choices if none specified.
Domain report prints correctly in preformatted output.
Specifying +1 and +V2 doesn't crash the program.
-v reports dates correctly.
Trailing dots on hostnames removed.
Second argument to LOGFILE command can't be obliterated by /../
1.92beta (08-Oct-96)
DNS lookups added on Mac.
Netpresenz format understood on Mac.
New languages: Spanish, Italian and Danish.
Extra information when debugging turned on.
*.htm are now pages on all machines.
A few small bugs fixed.
1.91beta4 (13-Jul-96)
Cache file now includes page request information.
DNS bug fixed.
New command DNSHASHSIZE.
Bug in browser reports fixed.
1.91beta3 (09-Jul-96)
BSD/OS compilation bug believed fixed.
Fixed HOSTALIAS which I broke yesterday.
DNS bug (causing too many lookups) identified, although not yet fixed.
1.91beta2 (08-Jul-96)
Some bug fixes (including: HOSTEXCLUDE and CASE INSENSITIVE didn't work properly; selecting "no links" failed on the form; less fussy about what can appear on the form).
Mac version no longer includes source code, so is much shorter.
1.91beta1 (05-Jul-96)
Now DNS code doesn't look up a name twice, even if one is a failed request.
1.91beta (05-Jul-96)
Will now output in any of several languages.
Preformatted output introduced.
New File Type Report.
Can limit the number of rows in the time reports.
Number of requests for pages (as opposed to raw requests) now calculated throughout.
DNS lookup returns, with cacheing across runs.
Logfiles can include wildcards.
Wildcards can include multiple *'s.
Can process case insensitive logfiles.
OUTPUTALIAS commands introduced.
New commands to specify exactly what is included, and what linked, in the request report and referrer report.
FILEALIAS a a and FILEALIAS a b; FILEALIAS b c now work.
New ALLOW options to cancel INCLUDES.
REPSEPCHAR and DECPOINT introduced.
DIRSUFFIX introduced.
Debugging reports number of corrupt lines in other logs.
Hash sizes can now be allocated at run time.
stdin can now be used for any input file, but not for two.
Macintosh version now quits automatically if no warnings have been issued.
Form interface made more secure.
"Mozilla (compatible)" separated out in Browser Summary.
Major internal changes should improve speed.
Code for non-Unix platforms integrated into main code.
"Referrer" spelled correctly.
Licence introduced.
Update file introduced.
Readme updated to include non-Unix instructions.
(19-Apr-96)
First Mac version.
1.9beta6
Two bug fixes (number of bytes was incorrectly reported in some cases, and -v would overwrite the OUTFILE).
Documentation improved.
1.9beta5
More bug fixes...
1.9beta4
One important bug fix (I broke GRAPHICAL OFF in 1.9beta3).
New form cgi options: ch, gr and ou=3.
Code shortened.
(05-Mar-96)
First DOS version.
1.9beta3
Mainly bug fixes and improved documentation.
Browser and referer reports now include failed requests.
The WARNINGS option can now be specified on the form.
1.9beta2
Small bug fixes
1.9beta (06-Feb-96)
Lots of changes. The most important new features are
What was new in version 1?

What was new in version 1?

This page lists the new features which were in version 1 of analog.
What's new in version 3?
What was new in version 2?
1.2.6
Minor bug fix; will only affect those with corrupt logfiles.
1.2.5
Minor bug fix for weekly report.
1.2.4
Patch for Spyglass server logfile format.
1.2.3
A couple of bug fixes (wild subdomains sometimes caused crashes).
-v option now gives the version number.
1.2.2
Patch for proxy servers: http:// not translated to http:/
1.2 (11-Nov-95)
Can configure columns in reports to give percentage requests and number of bytes.
Wild subdomains (e.g., *.com).
Nameless subdomains.
Subdomains now listed in alphabetical order.
Proper support for numerical hostnames in HOSTIGNORE, HOSTONLY, SUBDOMAIN and alphabetical sorting.
New BASEURL command allowing statistics to be displayed on other servers.
Output always says how things are sorted.
"Last 7 days" now behaves sensibly with TO.
Filenames containing /../, /./ and // translated.
Header and footer options removed from form (for security reasons).
1.1 (02-Oct-95)
Form interface introduced.
ASCII output now possible as well as HTML.
Output file can now be specified in the configuration file.
FROM and TO commands more powerful.
DEBUG and BACKGROUND introduced.
One bug fix: alphabetical sorting doesn't now swap some hostnames.
List of primes included in distribution.
1.0 (12-Sep-95)
Only minor changes since 0.94beta.
0.94beta (30-Aug-95)
New configuration variables SEPCHAR and REPORTORDER.
New configuration commands WITHARGS and WITHOUTARGS.
New commandline options +-A and +-x. (Config.: ALL and GENERAL).
Logfile entries with - as the return code are now regarded as successes, not corrupt entries.
Fixed bugs in host report when aliases or numerical hosts are present.
Documentation rewritten.
0.93beta (27-Jul-95)
Approximate hostname counting now possible in fixed memory.
New configuration commands ISPAGE and ISNOTPAGE.
New commandline option -v.
New configuration command WEEKBEGINSON.
Proper error message when memory exceeded.
Program split into several files.
0.92beta (11-Jul-95)
New reports introduced: hostname, full daily, and weekly.
FROM and TO commands introduced.
Header and footer files introduced.
More helpful warning messages.
Ability to read configuration instructions from stdin.
Subdomain commands moved from domains file to configuration file.
Makefile provided.
0.91beta (04-Jul-95)
Configuration file introduced, enabling many new options.
Some bug fixes and speed improvements.
Ability to print "top n" reports (rather than "everything higher than n").
Request report can print only pages.
Ability to try and resolve numerical addresses.
Now less fussy about the format of the domains file.
Logo added.
Readme converted to HTML.
0.9beta
More speed improvements, and some bug fixes.
Introduced -u option.
Introduced subdomain analysis.
Included "not modified" replies as successes, not redirects.
First public release at 0.9beta3. (29-Jun-95)
0.89beta (21-Jun-95)
Commandline arguments.
Efficiency improvements.
Host count and "last 7 day" statistics.
0.8beta (14-Jun-95)
Initial program, just default options.

Index

[ A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z ]

This is the index for this Readme. Follow the numbers after each name to find references to that command or concept. Note that families of commands are indexed under the second part of the name: for example, HOSTEXCLUDE is under *EXCLUDE, not under HOST. This index includes all of analog's configuration commands: if a command you used in previous versions is not here, see the page on Upgrading from earlier versions.

Acknowledgements [1]
Addresses, numerical [1]
*ALIAS [1]
Aliases [1]
ALLBACK [1]
ALLGRAPH [1]
analhead.h [1]
analog.cfg [1][2][3][4]
Announcements [1]
ARGSEXCLUDE [1]
*ARGSFLOOR [1]
ARGSINCLUDE [1]
*ARGSSORTBY [1]
*BACK [1]
Bar charts [1]
BASEURL [1]
Basic commands [1]
Broken pipe [1][2]
BROW* commands - see under second part of name
BROWSER [1]
Browser Report [1][2][3]
Browser Summary [1][2]
BROWSUM* commands - see under second part of name
Bugs, reporting [1]
Bytes, how displayed [1]
Cache files [1]
CACHEOUTFILE [1]
CACHEFILE [1]
CASE [1]
CGI program [1]
Colours [1]
*COLS [1][2]
Command line arguments [1][2][3][4]


Compilation problems [1]
Compiling [1]
Compressed logfiles [1]
COMPSEP [1]
Computer-readable output style [1]
CONFIGFILE [1]
Configuration files [1][2][3][4][5]
Configuration file, default [1]
Configuration file, mandatory [1]
Contents [1]
Contributors [1]
Corrupt logfile lines, definition [1]
Countries [1]
Crashes [1]
Current logfile format [1]
Customising analog [1]
DAILY [1]
Daily Report [1][2]
Daily Summary [1][2]
Date reports [1]
Dates, restricting [1]
DAY* commands - see under second part of name
Debugging [1]
DECPOINT [1]
Default configuration file [1]
Default logfile format [1]
DEFAULTLOGFORMAT [1]
Definitions [1]
DIR* commands - see under second part of name
DIRECTORY [1]
Directory Report [1][2][3]
DIRSUFFIX [1]
DNS [1]
DNS lookups [1]
DNSBADHOURS [1]
DNSFILE [1]
DNSGOODHOURS [1]
DNSTran [1]
DOM* commands - see under second part of name
DOMAIN [1]
Domain Report [1][2][3][4]
Domains file [1]
DOMAINSFILE [1]
Error Report [1]
Errors [1]
Examples [1]
*EXCLUDE [1]
Exclusions [1]
FAIL* commands - see under second part of name
Failed Referrer Report [1][2][3]
Failed requests, definition [1]
Failed User Report [1][2]
FAILREF [1]
FAILREF* commands - see under second part of name
FAILURE [1]
Failure Report [1][2][3]
FAILUSER [1]
FAILUSER* commands - see under second part of name
FAQ [1]
Fatal errors [1]
FILE* commands - see under second part of name
File, definition [1]
File Size Report [1][2]
File Type Report [1][2][3]
FILETYPE [1]
Filters [1]
First day of week [1]
FIVE [1]
FIVE* commands - see under second part of name
Five-Minute Report [1][2]
*FLOOR [1][2]
FOOTERFILE [1]
Form interface [1]
Frequently Asked Questions [1]
FROM [1]
FULLBROW* commands - see under second part of name
FULLBROWSER [1]
FULLDAILY [1]
FULLDAY* commands - see under second part of name
FULLHOUR* commands - see under second part of name
FULLHOURLY [1]
GENERAL [1]
General Summary [1]
GOTOS [1]
*GRAPH [1]
Graphs [1]
HEADERFILE [1]
Hierarchical reports [1]
Hits [1]
Home page [1]
HOST [1]
HOST* commands - see under second part of name
Host, definition [1]
Host Report [1][2]
HOSTNAME [1]
Hostnames, numerical [1]
HOSTREP* commands - see under second part of name
HOSTURL [1]
HOUR* commands - see under second part of name
HOURLY [1]
Hourly Report [1][2]
Hourly Summary [1][2]
IMAGEDIR [1]
*INCLUDE [1]
Inclusions and exclusions [1]
Introduction [1]
IP addresses [1]
LANGFILE [1]
LANGUAGE [1]
Languages [1][2]
LASTSEVEN [1]
Licence [1][2]
LINKEXCLUDE [1]
LINKINCLUDE [1]
LOGFILE [1]
Logfile formats [1]
Logfile prefix [1]
Logfiles [1]
Logfiles, choosing [1]
Logfiles, compressed [1]
Logfiles, finding [1]
LOGFORMAT [1]
LOGO [1]
LOGTIMEOFFSET [1]
Low memory [1]
*LOWMEM [1][2]
Mailing list [1]
Makefile [1]
Mandatory configuration file [1]
Map [1]
MARKCHAR [1]
Memory, using less [1]
MINGRAPHWIDTH [1]
MONTH* commands - see under second part of name
MONTHLY [1]
Monthly Report [1][2]
Non-time reports [1]
Numerical addresses [1]
Numerical hostnames [1]
Operating System Report [1]
OUTFILE [1]
OUTPUT [1]
Output aliases [1]
OUTPUT COMPUTER [1][2]
Output, configuring [1]
Output style, computer readable [1]
Output styles [1]
*OUTPUTALIAS [1]
Page, definition [1]
PAGEEXCLUDE [1]
PAGEINCLUDE [1]
Pages, defining [1]
PAGEWIDTH [1]
Path through site [1]
PRINTVARS [1][1]
PROGRESSFREQ [1]
QUARTER [1]
QUARTER* commands - see under second part of name
Quarter-Hour Report [1][2]
RAWBYTES [1]
REDIR [1]
REDIR* commands - see under second part of name
Redirected Referrer Report [1][2][3]
Redirected requests, definition [1]
Redirection Report [1][2][3]
REDIRREF [1]
REDIRREF* commands - see under second part of name
REF* commands - see under second part of name
REFARGSEXCLUDE [1]
REFARGSINCLUDE [1]
REFDIR [1]
REFERRER [1]
Referrer, definition [1]
Referrer Report [1][2][3]
Referring Site Report [1][2][3]
REFLINKEXCLUDE [1]
REFLINKINCLUDE [1]
REFREP* commands - see under second part of name
REFSITE [1]
REFSITE* commands - see under second part of name
Report.html [1][2][3]
Reporting bugs [1]
REPORTORDER [1]
Reports, list of [1]
REPSEPCHAR [1]
REQ* commands - see under second part of name
REQUEST [1]
Request Report [1][2][3]
Requests, definition [1]
Requests for pages, defining [1]
Requests for pages, definition [1]
Requests, types of [1]
*ROWS [1]
Samples [1]
SEPCHAR [1]
SIZE [1]
SIZE* commands - see under second part of name
*SORTBY [1][2]
Source code [1]
Starting to use analog [1]
Starting to use analog on a Mac [1]
Starting to use analog on OS/2 [1]
Starting to use analog on Windows 95 & NT [1]
Starting to use analog on other platforms [1]
STATUS [1]
Status Code Report [1][2]
STATUS* commands - see under second part of name
SUBBROW [1]
SUBDIR [1]
Subdirectories [1]
SUBDOMAIN [1]
Subdomains [1]
SUB*FLOOR [1]
SUB*SORTBY [1]
SUBTYPE [1]
Successful requests, definition [1]
Syntax [1]
Time reports [1]
TIMECOLS [1]
TIMEOFFSET [1]
Times, restricting [1]
Title line [1]
TO [1]
Total requests, definition [1]
Translators [1]
Tree reports [1]
TYPE* commands - see under second part of name
UNCOMPRESS [1]
Unresolved numerical addresses [1]
Unwanted logfile entries, definition [1]
Upgrading from earlier versions [1]
USER [1]
USER* commands - see under second part of name
User Report [1][2]
USERREP* commands - see under second part of name
VHOST [1]
VHOST* commands - see under second part of name
VHOSTREP* commands - see under second part of name
Virtual Host Report [1][2]
Visitors [1]
Visits [1]
WARNINGS [1]
Warnings [1][2]
WEEK* commands - see under second part of name
WEEKBEGINSON [1]
WEEKLY [1]
Weekly Report [1][2]
What was new? [1][2]
What's new? [1][2]
Year 2000 compatibility [1]

[ A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z ]