Analog is a program which analyses logfiles from WWW servers. It works on
almost any operating system. It is designed to be fast and to produce
attractive statistics. It's free software.
Beginners should read the licence followed by the
section on Starting to use analog.
This Readme describes analog 4.90beta4. This is a beta test for version
5. For the latest version of analog, see the
analog home page.
For examples of the output see
Analog is free software, but its usage, distribution and modification are
covered by a licence. You must agree to the terms of
the licence before using the program. In particular, it comes with no
warranty. As a beta test, this version is expected to contain bugs.
This is a version of the Readme in one page. If you're reading it on line,
you might prefer the version on several smaller
pages. There is an index at the end of this
document.
Now you can go to
The only thing you need to run analog is to be able to read the logfiles
which are produced by your web server. If you don't know what these logfiles
are and where to find them, contact your internet service provider (ISP) or
system administrator. Analog doesn't write the logfiles: it only reads them.
If you log in to your ISP's machine from your home machine, you have two
options. If you have the right permissions, you can run analog on your ISP's
machine. Otherwise, you can download (e.g., ftp) the logfiles from their
machine to yours, and then run analog on your machine.
Once you've downloaded the right version of analog for your computer from the
analog home page
(or a mirror site), you need to know how to set it up and run it. This is
very easy, but the instructions are slightly different depending which
platform you're using.
If you can't manage to set up analog after reading the instructions, send
a message to the analog-help mailing list.
Here is the really short summary:
- Edit analog.cfg
- Run analog
- Read Report.html
When you download the Mac version of analog, it should unpack itself. (If it
doesn't, you might have to run StuffIt Expander on it). You should then find
in the analog directory a configuration file called analog.cfg
and the analog application itself, as well as the Readme, the
Licence (which you must read and agree to before
using analog) and a couple of other files. When you double-click on the analog
icon, it will run in its own window, and produce an output file called
Report.html. (For help in interpreting the output, see
What the results mean.)
The window will then close if there weren't any warning messages, or stay open
for you to read them if there were.
You can configure analog by putting commands in the configuration file,
analog.cfg. One command you will need straight away is
LOGFILE logfilename # to set where your logfile lives
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch
it from the internet. There's a sample logfile supplied with the program.
There's a list of basic commands later in the
Readme. Also there are a few to get you started in the configuration file
already, but there are lots of others available. You can read about all the
commands in the section on customising analog.
Another way to start analog is to drag a logfile onto the analog icon, in which
case analog will try to analyse it, or drag a configuration file onto the
icon, in which case analog will use the commands in that configuration file.
(Analog detects whether it's a configuration file or a logfile by whether
it starts with a # or not.) This enables you to create different
reports without having two copies of the application.
One note: on other platforms, there is another way to give options, via
command line arguments. You'll see these mentioned in this Readme from time
to time, but the Mac doesn't have a command line, so ignore these.
If you want to compile your own version of analog (it's written in C), or
just to read the source code, it's available from the
analog home page.
(It's the same source code for all versions).
This describes how to set up analog under Windows 95/NT or later.
Windows 3.1 users will have to read the section on
other platforms instead.
Here is the really short summary:
- Edit analog.cfg
- Run analog
- Read Report.html
When you've downloaded analog, and either you or your browser has unzipped
it, you will find in the analog folder a configuration file called
analog.cfg and the analog executable itself, as well as the Readme,
the Licence (which you must read and agree to before
using analog) and a couple of other files.
There is no setup.exe: analog is already ready to run without one.
(Some unzip programs are broken, and do not create folders when they should. If
you don't have a folder called lang inside the analog folder,
create one and put all the files called *.lng and *.tab
into it.)
There are two ways of running analog. You can either run it from Windows
(by single-clicking or double-clicking on its icon, depending on your setup),
or you can run it from the DOS command prompt (under Start-Programs). If you
run it from Windows, it will create a DOS window to run in. When it's
finished, it will produce an output file called Report.html. The
first time you run it, this may all happen almost instantly. For help in
interpreting the output, see What the results
mean.
You can configure analog by putting commands in the configuration file,
analog.cfg. You can edit this file using any plain text editor, for
example Notepad. One command you will need straight away is
LOGFILE logfilename # to set where your logfile lives
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch
it from the internet. There's a sample logfile supplied with the program.
There's a list of basic commands later in the
Readme. Also there are a few to get you started in the configuration file
already, but there are lots of others available. You can read about all the
commands in the section on customising analog.
In some ways, it's easier to run analog from the DOS command prompt, because
you get to see any error or warning messages more easily. Also, if you run
analog from the command prompt, there is another way to give options, via
command line arguments, given on the command line after the program name.
These are just shortcuts for configuration file commands. You can use the
command line arguments if you run analog from a batch file too.
If you want to compile your own version of analog (it's written in C), or
just to read the source code, it's available from the
analog home page.
(It's the same source code for all versions).
Here is the really short summary:
- Edit anlghead.h and compile, if necessary
- Edit analog.cfg
- Run analog
Many platforms have a precompiled version of analog available. Before
compiling analog, have a look at the
analog home page to see if yours does.
If you're not using one of the platforms for which a precompiled version
is available, you'll have to compile your own
version from the source. But don't worry -- it's written in
standard C throughout, so it will compile out of the box on most platforms.
(The source code is the same for all platforms.)
First, change to the src/ directory.
Then look at the file anlghead.h, and see if there's anything you
want to edit.
When you have done that, you need to compile the program. How to do that
depends on which operating system you're using.
Compiling under Unix. First edit anlghead.h as
described above. Then just type
make
within the src/ directory
to compile the program. On most systems, that will be sufficient, and the
compiled program should appear in the parent directory. If it
fails to compile, have a look in the Makefile to see if there's anything that
you need to change to suit your configuration, and try again. It says in that
file what to do. In particular, Solaris 2 (SunOS 5) users need to
change the LIBS= line.
(Experts can pass some arguments in on the make command line
instead of by editing anlghead.h: e.g.
make DEFS='-DLANGDIR=\"/usr/etc/apache/analog/lang/\"'
This is useful if you have a script to compile analog.)
If you haven't got gcc, you will need to change the compiler - try acc or cc
instead. If it still doesn't compile, try DEFS=-DNODNS to ignore the
DNS lookup code.
There is a known problem with HP-UX 10 and some versions of gcc. If it
complains about an error in the <sys/stat.h> library, you
need to upgrade to gcc version 2.7.2.3 or later, or use HP's cc compiler.
HP's compiler is not an ANSI C compiler by default, so you need to specify
-Ae in the CFLAGS to tell the compiler to use ANSI C.
SunOS 4's cc and gcc don't have the necessary header files for ANSI
C. If you have the ANSI C compiler acc, use that. Otherwise use the
DEFS given in the Makefile.
SunOS 5 users need to change the LIBS= line in the
Makefile. Also, this OS sometimes seems to have a broken strcmp()
function. If you get an "illegal instruction" error when running analog,
compile it with the -DNEED_STRCMP in the DEFS= line.
Compiling under OpenVMS. First edit anlghead.h
as described above. Then type
MMS
the src/ directory to compile analog.
Compiling under Acorn RiscOS. The Makefile
is called
Make.Risc, and you will have to rename it to Makefile
before running make. Also you have to make directories called C,
H and O, and move the sources files into the appropriate
directories: e.g., alias.c must be renamed C.alias. And
you will find that there are some filenames in the header file
anlghead.h that you want to change to fit into the RiscOS directory
structure.
Compiling under OS/2.
To compile analog for OS/2, you will need the
EMX package. You should
edit the Makefile to have OS=OS2 and LIBS=-lsocket.
Then after editing anlghead.h and running Make, you need to run the command
EMXBIND -b ANALOG
to generate the analog.exe executable.
After you've compiled the program, leave the src/ directory and
then just type
analog
to run the program. (Or ./analog if for some reason .
isn't in your $PATH.)
You can configure analog by putting commands in the configuration file,
which is called analog.cfg by default. Two commands you will need
straight away are
LOGFILE logfilename # to set where your logfile lives
OUTFILE outputfile.html # to send the output to a file instead of the screen
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch
it from the internet. There's a sample logfile supplied with the program.
There's a list of basic commands later in the
Readme. Also there are a few to get you started in the configuration file
already, but there are lots of others available. You can read about all the
commands in the section on customising analog.
For help in interpreting the output, see What the
results mean.
There is one other way to give options to analog, via command line arguments,
given on the command line after the program name. These are just shortcuts for
configuration file commands.
This is the bulk of the Readme. It tells you all the commands
you can give to analog, and what they all do. First there's a list of
which is as much as beginners need to read, until they want to do something
which isn't listed there, or are curious to find out what they could do.
The following section is a technical (i.e., dull but important) one on the
Then there's documentation on all the configuration commands in the following
categories. Analog has over 200 configuration commands and over 40 command
line options, so sometimes these sections turn into lists of commands.
But here's where you find out everything you can do with analog.
Later there's an index of all the commands and topics,
and also a quick reference containing the syntax
of all the commands and examples.
Here is a list of basic configuration commands to get you started with
analog. These commands should be added to your configuration file,
analog.cfg, as explained in the section on
Starting to use analog.
We'll see all the possible configuration commands in later sections.
Or you can read a summary of the commands which control each report in the
section on Analog's reports.
Analog reads logfiles produced by your web server, and produces an output file
based on the data in them. So you need to know how to specify which logfile to
read, and which file to send the output to. The relevant commands look like
LOGFILE my_logfile
OUTFILE output.html
where, of course, you should substitute the names of the files you want to use.
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch
it from the internet, so you may have to fetch it yourself first.
You can read several logfiles by giving several logfile commands, or by giving
a comma-separated list, or by using wild cards in the logfile name. So, for
example, if you use the commands
LOGFILE new1.log,old*.log
LOGFILE new2.log
analog will analyse the logfiles new1.log, new2.log,
and all the old logfiles. Analog will recognise logfiles in several different
formats. You can read more about this in the section on
Choosing a logfile.
There are a couple of other commands you need to know right at the beginning,
not because they're particularly important in themselves, but because the
output will look silly if you don't know them. First, you need to know how to
put your own organisation's name and URL at the top of the report. For this,
you need two commands such as
HOSTNAME "Spam Widgets Inc."
HOSTURL http://www.spam-widgets.com/
If you have broken images in the output instead of graphs, you need to say in
which directory on your server the images are stored. You do this by a
command like
IMAGEDIR /analog/images/
(This is just put in the <img> tags in the output page, so it's the URL
of a directory, not the name of the directory on your disk. The images are
distributed with the program - you will have to move them to whichever
directory you choose.)
Next you will want to know how to turn individual reports on and off. Analog
can produce up to 44 different reports if your web server has been configured
to record the necessary data in your logfiles,
but here are the most important ones. Try them and
see what happens. You can turn each report on with an ON command,
or off with an OFF command. You can also use the commands ALL
ON and ALL OFF to turn all reports on or off.
MONTHLY ON # one line for each month
WEEKLY ON # one line for each week
DAILYREP ON # one line for each day
DAILYSUM ON # one line for each day of the week
HOURLYREP ON # one line for each hour of the day
GENERAL ON # the General Summary at the top
REQUEST ON # which files were requested
FAILURE ON # which files were not found
DIRECTORY ON # Directory Report
HOST ON # which computers requested files
ORGANISATION ON # which organisations they were from
DOMAIN ON # which countries they were in
REFERRER ON # where people followed links from
FAILREF ON # where people followed broken links from
SEARCHQUERY ON # the phrases and words they used...
SEARCHWORD ON # ...to find you from search engines
BROWSERSUM ON # which browser types people were using
OSREP ON # and which operating systems
FILETYPE ON # types of file requested
SIZE ON # sizes of files requested
STATUS ON # number of each type of success and failure
The full list of reports is in the section on
Configuring the output.
Some reports, for example the Referrer, Browser and Operating System reports,
will only appear if your web server has been configured to record the
necessary data in its logfiles.
You can configure lots of other things about each
report, such as how many rows are listed, which columns are included, and
how the reports are sorted. For example, the command
REQINCLUDE pages
tells analog only to list pages, rather than all files, in the Request Report,
and
REQFLOOR 10r
tells analog to include in the Request Report all files with at least 10
requests.
You can read a summary of all the reports and the commands which control them
in the section on Analog's reports.
You can have the output in several different languages, by using a
LANGUAGE command. For example, the command
LANGUAGE FRENCH
will give you the output in French. The available languages at the moment are
ARMENIAN, BOSNIAN, BULGARIAN,
CATALAN, SIMP-CHINESE (GB2312 encoding),
TRAD-CHINESE (Big5 encoding), CROATIAN,
CZECH, DANISH, DUTCH,
ENGLISH, US-ENGLISH, FINNISH,
FRENCH, GERMAN, GREEK, HUNGARIAN,
ICELANDIC, ITALIAN,
JAPANESE, KOREAN,
LATVIAN, LITHUANIAN,
NORWEGIAN (Bokmål),
NYNORSK, POLISH, PORTUGUESE,
BR-PORTUGUESE, ROMANIAN,
RUSSIAN, SERBIAN, SLOVAK, SLOVENE,
SPANISH, SWEDISH, TURKISH and
UKRAINIAN.
All these languages were available in previous versions of analog,
but most have not yet been translated for version 5, so only a few are
available at the moment (see the list in the
What's new? section).
As new languages are translated, they will be added to the
analog home page.
See the section on Configuring the
output for how to download, or even translate, new languages.
Two other common things you might want to do are to alias files or
hosts (for example, to tell analog that two different filenames are really
the same file), or to include or exclude certain files, hosts
or dates (to ignore accesses from your site, for example, or to do an analysis
only of a certain subdirectory or a certain time period). For these, see the
later sections on Aliases and
Inclusions and exclusions.
As I said, these are only a few of the commands available. To find out about
all the commands, you'll have to read the remaining sections of the Readme,
starting with a short section on the syntax of
configuration commands.
This section describes how analog finds configuration commands, and what the
syntax of a configuration file should be. The syntax of individual commands is
given in the Quick reference section
later.
When analog starts up, it first reads options from configuration files
and the command line (assuming that you are running analog from an operating
system with a command line). Defaults for many of these options will have
already been set in the files anlghead.h and anlghea2.h
at the time the program
was compiled. So if you compile your own version of analog, rather than
downloading a pre-compiled executable, you can also set some options in
those files before compiling. Those options are all documented there.
The first file which analog reads is the default
configuration file,
normally called analog.cfg. You can stop this file being read by
specifying the option -G on the command line. Then the command line
arguments are read, in the order in which they appear. Finally, the
mandatory configuration file is read, if you specified one when you
compiled the program. This is a configuration file which cannot be overridden
by the user: if it is not found, analog exits immediately. This allows a
system administrator to prevent users analysing certain files or producing
certain reports, for example. However, note that the
only certain way to prevent users analysing things is to deny them access to
the logfile. Otherwise there is nothing to stop them analysing the logfile
using another copy of analog or another program.
You can include another configuration file by a
command like
CONFIGFILE other.cfg
The commands in the other configuration file are read immediately, in order.
The program then continues reading the first configuration file where it left
off. Note that reading in several configuration files does not produce
several reports, but a single report based on all the options.
You can also include another configuration file from the command line by
using a command like +gother.cfg. (Note that there is no space
between +g and the filename; this is true of all command line
arguments.) But note that reading an alternative configuration file
does not stop the default configuration file (usually
analog.cfg) being read as well. To do that you have to specify
-G as well as the +g command. This is because if you want
several different configurations, it's most convenient to put all the common
options in analog.cfg, and options specific to each configuration
in a separate file. Then the +g command line option will read both
those files.
If the name of a configuration file doesn't include a directory, it will be
looked for wherever analog expects to find its configuration files. (This
location is a compile-time option.) For example, in the Windows version it
would be in the same folder as the analog executable. To include a
configuration file from the current working directory, you therefore need
(on Unix) +g./other.cfg instead of just +gother.cfg.
This applies to the default and mandatory configuration files as well.
In the Mac version, you can start up a program with a particular configuration
file instead of the default one by dragging the configuration file onto the
analog icon. The file must start with a #.
You can also specify any configuration command on the
command line even if it
doesn't have a command line abbreviation, by use of the +C command.
(NB The C must be upper case.)
For example, +C"UNCOMPRESS *.gz gzcat" will include that command.
Here are the syntax rules for configuration
commands. A configuration file
contains several commands on separate lines; any text after a hash
(#) on a line is ignored as a comment. Each command consists of
the command name followed by one or two arguments. An argument to a command may
optionally be placed in single or double quotes or parentheses, and it must be
if the argument contains a hash or a space. Configuration commands can be
continued across new lines by using a backslash as the last character on the
line (but can't then have comments until the end of all the lines; also the
total length can't be more than 254 characters). So, for
example, here are some valid configuration commands.
DAILYSUM OFF # We don't want a Daily Summary
DAILYREP "ON" # We want a full Daily Report instead
HOSTNAME (Spam Widgets Inc.) # Spaces, so quotes or brackets needed
LOGFILE logfile1.log,\
logfile2.log # This line and the previous one are one command
Generally later commands override earlier ones if you can have only one of
that thing (e.g., for the OUTFILE), or supplement them
if you can have several (e.g., for the LOGFILE, because you can
read several logfiles). Apart from that, the order of commands doesn't matter,
except that LOGFORMAT
and LOGTIMEOFFSET
commands must come earlier in the same configuration file than the
LOGFILE to which they refer.
If all the options seem a bit confusing, just run
analog -settings [other options]
from the command line,
or include SETTINGS ON in the configuration commands.
Then instead of running normally, analog will just tell you what the values of
all the variables will be, based on
the defaults in anlghead.h and anlghea2.h, the
configuration commands, and the
command line options. If you're on Unix or Windows, remember that you can send
the output to a file with
analog -settings > file
Also, analog -version will just give the version number.
The basic command for selecting a logfile is
LOGFILE logfilename
or just to put the logfile name on the command line without any arguments,
e.g., analog logfilename. A - sign or the word
stdin is interpreted as standard input: this is useful on Unix
systems for constructing pipes. All logfiles must be within your computer's
file system (on disk, or at least mounted under Unix, or on a mapped drive
under NT) -- analog won't use FTP or HTTP to fetch them from the internet.
In the Mac version, you can also analyse a particular single logfile by
dragging it onto the analog icon.
You can have several LOGFILE commands. You can include wildcards in
the logfile name (but not necessarily in the directory name: this is
system-dependent), and you can use a list of logfiles separated by commas
(without spaces). So the following commands would tell analog to read
logfile1, c:\logs\logfile2, and all files ending in
.log:
LOGFILE logfile1,*.log
LOGFILE c:\logs\logfile2
Or if you were on a Mac, you might use something like
LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*"
You can also use the special command
LOGFILE none
to erase the list of logfiles specified so far.
If the name of a logfile doesn't include a directory, it will be
looked for wherever analog expects to find logfiles. (This
location is built in when the program is compiled.) For example, on
Windows it would be in the same folder as the analog executable. So to
include a logfile from the current working directory, for example on Unix, you
need to use ./logfile.log instead of just logfile.log.
The LOGFILE commands are cumulative, except that any logfiles
on the command line or in configuration files specified on the command line
override any in the default configuration
file or configuration files loaded from there, and are themselves
overridden by any in the mandatory
configuration file or configuration files loaded from there.
Usually you don't need to worry about this, and it will do what you expect!
(Actually I should have said "logfiles or cache files" -- but we'll
get on to that later).
Analog knows about several different types of logfile. By default it will
attempt to see if your logfile is of one of the types it knows about, based
on the first line. The types it can usually diagnose are the common log
format, the NCSA combined format, referrer log and browser log, the W3
extended log format, the Microsoft IIS format, the Netscape format, the
WebSTAR format and the WebSite format. Examples of all
these formats are given at the end of this section. If you have
debugging on, analog will report what type of
logfile it thinks yours is.
If your logfile is not in one of the standard formats, you will probably still
be OK, because it is possible to tell analog about other formats using a
LOGFORMAT command. This is explained in the
next section. But most users don't
ever need to know about this because they have logfiles in a standard
format. So the best thing to do is just to try analysing your logfile and see
if analog will understand it. If it does, you don't need to worry about
LOGFORMATs.
If analog can't understand your logfile, it will
warn you that it can't detect
the format, or possibly that it found a lot of corrupt lines. There are
basically four reasons why this might happen:
- Some log formats are not very well designed and analog can't analyse
them reliably. In this case it will give up, usually with a helpful
message, rather than risk doing a bad job. For example, you might get
"Logfile with ambiguous dates" or "Time
without date." In this case you should read the
notes on all the built-in formats below where
some common problems with those formats are described.
- Since analog tries to deduce the format based on the first line of the
logfile, it could just be that the first line is corrupt. In this case,
you could tell analog the format, or you could just fix the first line.
- For the same reason, if the format changes midway through the log,
analog will count the remaining lines as corrupt. In this case, you will
find that your report contains a partial analysis but with a large
number of corrupt lines too. You will need to give analog two
LOGFORMAT commands to tell it about the two different formats.
- Finally, some logfiles really aren't in one of the standard formats.
In this case you will need to read the next
section and learn how to tell analog about your format.
If you can't see what's wrong with your logfile, you can specify
DEBUG ON, and analog will report
where each line was corrupt.
There's also a second argument to the
LOGFILE command, which specifies a
prefix to add to all the filenames in that logfile. This is useful if
you've got several different servers or virtual hosts, when the same
filename may occur on each of the servers. The argument can contain a
%v, and the name of the virtual host from each line of the logfile
will then be inserted at that point. For example,
LOGFILE log1,log2 http://www.%v.mydomain.com
would translate a filename /file.html with virtual host
host1 in log1 or log2 to
http://www.host1.mydomain.com/file.html. If you are using the second
argument to the LOGFILE command, you will probably want to use
the SUBDIR command as well.
If %v is included in the argument and the logfile line
doesn't have a virtual host, that line will be marked as corrupt. If
VHOSTLOWMEM 3 is specified, the
%v's will not be translated and will just appear as %v
in the output.
It is often convenient to store logfiles compressed
to save disk space.
Analog on the Mac can read logfiles compressed using gzip. And
analog on Unix and Win32 can read compressed logfiles
provided that you use an UNCOMPRESS command to say how to
uncompress them. You need to supply the types of file that you want to
uncompress in a comma-separated list, together with the name of a command that
will uncompress the files to standard output (rather than to a file). For
example, on Unix you might use
UNCOMPRESS *.gz,*.Z /usr/bin/gzcat
whereas on Windows NT, you might use
UNCOMPRESS *.gz ("c:\Program Files\gzip\gzip" -cd)
This would be a suitable command to include in the
default configuration file.
If analog determines when it starts to uncompress a logfile that that file
isn't wanted for the analysis, two undesirable things can happen. Either the
program might pause until the logfile is fully uncompressed, or there might be
a "broken pipe" error reported. This is system dependent, and out
of analog's control.
Here is a summary of the various logfile formats which analog knows about.
To illustrate them, I have used the same (fictional) request as it might be
recorded in the different formats.
The common logfile format is written by most
servers. Its lines look like
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
(except all on one line).
Some versions of Microsoft software have a buggy version of this with an extra
quote mark before the HTTP like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ "HTTP/1.0" 200 1243
Analog will understand these, but (as with any two formats) it will reject
lines if the format changes half way through.
The NCSA referrer log looks like
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/
and the browser (or agent) log looks like
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)
In the referrer log, the date can be omitted.
The NCSA combined log is the same as the common log,
except that
it has the referrer and browser on the end in quotes, like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0"
200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
(except all one line). If you are using the Apache server, you can generate
this with the mod_log_config module, using the Apache command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""
It is usually better to use the combined log than separate logs, because it
stores more information in less space.
The Microsoft IIS logfile looks like
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
(except all on one line).
However, the format is extremely badly designed, in that the date follows local
conventions: in other words, in North America the above example would have the
date 12/25/98 instead. Analog will diagnose which form the logfile
is in if possible: but if both the date and the month are at most 12, there
is no way to tell which format it is. In this case, it will advise you to use
the command
LOGFORMAT MICROSOFT-NA for North American date
format, or LOGFORMAT MICROSOFT-INT for international date format.
In some countries, the date will not be in either of these formats, in which
case you need to write your own LOGFORMAT command.
There are also various third-party extensions to the Microsoft format to
include, for example, the browser and referrer. But they all do it in
different ways, so analog can't automatically diagnose them, and again, you
need to write a LOGFORMAT command for them.
The WebSite format looks like
12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/
http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178
(except all on one line, and with the fields separated by tabs). It suffers
from the same problem with ambiguous dates as the IIS logfile (above), so
again you might have to use LOGFORMAT WEBSITE-NA or LOGFORMAT
WEBSITE-INT, or even have to write your own LOGFORMAT command.
The W3 extended log, the Netscape log, and the WebSTAR
log can be recognised
because they must include at or near the top a line telling analog
what format to expect on subsequent lines. (They may also contain later lines
changing the format). If the header line is missing, analog won't be able to
interpret the subsequent lines and so won't be able to analyse the logfile. In
this case, you will have to either replace the missing header or use a
LOGFORMAT command to tell analog your format.
If analog finds that the header line is corrupt, it
will usually tell you what was wrong with it. The most common problem is that
you're not allowed the time without the date or vice
versa -- in particular, having the date just at the top of the logfile is not
sufficient; you must have it on each line. By default, Microsoft servers
produce extended
logs with the date only at the top. But if the date changes during the
logfile, the server doesn't then write a new date line. For this reason analog
can't analyse such logfiles safely. There are some programs on the
helper applications page to put the date on each
line. If you already have such a logfile you might want to use one of these
programs, but they have to assume that the date doesn't change during the
logfile, so it would be safer to tell your server to log the date on every
line in future.
The extended log is described at
http://www.w3.org/TR/WD-logfile.html.
Its header line looks like
#Fields: date time cs-uri
In the rest of the logfile, the fields can be separated by spaces or tabs.
Remember the logfile must contain the date as well as the time on every
line -- see above.
There is also Microsoft's attempt at the extended format -- unfortunately they
didn't read the spec., so they didn't enclose the browser and referrer in
quotes, they replaced spaces in the browser name with +'s, and they
put the time taken to serve the request in milliseconds instead of seconds.
And there is WebSTAR's attempt which is very nearly right except that they
erroneously used the CS-HOST field as the client hostname instead
of the server hostname. Analog will understand all of these versions.
Extended logs always record the time in GMT, so you will probably need to use a
LOGTIMEOFFSET command to
convert to your local timezone.
The WebSTAR format is described at http://www.starnine.com/webstar/docs/ws4manual.3f.html.
It has a header line like
!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAME
In the rest of the logfile, the fields are separated by tabs. The WebSTAR
server also records the time in GMT, so again you will probably need to use a
LOGTIMEOFFSET command to
convert to your local timezone. Some other Mac servers also use the WebSTAR
format, or something looking like it. Analog will understand these too.
Finally, the Netscape header line looks like
format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%"
%Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%
This section is about how to tell analog the format of your logfile. Most
people don't need to do this because analog can detect the format automatically
-- try it first and see! But if it can't, and you need to specify the log
format explicitly, here is how to do it.
The basic command to specify a log format looks like
LOGFORMAT format
-- we'll discuss what the formats can be in a minute. Or if you are using the
Apache server, you will probably find it more convenient to use
APACHELOGFORMAT format
instead.
The LOGFORMAT and APACHELOGFORMAT
commands only apply to logfiles specified with a LOGFILE command
later in the same configuration file. So you must put the
LOGFORMAT above the LOGFILE to which it refers. This
way, different logfiles can have different formats, like this:
LOGFILE log0
LOGFORMAT format1
LOGFILE log1
LOGFORMAT format2
LOGFILE log2
LOGFILE log3
In this example, log1 is in format1, log2 and
log3 are in format2, and log0 isn't in either
format -- analog will try and detect which format it's in.
The APACHELOGFORMAT command is followed by the
LogFormat from your Apache httpd.conf file. For example,
if your httpd.conf contained the following lines:
LogFormat "%h %l %u %t %v \"%r\" %>s %b" myformat
CustomLog /var/log/apache/access.log myformat
then your analog.cfg should contain
APACHELOGFORMAT (%h %l %u %t %v \"%r\" %>s %b)
LOGFILE /var/log/apache/access.log
(Use parentheses instead of quotes round the argument if the argument already
contains quotes.) Analog
understands all Apache log formats, with the exception that it won't parse
Apache's "%...{format}t" construction for customised
times: if you have this construction, you will have to use ordinary
LOGFORMAT instead.
The possible formats for use with the
LOGFORMAT command are of two
types. First there are some symbolic words, and then there are log format
strings. We'll look at the words first.
There are format words for all the built-in formats
analog knows about.
You might need one of these words if your logfile is in a standard format, but
analog can't detect which format it's in for some reason; for example, maybe
the first line is corrupt; or maybe analog can't tell whether you're using
North American or international dates. So for example
LOGFORMAT COMMON
will select common format; you can also have COMBINED,
REFERRER, BROWSER, EXTENDED,
MICROSOFT-NA (North American date format),
MICROSOFT-INT (international date format),
WEBSITE-NA, WEBSITE-INT,
MS-EXTENDED (Microsoft's attempt at extended format),
WEBSTAR-EXTENDED (WebSTAR's version of extended format),
MS-COMMON (a buggy version of common format in some versions
of Microsoft software), NETSCAPE or WEBSTAR. All these
formats were defined at the end of the previous
section. You can also use the special word AUTO to return to
automatic detection.
If your logfile is not in one of the recognised
formats, you can tell analog
about your format using a log format string. You only ever need this if your
logfile has lines which are not in one of the standard formats. (And even if it
isn't in a standard format, if you're using the Apache web server, you will
find APACHELOGFORMAT easier.)
The format string consists of a template for the logfile line, with the
various fields and special characters replaced by codes as follows. Please
note that these codes are case sensitive -- for example, %b is
completely different from %B!
- %S
- host (the client hostname, or address of the computer
making the request)
- %s
- numerical IP address of client (if recorded in a
separate field; used when %S is empty)
- %r
- file requested
- %q
- query string (part of filename after ?, if
recorded in a separate field)
- %B
- browser
- %A
- browser with +'s instead of spaces
- %f
- referrer
- %u
- user (tip: a cookie can usefully be defined as
%u too)
- %v
- virtual host (the server hostname, also called the
virtual domain)
- %d
- day of the month
- %m
- month in digits
- %M
- month, three letter English abbreviation
- %y
- year, last two digits
- %Y
- year, four digits
- %h
- hour of the day
- %n
- minute of the hour
- %a
- a or A for am, or p or
P for pm, if %h is in the 12-hour clock. (So to
match "am" you need %am and to match "AM"
you need %aM)
- %U
- "Unix time" (seconds since beginning of 1970,
GMT). If it includes decimals, use %U.%j
- %b
- number of bytes transferred
- %t
- processing time in seconds
- %T
- processing time in milliseconds
- %c
- HTTP status code
- %j
- junk: ignore this field (field can be empty too)
- %w
- white space: spaces or tabs
- %W
- optional white space
- %%
- % sign
- \n
- new line
- \t
- tab stop
- \\
- single backslash
So for example, the common log format, which looks like
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
(except all on one line)
could be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)
In other words, it's just the sample line but with the hostname replaced by
%S, the username by %u etc.
(The parentheses are needed because the argument contains spaces.)
Or take another example: if you had lines which looked like
Fri 25/12/98 5:45pm, /~sret1/, jay.bird.com, 200, 1243,
http://www.site.com, Mozilla/2.0 (X11; I; HP-UX A.09.05)
(all on one line again), you could use the format
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)
Remember: if you have trouble writing a LOGFORMAT string, you can
turn debugging on, and analog will
report where each line was corrupt. If you still have trouble, you can write
to the analog-help mailing list.
A logfile can sometimes have lines in several different formats. So you can
specify several LOGFORMAT commands in a row, and they will all
apply to the next logfile. This is also useful if the format of your logfile
changes half way through. So in this example:
LOGFORMAT COMMON
LOGFORMAT COMBINED
LOGFILE log1
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)
LOGFILE log2
LOGFILE log3
log1 has lines in both common and combined format, whereas
log2 and log3 have lines just in the format in the
previous example.
If you specify several formats, analog tries to match each line to the first
format first, then if that fails the next, and so on, so the order of the
formats is important. Usually you want to specify the most common one
first, to minimise the time spent trying to match lines to inappropriate
formats.
I suggested above that any logfile which
doesn't have a LOGFORMAT
command earlier in the same configuration file is auto-detected. But this
isn't quite true. Actually such logfiles get a special format called the
default log format. The default format starts off as auto-detection,
but you can change it if you want with the DEFAULTLOGFORMAT
command. This command works exactly the same as the LOGFORMAT
command -- it understands the same formats, and if you have several
DEFAULTLOGFORMAT commands, they accumulate in the same way. The
difference is that they don't need to be put in any particular place. (There
is also APACHEDEFAULTLOGFORMAT, which has the same effect but uses
the Apache LogFormat strings.)
So let's go back to the first example:
LOGFILE log0
LOGFORMAT format1
LOGFILE log1
LOGFORMAT format2
LOGFILE log2
LOGFILE log3
Here log0 actually gets the default log format. If there are no
DEFAULTLOGFORMAT commands, the default will be auto-detection. But
if there are DEFAULTLOGFORMAT commands, even in another
configuration file, that will be the format of log0.
The times you need to use the DEFAULTLOGFORMAT instead of the
LOGFORMAT are if you want to change the format of logfiles which
aren't given in a LOGFILE command -- for example, ones specified on
the command line, or dragged onto the program icon on a Mac, or compiled in.
It is also useful to use the DEFAULTLOGFORMAT if your logfiles are
always in the same format, so that you don't have to worry about putting in
enough LOGFORMATs in the right places.
A couple more technical details and tips about
LOGFORMAT commands.
The "Unix time", %U, is always recorded in GMT. So you
will probably need to use a
LOGTIMEOFFSET
command to convert to your local timezone. Also, it's just the integer part of
the time, so if you have decimals you will have to use %U.%j .
The log formats which analog can handle are those which are known as
instantaneously decipherable: in practice, this means that the character
which terminates a string can never occur in the string. So for example, in
common format, which looks like
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)
if the hostname ever contained a space, the line would be marked as corrupt,
because analog terminates the host at the first space, not at the
first occurrence of space-dash-space, and then the rest of the line wouldn't
match. Of course, hostnames should never contain spaces, so this shouldn't be a
problem. There are a couple of other restrictions: if there is any date or
time information, then the year, month, date, hour and minute must all be
present: and the same information may not occur twice in the format (so you
can't have both %m and %M, for example, because these
both represent the month; make one of them a %j to have it ignored).
Sometimes you need to read one of the fields in a
logfile, but not analyse it.
For example, if you have a separate common log and referrer log, the referrer
log might look like
http://guide-p.infoseek.com/Titles -> /~sret1/analog/
But the requests for /~sret1/analog/ would already have been
counted when reading the main logfile, so you don't want to count them again
now. You get round this by specifying a * in that item in the
format string, like this:
LOGFORMAT (%f -> %*r)
A tip: sometimes it is more efficient to specify two or more adjacent fields
to ignore with a single %j, as long as the whole group ends with a
recognisable character. So common format is more efficiently specified as
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
-- in the date and time [25/Dec/1998:17:45:35 +0000], the seconds
and the timezone can be ignored with a single %j, extending until
the close-bracket.
Another tip: %j can also be used to ignore whole lines, rather than
just fields analog doesn't use. For example, the extended log format ignores
lines beginning with # by using
LOGFORMAT #%j
and the Microsoft format ignores lines corresponding to FTP requests with
LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j)
If those formats had not been used, the lines would have been incorrectly
marked as corrupt.
Finally, both for reference and as examples, here is
a list of all the fixed formats that analog understands, together with the
example lines from the previous section
and their built-in definitions (split over two lines where necessary).
- Common format, LOGFORMAT COMMON
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
- Microsoft common format,
LOGFORMAT MS-COMMON
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ "HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%w"HTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
- Combined log, LOGFORMAT COMBINED
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200
1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b "%f" "%B")
- Referrer log, LOGFORMAT REFERRER
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/
or http://www.site.com/ -> /~sret1/
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r)
LOGFORMAT (%f -> %*r)
- Browser log, LOGFORMAT BROWSER
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B)
- Microsoft log, North American dates,
LOGFORMAT MICROSOFT-NA
192.64.25.41, -, 12/25/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %m/%d/%y, %h:%n:%j, W3SVC%j, %j, %v,
%T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j)
- Microsoft log, international dates,
LOGFORMAT MICROSOFT-INT
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %d/%m/%y, %h:%n:%j, W3SVC%j, %j, %v,
%T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %d/%m/%y, %h:%n:%j, %j)
- WebSite log, North American dates,
LOGFORMAT WEBSITE-NA
12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/
http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178
LOGFORMAT (%m/%d/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
- WebSite log, international dates,
LOGFORMAT WEBSITE-INT
25/12/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/
http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178
LOGFORMAT (%d/%m/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
The extended log, Netscape log and WebSTAR log don't have any built-in
formats: analog constructs their formats from their header lines.
After analog has read each logfile entry, it then applies
aliases to each
of the items. First, if you have a case insensitive filesystem, analog
converts the filename to lower case. Usually analog assumes that Unix and BeOS
filesystems are case sensitive and other systems are case insensitive. You
might want to override its choice, if, for example, you have transferred files
from one machine to another, so as to use the convention on the original
machine. You can do this by the commands
CASE INSENSITIVE
CASE SENSITIVE
There are similar commands for usernames, if your logfile records these. By
default, usernames are always case insensitive, but you can specify
USERCASE SENSITIVE
to override this.
Next it applies built-in aliases to each item. For
example, it knows that
%7E in a filename or referrer is equivalent to ~ and
translates it accordingly. It also strips off the directory suffix from any
filenames which have it. This suffix is normally index.html, but
you can specify another one instead with a command such as
DIRSUFFIX default.htm
(You can only have one DIRSUFFIX.) There are other built-in
aliases for other items: for example, hostnames are converted to lower case
at this point.
After this, it applies user-specified aliases to
each item. These aliases are
useful if, for example, you know that two filenames correspond to the same
file, or if you want to translate local hostnames to their internet
equivalents. You specify aliases by commands like
FILEALIAS /football.html /soccer.html
HOSTALIAS lion lion.statslab.cam.ac.uk
There is also the special command FILEALIAS none, which cancels
any other file aliases which might have been specified.
The alias commands for the other items are called BROWALIAS,
REFALIAS, USERALIAS and VHOSTALIAS.
Only one alias is ever applied to any item. So after
FILEALIAS /football.html /soccer.html
FILEALIAS /soccer.html /brazil.html
the file /soccer.html would get translated to
/brazil.html, but /football.html would only get
translated to /soccer.html and would not see the second alias.
You can also use wildcards in ALIAS commands: ? matches
any one character and * matches any number of characters (including
none).
And on the right-hand side, you can use $1, $2 etc. to
represent the parts of the original name matched by the *'s. (You
can use $$ to get an actual $ on the right-hand side.)
As a special abbreviation, if there is exactly one * on the
left-hand side, then a * on the right-hand side can be used to
represent $1. So, for example,
FILEALIAS /*/football/* /soccer/
would translate /sport/football/rules.html to just
/soccer/, but either of
FILEALIAS /*/football/* /$1/soccer/$2 # or
FILEALIAS /sport/football/* /sport/soccer/*
would translate /sport/football/rules.html to
/sport/soccer/rules.html.
Analog's *'s are un-greedy: if there are two possible ways of
matching, the part of the expression on the left matches as little as
possible. This is more often what you want. But it contrasts with Perl's
regular expressions, for example. (Oh, two consecutive *'s are
completely useless, but if you try it they are collapsed into one before
counting the $1, $2, etc.)
The behaviour of FILEALIAS and REFALIAS can be
slightly unintuitive if the file has search
arguments.
A warning to Unix users: if you put an ALIAS command on the
command line with +C, the shell
may try and expand $1 etc., which is not what you want. To stop
the shell doing this, put the command in single quotes instead of double
quotes.
There is another set of alias commands, called
output aliases. There
is one of these for each of the reports, except the time reports. Instead of
acting on items when the logfile is being read, they act on individual lines
in the output. So for example, the command
TYPEALIAS .txt ".txt (Plain text files)"
would provide an explanation of that line in the file type report.
There can be some confusion between some normal alias and
output alias commands. For example, what is the difference between
HOSTALIAS and HOSTREPALIAS? In fact, there are
several differences, resulting from the different times at which the aliases
are processed. The HOSTALIAS applies to the host items, but
the HOSTREPALIAS only applies to the lines in the Host
Report. This means that the HOSTALIAS also affects the other
reports which use the hosts, such as the Domain Report, whereas the
HOSTREPALIAS only affects the Host Report. (Similarly,
DOMALIAS only applies to the Domain Report; it doesn't change the
domain for other reports.)
Another difference is that the
HOSTREPALIAS applies separately to each line of the Host
Report. This means that if two separate hosts translate to the same thing in a
HOSTALIAS command, they will become one host for all the reports.
But if one were to use the same HOSTREPALIAS commands, they would
still be two hosts, and would still be listed separately in the Host Report,
but would just happen to have the same name in that report.
So in summary, when should you use each command? HOSTALIAS would
normally be used if a single actual host had two different names, whereas
HOSTREPALIAS would normally be used to annotate or clarify the
Host Report.
The full list of output aliases is REQALIAS, REDIRALIAS,
FAILALIAS, TYPEALIAS, DIRALIAS,
HOSTREPALIAS, REDIRHOSTALIAS, FAILHOSTALIAS,
DOMALIAS, ORGALIAS, REFREPALIAS,
REFSITEALIAS, REDIRREFALIAS, FAILREFALIAS,
BROWREPALIAS, BROWSUMALIAS, OSALIAS,
VHOSTREPALIAS, REDIRVHOSTREPALIAS,
FAILVHOSTREPALIAS, USERREPALIAS,
REDIRUSERALIAS and FAILUSERALIAS.
There is one known bug with the output aliases. The report is sorted
before the alias is applied. This means that if the
SORTBY for the report is set to
ALPHABETICAL, then the report will not be sorted correctly.
You can also use regular expressions in the
ALIAS commands.
Sorry, I'm not going to teach you how to use regular expressions
here if you don't already know: if you're on Unix try typing man
perlre or man regex or man grep. There are lots of
implementations of regular expressions. The ones which analog uses are
Perl-syntax regular expressions. In general, these are a superset of the
extended regular expressions used by Unix egrep or GNU
grep -E.
You include regular expressions in an ALIAS command by prefixing
the left-hand side of the alias with "REGEXP:". Or you can
specify a case-insensitive match, like Unix egrep -i, by using
"REGEXPI:". (It's automatically case-insensitive for many
items, such as hostnames, or filenames if you have specified
CASE INSENSITIVE.)
On the
right-hand side of the alias you can use $1, $2 etc. to
represent the first, second etc. bracketed expression on the left-hand side,
counting in order of the left brackets. (Again, you can't put $1,
$2 etc. on the command line unless you put them in single quotes.)
Regular expressions match if they match just part of the string. If you want
them to have to match the whole of the string, you have to anchor them to the
ends of the string with ^ and $.
For example,
REQALIAS REGEXP:^(/~(.+?)/.*) "[$2] $1"
would translate /~sret1/backgammon/rules.html
to
[sret1] /~sret1/backgammon/rules.html
in the Request
Report. Or
HOSTALIAS REGEXP:^([^.]*)$ $1.mycompany.com
would add .mycompany.com to all hostnames not containing a dot.
(See the FAQ for a discussion about whether
this is a good idea.)
Regular expressions are greedy: if there are two possible ways of matching,
the part of the expression on the left matches as much as possible.
After aliasing each item, analog decides whether that item is wanted or not.
The whole line is only counted if all the items are wanted.
Whether an item is wanted or not is determined by INCLUDE and
EXCLUDE commands specified by the user. These commands can be used
to exclude requests from your local users, for example, or to analyse only
files in a subdirectory. For example
HOSTEXCLUDE mycomputer.myisp.com
would exclude all requests by that computer from the statistics.
The rule for determining whether an item is included or excluded is as
follows. All the INCLUDE and EXCLUDE commands for that
item are considered one by one in order, and the item is included or excluded
according to the last command it matched. Items which don't match any of
the INCLUDE or EXCLUDE commands are included if the first
command was an exclusion, and excluded if the first command was an inclusion.
For example, the configuration
FILEINCLUDE /~sret1/*
FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/*
FILEINCLUDE /~sret1/backgammon/*.gif
would instruct the program to examine only my files, excluding my
backgammon and analog files, but including gifs in my backgammon directory.
On the other hand,
FILEEXCLUDE /~sret1/*/img/*
would analyse all files, except for images in my various directories.
(If you get confused with all the inclusions and
exclusions, remember that you can always use
SETTINGS ON
to see what the options you have specified represent.)
Note that inclusions and exclusions can contain any number of wildcards, and
can be lists separated by commas (but no spaces).
The full list of these commands is HOSTINCLUDE and
HOSTEXCLUDE; FILEINCLUDE and FILEEXCLUDE;
BROWINCLUDE and BROWEXCLUDE; REFINCLUDE and
REFEXCLUDE; USERINCLUDE and USEREXCLUDE;
VHOSTINCLUDE and VHOSTEXCLUDE; and
STATUSINCLUDE and STATUSEXCLUDE.
Some notes on these commands.
Because the inclusions and exclusions take place after the aliasing,
the name you must use is the aliased name. (In the absence of
output alias commands, this is
the name of the item in the output.)
Sometimes a line doesn't contain a particular sort of
item, either because there is no field reserved for it on the line, or because
the browser didn't send it for that request, or because it was present but
corrupt. You can include or exclude these lines by making a
special blank entry in the INCLUDE or EXCLUDE
command. For example,
USERINCLUDE jim
USERINCLUDE ""
would include lines from user jim and lines without any user
specified.
The behaviour of REQINCLUDE and REFINCLUDE can be
slightly unintuitive if the file has search
arguments.
You can also use regular expressions for the
inclusions and exclusions by prefixing the expression with
"REGEXP:" or "REGEXPI:". I've
already described this at length in the context of aliases, so you can
look there for all the details.
A regular expression must be on a line on its own, not within a
comma-separated list.
The STATUSINCLUDE and
STATUSEXCLUDE commands are slightly different from the rest.
They work on HTTP status codes. (These codes are defined in the
HTTP spec, and
viewable in the Status Code Report. But if you don't already know about them,
you really don't want to use these commands anyway!) The arguments to the
commands are a comma-separated list of ranges. One end of the range can be
blank, meaning from the first, or to the last, status code. For example
STATUSINCLUDE 200-206,304,500-
would mean only look at lines with status codes 200-206, 304 or 500-599.
Some people want to exclude status code 304 (Not Modified)
to stop those requests appearing in the Request Report. But there is a better
solution. By default, analog counts code 304 as a successful request, because
it assumes that the cached version of the document is then presented to the
user. But you can count it as a redirected request with the command
304ISSUCCESS OFF
Again, if you don't understand this, stick with the default.
There is also one other pair of commands which belongs in
this category,
namely the FROM and TO commands. These specify a time
period to restrict the analysis to. The simplest usage of these commands is
FROM yyMMdd or FROM yyMMdd:hhmm, where yy
represents the last two digits of the year (analog assumes that the year is
between 1970 and 2069), MM represents the month,
dd is the date, hh the hour, and mm the
minute. So, for example, to analyse only requests from
1st July 1999 to 1pm on 15th June 2000 I would use the configuration
FROM 990701
TO 000630:1300
Alternatively, each of the components can be preceded by + or
- to represent time relative to the time at which the program was
invoked. In this case, the date can have more than 2 digits. This allows
constructions like
FROM -01-00+01 # from tomorrow last year
TO -00-0131 # to the end of last month (OK even if last month
# didn't have 31 days)
FROM -00-00-112
TO -00-00-01 # statistics for the last 16 weeks
FROM -00-00-00:-06+01 # statistics for the last 6 hours
There are command line abbreviations +F and +T
for the FROM and TO commands; for example,
+T-00-00-01:1800 looks at statistics until 6pm yesterday.
-F and -T turn off the from and to, as do FROM
OFF and TO OFF.
There are also INCLUDE and
EXCLUDE commands for most of
the reports. These exclude individual lines from particular reports. So, for
example, the command
REFREPEXCLUDE http://your.site.com/*
would exclude your internal referrers from the Referrer Report. However, it
would not exclude them from the Failed Referrer Report, the Referring Site
Report, etc. (you need to use FAILREFEXCLUDE,
REFSITEEXCLUDE etc. for that); nor would it prevent other analysis
of logfile lines with those referrers, as REFEXCLUDE would. Also
REFREPEXCLUDE would include the referrers in the "not
listed" line at the bottom of the report.
The full list of these commands is
REQINCLUDE and REQEXCLUDE;
REDIRINCLUDE and REDIREXCLUDE;
FAILINCLUDE and FAILEXCLUDE;
TYPEINCLUDE and TYPEEXCLUDE;
DIRINCLUDE and DIREXCLUDE;
HOSTREPINCLUDE and HOSTREPEXCLUDE;
REDIRHOSTINCLUDE and REDIRHOSTEXCLUDE;
FAILHOSTINCLUDE and FAILHOSTEXCLUDE;
DOMINCLUDE and DOMEXCLUDE;
ORGINCLUDE and ORGEXCLUDE;
REFREPINCLUDE and REFREPEXCLUDE;
REFSITEINCLUDE and REFSITEEXCLUDE;
SEARCHQUERYINCLUDE and SEARCHQUERYEXCLUDE;
SEARCHWORDINCLUDE and SEARCHWORDEXCLUDE;
INTSEARCHQUERYINCLUDE and INTSEARCHQUERYEXCLUDE;
INTSEARCHWORDINCLUDE and INTSEARCHWORDEXCLUDE;
REDIRREFINCLUDE and REDIRREFEXCLUDE;
FAILREFINCLUDE and FAILREFEXCLUDE;
BROWSUMINCLUDE and BROWSUMEXCLUDE;
BROWREPINCLUDE and BROWREPEXCLUDE;
OSINCLUDE and OSEXCLUDE;
VHOSTREPINCLUDE and VHOSTREPEXCLUDE;
REDIRVHOSTREPINCLUDE and REDIRVHOSTREPEXCLUDE;
FAILVHOSTREPINCLUDE and FAILVHOSTREPEXCLUDE;
USERREPINCLUDE and USERREPEXCLUDE;
REDIRUSERREPINCLUDE and REDIRUSERREPEXCLUDE;
and FAILUSERINCLUDE and FAILUSEREXCLUDE.
The inclusion or exclusion applies to the
unaliased name, if you are doing any output
aliases.
You can also use the symbolic word pages in suitable
INCLUDE and EXCLUDE commands; one very common command is
REQINCLUDE pages
to include only pages in the request report.
There are some miscellaneous INCLUDE and
EXCLUDE commands which I'll describe now. First, analog determines
which files should count as pages (and thus which requests
count as page requests) using an INCLUDE/EXCLUDE
pair called PAGEINCLUDE and PAGEEXCLUDE.
By default, (case insensitive) *.html and *.htm,
and directories (*/) count as pages. But you
change the list by commands like
PAGEINCLUDE *.asp
PAGEEXCLUDE /sret1.html
I.e., *.asp are pages, but /sret1.html
isn't. (If the file has search arguments, the
PAGEINCLUDE and PAGEEXCLUDE are reckoned just on the
part of the filename before the question mark.)
In some of the reports, analog can link to the files
which it's listing. You can specify exactly which files are linked to with the
LINKINCLUDE family of commands. For example,
REQLINKINCLUDE pages,*.pdf
would link to pages and PDF files in the Request Report. The full set of these
commands is REQLINKINCLUDE and REQLINKEXCLUDE
(Request Report), REDIRLINKINCLUDE and REDIRLINKEXCLUDE
(Redirection Report), FAILLINKINCLUDE and FAILLINKEXCLUDE
(Failure Report), REFLINKINCLUDE and REFLINKEXCLUDE
(Referrer Report), REDIRREFLINKINCLUDE and
REDIRREFLINKEXCLUDE (Redirected Referrer Report), and
FAILREFLINKINCLUDE and FAILREFLINKEXCLUDE
(Failed Referrer Report).
Note that the target of the links is also affected by the
BASEURL command.
Finally, there is a pair of commands called
ROBOTINCLUDE and ROBOTEXCLUDE, which determine which
browsers count as "robots" in the Operating System Report. For
example,
ROBOTINCLUDE Googlebot/*
There is one final set of INCLUDE and EXCLUDE commands
to include or exclude the search arguments at the end of URLs. But there are
some slightly complicated issues surrounding those, so they deserve a
new section.
Sometimes a URL contains arguments after a question mark. For example, the URL
/cgi-bin/script.pl?x=1&y=2
runs the /cgi-bin/script.pl program with arguments x=1
and y=2. (Sometimes the server records these arguments in a separate
field in the logfile, but if so you can use the %q field in the
LOGFORMAT command, and analog
will translate the filename to the above format).
You can tell analog either to read or to ignore the arguments using the
commands ARGSINCLUDE and ARGSEXCLUDE which we'll discuss
in a minute. But by default, all arguments are
read, and as this is usually what you want, you don't usually need those
commands.
You don't always see the arguments in the reports, even if they're being
read, because analog doesn't show them if there aren't enough of them. In
order to see them, you have to set the corresponding
ARGSFLOOR parameter low
enough.
Also note that within a report, the search arguments are listed immediately
under the file to which they refer. This temporarily interrupts the normal
order of the files. It may be clearer if you turn the
N column on.
Assuming that the arguments are being read, analog treats the file
/cgi-bin/script.pl?x=1&y=2 as a different file from
/cgi-bin/script.pl (or from
/cgi-bin/script.pl?y=2&x=1 for that matter). It doesn't look
like that in the Request Report because you see a grand total for
/cgi-bin/script.pl with all its different arguments. But it matters
if you want to do inclusions and exclusions or
aliases on the file.
The reason is that, for example, the command
FILEINCLUDE /cgi-bin/script.pl
doesn't match the file /cgi-bin/script.pl?x=1&y=2. To
match that, you would have to use something like
FILEINCLUDE /cgi-bin/script.pl*
instead. Similarly
FILEALIAS /cgi-bin/script.pl /script.pl
will change /cgi-bin/script.pl itself, but not
/cgi-bin/script.pl?x=1&y=2. You might want to use something
like
FILEALIAS /cgi-bin/script.pl?* /script.pl?$1
as well. (However, PAGEINCLUDE and PAGEEXCLUDE always
refer to the part of the filename before the question mark.)
Conversely, because in the Request Report files with arguments are only
included if their parent file is included, you can't just
REQINCLUDE /cgi-bin/script.pl?*x=1*
or you will end up with nothing listed. You have to
REQINCLUDE /cgi-bin/script.pl
as well.
The alternative is to tell analog not to read the
search arguments. There are commands called ARGSINCLUDE and
ARGSEXCLUDE, and REFARGSINCLUDE and
REFARGSEXCLUDE, to do this. They work the same as the
other INCLUDE and EXCLUDE
commands which we discussed in the previous section. So, for example, if the
command
ARGSEXCLUDE /cgi-bin/script.pl
were given, analog would ignore the arguments to that file, and so read
/cgi-bin/script.pl?x=1&y=2 as just
/cgi-bin/script.pl. On the other hand, if
ARGSINCLUDE /cgi-bin/script.pl
were specified, analog would read the arguments, and so treat
/cgi-bin/script.pl?x=1&y=2 as a different file from
/cgi-bin/script.pl.
REFARGSINCLUDE and REFARGSEXCLUDE are the same
for referrers.
Technical note: the check for whether the arguments should be included happens
before the filename has been subject to either built-in or user-specified
aliases. So you have to use the unaliased name,
exactly as it occurs in the logfile. For example,
ARGSINCLUDE /~sret1/script.pl won't match
/%7Esret1/script.pl even though they are really the same
file. It also means that you can't use "pages" in the
ARGSINCLUDE or ARGSEXCLUDE command, because we don't
know whether a file is a page until after it's been aliased.
There are related commands called
SEARCHENGINE and INTSEARCHENGINE. If you have referrers
with search arguments, usually
from search engines, you can tell analog which field corresponds to the search
term. It uses this information to compile the Search Query Report and the
Search Word Report. For example, consider the referrer
http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=carrot+cake
The search term is in the field q= so the appropriate
SEARCHENGINE command is
SEARCHENGINE http://www.altavista.com/cgi-bin/query q
or even better
SEARCHENGINE http://*altavista.*/* q
to allow for all their mirror sites in different countries.
The command INTSEARCHENGINE is the same for search engines within
your site. For example, you might have requests for files like
/cgi-bin/search?trm=chocolate+cake
in which case you would specify
INTSEARCHENGINE /cgi-bin/search trm
Sometimes a search engine has two or more possible fields for the search
term. In that case you can list all of them separated by commas, like this:
SEARCHENGINE http://*webcrawler.*/* search,searchText
The rest of this section is a bit technical, and you usually don't need to
worry about it. On a first reading, you probably want to
skip it.
I said previously that
%7E in a URL is automatically converted to ~, etc. In
fact this is only done to the ASCII-printable characters %20-%7E,
because these are the only characters that are the same in every character
set. (In fact, even that isn't true. Experts might want to know that
?, & and = aren't converted either, to
distinguish them from query-string delimiters: an encoded ?,
& or = is one that is not intended to be a
delimiter. Also % isn't converted, to avoid confusing
%25nm with %nm.)
But in the Search Query Report and Search Word Report it is useful to be able
to convert non-ASCII characters too, so that you can see the actual words
people typed, rather than get the %nm codes in place of all
accented letters. So in these reports analog also converts characters
%A0-%FF (if you are using an ISO-8859-* character set) or
%80-%FF (for other character sets, apart from ASCII).
However, there are reasons why you might not want this feature, and you can
turn it off with the command
SEARCHCHARCONVERT OFF
These reasons include:
- The character set in which the query was submitted to the search engine
may not be the same as that in which the page reached was written, or
that in which the analog output page is being written. So converting to
the character set of the analog output page may give garbage anyway.
This is particularly a problem with languages, such as Russian or
Chinese, which have two or more characters sets in common use. It is
also a problem for sites which host resources in many languages.
- Not all of the character positions correspond to printable characters in
every character set. Analog knows that %80-%9F are
non-printable in the ISO-8859-* character sets, but apart from that it
converts everything in %80-%FF. So you may end up with
non-printable characters in your output.
- I have no idea how well, if at all, this feature will work with multibyte
character sets (such as most East Asian languages). You will probably
find you want to turn it off in this case.
So far we have mainly discussed commands which control how analog reads
the logfiles. We now get on to commands for configuring the output.
First, you can change the style of the output using the
OUTPUT command. There
are five possible output styles, called HTML, PLAIN,
ASCII, LATEX and COMPUTER. HTML
produces web pages,
PLAIN produces plain text files, and ASCII is the same
as PLAIN except that it uses all ASCII characters (no accents etc.)
if possible. (This is because some applications don't understand accented
characters - for example, they're not always reliable over email).
LATEX produces LaTeX code which can be turned into beautiful
Postscipt if you have LaTeX and dvips installed. (Yes, I know it gives
overfull hboxes sometimes).
COMPUTER is a special format suitable for reading by a
computer (useful for reading into a spreadsheet, or post-processing with a
graphics package, for example).
There is a separate section about this
format later.
As well as a command like
OUTPUT PLAIN
you can also select PLAIN style with the command line argument
+a, and HTML with the command line argument
-a. You can also specify OUTPUT NONE
for no output, if you are producing a cache file.
Next, you can change the language of the output. There
are two ways to do
this. The usual way is to use the LANGUAGE command. For example,
the command
LANGUAGE FRENCH
will give you the output in French. The available languages at the moment are
ARMENIAN, BOSNIAN, BULGARIAN,
CATALAN, SIMP-CHINESE (GB2312 encoding),
TRAD-CHINESE (Big5 encoding), CROATIAN,
CZECH, DANISH, DUTCH,
ENGLISH, US-ENGLISH, FINNISH,
FRENCH, GERMAN, GREEK, HUNGARIAN,
ICELANDIC, ITALIAN,
JAPANESE, KOREAN,
LATVIAN, LITHUANIAN,
NORWEGIAN (Bokmål),
NYNORSK, POLISH, PORTUGUESE,
BR-PORTUGUESE, ROMANIAN,
RUSSIAN, SERBIAN, SLOVAK, SLOVENE,
SPANISH, SWEDISH, TURKISH and
UKRAINIAN.
All these languages were available in previous versions of analog,
but most have not yet been translated for version 5, so only a few are
available at the moment (see the list in the
What's new? section).
As new languages are translated, they will be added to the
analog home page.
The other way is to use the LANGFILE command. This is useful if you
want to download a new language from the
analog home page, or
if you want to translate one yourself, or even if you want to change some
words or phrases or the way the dates and times are formatted in the output.
The LANGFILE command tells analog in which file to find the various
words and phrases for a new language. For example, the command
LANGFILE guarani.lng # or
LANGFILE /usr/etc/httpd/analog/lang/guarani.lng
would read from that file.
If the name of the file doesn't include a directory, it will be
looked for wherever analog normally expects to find its language files.
Some languages also have domains files or
report descriptions files available. These are normally selected automatically
by the LANGUAGE command. But you can tell analog to use different
ones with the DOMAINSFILE and
DESCFILE commands. Also, some languages
have translations of the form interface or
configuration file.
If you want to translate another language, I would be delighted! Do
contact me first to make sure that no-one else is already translating the
same language. The file README.txt in the language directory, and
the English language file, contain some brief instructions for translating new
languages.
You can change which file the output goes to with
a command like
OUTFILE stats.htm
or with a command line argument like +Ostats.htm. If you use the
filename - or stdout, the output will go to standard
output, which is normally the screen, but Unix users might like to redirect it
to another file or even into a pipe. You can also use an absolute path name,
like
OUTFILE /usr/bin/httpd/htdocs/stats.html # Unix
OUTFILE "Hard Disk:Server Apps:WebSTAR:Analog:Report.html" # Mac
If the name of the OUTFILE doesn't include a directory, it will be
put wherever analog expects to put its output files. (This location is built
in when the program is compiled.) For example, on Windows it would be in the
same folder as the analog executable.
Sometimes it's convenient to include the date in the name of the
OUTFILE. You can do this by including the following codes in the
filename.
%D date of month
%m month name
%M month number
%y two-digit year
%Y four-digit year
%H hour
%n minute
%w day of week
So for example,
OUTFILE stats%y%M.html
will produce filenames like stats9905.html. The date used is the
TO date if one was specified, and
otherwise the time of the start of the program. It's always in English.
Next, you need to know how to turn the different reports on
and off. There are 44 different reports which analog can produce,
if your web server has been configured to record the necessary data in the
logfiles. Each one has a short name, and a code letter or number, as
follows. (Note that the code letters are case sensitive:
Z is quite different from z, for example).
x GENERAL General Summary
1 YEARLY Yearly Report
Q QUARTERLY Quarterly Report
m MONTHLY Monthly Report
W WEEKLY Weekly Report
D DAILYREP Daily Report
d DAILYSUM Daily Summary
H HOURLYREP Hourly Report
h HOURLYSUM Hourly Summary
w WEEKHOUR Hour of the Week Summary
4 QUARTERREP Quarter-Hour Report
6 QUARTERSUM Quarter-Hour Summary
5 FIVEREP Five-Minute Report
7 FIVESUM Five-Minute Summary
S HOST Host Report
l REDIRHOST Host Redirection Report
L FAILHOST Host Failure Report
Z ORGANISATION Organisation Report
o DOMAIN Domain Report
r REQUEST Request Report
i DIRECTORY Directory Report
t FILETYPE File Type Report
z SIZE File Size Report
P PROCTIME Processing Time Report
E REDIR Redirection Report
I FAILURE Failure Report
f REFERRER Referrer Report
s REFSITE Referring Site Report
N SEARCHQUERY Search Query Report
n SEARCHWORD Search Word Report
Y INTSEARCHQUERY Internal Search Query Report
y INTSEARCHWORD Internal Search Word Report
k REDIRREF Redirected Referrer Report
K FAILREF Failed Referrer Report
B BROWSERREP Browser Report
b BROWSERSUM Browser Summary
p OSREP Operating System Report
v VHOST Virtual Host Report
R REDIRVHOST Virtual Host Redirection Report
M FAILVHOST Virtual Host Failure Report
u USER User Report
j REDIRUSER User Redirection Report
J FAILUSER User Failure Report
c STATUS Status Code Report
For details on what the various reports mean, and a summary of the commands
which control them, see the section on
Analog's reports.
You can turn each report on or off with configuration
commands like
FIVEREP OFF
REFSITE ON
or by using command line arguments like -5 and +s.
You can also turn all reports except the General Summary on or off with the
commands ALL ON and ALL OFF, or with the command line
arguments +A and -A.
You can turn the descriptions of each report off
with the command
DESCRIPTIONS OFF
Even if DESCRIPTIONS is ON, the descriptions will only
appear if analog can find a report descriptions file in your language, or if
you specify one using the DESCFILE command: for example,
DESCFILE descriptions.txt
If the name of the descriptions file doesn't include a directory, it will be
looked for wherever analog normally expects to find its language files.
You can turn the "Go To" lines in the report off
with the command
GOTOS OFF
GOTOS ON turns them on again, and GOTOS FEW puts the
"Go To" lines just at the top and bottom. GOTOS OFF can
be abbreviated with the -X command line argument, and
GOTOS ON with +X.
You can turn off the "Program started at" line
at the top of the report, and the "Running Time" line at the
bottom, with the command
RUNTIME OFF
and turn them on again with RUNTIME ON.
The figures in parentheses in the General Summary are
for the last seven days:
either the seven days before the TO time, or if no TO
time is given, the seven days before the time of the program start. The
figures for the last seven days are normally included if some, but not all,
of the requests fall in those seven days; but you can turn them off by means
of the command
LASTSEVEN OFF
Of course LASTSEVEN ON turns them on again.
You can change the order of the reports by means of
the REPORTORDER command. You should list the
code letters for all possible reports in the order
you want them. Non-alphanumeric characters are ignored and so can be used as
separators. For example,
REPORTORDER x-1QmdDhHw4567W-cPz-ritEIYy-SlLZo-sNnfKk-ujJ-vMR-bBp
You can turn the lines in General Summary on and off
individually using the GENSUMLINES command. The default is
GENSUMLINES ALL
meaning all available lines. You can turn lines off using a command like
GENSUMLINES -KL
(to turn off lines K & L) and turn them on again
with a command like
GENSUMLINES +K
You can specify the exact set of lines to include with a command like
GENSUMLINES CDFGHM
You now just need to know which lines have which code letters, which is given
in the following table.
-
- Successful requests (always listed)
- B
- Average successful requests per day
- C
- Logfile lines without status code
- D
- Successful requests for pages
- E
- Average successful requests for pages per day
- F
- Failed requests
- G
- Redirected requests
- H
- Requests with informational status code
- I
- Distinct files requested
- J
- Distinct hosts served
- K
- Corrupt logfile lines
- L
- Unwanted logfile entries
- M
- Data transferred
- N
- Average data transferred per day
There is a command called IMAGEDIR
which tells analog where the various images used to make the report should
live. It should be a URL, not the actual location on your disk, and it should
include the final slash. For example, you could have
IMAGEDIR img/ # relative URL: within the same directory as the output
IMAGEDIR /img/ # off the root directory of your server
IMAGEDIR http://www.myother.server.com/img/ # on another server
Some people are confused about the IMAGEDIR. It's just put in the
<img> tags in the output. You can see its effect if you look at the HTML
source of the output page.
There are three commands which affect the top line of the
output. First,
the LOGO command allows you to replace the analog logo with
another image (for example, your organisation's logo). You can say
LOGO picture.gif # for this file
LOGO /images/picture2.gif # a different file
LOGO none # for no logo
The logo is assumed to be inside the IMAGEDIR unless it starts
with a slash, or contains ://
There are commands HOSTNAME and
HOSTURL which
affect the name and link at the end of the title line. For example, I might
specify
HOSTNAME "Stephen Turner"
HOSTURL http://www.statslab.cam.ac.uk/~sret1/
to generate the title "Web Server Statistics for
Stephen Turner".
Again, you can use none as the HOSTURL to specify no
link. Analog will normally translate characters in the hostname to HTML if
necessary. So to include literal HTML, such as accented characters, in the
output you need to precede them by a backslash, like this:
HOSTNAME "M\üller & S\öhne"
There are commands called HEADERFILE and
FOOTERFILE.
These let you specify files to be inserted near the top and bottom of your
output. You can also specify
HEADERFILE none
to cancel a previously-specified header file.
Again, if the name of the HEADERFILE or FOOTERFILE
doesn't include a directory, it will put in a canonical location, specified
when the program was compiled.
There is a command called STYLESHEET to
specify a style sheet for the output. This allows you to specify colours etc.
(See http://www.w3.org/Style/css/
for how to write a style sheet.) For example,
STYLESHEET /housestyle.css
STYLESHEET none # to cancel it
Hint: a common mistake in writing style sheets is to declare a font-family
for the body, but then not put <pre> sections back into a monospaced
font. This stops the columns lining up properly. Your style sheet should
contain a line like the following:
PRE, TT, CODE, KBD, SAMP { font-family: monospace }
There are three related commands called
SEPCHAR,
REPSEPCHAR and DECPOINT. These specify single characters
to be used as the thousands separator in numbers, the thousands separator
within the columns in the reports, and the decimal point. Normally, these will
be set automatically for the language you choose, but
you can change them if you want. For example, a French user might choose
SEPCHAR " "
REPSEPCHAR none
DECPOINT ,
to make "three thousand and a quarter" look like
"3 000,25" in text and "3000,25" in the reports.
There is a command called RAWBYTES. Specify
RAWBYTES ON
if you want the exact number of bytes to be listed in reports, or
RAWBYTES OFF if you want the number of kilobytes or Megabytes
as appropriate to be listed instead.
There are commands called
HTMLPAGEWIDTH, PLAINPAGEWIDTH and
LATEXPLAINWIDTH which specify the
width of the page. Which one is used depends on whethere the output style is
HTML, PLAIN (including ASCII), or
LATEX. The output is not guaranteed to fit in this width, but
analog will take notice of it when choosing the width of the time graphs,
when sorting the host report alphabetically, when drawing horizontal rules,
and when writing some bits of text.
There is a command called NOROBOTS which
stops robots which obey the
robots META tag
from indexing your output page or following its links. Normally this is set to
ON but you can specify NOROBOTS OFF if you don't mind
robots finding your other pages this way. Note that you will stop far more
robots if you also put your stats page in your
robots.txt
file; on the other hand, this file has to be kept up to date by the server
administrator.
Sometimes your server is not in the same timezone as
you, or at least records the times in its logfiles in a different timezone
(for example GMT). So that you can get your
statistics in your local time, there is a command called
LOGTIMEOFFSET to change the time by a certain number of minutes. As
with the LOGFORMAT command, this only
affects logfiles which come later in the same configuration
file.
You have to be careful using this command. Because of
daylight savings time in operation in different parts of the world at
different times, analog cannot attempt to convert between different
timezones. So it's your responsibility to set the right offset for different
times of year. For example, if you were in Chicago, but your server was
recording time in GMT, you would need to specify two different time offsets,
one of minus five hours for summer and one of minus six hours for winter. You
would need to split your logfiles in the right places and then run commands
like
LOGTIMEOFFSET -300
LOGFILE summer*.log
LOGTIMEOFFSET -360
LOGFILE winter*.log
There is also a related command called TIMEOFFSET. This tells
analog how much to offset the time of the computer on which it is running
(rather than the computer running the server), to get your local time.
In the following sections we shall look at some commands for configuring the
output of particular
reports, under the following headings: Time
reports, Other reports
and Hierarchical reports.
This section is about commands which control the appearance of the time
reports. There are thirteen such reports, which show the pattern of usage
over time. Eight of them (the ones with "Report" in their name) show
the usage at specific times, whilst the other five (the "Summaries")
show the total (not average) activity at particular times of day and week over
the whole time period of the report.
By the way, in the following lists, don't get confused between the commands
for the Quarterly Report (which begin with QUARTERLY) and those for
the Quarter-Hour Report and Quarter-Hour Summary (with begin with
QUARTERREP and QUARTERSUM respectively).
Each time report can contain columns listing the
requests, requests for
pages, and bytes transferred at that time, using the following code letters.
- R
- Number of requests
- r
- Percentage of the requests
- P
- Number of page requests
- p
- Percentage of the page requests
- B
- Number of bytes transferred
- b
- Percentage of the bytes
Which columns appear in which reports is controlled by various COLS
commands. For example, the command
HOURSUMCOLS Pb
tells analog to include the number of page requests and percentage of the
bytes, in that order, as the columns for the Hourly Summary. The full list of
these COLS commands is YEARCOLS,
QUARTERLYCOLS, MONTHCOLS, WEEKCOLS,
DAYREPCOLS, DAYSUMCOLS, HOURREPCOLS,
HOURSUMCOLS, WEEKHOURCOLS, QUARTERREPCOLS,
QUARTERSUMCOLS, FIVEREPCOLS and FIVESUMCOLS.
There is also a TIMECOLS command, which
specifies that all the time reports are to have the specified columns.
Similarly, analog can plot the bar charts in the time
reports according to
the number of requests, number of page requests, or number of bytes. This
is controlled by the GRAPH family of commands. So, for example,
DAYREPGRAPH P
tells analog to plot the bar charts in the Daily Report by the number of page
requests. This also controls how analog decides which is the busiest time
period in the bottom line of the report.
Using a lower case letter tells analog to plot the bar charts with
ASCII characters instead of the normal red bars. (This produces shorter
output, and it is how they appear anyway in PLAIN and
ASCII output styles,
or when viewed with a non-graphical browser.) So, for example,
DAYREPGRAPH b
would plot the Daily Report by bytes, without using the graphics. The full
list of GRAPH commands is YEARGRAPH,
QUARTERLYGRAPH, MONTHGRAPH, WEEKGRAPH,
DAYREPGRAPH, DAYSUMGRAPH, HOURREPGRAPH,
HOURSUMGRAPH, WEEKHOURGRAPH, QUARTERREPGRAPH,
QUARTERSUMGRAPH, FIVEREPGRAPH and
FIVESUMGRAPH.
There's also an
ALLGRAPH command to set all of them simultaneously.
There are various possible graphics available for the
graphs, controlled by
the BARSTYLE command, as follows. (They will all look the same if
you have a non-graphical browser.)
BARSTYLE a
BARSTYLE b
BARSTYLE c
BARSTYLE d
BARSTYLE e
BARSTYLE f
BARSTYLE g
BARSTYLE h
The default style is b.
You can plot the graphs either forwards in time (starting
from the earliest
date) or backwards (starting from the latest date). Use commands
like
MONTHBACK ON # Monthly Report backwards
WEEKBACK OFF # Weekly Report forwards
The full list of BACK commands is YEARBACK,
QUARTERLYBACK, MONTHBACK, WEEKBACK,
DAYREPBACK, HOURREPBACK, QUARTERREPBACK and
FIVEREPBACK.
It tends to be confusing to mix directions (and analog will warn you if you
attempt it) so usually you want to use the ALLBACK command which
will set all of them at once.
For the more detailed time reports, you usually only want
to list the last
few time periods. (Every five minutes for the last three years?? I think not.)
So analog provides some ROWS commands to let you specify how many
rows you want in the time reports. For example
QUARTERREPROWS 96 # only the last day's worth
MONTHROWS 0 # 0 means no restriction: show all time
The full list of ROWS commands is YEARROWS,
QUARTERLYROWS, MONTHROWS, WEEKROWS,
DAYREPROWS, HOURREPROWS, QUARTERREPROWS and
FIVEREPROWS.
Even if a ROWS command is given, the line at the bottom of the
report will still show the busiest time period ever, not just the busiest
one in that many rows.
The character which is used for plotting the graphs in
PLAIN and ASCII styles or on a
non-graphical browser is specified by means of the MARKCHAR
command. For example,
MARKCHAR =
tells analog to use the equals sign.
There is a parameter called
MINGRAPHWIDTH which sets the minimum
nominal size of the graphs. For example, if you set
MINGRAPHWIDTH 10
then the graph will be allowed to be up to 10 characters wide, even if
that would exceed the PAGEWIDTH.
There is one more command which affects the time
reports. You can specify
which day should be counted as the first day of the week. This affects the
layout of the Daily Report, Daily Summary and Weekly Report. For example,
our local student newspaper publishes a new edition on the web every Friday,
so they like to specify
WEEKBEGINSON FRIDAY
for their reports.
In the next section, we'll look at commands relating to the
non-time reports.
This section deals with the non-time reports. There are quite a lot of
commands which control these reports, although we've seen some of them
already.
First, these reports have COLS commands, just like the time
reports. (See the section on Time
reports for how to use these commands.) But for these reports,
several additional columns are available. Here is the full list of columns for
the non-time reports
- R
- Number of requests
- r
- Percentage of the requests
- S
- Number of requests in the last 7 days
- s
- Percentage of the requests in the last 7 days
- P
- Number of page requests
- p
- Percentage of the page requests
- Q
- Number of page requests in the last 7 days
- q
- Percentage of the page requests in the last 7 days
- B
- Number of bytes transferred
- b
- Percentage of the bytes
- C
- Number of bytes transferred in the last 7 days
- c
- Percentage of the bytes in the last 7 days
- d
- Date of last access
- D
- Date and time of last access
- e
- Date of first access
- E
- Date and time of first access
- N
- The number of the item in the list
So, for example,
REQCOLS NRSD
counts the files in the Request Report, listing the number of requests for
each, the number of requests for each in the last 7 days, and the time when
each was last requested. The full list of
COLS
commands for non-time reports is HOSTCOLS, REDIRHOSTCOLS,
FAILHOSTCOLS, ORGCOLS, DOMCOLS,
REQCOLS, DIRCOLS, TYPECOLS,
SIZECOLS, PROCTIMECOLS, REDIRCOLS,
FAILCOLS,
REFCOLS, REFSITECOLS, SEARCHQUERYCOLS,
SEARCHWORDCOLS, INTSEARCHQUERYCOLS,
INTSEARCHWORDCOLS, REDIRREFCOLS,
FAILREFCOLS, BROWREPCOLS, BROWSUMCOLS,
OSCOLS, VHOSTCOLS, REDIRVHOSTCOLS,
FAILVHOSTCOLS, USERCOLS, REDIRUSERCOLS,
FAILUSERCOLS and STATUSCOLS. Not
every column is allowed in every report, but if you specify an illegal one,
analog will warn you about it.
Next you need to know how use a SORTBY command
to specify
how the reports should be sorted. There are ten possible ways of sorting
reports:
- REQUESTS
- total number of requests
- REQUESTS7
- requests within the last 7 days
- PAGES
- total requests for pages
- PAGES7
- requests for pages within the last 7 days
- BYTES
- total bytes transferred
- BYTES7
- bytes transferred within the last 7 days
- FIRSTDATE
- time of first request
- DATE
- time of most recent request
- ALPHABETICAL
- alphabetically
- RANDOM
- unsorted, sometimes useful
for speed in very long reports
For example, the command
HOSTSORTBY ALPHABETICAL
will sort the Host Report alphabetically. The full list of SORTBY
commands is HOSTSORTBY, REDIRHOSTSORTBY,
FAILHOSTSORTBY, ORGSORTBY, DOMSORTBY,
REQSORTBY, DIRSORTBY, TYPESORTBY,
REDIRSORTBY, FAILSORTBY, REFSORTBY,
REFSITESORTBY, SEARCHQUERYSORTBY,
SEARCHWORDSORTBY, INTSEARCHQUERYSORTBY,
INTSEARCHWORDSORTBY, REDIRREFSORTBY,
FAILREFSORTBY, BROWREPSORTBY, BROWSUMSORTBY,
OSSORTBY, VHOSTSORTBY, REDIRVHOSTSORTBY,
FAILVHOSTSORTBY, USERSORTBY, REDIRUSERSORTBY,
FAILUSERSORTBY and STATUSSORTBY.
Again, not every sort method is possible in every
report, but you'll be warned if you choose an illegal one.
There is one known bug concerned with SORTBY ALPHABETICAL. The
report is sorted before any
output alias is
applied. This means that if an output alias has been specified for
the report, then the report may appear not to be sorted correctly.
You can also specify a FLOOR for most reports,
saying how much
activity an item needs before it is listed on the report. There are lots
of possible ways of specifying floors, which I'll list here, using the
DOMFLOOR (Domain Report FLOOR) command as an example.
Essentially each one consists of a number indicating the level of the floor,
followed by a letter indicating the floor criterion.
DOMFLOOR 1000r # all domains with at least 1000 requests
DOMFLOOR 100s # at least 100 requests within the last 7 days
DOMFLOOR 1000p # at least 1000 requests for pages
DOMFLOOR 100q # at least 100 requests for pages within the last 7 days
DOMFLOOR 1000000b # at least 1,000,000 bytes transferred
DOMFLOOR 1Mb # at least 1 megabyte
DOMFLOOR 10.5kc # at least 10.5kb within the last 7 days
DOMFLOOR 0.5%r # 0.5% of the requests (ditto %s, %p etc.)
DOMFLOOR 0.5:r # 0.5% of the maximum number of requests
# for any domain (ditto :s, :p etc.)
DOMFLOOR 970701d # last access since 1st July 1997
DOMFLOOR 970701e # first access since 1st July 1997
DOMFLOOR -00-01-00d # last access in last month (see
# documentation on FROM and TO commands)
DOMFLOOR -100r # domains with top 100 number of requests
# (ditto -100s, p, q, b, c, d, or e)
The full list of FLOOR commands is HOSTFLOOR,
REDIRHOSTFLOOR, FAILHOSTFLOOR, DOMFLOOR,
ORGFLOOR, REQFLOOR, DIRFLOOR,
TYPEFLOOR, REDIRFLOOR, FAILFLOOR,
REFFLOOR, REFSITEFLOOR, SEARCHQUERYFLOOR,
SEARCHWORDFLOOR, INTSEARCHQUERYFLOOR,
INTSEARCHWORDFLOOR, REDIRREFFLOOR,
FAILREFFLOOR, BROWREPFLOOR, BROWSUMFLOOR,
OSFLOOR, VHOSTFLOOR, REDIRVHOSTFLOOR,
FAILVHOSTFLOOR, USERFLOOR, REDIRUSERFLOOR,
FAILUSERFLOOR and STATUSFLOOR.
Once again, not every floor method is legal for
every report, but you'll be warned if you try and choose an illegal one.
I've already told you about how to turn each report on
and off from the
command line using its code letter. In fact,
you can specify the SORTBY and the FLOOR in the same
command. Take the example of the Referrer Report. If you follow the
+f (to turn the report on) with a letter, it represents the
sort method according to the following code:
- r
- REQUESTS
- s
- REQUESTS7
- p
- PAGES
- q
- PAGES7
- b
- BYTES
- c
- BYTES7
- d
- DATE
- e
- FIRSTDATE
- a
- ALPHABETICAL
- x
- RANDOM
You can then, or alternatively, use one of the above FLOOR formats
to specify the floor. If you specify a SORTBY, you can also leave
off the last letter of the floor, and analog will guess it according to the
sort method: the floor will be the same as the sort method, or by requests if
the sort method is ALPHABETICAL or RANDOM. Here are four
examples:
- +fp
- means turn the referrer report on and sort it by page
requests, but says nothing about the floor;
- +f100s
- means list all referrers with at least 100 requests
in the last 7 days, but says nothing about the sort method;
- +fb10000
- means list all referrers with at least 10,000 bytes,
sorted by bytes;
- +fa-000101d
- means list all referrers with accesses this year,
sorted alphabetically.
Each of these reports can have a pie chart drawn at
the top of it. The charts can be turned on and off, or plotted by a different
criterion, using the CHART commands. For example,
REQCHART OFF
will stop you getting a pie chart on the Request Report, whereas
REQCHART ON
will turn it back on, and plot it by the
REQSORTBY variable (or by
REQUESTS, if the REQSORTBY is FIRSTDATE,
DATE, ALPHABETICAL or RANDOM). You can also
use the following arguments to plot the chart by a different variable.
- REQUESTS
- total number of requests
- REQUESTS7
- requests within the last 7 days
- PAGES
- total requests for pages
- PAGES7
- requests for pages within the last 7 days
- BYTES
- total bytes transferred
- BYTES7
- bytes transferred within the last 7 days
But usually you just want to use the SORTBY if possible, so if the
SORTBY was a different one of these options, analog will warn you
about it.
The full list of CHART commands is
HOSTCHART, REDIRHOSTCHART,
FAILHOSTCHART, ORGCHART, DOMCHART,
REQCHART, DIRCHART, TYPECHART,
REDIRCHART, FAILCHART, REFCHART,
REFSITECHART, SEARCHQUERYCHART,
SEARCHWORDCHART, INTSEARCHQUERYCHART,
INTSEARCHWORDCHART, REDIRREFCHART,
FAILREFCHART, BROWREPCHART, BROWSUMCHART,
OSCHART, VHOSTCHART, REDIRVHOSTCHART,
FAILVHOSTCHART, USERCHART, REDIRUSERCHART,
FAILUSERCHART, STATUSCHART, SIZECHART and
PROCTIMECHART. Again, not every chart method is available for every
report. You can also use
ALLCHART ON
ALLCHART OFF
to turn them all on or off simultaneously.
The pie charts are normally written to the same directory as the
OUTFILE. But you can specify a
different location using the CHARTDIR and LOCALCHARTDIR
commands. If the OUTFILE is
standard output, or you are using the form interface,
you must use these commands, or you will not get any pie charts. Also, if you
are writing two output files to the same directory, you must use these
commands, or one set of images will overwrite the other.
You have to use both of the commands before they have any effect.
The CHARTDIR is the location of the pie chart directory on your
server, similar to the IMAGEDIR;
it's used for putting in the <img> tag to include the image. The
LOCALCHARTDIR is the location on your local disk; it's where the
image is written to. For example, you might have
CHARTDIR /images/
LOCALCHARTDIR /usr/local/apache/htdocs/images/
to put the pie charts in locations like
/usr/local/apache/htdocs/images/dom.png and link to them like
<img src="/images/dom.png">.
Actually, the CHARTDIR and LOCALCHARTDIR are just
prefixes to the filename, so you can specify something like
CHARTDIR rep1
LOCALCHARTDIR /usr/local/htdocs/stats/rep1
to put the pie charts in locations like
/usr/local/htdocs/stats/rep1dom.png and link to them like
<img src="/images/rep1dom.png">. Also the names can contain
date codes the same as in the
OUTFILE.
Here are the rules for which wedges are plotted in the pie chart. Up to ten
wedges, plus "Other", are drawn,
but wedges are only drawn if they are large enough. Also, wedges are only
drawn if the item is listed in the main table for the report. And the whole
chart will not be plotted if it would contain only one wedge.
You can list the time period covered by each report.
This is off by default because it uses a lot of memory to calculate it, but
if different reports cover different time periods (which can happen if your
log format has changed at some point), it's useful to turn it on with the
command
REPORTSPAN ON
There is also a command called REPORTSPANTHRESHOLD (which can be
abbreviated RSTHRESH). This says that each report span should only
be listed if it differs from the overall span of the whole report -- listed at
the top of the page -- by at least this many minutes at one end. For example,
REPORTSPANTHRESHOLD 60
will only list a report span if that report starts at least an hour after the
start of the logfile, or ends at least an hour before the end of the logfile.
You can set
REPORTSPANTHRESHOLD 0
to make sure that the report span is listed on all the reports.
We've already seen some other commands affecting what was listed in the
non-time reports. The output
INCLUDE and EXCLUDE commands specified lines to omit
from each report, and the
output alias commands specified
some aliasing to do on the names before they were listed. There were also
LINKINCLUDE and
LINKEXCLUDE commands to control what was linked to in the
reports. You might want to have another look at these paragraphs.
There's one other command which affects the links in the
Request Report, Redirection Report and Failure Report.
The command BASEURL prepends an additional string to the URLs
in the target of the link. For example, after the command
BASEURL http://www.statslab.cam.ac.uk
/~sret1/ will be linked to
http://www.statslab.cam.ac.uk/~sret1/, not just to
/~sret1/. This is very useful if you want to display the
statistics on a different server from the server they refer to. If you want the
file to be listed as http://www.statslab.cam.ac.uk/~sret1/, rather
than just to be linked to that address, you need to use the second argument to
the LOGFILE command instead.
In the next section, we'll look at commands for generating
hierarchical reports, which are closely related
to the commands in this section.
Some of the non-time reports have a hierarchical (or tree) structure:
so, for example, each domain in the domain report can have subdomains
listed under it, which in turn can have sub-subdomains, and so on. This
section describes commands for managing hierarchical reports.
First, you need to be able to control what gets listed in the reports.
For this you need to use the SUB family of commands. So, for
example, the command
SUBDIR /~sret1/*
would ensure that the Directory Report would not only contain an entry for
the sum of my files, but also one for each of my subdirectories, something
like this:
29,111: /~sret1/
10,234: /~sret1/analog/
5,179: /~sret1/backgammon/
11,908: /~steve/
You can have more than one * in the command. For example
SUBDOMAIN *.*
would list the whole Domain Report two levels deep.
If you specify a SUB command, all the intermediate levels are
included automatically. So, for example, after
SUBDOMAIN statslab.cam.ac.uk
cam.ac.uk and ac.uk will be included in the Domain
Report too, and after *.*.ac.uk, *.ac.uk will be
included.
Here are examples of the other four SUB commands:
SUBTYPE *.gz # in the File Type Report
SUBBROW */* # e.g. Mozilla/4 in the Browser Summary
SUBBROW Mozilla/*.* # add minor version numbers for Mozilla
REFDIR http://search.yahoo.com/* # Referring Site Report
SUBORG *.aol.com # Organisation Report
SUBORG *.*.com # Break down all .com's
The SUBDOMAIN command (but none of the others) can include a second
argument describing the subdomain. For example
SUBDOMAIN cam.ac.uk 'University of Cambridge'
Then that subdomain will be listed with its translation in the Domain Report.
You can also have numerical subdomains: e.g.,
SUBDOMAIN 131.111 'University of Cambridge'
If you sort the subdomains alphabetically, the numerical ones will also be
sorted alphabetically, not numerically. I don't think this will cause any
problems.
One other use for the SUBDIR command is if you have used the
second argument to the LOGFILE
command. Suppose you have translated files like /index.html into
http://www.mycompany.com/index.html. Then the command
SUBDIR http://*/*
would be appropriate to make the directory report look right.
The lower levels of each report
have FLOOR and SORTBY
commands which work exactly the same as those we have
already seen for the
top level. These commands are SUBDIRFLOOR, SUBDOMFLOOR,
SUBORGFLOOR,
SUBTYPEFLOOR, SUBBROWFLOOR and REFDIRFLOOR;
and SUBDIRSORTBY, SUBDOMSORTBY, SUBORGSORTBY,
SUBTYPESORTBY, SUBBROWSORTBY and REFDIRSORTBY.
A sub-item is listed in a hierarchical report only if it is above the
sub-FLOOR, and it is included with a SUB command,
and it is not excluded because of an
INCLUDE or
EXCLUDE command, and its immediate parent is listed. For
example, specifying
SUBDIR /*/*/
SUBDIRFLOOR -3r
SUBDIRSORTBY REQUESTS
would list the three subdirectories with most requests under each directory.
SUBDIRFLOOR 1:r would have listed any subdirectory with at least
1% of the maximum number of requests of any top level directory.
The three file reports
(Request Report, Redirection Report and Failure
Report) and the three referrer reports (Referrer Report, Redirected Referrer
Report and Failed Referrer Report) are not fully hierarchical, but they do
list search arguments together under the file to which
they refer (provided that the arguments have been read in: see the
ARGSINCLUDE command).
So they have
similar sub-FLOOR and sub-SORTBY commands, namely
REQARGSFLOOR, REDIRARGSFLOOR, FAILARGSFLOOR,
REFARGSFLOOR, REDIRREFARGSFLOOR and
FAILREFARGSFLOOR; and REQARGSSORTBY,
REDIRARGSSORTBY, FAILARGSSORTBY,
REFARGSSORTBY, REDIRREFARGSSORTBY and
FAILREFARGSSORTBY. The same
applies to the Operating System Report with its subdivisions of operating
systems: it has SUBOSFLOOR and SUBOSSORTBY.
The lower levels of a hierarchical report temporarily interrupt the top
level, and even though they are indented, this can sometimes make it look as
if the report is out of order. If you have a lot of sub-items, for example in
the Referrer Report if there are a lot of search arguments, then including the
N column can help to make it
clearer again.
That concludes the description of all the output configuration commands.
Now we move on to some other individual topics, starting with the
domains file.
The domains file tells analog which country is represented by each domain.
You can tell analog where to find your domains file with a command like
DOMAINSFILE mydomains.tab
Normally you don't need this command, because if there is a domains file in
your language, it should be selected automatically. But the
DOMAINSFILE command can be useful if you want to use a domains file
in a new language, for example.
If the name of the file doesn't include a directory, it will be
looked for wherever analog normally expects to find its language files.
You should have got a domains file with the program, but if you've lost it,
you can download one from
http://www.analog.cx/ukdom.tab.
It should contain on each line a domain code, followed by a number, followed
by its location, like this:
ad 2 Andorra
ae 3 United Arab Emirates
[...]
It does not need to be in alphabetical order, though humans may prefer it that
way. Subdomains do not go in the domains file: you can list them in the Domain
Report using the SUBDOMAIN command.
The number beside each domain represents how many
levels deep an "organisation" is
considered to be, for the purposes of the Organisation Report. For example,
consider the hostname www.sta.ad. The organisation is
sta.ad, at the second level, so Andorra has a 2 in the above
list. But in the UAE, a host looks like www.economy.gov.ae.
There is an extra level in the hierarchy, so the UAE has its organisations at
level 3.
There are some problems with this. A few countries have organisations at both
levels 2 and 3 (for example asaspace.at and
univie.ac.at). In those cases I've favoured false negatives over
false positives by using the bigger number. (Also there is a correction which
will make most of them right again: the first component is always removed from
a hostname of three or more components.) For other
countries, I don't have enough information to tell what the level should
be. I've just given those a 1. Do let me know if
you have any more information, or corrections, for the numbers.
Only domains which occur in the domains file will get
their own line in the
Domain Report: the rest are probably spurious, and will be accumulated
together as "unknown domains". If analog couldn't find the domains file, then
all the domains will be unknown. If you have
debugging
turned on, you can see which domains were unknown.
Lines starting with a hash (#) in the domains file are considered
to be comments.
This section describes the computer-readable output style. You can select
this style by the command
OUTPUT COMPUTER
This style is designed to be easy to read into spreadsheets, or post-process
with graphics creation tools, for example. You can find some programs which
use this style on the helper applications page.
However, the computer-readable output style is not suitable for reading back
into analog to create later reports: for that job, use the
cache files described in the next section.
Each line in the output is separated into fields by means of a special string.
You can specify this string by means of the COMPSEP command; for
example
COMPSEP ,
for CSV (comma separated value) format. Make sure not to use anything that
might occur in the output: for example, a single or double space would not be
suitable.
Each line in the preformatted output begins with a letter indicating which
report the line is part of. (The code letters for the reports are listed in
the section on Configuring the
Output.)
After that, there follows a field indicating the remaining columns in the
report (using the letters RrSsPpQqBbCcDdEeN as usual). In
hierarchical reports (including the reports which
can show search arguments) there is an additional column l at the
beginning, indicating the level in the hierarchy.
Finally there are the numerical data for each column and then the name of the
item. Times actually take up several fields: year, month, date, hour &
minute, or as many of those as are necessary to identify the time. (Year and
quarter in the case of the Quarterly Report).
So here is an example line from the Domain Report, showing the third-level
domain cam.ac.uk with 43 requests and 3.516% of the bytes.
o lRb 3 43 3.516 cam.ac.uk
The last line of most time reports indicates the busiest time period. After the
report letter comes *BT, followed by the letter R,
P or B for the
GRAPH method, followed by the
number of requests, pages or bytes respectively for the busiest time period,
followed by the time period itself.
The first lines of non-time reports can also contain overall information about
the report. First, if the
REPORTSPAN for the report is
wanted, it will be listed in lines with *FR and *LR
instead of the normal column letters. Then there is a line listing the floor
and sortby for the report. It has *f instead of the normal
column letters, followed by the floor in the form it would be written
for a FLOOR command, followed by
the SORTBY using the code letters
- r
- REQUESTS
- s
- REQUESTS
- p
- PAGES
- q
- PAGES
- b
- BYTES
- c
- BYTES
- d
- DATE
- a
- ALPHABETICAL
- x
- RANDOM
The general summary is a bit different. After an initial x, there is
a two-character code saying what the line contains. The possible codes are
- VE
- Version of analog
- HN
- HOSTNAME
- HU
- HOSTURL
- PS
- Program start time
- FR
- Time of first request
- LR
- Time of last request
- E7
- Time last 7 days ends
- SR
- Total successful requests
- S7
- Total successful requests in last 7 days
- PR
- Total successful requests for pages
- P7
- Total successful requests for pages in last 7 days
- FL
- Total failed requests
- F7
- Total failed requests in last 7 days
- RR
- Total redirected requests
- R7
- Total redirected requests in last 7 days
- NC
- Logfile lines without status code
- C7
- Lines without status code in last 7 days
- NF
- Number of distinct files requested
- N7
- Number of distinct files requested in last 7 days
- NH
- Number of distinct hosts served
- H7
- Number of distinct hosts served in last 7 days
- CL
- Number of corrupt lines in the logfile
- UL
- Number of unwanted lines in the logfile
- BT
- Total number of bytes transferred
- B7
- Total number of bytes transferred in last 7 days
Which lines are listed is still controlled by the
GENSUMLINES command. This
implies that if you turn a line off, you turn off its "last 7 days"
version too.
Analog has the ability to archive some of the data in your
logfile into a cache file so that the logfile can be thrown away
without losing the most important data. (This is sometimes known as
incremental processing.)
For most people, the cache file will not be needed: compressing
the logfile using a standard compression utility such as gzip will be
sufficient. Compressing a logfile is very efficient owing to the large number
of repeated strings: I find about 12 times compression in practice. That in
itself may solve your filespace problems, without needing to throw away any
information.
The cache file is also not the best format for post-processing the data or
feeding it into a spreadsheet. For that you should use the
computer-readable output style.
Many people have trouble using the cache file, and end up accidentally
recording corrupt data. You do need to understand what you're doing before you
throw away your logfiles. See the discussion on
Procedures below.
If you are going to use the cache file feature, it is also very important that
you understand what is and what is not recorded.
The summary is that all INCLUDE and EXCLUDE commands,
including FROM and TO, and any ALIASes and
LOGTIMEOFFSETs, must be applied when you
create the cache file, not when you read it later. If you want
different sets of options, you must create several cache files from the same
logfile.
The reason for this is that it is not
possible to reconstruct everything of interest in the logfile from the cache
file. The cache file does contain information about the total number of
requests for each host and each file, but not about, for example, which files
were read by which hosts. (To do so would take up as much disk space as the
compressed logfile.) So you cannot later look at only one file and see which
hosts read that file.
Another way to look at this: if you do, for example, a HOSTEXCLUDE
when reading the cache file, you are not doing a genuine
HOSTEXCLUDE because files that that host read will still be
included. You are only excluding those hosts from the Host Report,
Organisation Report and Domain Report. This is why you must do all the
inclusions and exclusions you want when you create the cache file.
When analog reads in a cache file, it does not apply any more aliases to the
items. This is to avoid double-aliasing.
So you must do any aliases you want at the time
you create the cache file. Similarly, it does not obey the
LOGTIMEOFFSET variable, to
avoid
double-offsetting, so any offset you want must be applied at cache-creation
time too.
Also, the cache file does not contain data about the number of requests for
each item in the last seven days: it can't, because the figures will be
different at the time the report is created.
Finally, times are only recorded to five-minute resolution.
You can create a cache file by setting the CACHEOUTFILE to be
the file you want the cache to live in. Set
CACHEOUTFILE none
to turn it off again. You will still get the regular output as well as the
cache output, unless you request OUTPUT
NONE. To avoid overwriting, you cannot set the
CACHEOUTFILE to be a file which already exists. (Disclaimer: on
some systems, race conditions may very occasionally thwart this check. Also
on a few systems, making the file writeable but not readable will allow it to
be overwritten). You can include the date in the name of the
CACHEOUTFILE in the same way as described earlier for the
OUTFILE.
You can read in a previously-made cache file with the CACHEFILE
command, or with the +U command line option. This works exactly the
same as the
LOGFILE command, so you can use commas
and wild cards to read in several cache files, and read compressed cache
files using the UNCOMPRESS mechanism.
If the name of the CACHEFILE or the CACHEOUTFILE doesn't
include a directory, it will be looked for, or written to, wherever analog
expects to find its cache files. (This location is built in when the program
is compiled.) For example, on Windows it would be in the same folder as the
analog executable.
It is possible (and useful) to make a report from some CACHEFILEs
and some LOGFILEs. LOGFILE and CACHEFILE
commands are basically cumulative, except that any logfiles and cache files in
the mandatory configuration file or
configuration files loaded from there override any on the command line or in
configuration files specified on the command line, which themselves
override any in the default configuration
file or configuration files loaded from there, which in turn override
compile-time options.
Usually you don't need to worry about this, and it will do what you expect.
Sometimes you don't want to record all the types of item in the cache file.
You might want to forget about which hosts had accessed your web site, for
example, and only remember how many times each file was requested. You can
choose not to include one type of item in the cache file by setting its
LOWMEM to 3; for example, specify
HOSTLOWMEM 3
to exclude hosts from the cache file. Because this is a serious
step, analog will produce a warning if you do this. You can even set all six
LOWMEMs to 3 if you just want to remember the pattern of requests
over time, not even which files were requested.
Many people have trouble when they try and use cache files, and end up
omitting data or double-counting. You have to be careful to make sure that
each piece of data is recorded in exactly one cache file. One very common
mistake is to use all the old cache files when making each new cache file.
Because each piece of data is then in all of the cache files, when you make a
new cache file, it will record each piece of data several times over. If analog
gives you a "double-counting" warning when you create a cache file,
you have probably done something of this sort wrong.
Here is one way to use the cache files correctly. It's not the only correct
way, but I think it's conceptually the simplest. The idea is that whenever you
start a new logfile, you make a cache file out of the old logfile. So each
cache file contains all the data from one, and only one, logfile. You never
use old cache files to make new ones: so you never have a CACHEFILE
and a CACHEOUTFILE command in the same configuration file.
Here is the procedure.
- Rotate your logs: that means, archive the old logfile, and restart the
server with a fresh logfile. (There are several standard tools to do
this: or see your server documentation.)
- Make both a cache file and an ordinary report from the old logfile. You
can do this simultaneously by using one LOGFILE command, one
OUTFILE command, and one CACHEOUTFILE command.
- Make a test report from the cache file (using CACHEFILE
and OUTFILE but no LOGFILE) and compare
it against the report from the logfile to check it works. (This step
really is worth doing!)
- Now you can throw away the old logfile, if you've really understood what
data you're losing by doing so. (But please remember that I can take no
responsibility if something goes wrong: see the
licence.)
- When you want to make the main report, you can now use all your cache
files and the current (not-yet-cached) logfile.
As explained above, all INCLUDE and EXCLUDE commands,
including FROM and TO, and any ALIASes and
LOGTIMEOFFSETs, must be applied when you create the cache file, not
when you read it later. So you may want to create several cache files from each
logfile with different sets of options. Of course, in this case, you musn't
later mix cache files made with different options.
Sometimes a logfile contains numerical IP addresses - like 131.111.20.59 -
for the computers that have visited you, instead of names like
lion.statslab.cam.ac.uk. This section describes how you can get analog
to do so-called DNS lookups to translate these numbers into names.
This relies on you having a suitably configured system: DNS lookups are
not possible on some systems.
DNS lookups are typically rather slow, because your computer
has to ask across the network to find out the names of the hosts. For this
reason, analog saves the addresses it has looked up in a file, so that you
don't have to look them up again next time. The file is specified by a command
like
DNSFILE dnscache
You will still need to use one of the commands in the next paragraph
in order to actually use the file.
If the name of the DNSFILE doesn't include a directory, it will be
looked for wherever analog expects to find its DNS files. (This
location is built in when the program is compiled.) For example, on
Windows it would be in the same folder as the analog executable.
There are four possible levels of DNS activity. If you specify
DNS NONE, no numerical addresses will be resolved. If you specify
DNS READ, then analog will read the DNS file for old lookups, but
no new lookups will take place. This mode is suitable if you are running
analog while not connected to the internet. The third level is
DNS WRITE. This reads the old file, looks up new addresses,
and adds them to the file. (The first time you use DNS WRITE, you
will get a missing-file warning as it tries to read the old file, but it will
exist the next time.) The final level is DNS LOOKUP. This
reads the old file and looks up new addresses, but doesn't add the new
addresses to the file, so that they will not be remembered for next time.
This is not normally a level that the user wants to specify, but analog will
switch to this the behaviour if DNS WRITE fails for some reason.
If you are using a HOSTEXCLUDE command,
you need to exclude the numerical IP address if it can't be resolved, or the
name if it can. In other words, exclude whatever the host is known as in the
report.
If two copies of analog were allowed to write to the
DNS file at the
same time, the file could become corrupted. So when analog is running in
DNS WRITE mode, it creates a lock file which tells other
copies of analog to back off to DNS LOOKUP. You can change the
location of that file with the command
DNSLOCKFILE filename
Of course you should make sure that all copies of analog use the same lock
file, at least if they have the same DNS file!
Again, if the name of the DNSLOCKFILE doesn't include a directory,
it will put in a canonical location, specified when the program was compiled.
If analog crashes, it may not
clear up the lock file, so in that case you may have to delete it yourself.
(Disclaimer: on some systems, race conditions may occasionally thwart this
mechanism, but this is very unlikely.)
Analog never deletes anything from the DNS file: this means that the DNS
file will grow, and can become quite large. You should delete the top of
it every so often. There is a program on the helper
applications page to help you do this more systematically.
There are two parameters which say how long to trust
old lookups for. If you set
DNSGOODHOURS 672
for example, then successful lookups will be checked again after 672 hours
(4 weeks). You can also set the DNSBADHOURS similarly, to check
failed lookups again after a certain time. By default the
DNSBADHOURS is 336 (2 weeks) and the DNSGOODHOURS is a
very large number (so that successful lookups are never rechecked, as long as
they remain in the DNS file).
On some platforms (maybe only Unix) you can set a
parameter called DNSTIMEOUT. If the DNS server still hasn't
returned a reply within this many seconds, then the lookup will be
aborted. Setting
DNSTIMEOUT 0
removes the timeout.
Finally, there is a debugging command,
DEBUG +D to show all the DNS lookups that analog is making.
Because analog's DNS lookups use only standard, platform-independent code,
you may find that they are not the best solution for you. This is especially
true on platforms without the DNSTIMEOUT command.
There are lots of other programs to perform the DNS lookups on the
helper applications page. Because these tend to be
optimised for particular platforms, you may find them faster.
Normally you need never write a DNS file: you should rely on analog to do it
for you. But in case you need to know, the format of the file is
timestamp IP_address name
where the timestamp is the number of minutes since the beginning of 1970, GMT
(i.e., "Unix time" divided by 60), and the name is just *
if the address couldn't be resolved.
This section describes how to run analog with lower amounts of memory. For
a normal logfile this will make analog run a bit slower. But if your computer
is running out of memory when running analog, it will go very slowly indeed:
so for large logfiles, this can make analog run much faster, or even make an
analysis possible that wouldn't otherwise be possible.
Recall what happens to an item when it has been read in. First it is
aliased. Secondly, it is checked to see whether
it is included or excluded. Then finally, if all
the items are wanted, one request is added to its score.
Normally the name of the item is saved before the aliasing takes place. This
avoids analog having to do the aliasing again next time the same item is
encountered. But this can take up more memory than necessary. So there is a
family of LOWMEM commands provided, which tell analog to record the
name at a later stage, or even not at all. If you use these commands, analog
will have to do a bit more work than normal, but it will use less memory.
On most sites, the hosts take up most of the memory, so I'll use the
HOSTLOWMEM command as an example.
The command
HOSTLOWMEM 0
represents the normal case, when the hostname is recorded before being aliased.
If you specify
HOSTLOWMEM 1
instead, then the hostname is not recorded until after the aliasing. If you
specify
HOSTLOWMEM 2
then the name is not recorded until after the inclusion and exclusion lookup
has been done as well. And finally, if you give the command
HOSTLOWMEM 3
then the hostname is not saved at all, and the Host Report will not be
constructed, even if you've asked for it. (The Domain Report can still be
constructed though.) The analogous commands for the other items are
FILELOWMEM, BROWLOWMEM, REFLOWMEM,
USERLOWMEM and VHOSTLOWMEM.
So what should you do if analog runs out of memory? First, look in your
logfile to see which items are taking up all the memory. If you have lots
of different filenames, ones you generate on the fly for example, you would
want to use the FILELOWMEM commands. Maybe you could combine all
the similar filenames into one with a FILEALIAS command, and use
FILELOWMEM 1. (If you have lots of different filenames caused by
different search arguments, then using
ARGSEXCLUDE might solve your
problem
without any need to use LOWMEM at all). But for most users, it
is the hostnames which cause the problem. If you only want to analyse requests
from certain hosts, then you could use HOSTLOWMEM 2 to exclude
the others before recording those that are left. If you don't want to exclude
any hosts, and you haven't got enough memory to record all the different
hostnames, then HOSTLOWMEM 3 would be appropriate.
This section lists commands to help you debug analog, if you think it's
going wrong. There's another section later which lists all the
errors and warnings which analog can generate,
and what they all mean, and another section which tells you
how to report bugs.
First, remember the option we mentioned before, to list the current settings
of all of analog's variables. To get this, just put -settings on
the command line, or SETTINGS ON in one of your configuration
files, along with your other commands. Then analog will produce the list of
settings instead of running in the normal way.
There are commands which control how much debugging
information and warning
information analog gives out while it is running. By default you get all the
warnings and no debugging, but you can change this by means of the commands
DEBUG and WARNINGS. If you say
DEBUG ON
you get all the debugging. (And DEBUG OFF turns it all off.)
You can also get just certain categories of debugging. The categories are
- C
- list all corrupt logfile lines
- D
- information about DNS lookups
- F
- information about file opening and closing
- S
- summary information about each logfile when it's closed
- U
- list unknown domains
- V
- list hosts without a domain (i.e., without a dot)
So, for example, the command
DEBUG FS
would give you information about file opening and closing, and what was in
each logfile, but none of the other sorts of debugging. Each line of debugging
information is prepended with its code letter. You can also specify
DEBUG +CD
to add C and D category debugging to whatever you've
already got, and
DEBUG -CD
to remove those two categories.
There is also a command line abbreviation for this command. Use
+V (for ON), -V (for OFF),
+VFS (to select exactly options FS), +V+FS
(to add those options), and +V-FS (to remove them).
The C messages actually come on two lines. The first line gives the
logfile line which was corrupt. The second line indicates where analog first
noticed a problem. (This is usually, but not always, close to where the
problem
actually was!) In fact, each "line" of the message may spread over
more than one line on your screen, and you have to be careful to take that into
account when trying to find out where the logfile line was corrupt.
The WARNINGS command acts similarly to the
DEBUG command (see the syntax above). By
default all warnings are on. As well as WARNINGS ON and
WARNINGS OFF, you can turn warnings on and off in the following
categories.
- C
- invalid configuration specified
- D
- dubious configuration specified
- E
- ERRFILE command used (see below)
- F
- files missing or corrupt
- L
- apparent problems in logfiles
- M
- possible problems in logfiles
- R
- turning off empty reports
See the section on Errors and
warnings for more details about the various categories. Again,
warnings are printed with their code letters.
Warnings range from the probably harmless to the usually serious.
So it is generally not a good idea to turn all warnings off, or you might miss
some important information. If you want to ignore a particular warning, turn
just its category off.
There is also a command line version of the WARNINGS command,
looking like +q, -q, +q<options>,
+q+<options> or +q-<options>. (The syntax is
the same as the +V command above.)
There is one more command which is useful when
trying to debug analog. If you give the command
PROGRESSFREQ 20000 # say
then analog will produce a little message after every 20,000 lines it reads
from the logfile. This is useful to determine whether the program has really
stopped or (as is more likely) is just being slow for some reason (such as
using DNS lookups).
To start with, all these messages go to standard
error, which is
normally just the screen. But you can change that by means of a command like
ERRFILE newfile
If you do this, analog will warn you that it's redirecting the messages, just
so that you don't miss any. To change back to standard error, use
ERRFILE stderr
The ERRFILE command will also erase any previous contents of that
file. (So don't use the same ERRFILE command twice, or you may lose
messages!)
If the name of the ERRFILE doesn't include a directory, it will be
put in whichever directory was specified for that purpose when the program was
compiled. For example, on Windows it would be in the same folder as the analog
executable.
There is a command called ERRLINELENGTH
to tell analog the width of screen you want these messages to fit in. As a
special case,
ERRLINELENGTH 0
specifies an unlimited screen width.
There is just one more section about analog's configuration commands and
command line arguments, but it's a rather long one, on the
form interface. (This is a way of running analog by
selecting options from a web page.) You might prefer to go straight onto the
section on What the results mean.
The form interface provides an HTML front end to analog, on Unix or Windows
platforms (and maybe others). That means that users can select options from a
web page, instead of having to create a configuration file.
Important: For security
reasons, you must not attempt to run analog itself as a CGI
program, or even leave it in the directory or folder with your web files or
CGI programs. When the form interface runs analog for you, it checks that
analog isn't given any dangerous options. Without this check, your system
could be vulnerable to attack.
Please don't try and set up the form until analog has been set up and is
running properly on its own. It just adds another level of complexity to
troubleshoot. And unlike analog itself, the form interface will not
run "out of the box". You have to read this section to find out how
to set it up.
The form interface is suitable for ordinary users to use, but it needs to be
set up by a system administrator or other expert. In order to set it up,
you have to be running a web server. You need to know what CGI programs
are, where they live on your server, and how to set up their permissions
properly. You also need to know how to write HTML forms. I shall assume this
level of background knowledge for the rest of this section. And you have to be
running Perl 5.001 or later: see Technical
details below for other system requirements. (Actually, if you're
on Windows and don't have Perl, you can download an executable version of the
form interface from the helper applications page.)
Warning: CGI programs can contain security loopholes which
allow an unscrupulous user to harm your system. (If you don't know about this,
you shouldn't be running CGI programs at all. Read and understand the
World Wide Web Security FAQ and
the CGI
Security FAQ first.) I have tried to make this form interface safe, but I
cannot guarantee it. Even the most carefully-designed CGI programs can
accidentally have serious security bugs. And I take no responsibility if
anything goes wrong: you use it at your own risk. (See the
licence.) Furthermore, you should be aware that
unless you take special measures like password protection or limiting
anlgform.pl to specific hostnames, setting up the form interface
implies making analog executable, and your logfiles analysable, by anyone on
the internet. There are more notes on security design
in this program towards the end of this section.
The form interface consists of two parts: a form (called
anlgform.html) to choose the options, and a cgi program (called
anlgform.pl) to pass them to the analog
program. Both anlgform.html and anlgform.pl must
be configured to your system before they will work at all. There are
instructions at the top of both files explaining how to do this.
The form which is distributed with the program should only be regarded as an
example form. You can find forms in languages other than English in the
lang directory. Or you can write your own if you prefer. In fact
you don't actually need the form at all: if you want just to create a link
to the cgi program, with the arguments passed after a question mark in the URL
in the usual way, then that's fine.
Almost every analog configuration command can be specified on the form, just
by including a form element with that name on the form. So, for example, if
you wanted to add a field for users to choose a logfile, you could write
Logfile name: <input type=text name="LOGFILE">
or maybe something like
<select name=LOGFILE size=1>
<option value="/var/log/apache/fred"> Fred's logfile
<option value="/var/log/apache/jane"> Jane's logfile
</select>
There are a few commands which you can't specify on the form for security or
performance reasons. The full list is *LOGFORMAT,
LANGFILE, HEADERFILE, FOOTERFILE,
UNCOMPRESS, OUTFILE, CACHEOUTFILE,
ERRFILE, DNS and SETTINGS;
and the person setting up the form can add more. There are also certain
arguments you can't give to commands: the most important is that you can't
include the wildcard * in the LOGFILE. (Of course you
can still specify several logfiles with a comma-separated list, or with a
<select multiple>). See the
security notes below for the reasons for these
exclusions, and for some more commands you might want to add to the forbidden
list.
Some commands are most conveniently specified in two halves. First, there are
commands which take two arguments (for example
ALIASes). You can cope with these by
sending two commands from the form, called COMMAND1 and
COMMAND2. For example,
Alias this file: <input type=text name="FILEALIAS1">
To this one: <input type=text name="FILEALIAS2">
You can only specify one such pair this way; so there's no way to specify
several of the same ALIAS, for example.
Then there are FLOOR commands. To
avoid users of the form having to know the syntax of these commands, you can
if you want specify them in two halves, FLOORA and
FLOORB, and they will be stuck together. For example, the form
distributed with the program specifies
<br>Include all domains with at least
<input type=TEXT name="DOMFLOORA" maxlength=6 size=6>
<select name="DOMFLOORB">
<option value=r>requests
<option value=p>requests for pages
<option value=b selected>bytes
</select>
If DOMFLOORA contains 5% and DOMFLOORB
contains r, then DOMFLOOR 5%r will be sent to the
program. (Or DOMFLOORA=5 and DOMFLOORB=%r would work
too, if you chose to present the form that way.)
There are a couple of extra non-analog commands which can
be sent from the form. First, if the option qv=1 is set, then
analog is not run, but a list of the configuration commands which would have
been sent to analog is printed instead. This is useful for checking that the
CGI program is working properly. It can also allow users to produce a
configuration file from form settings.
Secondly, you can specify other configuration files to be included at specific
times. When analog is called by the CGI program, it first processes the
default configuration file as usual.
Then it processes any configuration file specified by an option with name
cg. Then it processes all the other commands which the CGI program
specifies. After that, it processes any configuration file specified by an
option with name cm. Finally, it processes the
mandatory configuration file as usual.
(You may therefore want two copies of analog, one for form use and one for
non-form use, with different configuration files compiled in.) Note that
the commands in the default and mandatory configuration files will contribute
to the configuration: some of them may even override options specified on the
form. For example, if the default configuration file contains an
INCLUDE command, this may cause
INCLUDE and EXCLUDE commands specified on the form to
behave unexpectedly.
anlgform.pl usually sends the commands to analog in the order in which
it received them, which should be the same as the order they occurred in the
form. But there are some exceptions. First, all commands of the same name are
grouped together. So an interleaved sequence of INCLUDEs and
EXCLUDEs won't work, for example. Secondly, even though the names
of commands are case-insensitive, commands of the same name but in different
cases may come in the wrong order. Keep them in the same case! Thirdly,
WARNINGS and LOGTIMEOFFSET are sent first (and thus the
LOGTIMEOFFSET applies to any logfiles specified on the form).
There are a couple of commands which the form always
sets. These may override what you have set elsewhere. First, it sets either
DNS READ (if a DNSFILE is set on the form) or DNS
NONE (otherwise). You can override this behaviour in the mandatory
configuration file, but you are likely to run into timeout problems if you
do. Secondly, it always sets WARNINGS FL, so that the less
important warnings don't fill up your server's error log. You can override
this by sending an explicit WARNINGS command from the form.
There is one small point about compressed
logfiles. For security reasons, when using the form interface you need to
specify the full pathname to the uncompression command in the
UNCOMPRESS command in your
configuration file.
Here is what to do if you are having problems setting up the form interface.
First, does analog run properly on its own without anlgform?
Next, you can run anlgform.pl from the (DOS or Unix) command
line. This is good enough to debug most problems. You can specify
options in pairs like this:
anlgform.pl qv=1 LOGFILE=/some/log REQINCLUDE=pages
If you include qv=1 in the argument list as above, you will see
what anlgform.pl is trying to send to analog. If you don't include
qv=1, anlgform.pl will try and run analog.
If it still doesn't work, check the following points:
- Have you edited anlgform.pl and anlgform.html as
instructed at the top of those files?
- Do other CGI programs work on your server? Is anlgform.pl in
the right place to be recognised as a CGI program by the server?
- Look in the server's error log for clues. You might want to set
WARNINGS ON before you do this, because by default only
warnings in categories F and L are reported.
- Sometimes it's helpful to set the ERRFILE in your analog
configuration file (it won't work from the form) to catch any errors and
warnings which may be getting lost. This is especially true on IIS which
incorrectly sends errors to the browser instead of to an error log. If
you are using Internet Explorer you will probably also need to disable
the "friendly" error messages so that you can see the actual
error message.
- Are all relevant files (analog itself, logfiles, configuration files,
auxiliary files such as domain files...) executable/readable by your web
server?
- If some form options don't seem to take effect, then check whether they
are being overridden by a command in a configuration file.
- If you get a long wait, then no data returned, the server is probably
timing out the request before analog has finished. The remedy is to
increase the timeout interval.
- As explained above, the form always sets
DNS READ or DNS NONE, and WARNINGS FL,
overriding your default configuration file.
- Again as explained above, uncompressing of
compressed logfiles doesn't work unless you use the full pathname in the
UNCOMPRESS command.
As I said above, CGI programs can often contain security loopholes. (See the
World Wide Web Security FAQ and
the CGI
Security FAQ for more on this.) Although I
don't guarantee that the form interface is safe, I
have done my best to make it so. Here I shall explain my design decisions.
Comments on them are of course welcome: if they need to remain confidential,
you can email me privately at analog-author@lists.isite.net.
First, you should think about who can run the form interface. Unless you take
special measures like password protection or limiting anlgform.pl
to specific hostnames, adding the form interface to your site implies making
analog executable, and your logfiles analysable, by anyone on the internet.
There are obvious concerns both about privacy and about the load on your
system.
Certain commands are ignored by anlgform.pl and not passed to
analog. The list of them can be found at the top of anlgform.pl.
Here are the reasons for them. HEADERFILE and FOOTERFILE
would place any file on your system within the output. The
*LOGFORMAT commands would also allow any file to be read, because
someone could designate each line to be a single filename and then just list
the filenames. OUTFILE, CACHEOUTFILE and
ERRFILE would allow people to write to your filespace;
ERRFILE would also divert errors away from your error log.
UNCOMPRESS would allow a user to execute any command. DNS
is forbidden because setting it higher than READ would normally
cause the process to time out.
None of the above should be deleted (unless you are really, really sure that
it's completely impossible for anyone other than yourself to run
anlgform.pl). There are two other commands which are forbidden by
default but which you could consider removing from the forbidden list.
SETTINGS is included because it will give away the locations of
some files on your system. But it is useful for diagnostic purposes, and you
could consider removing it temporarily if you have trouble setting up the
form. The other command which is included is LANGFILE, although I
consider it to be a lower risk. It is included because it is theoretically
possible that another file could be exactly the right number of lines long to
be accepted as a language file, and then parts of it would get into the
output. But it would have to be exactly the right length first. If that's a
risk you're prepared to take, you can remove LANGFILE from the list.
There are other commands which you might consider adding to the list. For
example, it is theoretically possible (though rather unlikely), that another
file on your system could conform sufficiently closely to one of the
predefined log formats that analog could be persuaded to analyse it and so
reveal some of its contents. If you're worried about this, or even if you want
to force only one particular logfile to be analysed from the form, you can add
the LOGFILE command to the list of forbidden commands. And you could
add DOMAINSFILE for similar reasons.
You can of course add any command you like to the list. For example,
a user can use any configuration file on your system unless you add all of
CONFIGFILE, CM and CG.
Or if you wanted to stop a user having control of which warnings were written
to the error log, you could add WARNINGS. If you add a command, you
must also add any aliases for it. Have a look in the source file
globals.c for the same command under different names -- some
commands have legacy names which I don't admit to in the documentation.
For those who know about CGI security issues, here are some more technical
comments on my design. anlgform.pl sets the $PATH
environment variable to be empty. It opens analog as a pipe in
order to pass arguments into analog's standard input. User-specified data is
not used for the open() function, only passed down the pipe.
anlgform.pl is run with the -T flag on Unix. (Does
anyone know how to get this working under Windows?)
The arguments to LOGFILE and CACHEFILE commands are
checked for containing only certain allowed characters (specifically, letters,
digits, /\.:_ space, and - between two {letter, digit,
underscore}'s). This is because they could match an UNCOMPRESS
command and thus be passed to the shell when the uncompress command is
popen()'ed.
Apart from that, command names are checked for containing only letters and the
digits 1 and 2; and the arguments to commands are checked for not containing
control characters (actually characters 0-32 and 127-159; in particular
newline characters are prohibited). The length of the commands isn't checked
by anlgform.pl, but buffer overflow shouldn't be an issue as
configuration commands are checked for length by analog.
By the way, the reason that I advise that analog itself
shouldn't be used as a CGI program is that some servers, notably Microsoft
IIS, allow users to pass command line arguments into a CGI program. And even
if the program doesn't return the proper CGI headers, the output can be sent
back to the user. This means that all the above checking of arguments is then
thwarted. Of course, on servers on which you can't pass command line arguments
to a CGI program, there are not the same security concerns, but then analog
isn't very useful as a CGI program because if you can't pass any arguments,
you can only get the default output.
You need to be running Perl 5.001 or later (unless you're on Windows and
download the executable version of the form interface from the
helper applications page). You can get the latest
version of Perl free from www.perl.org (or
from http://www.activestate.com/Products/ActivePerl/
if you're on Windows). You also need the module
CGI.pm,
but this should have come with Perl anyway.
On Windows, you have to associate the .pl extension with the Perl
executable so that Perl scripts are executed by Perl.
anlgform.pl will understand the GET or POST
methods of form submission. The
HTML
spec says that GET should be used when, as in this case,
running the program has no side effects. However, section 15.1.3 of the
HTTP spec says that
POST should be used if some of the options being passed might be
confidential. Also, very long URLs, formed by specifying lots of options, can
cause trouble to some older servers. So anlgform.html uses the
POST method by default. However, the GET method will
also work. For example, you could make a normal link to anlgform.pl
with options specified after a question mark in the usual GET way.
This section of the Readme is about understanding the results analog
produces. It's divided into three subsections.
- How the web works. This section
discusses what happens when somebody connects to your web site, and what
you can and can't find out about them. If you think that you can get
statistics on how many people have visited your web site (or want to
know why you can't), then this section is for you.
- Analog's reports. This section
gives a summary of analog's reports, what they contain, and which
commands influence each one.
- Analog's definitions. This section
gives precise details on all of analog's terminology, exactly what is
counted in each report, and so on.
This section is about what happens when somebody connects to your web site, and
what statistics you can and can't calculate. There is a lot of confusion
about this. It's not helped by statistics programs which claim to calculate
things which cannot really be calculated, only estimated. The simple fact is
that certain data which we would like to know and which we expect to know are
simply not available. And the estimates used by other programs are not just a
bit off, but can be very, very wrong. For example (you'll see why below),
if your home page has 10 graphics on, and an AOL user visits it, most
programs will count that as 11 different visitors!
This section is fairly long, but it's worth reading carefully. If you
understand the basics of how the web works, you will understand what your web
statistics are really telling you.
1. The basic model. Let's suppose I visit your web site. I follow a
link from somewhere else to your front page, read some pages, and then follow
one of your links out of your site.
So, what do you know about it? First, I make one request for your front
page. You know the date and time of the request and which page I asked for
(of course), and the internet address of my computer (my host). I also
usually tell you which page referred me to your site, and the make and model
of my browser. I do not tell you my username or my email address.
Next, I look at the page (or rather my browser does) to see if it's got any
graphics on it. If so, and if I've got image loading turned on in my browser,
I make a separate connection to retrieve each of these graphics. I never log
into your site: I just make a sequence of requests, one for each new file I
want to download. The referring page for each of these graphics is your front
page. Maybe there are 10 graphics on your front page. Then so far I've made 11
requests to your server.
After that, I go and visit some of your other pages, making a new request for
each page and graphic that I want. Finally, I follow a link out of your site.
You never know about that at all. I just connect to the next site without
telling you.
2. Caches. It's not always quite as simple as that. One major problem
is caching. There are two major types of caching. First, my browser
automatically caches files when I download them. This means that if I visit
them again, the next day say, I don't need to download the whole page
again. Depending on the settings on my browser, I might check with you that
the page hasn't changed: in that case, you do know about it, and analog will
count it as a new request for the page. But I might set my browser not to
check with you: then I will read the page again without you ever knowing about
it.
The other sort of cache is on a larger scale. Almost all ISP's now have their
own cache. This means that if I try to look at one of your pages and
anyone else from the same ISP has looked at that page recently, the
cache will have saved it, and will give it out to me without ever telling
you about it. (This applies whatever my browser settings.) So hundreds of
people could read your pages, even though you'd only sent it out once.
3. What you can know. The only things you can know for certain are the
number of requests made to your server, when they were made, which files were
asked for, and which host asked you for them.
You can also know what people told you their browsers were, and what the
referring pages were. You should be aware, though, that many browsers lie
deliberately about what sort of browser they are, or even let users configure
the browser name. Also, a few browsers send incorrect referrers, telling you
the last page that the user was on even if they weren't referred by that page.
And some people use "anonymizers" which deliberately send false
browsers and referrers.
4. What you can't know.
- You can't tell the identity of your readers.
Unless you explicitly require users to provide a password, you don't
know who connected or what their email addresses are.
- You can't tell how many visitors you've had.
You can guess by looking at the number of distinct hosts that have
requested things from you. Indeed this is what many programs mean when
they report "visitors". But this is not always a good estimate
for
three reasons. First, if users get your pages from a local cache server,
you will never know about it. Secondly, sometimes many users appear to
connect from the same host: either users from the same company or ISP,
or users using the same cache server. Finally, sometimes one user
appears to connect from many different hosts. AOL now allocates users a
different hostname
for every request. So if your home page has
10 graphics on, and an AOL user visits it, most programs will count that
as 11 different visitors!
- You can't tell how many visits you've had.
Many programs, under pressure from advertisers' organisations, define a
"visit" (or "session") as a sequence of requests
from the same host until there is a half-hour gap. This is an unsound
method for several reasons. First, it assumes that each host corresponds
to a separate person and vice versa. This is simply not true in the real
world, as discussed in the last paragraph. Secondly, it assumes that
there is never a half-hour gap in a genuine visit. This is also untrue.
I quite often follow a link out of a site, then step back in my browser
and continue with the first site from where I left off. Should it really
matter whether I do this 29 or 31 minutes later? Finally, to make the
computation tractable, such programs also need to assume that your
logfile is in chronological order: it isn't always, and analog will
produce the same results however you jumble the lines up.
- Cookies don't solve these problems.
Some sites try to count their visitors by using cookies. But this can
only work if you refuse to let people read your pages who can't or won't
take a cookie. And you still have to assume that your visitors will use
the same cookie for their next request.
- You can't follow a person's path through your site.
Even if you assume that each person corresponds one-to-one to a host,
you don't know their path through your site. It's very common for people
to go back to pages they've downloaded before. You never know about
these subsequent visits to that page, because their browser has cached
them. So you can't track their path through your site accurately.
- You often can't tell where they entered your site, or where they
found out about you from.
If they are using a cache server, they will often be able to retrieve
your home page from their cache, but not all of the subsequent pages
they want to read. Then the first page you know about them requesting
will be one in the middle of their true visit.
- You can't tell how they left your site, or where they went next.
They never tell you about their connection to another site, so there's no
way for you to know about it.
- You can't tell how long people spent reading each page.
Once again, you can't tell
which pages they are reading between successive requests for pages. They
might be reading some pages they downloaded earlier. They might have
followed a link out of your site, and then come back
later. They might have interrupted their reading for a quick game of
Minesweeper. You just don't know.
- You can't tell how long people spent on your site.
Apart from the problems in the previous point, there is one other
complete show-stopper. Programs which report the time on the site count
the time between the first and the last request. But they don't count
the time spent on the final page, and this is often the majority of the
whole visit.
5. Real data.
Of course, the important question is how much difference these theoretical
difficulties make. In a recent paper (World Wide Web, 2,
29-45 (1999):
PDF 228kb),
Peter Pirolli and James Pitkow of Xerox Palo Alto Research Center examined
this question using a ten day long logfile from the xerox.com web
site. One of their most striking conclusions is that different commonly-used
methods can give very different results. For example, when trying to measure
the median length of a visit, they got results from 137 seconds to 629
seconds, depending exactly what you count as a new visitor or a new visit. As
they were looking at a fixed logfile, they didn't consider the effect of
server configuration changes such as refusing caching, which would change the
results still more.
6. Conclusion.
The bottom line is that HTTP is a stateless protocol. That means that people
don't log in and retrieve several documents: they make a separate connection
for each file they want. And a lot of the time they don't even behave as
if they were logged into one site. The world is a lot messier than this
naïve view implies. That's why analog reports requests, i.e. what is
going on at your server, which you know, rather than guessing what the users
are doing.
Defenders of counting visits etc. claim that these are just small
approximations. I disagree. For example, almost everyone is now accessing the
web through a cache. If the proportion of requests retrieved from the cache is
50% (a not unrealistic figure) then half of the users' requests aren't being
seen by the servers.
Other defenders of these methods claim that they're still useful because they
measure something which you can use to compare sites. But this
assumes that the approximations involved are comparable for different sites,
and there's no reason to suppose that this is true. Pirolli & Pitkow's
results show that the figures you get depend very much on how you count them,
as well as on your server configuration. And even once you've agreed on
methodology, different users on different sites have different patterns of
behaviour, which affect the approximations in different ways: for example,
Pirolli & Pitkow found different characteristics of weekday and weekend
users at their site.
I've presented a somewhat negative view here, emphasising what you
can't find out. Web statistics are still informative: it's just important not
to slip from "this page has received 30,000 requests" to
"30,000 people have read this page."
In some sense these problems are not really new to the web -- they are present
just as much in print media too. For example, you only know how many magazines
you've sold, not how many people have read them. In print media we have learnt
to live with these issues, using the data which are available, and it would
be better if we did on the web too, rather than making up spurious numbers.
7. Acknowledgements and further reading.
Many other people have made these points too. While originally writing
this section, I benefited from three earlier expositions:
Interpreting WWW
Statistics by Doug Linder;
Getting
Real about Usage Statistics by Tim Stehle; and
Making Sense of Web Usage Statistics by Dana Noonan
(which doesn't seem to be available on the web any more.)
Another, extremely well-written document on these ideas is Measuring Web
Site Usage: Log File Analysis by Susan Haigh and Janette Megarity.
Being on a Canadian government site, it's available in both
English and
French.
Or for an even more negative point of view, you could read
Why Web Usage Statistics are
(Worse Than) Meaningless by Jeff Goldberg.
This section summarises all of analog's reports, and the main commands which
control them. For details on these commands, see the sections on
Time reports,
Other reports and
Hierarchical reports.
For exact details on what is counted in each report, see the section on
Analog's definitions.
You can get descriptions of each report within the output by using the
DESCRIPTIONS and
DESCFILE commands.
Program started at Thu-24-Sep-1998 13:48.
Analysed requests from Wed-16-Sep-1998 09:52 to Mon-21-Sep-1998
02:04 (4.7 days).
The top two lines of the report tell you when the program was run, and
which dates it includes data from.
(Figures in parentheses refer to the 7 days to 24-Sep-1998 13:48).
Successful requests: 79,646 (48,947)
Average successful requests per day: 17,036 (6,992)
Successful requests for pages: 31,138 (18,689)
Average successful requests for pages per day: 6,660 (2,669)
Failed requests: 9,008 (6,378)
Redirected requests: 344 (235)
Distinct files requested: 8,180 (2,884)
Distinct hosts served: 6,640 (4,991)
Corrupt logfile lines: 2
Data transferred: 976.92 Mbytes (627.06 Mbytes)
Average data transferred per day: 208.96 Mbytes (89.58 Mbytes)
The General Summary contains some overall statistics about the data being
analysed: the most important being the number of requests (the total
number of files downloaded, including graphics); the number of requests for
pages (just counting the various pages on your site); the number of
distinct hosts (the number of different computers requests have come
from); and the amount of data transferred in bytes. For exactly what
the various lines mean, see the section on Analog's
definitions.
The figures in parentheses represent the seven days given at the top of this
report: it's the seven days before the TO time if there was a
TO command, or if not the seven days before the report was run.
You can't find out the number of visitors or visits you've had, and don't
believe any program which tells you that you can. See the section on
How the web works for a discussion of
this.
You can turn this report on or off with the
GENERAL command.
You can control which lines are included using the
GENSUMLINES command.
You can include
or exclude the figures for the last seven days with the
LASTSEVEN command.
You may get slightly different lines to those above, depending on exactly
what's in your logfile.
Each unit () represents 800 requests for pages, or part thereof.
week beg.: #reqs: pages:
---------: -----: -----:
13/Sep/98: 69614: 25277:
20/Sep/98: 10032: 5861:
Busiest week: week beginning 13/Sep/98 (26,654 requests for pages).
These reports tell you how many requests there were in each time
period. They also tell you which was the busiest time period.
You can control whether each report is included or not with the appropriate
ON or OFF command.
You can control which columns are listed by the
COLS commands. You can control
which measurement to use for the bar charts and the "busiest" line
by the GRAPH commands. You can
determine how many rows are displayed with the
ROWS commands. You can display the
lines backwards or forwards in time by the
BACK commands. You can change the
graphic used for the bar charts with the
BARSTYLE command.
Each unit () represents 150 requests for pages, or part thereof.
day: #reqs: pages:
---: -----: -----:
Sun: 2031: 1193:
Mon: 8001: 4668:
Tue: 0: 0:
Wed: 13934: 5915:
[etc.]
These reports tell you the total number of requests in each day or hour of the
week, or in each period of the day, summed over all the weeks or days in the
report. (It's not the average, nor is it the figures for just the last week or
last day).
You can control whether each report is included or not with the appropriate
ON or OFF command.
You can control which columns are listed by the
COLS commands. You can control
which measurement to use for the bar charts by the
GRAPH commands. You can change the
graphic used for the bar charts with the
BARSTYLE command.
Listing the first 5 files by the number of requests, sorted by
the number of requests.
#reqs: %bytes: last date: file
-----: ------: ---------------: ----
4123: 2.29%: 21/Sep/98 01:57: /~sret1/analog/
3064: 0.15%: 21/Sep/98 01:54: /~sret1/analog/analogo.gif
1737: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar1.gif
1692: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar16.gif
1685: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar8.gif
67345: 97.54%: 21/Sep/98 02:04: [not listed: 8,175 files]
The rest of the reports are all quite similar. Here is a list of them. If
you're unfamiliar with some of the terms, see the section on
Analog's definitions.
- The Host Report lists all computers which downloaded files from you.
- The Domain Report lists which countries those computers came
from. (If you only get "unresolved numerical addresses", see the
FAQ.)
- The Organisation Report attempts to
list the organisations (companies, institutions, ISPs etc.) which
the computer was registered under.
- The Host Redirection Report and Host Failure Report list all computers
which encountered redirections or errors.
- The Request Report (the example above) lists which files were
downloaded.
- The Directory Report lists which directories those files came from.
- The File Type Report lists the file types (actually, extensions) of
those files.
- The File Size Report breaks them down by size.
- The Processing Time Report shows the time taken to serve each file.
- The Redirection Report lists the filenames which resulted in redirections:
mainly directories without the final slash, and
"click-thru"'s.
- The Failure Report lists the filenames which caused errors.
- The Referrer Report lists which pages linked to your files (and
also pages which included your images).
- The Referring Site Report lists the servers those referrers were on.
- The Search Query Report and the Search Word Report list which search
terms people used to find your site (provided you've used the
appropriate SEARCHENGINE
commands).
- The Internal Search Query Report and Internal Search Word Report list the
search terms people used on scripts within your site (provided
you've used the appropriate
INTSEARCHENGINE commands).
- The Redirected Referrer Report lists the referrers which led to
redirections.
- The Failed Referrer Report is essentially a broken link report.
- The Browser Report lists the detailed versions of browsers used,
and the Browser Summary collects them by vendor.
- The Operating System Report lists the operating systems of the
visitors whose browser types you know (as far as possible: it's not always
possible to distinguish between Windows NT & Windows 2000, for
example). Which browsers count as robots is controlled by the
ROBOTINCLUDE and
ROBOTEXCLUDE commands.
- The Virtual Host Report lists the activity of your various virtual
domains.
- The Virtual Host Redirection Report and Virtual Host Failure Report give
the number of redirections and errors on each of those domains.
- The User Report lists your visitors if your server requires
authentication, or perhaps the visitors' cookies.
- The User Redirection Report and User Failure Report list the users who
encountered redirections or errors.
- The Status Code Report lists the number of each
HTTP status code that you had.
Usually you can only get some of these reports, depending on what information
is recorded in your logfile.
As usual, you can control whether each report is included or not with the
appropriate ON or OFF
command.
You can control which columns are listed with the
COLS commands.
You can change how the reports are sorted with the
SORTBY commands.
You can control how many items are listed with the
FLOOR commands.
You can control whether and how the pie charts are plotted with the
CHART commands.
You can list the time period covered by each report with the
REPORTSPAN command.
You can include or
exclude individual items with the output
INCLUDE and EXCLUDE commands.
You can change the names of items in the reports with the
output alias commands.
The "not listed" line at the bottom counts those items which
didn't get enough traffic to get above the FLOOR for the report,
and those which were explicitly EXCLUDEd.
Most of these reports have a hierarchical structure, like this example for
the Domain Report:
Listing the first 5 domains by the number of requests, sorted by
the number of requests.
no.: #reqs: %bytes: domain
---: -----: ------: ------
1: 13243: 16.23%: .com (Commercial)
: 1262: 1.26%: aol.com
2: 11783: 25.64%: .jp (Japan)
: 9592: 22.19%: ad.jp
: 1043: 1.97%: co.jp
3: 10073: 11.62%: .net (Network)
: 1926: 1.71%: uu.net
4: 9657: 13.31%: [unresolved numerical addresses]
5: 7388: 8.04%: .uk (United Kingdom)
: 5792: 5.74%: ac.uk
: 1510: 1.99%: co.uk
: 18502: 25.16%: [not listed: 82 domains]
You can control which items are listed on the lower levels by the
SUB family of commands.
There are also separate
sub-SORTBY and
sub-FLOOR commands for the
lower levels. (Called
ARGSSORTBY and
ARGSFLOOR for some reports,
such as the Request Report.)
Notice that the lower levels are always listed with their parents, so they
break up the sort order. Also, they don't count towards the total number of
items listed, so there are only 5 domains listed in the example above, as you
can see in the first column. (The N
column is particularly useful in hierarchical reports for this reason.)
Which files are linked to in the reports is controlled by the
LINKINCLUDE and
LINKEXCLUDE commands. The links are also affected by the
BASEURL command.
This analysis was produced by analog 4.90beta4.
Running time: 8 seconds.
At the end of the report you can see which version of analog produced the
report, and how long the report took to run.
This section describes how analog defines its terms, and exactly what is
counted in each category. It gets a bit technical at times -- if you're just
trying to understand the reports, I recommend you read the section on
Analog's reports first.
We start with some basic definitions.
The host is the computer which has asked you for a file (often called
the "client"). The file
might be a page (i.e., an HTML document) or it might be something
else, such as an image. By default filenames ending in (case insensitive)
.html, .htm, or / count as
pages, but you can tell analog to count any file as a page with the
PAGEINCLUDE command.
The total requests counts all the files which
have been requested, including pages, graphics, etc. (Some people call this
the number of hits, but that word is also used in other ways by other
people, so I avoid it). The requests for pages obviously only counts
pages. The referrer for a request
is the place that the user (or his computer) heard about your file from. If
he followed a link to reach a page, it will be the previous page. In the
case of a graphic on a page, the referrer will be the page containing the
graphic.
Analog recognises four categories of request, based on the HTTP status code of
the request. You can see the total number of requests for each status code,
and what the codes mean, in the Status Code Report. (Or see the
HTTP spec for a
detailed description.)
First, successful requests are those with HTTP status codes in the 200's
(where the document was returned) or with code 304 (where the document was
requested but was not needed because it had not been recently modified and the
user could use a cached copy). (Actually, you can configure code 304 to be a
redirected request instead of a successful request with the
304ISSUCCESS command.)
Successful requests for pages refers to
those lines on which the file requested was named and was a page.
Redirected requests are those with other codes in the 300's, indicating
that the user was directed to a different file instead. The most common cause
of these requests is that the user has incorrectly requested a directory name
without the trailing slash. The server replies with a redirection ("you
probably mean the following") and the user then makes a second connection
to get the correct document (although usually the browser does it automatically
without the user's intervention or knowledge). The other common cause of
redirected requests is their use as "click-thru" advertising
banners.
Failed requests are those with codes in the 400's (error in request) or
500's (server error). They come about for a variety of reasons, but the most
common are when the requested file is not found or is read-protected.
Finally, requests returning informational status code are those with
status codes in the 100's. These are very rare at the moment.
There are a few other types of logfile lines listed in the General Summary.
Lines without status code refers to those logfile lines without a
status code, and the successful requests in the General Summary only counts
the ones with a status code: except if the line contains the name of the file
requested, and the filename is being counted (not starred in the
LOGFORMAT), then it's counted
as a success.
Unwanted logfile entries are ones which you have explicitly
excluded. Finally,
corrupt logfile lines are those which analog didn't manage to parse.
(The number given is the number of unparseable lines in the whole logfile, even
if the rest of the report is restricted to a small part of the logfile,
because analog doesn't know whether a line would have been wanted if it
couldn't parse it! You can list all the corrupt lines by turning
debugging on.)
Most reports only include successful requests in calculating the number of
requests, requests for pages, bytes, and last date: unless, of course, the
report is a redirection or failure report. There is a further restriction on
the time reports, the Status Code Report, the Processing Time Report, the
File Size Report, and the bytes lines in the General Summary: the logfile
line must also contain the name of the file requested, and the filename must
be being counted. This is necessary to stop double counting if the server uses
separate logs.
The "not listed" line at the bottom of each of the
non-time reports includes both those items which
were explicitly excluded at the output stage with an
OUTPUTEXCLUDE command,
and those which were not listed because they were below the floor for the
report.
The figures in parentheses in the General Summary are for the last seven days:
either the seven days before the TO time, or if no TO
time is given, the seven days before the time of the program start. (It would
be nicer to use the seven days before the last time in the logfile, but we
don't know when this is until we've read the whole logfile, and by then it's
too late.) The figures for the last seven days are not included if all, or
none, of the requests fall in the last seven days.
In the Domain Report, "domain not given" means that the hostname did
not contain a dot. "Unknown domain" means that it did contain a dot,
but that the domain name was not in the domains
file (or that the domains file could not be read). The hosts and domains
concerned can be listed by turning
debugging on.
In the Operating System Report, which browsers count as robots is controlled
by the ROBOTINCLUDE and
ROBOTEXCLUDE commands.
This section lists all the errors and warnings which analog can produce,
together with a short explanation.
First, you should understand the difference between a crash, an error, a
warning, and a debugging message. First, a crash is when analog exits
prematurely, without producing the whole output file. The system might give a
message, but analog will not give one of its own messages. Analog should never
crash. If it does crash, please tell me about it.
An error is something which stops analog finishing its job. Whenever
an error is detected, analog gives a message starting something like
analog: Fatal error: and will then tell you what type of thing went
wrong before quitting.
A warning is a problem which is not fatal to analog: it will keep on
with its processing. These vary from the possibly serious, such as files which
could not be found, to purely informational. They produce a message starting
analog: Warning. You can turn warnings off using the
WARNINGS command.
Finally, a debugging message gives information on the state of the
program. They just begin with a single code letter followed by a colon. You
don't get any debugging messages unless you've
asked for them.
If you want to send these messages to a file instead of to the screen, you can
use the ERRFILE command.
To tell analog the width of your screen for these messages, you can use the
ERRLINELENGTH command.
Now I shall describe all the possible errors and
warnings in detail.
- Ran out of memory: cannot continue
- Analog ran out of memory. Try increasing the memory available to the
process, if your operating system will allow it, or using the
LOWMEM commands.
- Cannot ignore mandatory configuration file
- See the section in the Readme on the
mandatory configuration file.
- Can't find language file
Language file too short
Language file too long
Language file contains excessively long lines
- Analog can't run without a well-formed language file. See the
documentation on language files.
- Attempted to read more than 50 configuration files
- The most likely explanation for this is that you have accidentally
created a loop using the
CONFIGFILE command, for
example if a configuration file includes itself.
- Incorrect default given in anlghead.h
Default given in anlghead.h too short
- If you've compiled your own version, and you've specified an incorrect
configuration in the file anlghead.h, analog gives up to
allow you to fix it.
- Failed to open output file for writing
- Analog couldn't create, or couldn't write to, the output file you
specified.
- Cache output file already exists: won't overwrite
- Analog won't overwrite an old cache file. You must move or delete it
yourself first.
- OUTFILE and CACHEOUTFILE are the same
- OUTFILE and CACHEOUTFILE both set to stdout
- This can't be what you wanted, because one would overwrite the other.
- OUTPUT NONE and CACHEOUTFILE none selected
- You requested no output.
- OUTPUT LATEX only available with US-ASCII
and ISO-8859-1 character sets
- The LaTeX output style only works with Western European languages
because the standard LaTeX distribution doesn't contain the characters
for other languages.
Remember that warnings are not fatal: in fact some are rarely even serious.
You can turn them off using the
WARNINGS command. The possible
warnings come in several different categories, shown by a letter in the warning
message. The categories are as follows.
- C
- invalid configuration specified
- D
- dubious configuration specified
- E
- ERRFILE command used
- F
- files missing or corrupt
- L
- apparent problems in logfiles
- M
- possibly problems in logfiles
- R
- turning off empty reports
This category indicates an incorrect configuration. Analog will either ignore
what you said, or try and do the best it can with it. There are too many
warnings in this category to list completely. You will have to consult the
documentation for the particular configuration
command that gave an error. If you get an error for a command which used
to work in a previous version of analog, have a look in the section
Updating from older versions.
This is for configurations which might be intended, but which look suspicious.
Analog will not override what you've specified in this case.
- LOGFORMAT with no subsequent logfile
- You have specified a LOGFORMAT command, but no
subsequent logfile to which it could be applied. Most likely
you put the LOGFORMAT after the LOGFILE command.
You must put the LOGFORMAT before the LOGFILE
command or use DEFAULTLOGFORMAT instead. See the section on
Specifying a log format for
more details.
- Offset not a multiple of 30
Offset more than 25 hours
- The time offsets are meant to be for
correcting between differences in time zones. These differences are
usually multiples of 30 minutes between -25 and +25 hours. Maybe you
specified the offset in hours instead of minutes by mistake, or
something like that.
- FROM time is later than the present
- Usually this will mean that no entries are counted. Analog doesn't
try and correct it in case the clock on your computer or your server is
wrong -- but you would be better using
TIMEOFFSET or
LOGTIMEOFFSET to correct
those clocks.
- SORTBY doesn't match FLOOR
SORTBY doesn't match SUBSORTBY
(or FLOOR/SUBFLOOR)
SORTBY (or FLOOR or GRAPH) isn't
included in COLS
- Within one report, it's helpful to your readers to have the sort methods
and the floors compatible, and all included in the COLS.
(See the section on Non-time
reports).
- Column N with SORTBY ALPHABETICAL/RANDOM
- Numbering off the items when they're not in order of busyness is
probably a mistake.
- Time reports have not all got same value of BACK
- It's usually helpful to have all the time
reports running in the same direction.
- Report contains no COLS
- You've got an empty COLS list for one report, so you'll just
get a list of names, not any information about them.
- LOWMEM 3 prevents that item being cached
- You're making a cache file, but one item is
not being recorded because of a
LOWMEM command, and will therefore
not be saved in the cache file.
There is only one warning in this category.
- Redirecting future diagnostic messages
- You've used an ERRFILE command to change the destination of
errors, warnings, debugging and PROGRESSFREQ
diagnostics. This is just warning you so that you don't miss any
messages.
This category is for diagnosing files which couldn't be opened or read
successfully. These can be serious, but most of the messages should be
self-explanatory. There are a few worth mentioning specifically.
- Can't auto-detect format of logfile
- The LOGFORMAT is set to automatic
detection, but the first line of the logfile is not in any of the
standard formats. This error can often be generated when you try and
specify your own LOGFORMAT but put it after the
LOGFILE command so that it is not in effect for that logfile.
- Logfile with ambiguous dates
- Some servers, notably IIS and WebSite, record dates in their logfiles
according to local conventions. Then if analog encounters 2/1/99, for
example, it doesn't know whether it's the 2nd January or 1st February.
This problem, and what to do about it, is described in more detail in the
section on
Choosing a logfile.
- Failed to open domains file
- In this case, all domains will be counted as "unknown domains".
- Failed to open DNS input file
- The first time you use DNS lookups, you don't have a DNS cache file, so
you get this warning. Assuming you are using DNS WRITE, the
message will go away next time you run analog.
- DNS lock file already exists
- To stop two copies of analog trying to write the DNS file at the same
time, an empty "lock file" is created, which tells the second
copy of analog to use DNS LOOKUP instead of DNS
WRITE. If analog crashes, it may not delete its lock file. So if you
get the "already exists" message even though no other copy of
analog is running, you may need to delete the lock file yourself.
When analog finishes reading a logfile, it checks whether there might have
been something wrong with it.
- Large number of corrupt lines
- This could indicate a problem with the logfile, or with the
LOGFORMAT specification.
The possible causes are described in the section about
Choosing a logfile.
- Logfiles overlap: possible double counting
- This means that two logfiles which were counting the same type of item
overlapped in time. Because it's only based on the time period of the
logfiles, not the actual entries, this may or may not indicate a genuine
problem. It is a problem if you read the same logfile twice. Or maybe
you used the cache file feature
incorrectly.
Or maybe your web server produces several logfiles, and your
LOGFORMAT specification
should have told analog to ignore some of the items in some of the
logfiles. It is not a problem if the logfiles are in fact completely
disjoint; for example if you analyse logfiles from two different virtual
hosts. In this case, the statistics produced will still be correct.
This category is for warnings about logfile formats which might make analog
produce unexpected results.
- Logfile contains lines with no [whatevers], which are being
filtered
- This is usually harmless. It is perhaps best explained by
example. Suppose you are excluding certain
files from the analysis, but that you are also analysing a browser log
which just contains information about the browsers used, not which files
they read. Then we can't exclude the browsers which read the excluded
files, because we don't know which they were, so all browsers will be
included.
- Logfile contains lines with no file names (or bytes): page (or byte)
counts may be low
- If a logfile line doesn't contain a file name, analog will assume that
the request wasn't for a page. Similarly, if it doesn't give the number
of bytes transferred, analog will make the bytes zero. So the number of
page requests or bytes credited to the other items on that line will
then be too low.
- Old-style cache file doesn't contain data on first-request times of
items; so these may be overestimated
- Cache files now contain the first-request time of each item. But if you
read a cache file from an older version of analog, this data will not
have been recorded, and so the last-request time will be used instead.
- Cache file doesn't contain last-seven-day statistics
- It is impossible for cache files to record the number of requests in the
last 7 days, because the data would be wrong at the time the cache file
was read.
This is used when analog turns off an empty report. This could be because none
of the relevant items were included in any of the logfiles, or perhaps
beacause a LOWMEM command stopped them
being recorded. It is also used when analog turns off a pie chart which would
have contained only one wedge.
This is not an analog-generated warning, but it can result from analog closing
a logfile it's uncompressing without reading the whole of it, when it
determines that it will not need it.
This list is divided into six sections:
- Getting Started
- Basic Configuration
- Understanding the Output
- Advanced Usage
- Form Interface
- Design Decisions
- Getting Started
See also Starting to use
analog.
- Analog doesn't have a setup.exe.
- Analog just flashes up a DOS window and then
quits.
- When I try and edit analog.cfg,
Windows asks me which program I want to use to open that file.
- When I try and compile analog, it gives me an
error (e.g. on SunOS 5).
- Analog didn't write the logfile when I ran
it.
- Analog won't read extended logfiles generated by
IIS.
- What does "Logfile with ambiguous
dates" mean?
- What does this error message mean?
- I tried to run analog from my browser, but it
didn't work.
- Basic Configuration
- I want to make several different statistics
pages. Do I have to install several copies of analog?
- My analog.cfg included lots of
CONFIGFILE commands, but only one report was
produced.
- Why doesn't the Daily Report only show the last
six weeks?
- Why do the time reports all list 0 requests?
- How do I get the Request Report to list files
with fewer than 20 requests?
- How do I ignore accesses from my site?
- How do I ignore internal referrers in the
Referrer Report?
- How do I get information on just my pages, not
everybody's?
- How do I list subdirectories not just top-level
directories in the Directory Report?
- How do I list minor browser versions in the
Browser Summary?
- I used the command "DIREXCLUDE
/mydir/", but files in that directory were still
listed.
- I used the command
"FILEEXCLUDE /cgi-bin/script.pl", but that
file was still listed in the Request Report.
- I used the command "IMAGEDIR
C:\analog\images\", but I only got broken images.
- I want a configuration file with all of the
possible configuration commands in it.
- I want to see your configuration file.
- Does the order of the commands matter in the
configuration file?
- Why are my browser and referrer reports
empty?
- Why isn't the Referrer Report sorted
properly?
- I want to list (or not to list) referrers
with their search arguments in the Referrer Report.
- Why are my click-thru's (or CGI scripts)
not listed in the Request Report?
- I can't find /script.pl?q=1 in the
Request Report.
- Why can't I have P in the
REQCOLS, REQSORTBY or
REQFLOOR?
- Can I find out which files each referrer pointed
to?
or Can I find out which files each
host has read?
or Can I find out which hosts have
read each file?
or Can I find out the number of hosts
visiting on each day?
or lots of similar questions.
- Can I use %d, %m etc. in
the LOGFILE, like I can in the
OUTFILE?
- Can SETTINGS ON produce a
configuration file instead of an English list of settings?
- I get the message "logfiles overlap"
even though the two logfiles contain completely separate
requests.
- Can I get data on individual visitors, or
visits, to my site?
- Can I change the way dates are formatted in the
output?
or Can I change some of the phrases
in the output?
- Can I change the background colour of my
output?
- How can I make the output prettier?
- Understanding the Output
See also What the results
mean.
- How do I find out the number of hits from your
data?
- Why are there so many referrers from my own
site?
- The report covers exactly a week, but the
figures for the last seven days don't agree with the totals.
- I only have 240 requests in total. Why does
analog think there are 840 requests per week?
- Why doesn't analog agree with the counter on my
page?
- Why doesn't analog agree with grepping the
logfile?
- Why do I only get "unresolved numerical
addresses" in the Domain Report and Organisation Report?
- Why are directories listed in the Request
Report?
- When someone reads one of my PDF files, it
scores dozens of hits.
- The Organisation Report doesn't identify
organisations correctly.
- "Organization" isn't spelled
correctly.
- Advanced Usage
- How can I do such-and-such with a command line
option?
- I want a list of all command line arguments.
- How do I list all numerical subdomains to depth
2 in the Domain Report?
- I want to do a HOSTEXCLUDE on some IP
addresses. Can I use a range like 131.111.20.1-127?
- I want to be able to count requests with status
code 301 and 302 as successes, so that they appear in the Request
Report.
- I want to report on a field analog doesn't know
about.
- Can analog analyse FTP logfiles?
- Can analog analyse other logfiles, such as mail
logs, or the syslog?
- How can I run analog automatically every day?
- I'm setting up IIS. Which logfile format should
I use?
- I host lots of virtual domains. How should I set
up analog?
- Can I make several reports with just one run of
analog?
- I ran out of memory when trying to run
analog. What can I do?
- You're processing 20,000,000 requests in under
10 minutes. Why is mine much slower?
or Analog appears to stall.
- How do I make a link on my page that runs
analog?
- Do I have to save all my old logfiles?
or Can analog make statistics from an
old report instead of reading the whole logfile again?
- Can analog write to a database or
spreadsheet?
- Form Interface
See also Form
troubleshooting.
- I couldn't make the form run.
- How can I specify different logfiles from the
form interface?
- I specified
LOGFILE=C:\inetpub\wwwroot\w3svc1\*.log from the form
but it said "Unsafe characters in
LOGFILE".
- My browser showed me anlgform.pl, rather than
running it.
- Why does the form interface give "Document
Returned no Data"?
- The images don't appear when running analog from
the form interface.
- Why do I get some reports that weren't requested
on the form?
- How do I make a link to anlgform.pl
without using anlgform.html?
- Is there a form interface not using Perl
(e.g. ASP or .exe)?
- Design Decisions
- Why doesn't the HEADERFILE replace
the whole <head> of the output file?
- Why not use HTML tables?
- Why are you still using HTML 2.0?
- It would be better if you used png's instead of
gif's.
- Why not just do DNS resolution of the hosts that
actually make it into the Host Report?
- Couldn't you do the DNS lookups faster with
threads?
- Why doesn't analog analyse the error_log?
- My server lists local names in the logfile. Can
you put a common suffix on them automatically?
- Can you extrapolate from the current month's
partial data to produce a prediction for the whole month, based on
the rate so far?
- Can you extend the Domain Report to say which US
states people visited from?
- Why not use language codes instead of country
codes for the names of the language files?
- Why doesn't analog produce statistics on
"visits"?
- Why don't you sell analog?
Most questions in this category are answered in the section entitled
Starting to use analog. If you can't get
analog running you should look there.
- Analog doesn't have a setup.exe.
No, and it doesn't need one. It's already ready to run! See
Starting to use analog under
Windows.
- Analog just flashes up a DOS window and then
quits.
This is the correct behaviour. It should have created a report
called Report.html. See Starting
to use analog under Windows.
- When I try and edit analog.cfg,
Windows asks me which program I want to use to open that file.
Use Notepad, or any other plain text editor.
- When I try and compile analog, it gives me an
error (e.g. on SunOS 5).
Maybe you need to edit the Makefile. There are some
platform-specific notes in the section
Starting to use analog on other
platforms, and in the Makefile itself.
- Analog didn't write the logfile when I ran
it.
Analog doesn't write the logfiles. Your web server writes the
logfiles, and analog just reads them. See
Starting to use analog.
- Analog won't read extended logfiles generated by
IIS.
By default, this server writes the date only at the top of the
logfile, not on
every line. But it doesn't write a new date if the date changes during
the logfile, so analog can't tell which date later entries in the log
occurred on. More details, and what to do about it, are in the section
on Choosing a logfile.
- What does "Logfile with ambiguous
dates" mean?
See the section on Errors and
warnings.
- What does this error message mean?
Again, see the section on Errors and
warnings.
- I tried to run analog from my browser, but it
didn't work.
Analog should not be run as a CGI program, or even put in the folder
with your CGI programs, for security reasons. You should use the special
CGI program instead.
Analog has lots of configuration commands, all of which are in the section on
Customising analog. Here are some of
the most common questions. If your question isn't answered here, you could
also try looking in the index.
- I want to make several different statistics
pages. Do I have to install several copies of analog?
No. Just install it once, and run it with different
configuration files. (You do have
to run it once per output page though.)
- My analog.cfg included lots of
CONFIGFILE commands, but only one report was produced.
Analog can only produce one report per run. To produce several
reports, you have to run it several times.
- Why doesn't the Daily Report only show the last
six weeks?
This is controlled by the
FULLDAYROWS command.
- Why do the time reports all list 0 requests?
They probably only list 0 requests for pages. Maybe you need to use
PAGEINCLUDE to count
more files as pages.
- How do I get the Request Report to list files
with fewer than 20 requests?
Use the REQFLOOR
command, e.g., REQFLOOR 10r to list down to 10
requests. Also, if you want to list all the files not just pages, you
may need to use the command REQINCLUDE *
- How do I ignore accesses from my site?
Use the HOSTEXCLUDE command.
- How do I ignore internal referrers in the
Referrer Report?
Use the REFREPEXCLUDE command.
- How do I get information on just my pages, not
everybody's?
Use the FILEINCLUDE command.
- How do I list subdirectories not just top-level
directories in the Directory Report?
SUBDIR */*
- How do I list minor browser versions in the
Browser Summary?
Use SUBBROW */*.*
- I used the command "DIREXCLUDE
/mydir/", but files in that directory were still listed.
DIREXCLUDE only affects the Directory Report, not the
other reports. You want "FILEEXCLUDE /mydir/*"
instead.
- I used the command
"FILEEXCLUDE /cgi-bin/script.pl", but that
file was still listed in the Request Report.
If the file has search arguments, you have to be a bit careful with
FILEEXCLUDE. This is described in the section about
search arguments.
- I used the command "IMAGEDIR
C:\analog\images\", but I only got broken images.
The IMAGEDIR command has to be a URL, not a directory on
your disk. (It's just inserted into the <img> tags in
the HTML output: have a look at the output and you'll see.) Also this
means that the images have to be put in the part of your filespace that
has your web files.
- I want a configuration file with all of the
possible configuration commands in it.
One is already distributed with the program, in the
examples folder.
- I want to see your configuration file.
This is also included in the distribution in the examples
folder.
- Does the order of the commands matter in the
configuration file?
Only occasionally. If you have two of one command, the later one will
generally override the earlier one. Apart from that, commands can come
in any order, except that LOGFORMAT
and LOGTIMEOFFSET
commands must come before the LOGFILE to which they refer.
- Why are my browser and referrer reports
empty?
Maybe your logfile doesn't contain any browser and referrer
information?
- Why isn't the Referrer Report sorted
properly?
It is sorted properly. But search arguments
are also listed under the file they belong to, and this interrupts the
ordering. If you set the
REFARGSFLOOR high
enough you won't see the search arguments. Or you can include the
N column to make the
ordering more obvious.
- I want to list (or not to list) referrers
with their search arguments in the Referrer Report.
To see the search arguments you may need to set the
REFARGSFLOOR lower. To
avoid seeing them, you could set the REFARGSFLOOR higher, or
alternatively use the
REFARGSEXCLUDE command
to ignore them either for all files or just for particular files.
- Why are my click-thru's (or CGI scripts)
not listed in the Request Report?
If they cause a redirection to another page, they will be listed in
the Redirection Report, rather than the Request Report.
- I can't find /script.pl?q=1 in the
Request Report.
If it causes a redirection, it will be in the Redirection Report not
the Request Report. But also, you may need to set the
REQARGSFLOOR or
REDIRARGSFLOOR lower to
actually see it.
- Why can't I have P in the
REQCOLS, REQSORTBY or REQFLOOR?
The number of page requests doesn't make sense in the Request Report
because it's either the same as the number of requests (if the file is a
page) or zero (if it isn't). If you want to list only pages in this
report, use REQINCLUDE pages instead.
- Can I find out which files each referrer pointed
to?
or Can I find out which files each host has read?
or Can I find out which hosts have read each file?
or Can I find out the number of hosts visiting on each
day?
or lots of similar questions.
There are lots of questions like this. They all want analog to
cross-reference two sorts of item (e.g. files and referrers in the first
example above, or hosts and dates in the last). Granted, these would be
useful. But it is fundamental to analog's speed and minimal memory
requirement that it only records statistics for each type of item
individually, and doesn't record enough information to cross-reference
them afterwards.
What you can do is to restrict the analysis to just requests from
certain referrers (for example) with the
REFINCLUDE command, or to a
particular time period with
FROM and TO.
This is usually good enough.
- Can I use %d, %m etc. in
the LOGFILE, like I can in the
OUTFILE?
No. This is rarely useful, because you can only get
at today's date that way. If you're on Unix, you can embed the date in
the logfile name using the date command: for example,
analog access.`date +%Y%m%d`.log
- Can SETTINGS ON produce a
configuration file instead of an English list of settings?
No. But it does tell you which configuration files it read, so you
can just get the commands out of them. Or if you want a list of all
configuration commands, there is one in the examples
directory.
- I get the message "logfiles overlap"
even though the two logfiles contain completely separate requests.
This message is based only on the dates of the files, not the
contents. If you're sure there is no problem, you can turn it off with
the command WARNINGS -L.
- Can I get data on individual visitors, or
visits, to my site?
No, it's not technically possible, and don't believe any program
which tells you it is. See the section on
How the web works for details.
- Can I change the way dates are formatted in the
output?
or Can I change some of the phrases in the output?
Yes, by editing the language
file.
- Can I change the background colour of my
output?
Yes. The correct way to do this is to write a style sheet, and then
use the STYLESHEET
command.
- How can I make the output prettier?
There are some programs on the helper
applications page to do this.
Most of the questions in this category are answered in the section on
What the results mean, which I really
recommend you read if you want to understand what analog is telling you.
- How do I find out the number of hits from your
data?
I don't use the word hits, because people use it in
different ways, so it's misleading. I use requests for the
number of transfers of any type of file (text, graphics, ...), and
page requests for the number of transfers of HTML pages. See the
section on Analog's definitions
for more information.
- Why are there so many referrers from my own
site?
These come from all the internal links on your site, and all the
graphics on your pages. See the section on
How the web works for more
information. If you don't want to see them, you can use
REFREPEXCLUDE to
exclude them.
- The report covers exactly a week, but the
figures for the last seven days don't agree with the totals.
The figures in parentheses are for the seven days before the time
the program was run, unless there is a TO command. They
are never for the seven days before the end of the logfile.
(Although if you know that the logfile only contains entries up to a
certain time, you may want to include a TO command for that
time to get the last seven days' data right.)
- I only have 240 requests in total. Why does
analog think there are 840 requests per week?
If you have 240 requests in two days, that's a rate of 840 requests
per week. Just like if you drove 28 miles in 20 minutes, you'd have
driven at 84 miles per hour.
- Why doesn't analog agree with the counter on my
page?
There are lots of possible reasons. Do they both start from
the same date? Are you just looking at requests for that one page with
analog, not for all your other pages and graphics? Also, analog will
record all requests to that page; if it's a graphic, your counter will
only measure requests from people on graphical browsers that reached
that place on the page.
- Why doesn't analog agree with grepping the
logfile?
Have you understood what analog includes in
its counts? In particular, most reports only list "successful"
requests (HTTP status codes 200-209 & 304). A naïve grep would
count failures too.
- Why do I only get "unresolved numerical
addresses" in the Domain Report and Organisation Report?
Your server only records the numerical IP address of the hosts that
contact you, not their names. Read the section about
DNS lookups, or turn DNS resolution
on in your server.
- Why are directories listed in the Request
Report?
They are not directories, they are pages with the same name as
the directory. For example, I have both a directory called
/analog/ and a page called /analog/ (which happens
to be the same as /analog/index.html).
- When someone reads one of my PDF files, it
scores dozens of hits.
PDF files are often downloaded and read one page at a time, and each
page will then count as a separate request. Although this is not ideal,
it's much less clear what to do about it. Analog has no way of knowing
how many pages constituted a single download in the reader's mind. As
usual, we can only reliably report how many requests there were at the
server, not guess what users did with the file later.
- The Organisation Report doesn't identify
organisations correctly.
The rules I use are described in the section on
The domains file.
I admit they aren't perfect, but this is because in domains in which
organisations aren't all at the same level in the domain hierarchy,
there is no way to identify them perfectly without long lists.
- "Organization" isn't spelled
correctly.
Yes it is. If you want American spellings, you have to specify
LANGUAGE US-ENGLISH
in your configuration file.
- How can I do such-and-such with a command line
option?
Use the +C option to put
any configuration command on the command line.
- I want a list of all command line arguments.
There is a list in the index.
- How do I list all numerical subdomains to depth
2 in the Domain Report?
SUBDOMAIN *.* deliberately only lists the top-level
numerical subdomains to avoid cluttering the output.
SUBDOMAIN *.*.* will work but will list everything else
to depth 3. So the best solution is
SUBDOMAIN 1*.*,2*.*,3*.*,...
- I want to do a HOSTEXCLUDE on some IP
addresses. Can I use a range like 131.111.20.1-127?
No, but you can use wildcards or regular expressions, which allows
you to specify most ranges very quickly.
- I want to be able to count requests with status
code 301 and 302 as successes, so that they appear in the Request
Report.
No, you really don't, because that would lead to double counting
when a request for /dir (code 301) is redirected to
/dir/ (code 200). For CGI scripts etc. look in the
Redirection Report instead of the Request Report.
- I want to report on a field analog doesn't know
about.
Use the following kludge. Write a
LOGFORMAT to declare the field to
be a virtual host or a user (whichever you aren't already using). Then
edit your language file so that the right text is output.
- Can analog analyse FTP logfiles?
Yes. If you are using the xferlog format, then there is a
configuration file to help you in the examples
directory. Otherwise you will have to write your own
LOGFORMAT. (You probably won't be
able to read anything other than the lines corresponding to file
transfers.)
- Can analog analyse other logfiles, such as mail
logs, or the syslog?
Yes and no. For mail logs, there is a program on the
helper applications page to help you. For
other logs, you can get some results out by writing your own
LOGFORMAT. But analog does make
some assumptions about the sort of information it expects on a logfile
line, and the further these assumptions are from being met, the harder
it will be!
- How can I run analog automatically every day?
This depends on your particular machine. On Unix, you need to run
analog as a cron job (see "man cron"). This is my cron command
to run it at 1:50am every day:
50 1 * * * $HOME/bin/analog
On Windows NT you can do the same with the at command. (It's
probably easiest to put it in a batch job; also only an
administrator can run at.) On Windows 98, it should be possible with the
Task Scheduler, although I haven't tried it. On Windows 95 it's not
possible as far as I know.
On Mac, there are programs called
Cron or
CronoTask
to do this.
- I'm setting up IIS. Which logfile format should
I use?
The W3C format is probably best. You can turn fields on and off in
this format. And it contains all the possible fields which can be
logged, which the other formats do not. However, it is important to turn
the date field on (it's off by default), not just to log the date once
at the top: see the section on problems
with logfile formats for why.
- I host lots of virtual domains. How should I set
up analog?
There's a file in the examples directory which discusses
this issue.
- Can I make several reports with just one run of
analog?
Not at the moment. I want to do this in a future version, but it will
require some considerable work. However, depending on your which options
you want to vary, you may be able to avoid having to read the logfile
several times by using cache files. (This is
likely to be faster, but more complicated.)
- I ran out of memory when trying to run
analog. What can I do?
See the section on Coping with low memory.
- You're processing 20,000,000 requests in under
10 minutes. Why is mine much slower?
or Analog appears to stall.
If you have DNS lookups on, they are very
slow. Otherwise, it probably depends on the speed of your computer and
disks, and what other programs are running at the same time. You can
use the PROGRESSFREQ
command to see
if it's really stalled or whether it's just being slow. If you are
running out of memory, you might find analog's
LOWMEM commands helpful.
- How do I make a link on my page that runs
analog?
Link to the anlgform program, with the
desired options. But be careful about the load on your server.
- Do I have to save all my old logfiles?
or Can analog make statistics from an old report instead
of reading the whole logfile again?
These questions are answered in the section about
Cache files.
- Can analog write to a database or
spreadsheet?
Use the computer-readable output style,
which can export to CSV. Or if what you really want to do is to run
analog again without re-reading the logfiles, read the section about
Cache files.
There is also a section on troubleshooting in
the documentation about the form interface.
- I couldn't make the form run.
Have you made analog work without the form? Have you run
anlgform.pl from the command line as explained in the section
on troubleshooting?
- How can I specify different logfiles from the
form interface?
Just add a new field to the form with name=LOGFILE
- I specified
LOGFILE=C:\inetpub\wwwroot\w3svc1\*.log from the form but it
said "Unsafe characters in LOGFILE".
On the form, you can't use wildcards in the LOGFILE
name for security reasons.
- My browser showed me anlgform.pl, rather than
running it.
You have to tell the server to execute the CGI program, not just
send it out like it would for a normal file. Often this is done by
putting it in a special /cgi-bin/ directory.
- Why does the form interface give "Document
Returned no Data"?
If it doesn't happen for a while, then probably the server is giving
up before the analog process has finished running. Increase the timeout
interval on the server.
- The images don't appear when running analog from
the form interface.
You probably need to set the
IMAGEDIR. If the images
are in your /cgi-bin/ directory, the server will normally try
to execute them instead of just sending them out.
- Why do I get some reports that weren't requested
on the form?
If a report is neither included nor excluded on the form, the
system default will be used. This will depend on your configuration files
and on compile-time settings.
- How do I make a link to anlgform.pl
without using anlgform.html?
anlgform.pl accepts the GET or POST
methods of form submission. So you can make a link with the arguments
passed after a question mark in the usual GET way.
- Is there a form interface not using Perl
(e.g. ASP or .exe)?
There is a Windows executable version of the Perl script on the
analog helpers page. At the time of writing,
I don't know of any ASP version of the anlgform program, but if someone
writes one, I'll put it on the analog helpers
page too.
Warning: Potential authors must understand CGI
security issues in general, and the extra
issues about what the analog form interface must disallow, or they
will open security holes on their system.
or "Why didn't you do it this way?"
- Why doesn't the HEADERFILE replace
the whole <head> of the output file?
Because you almost never get valid HTML that way. Use a
style sheet instead.
- Why not use HTML tables?
Most non-graphical browsers don't do a good job with tables. Also
tables aren't available in HTML 2.0, which is the sort of HTML
analog writes.
- Why are you still using HTML 2.0?
It seems to be impossible to make my bar charts in HTML 4.0.
- It would be better if you used png's instead of
gif's.
I'm aware of the issues. But png support isn't good enough even in
new browsers; and I have always made a point of designing analog to work
even on old browsers.
- Why not just do DNS resolution of the hosts that
actually make it into the Host Report?
There is one theoretical and one practical problem. Theoretically,
the problem is that which hosts do make it into the Host Report can
change when the DNS lookups have been done. And practically, this
wouldn't help identify the busiest countries or organisations, which is
usually what you really want to know. However, there is a Perl script on
the helper applications page to do this.
- Couldn't you do the DNS lookups faster with
threads?
The problem is, the standard commands for DNS lookups are not
thread-safe on many platforms, so it would involve a lot of
platform-specific code. Again, there are programs for specific platforms
on the helper applications page.
- Why doesn't analog analyse the error_log?
The error log is intended for humans rather than computers to read.
So there is no consistent format: even different versions of the same
server have different formats. And there is not much need to analyse it
because analog's various failure reports are good enough for almost all
purposes.
- My server lists local names in the logfile. Can
you put a common suffix on them automatically?
This wouldn't be a good idea by default, because things like
"unknown" would get the suffix. You can always add them using
HOSTALIAS. (There is
an example to accomplish this using regular expressions in the
section about aliases.)
- Can you extrapolate from the current month's
partial data to produce a prediction for the whole month, based on the
rate so far?
No. There are too many problems in trying to produce anything
sensible, especially near the beginning of the month. Different days of
the week and different times of day cause lots of problems. I would
prefer to produce accurate raw data than suspect derived data.
- Can you extend the Domain Report to say which US
states people visited from?
No. Some programs pretend to do this, but you can actually only tell
which state the computer the person was using is in, which may be quite
different from where the user was for ISP's or other large organisations.
- Why not use language codes instead of country
codes for the names of the language files?
People are more familiar with the country codes, and not all of my
languages have language codes anyway. Anyway, the filenames are normally
invisible to the user.
- Why doesn't analog produce statistics on
"visits"?
See How the Web Works.
- Why don't you sell analog?
I didn't write analog for the money, and I'm happy just to see
people use it. Also, by making it open source, lots of people send me
ideas and code to include in future versions. How do you think I got all
those languages? (Of course, if you want to send me money, or gifts in
kind, or even just postcards...).
I welcome mail about analog, both praise and bug reports! I and others are
also usually happy to help people who have trouble with analog: it helps me
to find bugs, and know where the documentation is unclear.
There are three mailing lists for analog.
- analog-announce
- Announcements about analog. I post to this when there are new versions,
for example. Usually only gets a few messages a year.
- analog-help
- Getting help with analog from experienced users. This is the place to go
if you have trouble setting up or configuring the program. Usually you will
get a swift reply. You have to subscribe to the list before you can send a
message. There is also a
searchable
web archive of the list.
- analog-author
- This just goes to me. Use for private comments, or other things that would
not be suitable for the analog-help list: keep support questions to
analog-help. You may or may not get a swift reply, depending how
busy I am with other things.
There is also an independent
Japanese analog
mailing list. And
there are also companies offering support for analog on a commercial basis:
you can find a list of them on the analog home
page.
To receive announcements about analog, send a message to
analog-announce-request@lists.isite.net
with the word subscribe in the main body of the message. Note that
the word has to be in the body of the message, not the subject. Also please
note the -request part of the address.
If you want to get help with analog, please check the following
simple things first.
- Read the FAQ. Maybe I've answered your question
already. If I have, I'll just direct you to the FAQ, not answer it
again.
- If your question is "will this command have that effect," why
not try it and see!
- If you think you've found a bug, read the
list of
known bugs at my site, to see if your bug is already known about.
- Read the other relevant pages of the Readme, particularly the sections
on Starting to use analog and
Customising analog. You may also
find the index useful. I don't appreciate people
who are too lazy to read the documentation. (If the documentation is
unclear, or the relevant paragraph is too well hidden, then that's a
different matter. Of course I want to know about that.)
- Have a look in the
web
archive of the mailing list to see if your question has already been
answered there.
- If analog isn't doing what you thought you asked it to, then run it with
the SETTINGS ON
configuration
command, and see what options it thinks it's meant to be using.
I'm sorry to be so fussy, but a lot of the mail on the list really needn't
have been sent at all, and just wastes the time of everybody on the list. As
I say, I really do welcome genuine mail.
If you still need help, write to the analog-help mailing list. First you have
to subscribe (you can't send mail without subscribing) by sending a message to
analog-help-request@lists.isite.net
with the word subscribe in the main body of the message. (Note that
the word has to be in the body of the message, not the subject.) After you've
received an acknowledgement that you have subscribed, you can send mail to
analog-help@lists.isite.net. (You can still only send mail from the subscribed
address, of course.) Don't try and use the analog-help@ address for
subscribing. It won't work!
Please do the following when you send mail to the list.
- Describe exactly what you did, what you expected, and what the computer
did. Include the exact text of any error messages, not a
précis.
- Mention which version of analog you are using, on which operating
system.
- Give your mail a subject line which indicates immediately what aspect of
analog it is about. (This is useful for the archive).
- Do not send long files or attachments unless you're
asked to. We do not want to see your configuration file, your header
file, your output file, or any logfile over 10 lines long. They are
almost always useless to us. And anyway, excessively long messages will
be rejected by the mailing list server.
If you want to send a private message to me, you can send it to me at
analog-author@lists.isite.net.
Please don't use this address for user support questions: I'll just redirect
you to the analog-help list.
Many thanks to ISite for providing these
mailing lists for me, and to The Mail
Archive for archiving the analog-help list.
Some people have written helper applications for analog. These are independent
programs which work together with analog to make certain tasks easier. There
are graphical configuration tools, for example, or tools which post-process
analog's output to produce graphs. There are tools to do the DNS lookups more
quickly, configuration files for certain jobs, and lots of other things.
These helper applications are all listed at the analog site. The list is
constantly changing, so I'm not distributing it with the program. But I
strongly recommend you go to the
analog home page
(or even better, to your local mirror site) and check it out.
There are also some example configuration files in the examples
directory or folder distributed with the program.
Many people have helped me with analog, and I can't thank them all
specifically. But I do appreciate everyone who's given me feedback or sent me
bug reports.
Thanks are due to the author of
getstats,
Kevin Hughes. In the days before analog there were only three serious logfile
analysis programs, and only one of them, getstats, had attractive output. I
wrote analog when getstats stopped being able to cope with the size of our
logfile, but my output still looks somewhat similar to his.
Thanks are also due to all those who helped in the early stages of writing
this program, and gave me the encouragement to continue with analog and to
release it publicly. Those who made helpful suggestions during the first few
weeks of the program are
numerous, but I must mention particularly Dan Anderson, Martyn Johnson,
Joe Ramey, Chris Ritson, Quentin Stafford-Fraser and Dave Stanworth. Above
all Gareth McCaughan gave me lots of programming advice. The program would
have run much more slowly without him.
My employer, the University of
Cambridge Statistical Laboratory, kindly lets me distribute analog from
their web server, and niccx.com
provides the address analog.cx.
Many other people have provided mirror sites for analog, starting with
Dave Stanworth (again!). The full list of mirror sites is listed elsewhere;
thanks to all of them. Many thanks also to
ISite for providing the mailing lists, and
to The Mail Archive for archiving
the analog-help list.
Mark Roedel first suggested porting analog to different platforms, and made
the original DOS port. Shortly afterwards, Jason Linhart made the Mac port,
and has continued to contribute lots of extra code for that platform and for
the program in general. The Mac version also includes code contributed by
Stephan Somogyi and Nigel Perry. Later ports were made by Dave Jones, Martin
Zinser & Rick Dyson (OpenVMS), Magnus Hagander (Win32),
Ivan Martinez (OS/2), Nick Smith (Acorn RiscOS), Scott Tadman (BeOS),
Thomas Engel (NeXTSTEP), Martin Kraemer & Holger Schranz (BS2000/OSD,
including EBCDIC support), and Hideyuki Yahagi (AS/400).
Thank you also to the people who make precompiled versions available for
various platforms.
The regular expression parsing is taken from Philip Hazel's excellent
PCRE library. The graphics use
Thomas Boutell's gd library,
the libpng library,
and the zlib library by
Jean-loup Gailly & Mark Adler. The zlib library is also used for logfile
decompression on the Mac. Each of these libraries is subject to its
own copyright and licensing conditions: PCRE
licence, gd licence, libpng
licence, zlib licence.
The BS2000/OSD port includes code developed by the Apache Group for use in
the Apache HTTP server project.
If NEED_MEMMOVE is defined at compile time, then this product
includes software developed by the University of California, Berkeley and its
contributors.
The form interface is based on an idea by James Dean Palmer. The code to
expand wildcards in directory names under Unix is by Owen Cliffe. Thanks to
all the other people who have contributed bits of code too: I apologise for
not having room to name all of them.
Thanks also to those who have written
helper applications, for making analog more usable.
For the translations into other languages, many thanks are due to the
following:
Tigran Nazarian (Armenian), Emir Alikadic (Bosnian),
Luchezar Georgiev (Bulgarian), Francesc Rocher,
M. Mercè Llauge, Francesc Burrull i Mestres & Jordi Vidal (Catalan),
Yang Meng (Simplified Chinese), Andrew Choi & Tzu-hsien Yu (Traditional
Chinese), Tomo Sombolac (Croatian),
Jan Simek & Karel Fajkus (Czech), Adrian Price (Danish),
Ferry van het Groenewoud, Joost Baaij & Dimitry Smagghe (Dutch),
Henrik Huhtinen, Steve Kelly, Andrew Staples & Mikko Silvonen (Finnish),
Patrice Lafont, Lucien Vieira, Jean-Marc Coursimault, Lionel Delaude
& Gordon Macpherson (French), Mario Ellebrecht, Martin Kraemer,
Holger Schranz, Thomas Jacob, Thomas Frings, Georg Schwarz &
Ralf Döring (German),
Dimitris Xenakis (Greek), Laszlo Nemeth & Andras Kemeny (Hungarian),
Gustaf Gustafsson & Valberg Larusson (Icelandic),
Furio Ercolessi, Luca Andreucci, Alessio Bragadini & Marco Bernardini
(Italian), Takayuki Matsuki, Stephen Obenski, Motonobu Takahashi,
Kaori Chikenji & Kazuto Ishigaki (Japanese), Byungkwan Kim &
InChang Oh (Korean), Jurijs Turjanskis & Anda Bimbere (Latvian),
Ingrid (Lithuanian), Jan-Aage Bruvoll, Espen Bjarnø & Pål
Løberg (Norwegian Bokmål), Magni Onsøien (Norwegian
Nynorsk), Wlodek Lapot, Tomek Wozniak & Marcin Sochacki (Polish), Ivan
Martinez & Paulino Michelazzo (Brazilian Portuguese),
Jaime Carvalho e Silva (European Portuguese),
Alex Mihaila (Romanian), San Sanych Timofeev, Boris Litvinenko &
Vyacheslav Nikitich (Russian), Mile Peric (Serbian), Stefan Billik (Slovak),
Andrej Zizmond & Dalibor Cvijetinoviè (Slovene),
Javier Solis, Alexander Velasquez, Alfredo Sola, Martin Perez, Nelson Tactuk
& Javier Kohan (Spanish), Björn Malmberg, Frank Osterberg,
Wesley Schaal & Christian Rose (Swedish), Nezih Erkman (Turkish),
and Yaroslav Boychuk (Ukrainian).
Finally, thanks to all of you for using the program!
This section lists the major new features in each version of analog. There's
also another section about how to upgrade from older
versions of analog, listing which commands have changed or been abolished, or
how the output of this version differs from that of previous versions.
- 4.90beta4 (26-Mar-01)
- Wildcards in directory names in LOGFILE commands now work
under Unix. (Thanks to Owen Cliffe for this code.)
- The CHARTDIR and
LOCALCHARTDIR can now contain date codes, in the same way
as the OUTFILE.
- My own configuration file included in the examples directory.
- This version is available in Armenian, Bulgarian, Catalan, English,
US English, French, German, Italian, Japanese, Korean, Latvian, Polish,
Portuguese, Slovene, Swedish and Ukrainian.
- 4.90beta3 (13-Feb-01)
- Security fix for buffer overflow bug.
- 4.90beta2 (05-Feb-01)
- New commands CHARTDIR and
LOCALCHARTDIR.
- The pie charts now work properly in non-European character sets.
- *.asp is no longer included in the
default definition of "pages".
(See how to upgrade).
- The computer-readable output style now gives
the REPORTSPANs.
- On Unix, now follows symlinks when finding the name of the analog binary
to construct other filenames.
- Various other minor bug fixes from 4.90beta1.
- This version is available in Armenian, Bulgarian, Catalan, English,
US English, French, Portuguese, Swedish and Ukrainian.
- 4.90beta1 (22-Jan-01)
- This is the first beta test for version 5.
- Twelve new reports: Yearly Report, Quarterly Report, Five-Minute
Summary, Quarter-Hour Summary, Hour of the Week Summary, Host Failure
Report, Host Redirection Report, Virtual Host Failure Report, Virtual
Host Redirection Report, User Redirection Report, Internal Search Word
Report and Internal Search Query Report.
- Pie charts are now included at the top of most reports.
*CHART commands to control
them.
- New command
GENSUMLINES to control
which lines are listed in the General Summary.
- The "Distinct hosts" line now appears in the General
Summary even if you aren't making a Host, Organisation or Domain
Report.
- New commands
DESCRIPTIONS and
DESCFILE to provide descriptions of each report.
- New commands
REPORTSPAN and
REPORTSPANTHRESHOLD to list the time period covered by
each report.
- New commands
ROBOTINCLUDE and
ROBOTEXCLUDE for listing robots in the Operating System
Report.
- New command
INTSEARCHENGINE to
allow the internal search reports.
- New command DNSTIMEOUT
(on some platforms) to reduce the time taken for failed DNS lookups.
- New LINKINCLUDE and
LINKEXCLUDE commands. (See
how to upgrade).
- New columns E and
e for time of first request; and S,
s, Q, q, C and c
for data in last 7 days; also corresponding
SORTBY and
FLOOR commands.
- Substantial internal changes to allow these new columns without
using extra memory if they are not wanted, and to substantially reduce
code size (from what it would have been otherwise!).
- LaTeX output style.
- All files are now looked for in the "right" directories.
This should improve usability substantially, especially from the form
interface. (See how to upgrade).
- If you specify a CACHEFILE command but no
LOGFILE command, analog won't read the default logfile. (See
how to upgrade).
- US English now uses the 12-hour clock by default. (See
how to upgrade).
- Computer readable output now reports the
busiest time period for the time reports.
- Several commands have changed to better names, but the old names
should still work.
- In addition, version 4.14 was also released on the same date, and this
version includes all its changes.
- This version is only available in English and US English.
- What was new in version 4?
- What was new in version 3?
- What was new in version 2?
- What was new in version 1?
This section lists those commands which existed in older versions of analog,
but which have been changed or abolished in this version. It also lists reasons
why the same input might now produce different output. The new features in this
version are listed in the section What's new in
this version?.
- The Operating System Report, Browser Report & Browser Summary are now
sorted by page requests instead of raw requests by default.
- There are some new files (with names beginning with sq) in
the images/ directory. These will need to be copied into your
IMAGEDIR.
- *.asp is no longer included in the
default definition of "pages" (which
it was from 4.11 to 4.90beta1) because there are too many other equally
well qualified candidates. But you can easily re-enable it in
analog.cfg.
- The format of the "busiest time" and floor/sortby lines has
changed in the computer-readable output.
- The default DNSGOODHOURS and
DNSBADHOURS have been increased. In particular, the
default DNSGOODHOURS is now as near infinity as makes no
difference!
- Only some languages have been translated for version 5 so far:
see the list in the What's new?
section.
- All files are now looked for in sensible directories, specified at
compile time, if no other directory is specified. On platforms where
these directories are not known at compile time, analog formerly looked
in the current working directory, but now looks in (its guess at) the
directory of the analog binary.
This change improves usability substantially, especially from the
form interface, by not constantly requiring the user to specify the full
pathname to the analog directory. But it does mean that if you want a
file from the current directory, you now have to specify, for example,
CONFIGFILE ./analog.cfg instead of just
CONFIGFILE analog.cfg.
- The cache file now includes data on the first-request time of each
item. But if you read a cache file from an older version of analog, this
data will not have been recorded, and so the last-request time will be
used instead. Analog will warn you about this.
- If you specify a CACHEFILE command but no LOGFILE
command, analog won't read the default (compile-time) logfile. This is
much more intuitive behaviour, but some users may have been relying on
the old behaviour. The actual rule is given in the documentation on
Cache files.
- Some browsers will be rediagnosed as robots in the Operating System
Report. This will mainly reduce the "OS unknown" total, but
may also reduce other categories.
- US English now uses the 12-hour clock by default. If you
want to continue to use the 24-hour clock, use the language file
us24.lng instead. (Either use a
LANGFILE command, or
rename us24.lng to us.lng). Conversely, you can
get British English with the 12-hour clock by using uk12.lng.
- Because of the twelve new reports, if you use a
REPORTORDER command,
you should include the corresponding new
letters: 1Q76wLlMRjyY.
- Computer readable output now has an extra
line reporting the busiest time period for the time reports.
- The date codes in the
OUTFILE and CACHEOUTFILE commands now always
produce dates in English.
- The REFLINKINCLUDE command now only controls links in the
Referrer Report. Use REDIRREFLINKINCLUDE and
FAILREFLINKINCLUDE for the Redirected and Failed Referrer
Reports.
- When doing a negative floor, items are
no longer included if they have 0 of the criterion in question, even if
there aren't enough items otherwise. For example FLOOR -25p
will list fewer than 25 items if there aren't 25 items with requests for
pages, even if there are other items with 0 requests for pages.
- When a logfile line contains bytes but no filename, analog previously
ignored the bytes. It now counts them for other items on the line, but
doesn't put them in the File Size Report or General Summary (to avoid
double-counting).
- There are no longer any HTML language files containing HTML entities. So,
for example, the HTML output will always contain a literal é
instead of the code é. This should make no
difference to the reader, but please do tell me about any problems.
- The source files have moved to the src/ directory. And there
are new source files in subdirectories of the src/ directory.
Also the header files have different variables in them. This means that
automatic build scripts will have to be rewritten.
- Regular expressions in an INCLUDE or EXCLUDE
command must now occur on a line on their own, not within a
comma-separated list.
- The search terms reported in the Search Word and Search Query Reports
are no longer converted to lower case if you are using a multibyte
character set.
- Unprintable characters in the output are now replaced by '?', except for
multibyte character sets.
- *INCLUDE "" and *EXCLUDE "" (see
documentation) now apply to items
which were present but corrupt. This may have the effect of including or
excluding some new lines.
- There has been a tiny change in computer-readable
output style. Previously if a time was blank, it took up only one
column. Now it takes up as many columns as if it had been present.
- There is better parsing of extended format and WebSTAR format logs,
which may cause differences in some cases.
- All referrers now count as "pages" irrespective of
any PAGEINCLUDE and PAGEEXCLUDE commands.
Consequently, for example,
"REFLINKINCLUDE pages" is now the
same as "REFLINKINCLUDE *". You can recreate
the previous behaviour with
"REFLINKINCLUDE *.html,*.htm,*/".
- The default REPORTORDER has changed.
- New anchor names are used internally to the report.
- There may be slight differences in the results in this version owing to
stripping anchors off filenames.
- Some of the default paths have changed in anlghead.h.
- It is now recommended that you don't run analog as a CGI program, or put
it in the directory with your CGI programs, for
security reasons.
- Each browser in the Browser Summary is now sorted by major version
number then minor version number. So SUBBROW */* will now
only show the major versions. To get all the minor versions, you need
SUBBROW */*.*
- PAGEWIDTH has been replaced by
HTMLPAGEWIDTH and
ASCIIPAGEWIDTH.
- PRINTVARS has been renamed
SETTINGS.
- The form interface has been completely rewritten, and old versions of
anlgform.html will not work with this version.
- The Browser Summary now diagnoses MSIE, Opera and WebTV browsers
better. This will cause differences in output from previous versions.
- With RAWBYTES OFF, bytes are now listed as, for example,
47.68 Mbytes instead of 48,832 kbytes. This should be less confusing.
- The DNS file has a new time encoding. It's only a
few hours different, so I haven't made any special provision for it. The
effect is that the DNSGOODHOURS and
DNSBADHOURS may be a few hours out
for existing entries (but not for new ones).
- Most languages don't work in this beta version, but should be added
again by version 4. (The language files are included in the
distribution, but contain lots of English strings).
- There is a new set of graphics in the images directory, which
you will have to move to your web directory.
- In the Mac version, if a configuration file is dragged onto the analog
icon, it is used instead of, not as well as, the default configuration
file.
- In the computer-readable output style, the
line L7, the time the last seven days begins after, has been
replaced by E7, the time the last seven days ends. This is
for consistency with the other output styles.
- Also in the computer-readable output, there is a new line reporting the
floor and the SORTBY for the report. In 3.11 and earlier,
this didn't exist, and in 3.2 it only reported the floor, not the
SORTBY.
- %R (Mac-style filename) has been abolished in the
LOGFORMAT. Just use plain %r instead.
- It is no longer allowed to set the CACHEOUTFILE to be the
same as a previous cache file.
- The definition of the common log format and related formats changed
between 3.11 & 3.2, and again between 3.2 & 3.3. This could
cause differences in output, but they are likely to be only very minor.
- Lines without a particular item now work properly with
INCLUDE and EXCLUDE commands. For example, if
you do an INCLUDE to look at only certain lines, then lines
without that type of item at all will not now be included, whereas
previously they would have been. This can make the results lower than in
these earlier versions.
- I have reluctantly removed support for NetPresenz logs. This hasn't
worked for some time, and I have already been advising NetPresenz users
not to use newer versions of analog because they could get wrong
results. Unfortunately, fixing it would require a complete rewrite of
the entire parsing code, which isn't going to happen any time soon. So
my advice remains the same: continue to use version 2.11 or (even
better) pre-process your logfiles into a form which analog can handle
safely.
- The English domains file has changed name from domains.tab
to ukdom.tab.
- If using the form interface on Windows, it is now necessary to put the
analog executable at \analog\analog.exe instead of
\Program Files\analog\analog.exe
- LOGFORMAT MICROSOFT has been replaced by
LOGFORMAT MICROSOFT-NA and LOGFORMAT MICROSOFT-INT;
and similarly for LOGFORMAT NETPRESENZ.
- It is possible that there may be small discrepancies between the results
from previous versions and the results from this version because the
parsing code has changed, but any such differences should be minor.
However...
- If you used to use REFEXCLUDE or BROWEXCLUDE, you
most likely now want
REFREPEXCLUDE or
BROWREPEXCLUDE
instead, or you will exclude lots of lines that were previously included.
- It is possible that this version may not automatically parse a logfile
that previous versions could parse, because it checks more carefully
that the logfile is in the format claimed. If so, you will have to use a
LOGFORMAT command.
- Approximate host counting has been abolished, unless there's a
significant demand for it.
- Count of number of new hosts in last seven days abolished. It was too
confusing because it depended on which old logfiles you analysed.
- The Error Report has been abolished (together with the configuration
commands ERROR, ERRLOG and ERRMINOCCS).
See the What's new? page.
- The BROWLOG and REFLOG commands have also been
abolished: just use LOGFILE
instead.
- The HASHSIZE commands have been abolished: analog now chooses
the size of the hash tables itself.
- The MINREQS and similar options have been replaced by the
FLOOR commands.
- Only one * is now allowed on the left-hand side of aliases,
to avoid ambiguities.
- Automatic detection of log type is now on a per-file rather than a
per-line basis.
- ISPAGE is now called
PAGEINCLUDE.
- WITHARGS and REFWITHARGS are now called
ARGSINCLUDE and
REFARGSINCLUDE.
- MONTHLYBACK is now called MONTHBACK.
- FULLHOSTS is now just called HOST.
- LOGOURL is now called LOGO.
- The UNIT commands have been abolished. They weren't very
useful, and they didn't make much sense with the different ways of
displaying the time report bar charts. The unit is now always chosen
automatically.
- DIRLEVEL has been abolished, because the
SUBDIR command is more general.
Use SUBDIR */* or whatever instead.
- Comments aren't allowed in the domains file.
I don't think this should cause a problem.
- GRAPHICAL is abolished. Instead, use lower case letters with
the GRAPH commands.
- NUMLOOKUP has been replaced by
DNS, and DNSFRESHHOURS
by the commands DNSGOODHOURS and
DNSBADHOURS.
- DNS cache files from previous versions are not compatible with this
version.
- You can't use PAGES in the columns or SORTBY or
FLOOR for the Request Report. Use REQINCLUDE pages
instead.
- - as a synonym for none has been abolished in some
places, e.g., HOSTURL.
- The following command line arguments have been abolished from earlier
versions, many of the letters getting new meanings: 7,
l, n, p, s, u,
v, w. (-v has moved to
-settings.) Others have been changed since version 1.2 as
well.
- Filenames for logfiles etc. should now be given DOS-style, with
backslashes, rather than Unix-style with forward slashes.
- There is no longer an automatic progress report. Use the
PROGRESSFREQ command
instead.
- Use *INCLUDE and
*EXCLUDE instead of *ONLY and
*IGNORE.
- The syntax of the *FLOOR commands has changed.
- Use *SORTBY REQUESTS or BYTES instead of
*SORTBY BYREQUESTS or BYBYTES.
This section lists the new features which were in version 4 of analog.
- What's new in version 5?
- 4.16 (13-Feb-01)
- Security fix for buffer overflow bug.
Small correction to Brazilian Portuguese language file.
- 4.15 (01-Feb-01)
- Bug fixes for accented letters in dates and for EBCDIC character set.
- 4.14 (22-Jan-01)
- Commas are allowed in regular expressions.
Can do a FLOOR beyond Terabytes.
OS X & MPE/iX ports.
Bug fixes, especially for multibyte character sets.
Unprintable characters in the report are now replaced by '?'.
Traditional Chinese, Portuguese, Brazilian Portuguese, US English
and corrected French domains files.
Rewrote the documentation on Cache
files. Added some new data to
How the Web Works.
- 4.13 (11-Oct-00)
- Corrected infelicity in compilation procedure.
- 4.12 (05-Oct-00)
- Recognises Windows Me for Operating System Report.
Can count beyond Terabytes.
PCRE code upgraded to version 3.4.
AS/400 port. Patches to compile cleanly on Cygwin and 64-bit Solaris.
Bulgarian and Croatian language files. Catalan and Finnish domains
files.
Various bug fixes.
New Licence (mostly less restrictive than
the previous one).
- 4.11 (31-May-00)
- The default definition of "pages" is
now case insensitive, and also includes
*.asp.
Reads the extended logs from IIS 5 correctly.
Version number displayed before any warning or debugging messages.
The "number of days" at the top of the report now obeys
DECPOINT (and is also now to 2 decimal places).
Improved OpenVMS build procedure.
Hungarian and Romanian language files, and corrected Spanish
language files and English domains files.
Italian and Spanish form interfaces.
The FAQ now has a list of
contents.
- 4.1 (30-Mar-00)
- Regular expressions in ALIASes and INCLUDEs are
now available on all platforms.
Regular expressions are now Perl-syntax regular expressions.
(Thanks to Philip Hazel's PCRE
library.)
"Repeated fields" in logfile header lines are now allowed.
New commands
STATUSINCLUDE and
STATUSEXCLUDE, and
304ISSUCCESS.
New output style PLAIN
(like ASCII but with accents). New language files for this.
In the computer-readable output,
hierarchical reports now have an extra column, indicating the depth of
the item in the hierarchy.
All referrers now count as "pages." (See
upgrade notes.)
Configuration commands can be continued across lines with a
backslash.
New token %s in
LOGFORMAT, allowing per-line selection of client-name and
client-IP fields.
New log format WEBSTAR-EXTENDED to allow for a small bug
in WebSTAR's implementation of the extended log format.
Korean language files. Also alternative Swedish translation.
- 4.04 (21-Mar-00)
- The analog home page has moved to
www.analog.cx
New column d in
non-time reports.
The RUNTIME command now turns off the "Program
started at" line as well as the "Running Time" line.
Non-alphanumeric characters are now allowed in the
REPORTORDER as separators.
Correctly parses more APACHELOGFORMATs.
Better detection of Windows 2000 in Operating System Report.
Better warning messages when the erroneous command contains a space.
Code for NeXTSTEP operating system.
Better treatment of multibyte character sets.
Icelandic language files. Corrections to Bosnian, French, Italian,
Japanese & Swedish.
- 4.03 (21-Feb-00)
- Fixed several small bugs.
New command RUNTIME.
Brazilian Portuguese language files and Swedish domains files.
Corrections to Dutch.
- 4.02 (31-Jan-00)
- New command SEARCHCHARCONVERT.
Support for Apache's new %q code in
APACHELOGFORMAT.
Fix for search reports causing crashes on Windows.
New language: Czech. Corrections for Serbian, Slovene and Ukrainian.
- 4.01 (17-Dec-99)
- New command USERCASE.
Some of the default paths have changed in anlghead.h.
Improvements to OpenVMS port.
Language files included for Armenian, Bosnian, Catalan,
traditional Chinese, Dutch, Finnish, German, Italian, Slovak, Slovene,
Spanish, Swedish & Ukrainian; corrections to Russian & Turkish.
- 4.0 (16-Nov-99)
- Simplified Chinese, Danish, Japanese, Portuguese & Serbian language
files included.
Otherwise only small changes since 3.90beta2.
- 3.90beta2 (02-Nov-99)
- It is now recommended that you don't run analog as a CGI program for
security reasons. (The CGI
command is still present, but it is now not documented.)
The Organisation Report is now
hierarchical.
The Browser Summary is now arranged by major version number. (See
notes on upgrading.)
Non-exact bytes are now given to 3 decimal places.
GOTOS FEW puts the "Go
To" lines just at the top and bottom of the output.
PRINTVARS has been renamed
SETTINGS.
-settings output improved,
especially with OUTPUT NONE.
Split PAGEWIDTH into
HTMLPAGEWIDTH and
ASCIIPAGEWIDTH.
Includes language files for French, Greek, Norwegian (Bokmål
& Nynorsk), Polish, Russian and Turkish.
New configuration file examples/big.cfg containing most
commands.
- 3.90beta1 (07-Oct-99)
- First beta test for version 4. The most important new features are:
- Five new reports: Organisation Report, Operating System Report,
Search Word Report, Search Query Report, Processing Time Report.
- Browser Summary improved (will change
results).
- Form interface completely rewritten, and
considerably simplified.
- Multiple *'s now allowed on left-hand side of
ALIASes.
- Regular expressions allowed in
INCLUDEs &
EXCLUDEs, and
ALIASes.
- The output
INCLUDEs and EXCLUDEs now apply to the
lower levels of a hierarchical report
as well as the top level.
- New commands: CGI,
STYLESHEET and
ERRLINELENGTH.
- New column N in most
reports.
- DEBUG C now reports
which part of a corrupt logfile line is corrupt.
- Non-exact bytes are now displayed as, e.g., 47.68 Mbytes instead of
48,832 kbytes. This should be less confusing.
- Timestamps added to
PROGRESSFREQ
reports.
- The DNS file has a new time encoding.
- Header files split up to make anlghead.h simpler.
- Form interfaces in German and U.S. English included.
- New documentation about search arguments.
- New examples directory.
- New licence. (Nearly the same, just
clarified, and slightly loosened).
- What was new in version 3?
- What was new in version 2?
- What was new in version 1?
This section lists the new features which were in version 3 of analog.
- What's new in version 5?
- What was new in version 4?
- 3.32 (02-Sep-99)
- Bug fixes, including:
- Drag-and-drop on Mac now works.
- Unsafe characters in hyperlinks now escaped.
- One bug that caused crashes when printing deep Directory Reports
fixed.
New VMS build scripts. Let me know of any compilation problems.
Computer-readable output now reports version of analog used.
Improved some diagnostic messages.
New language Serbo-Croatian; new domains files for Italian and
Russian; corrected Polish language files.
New documentation on
Analog's reports and
Quick reference.
Now uses named anchors throughout the documentation, so that
cross-references link to the right part of a page.
- 3.31 (19-Jun-99)
- New command BARSTYLE;
you will need to use new images.
Russian language file corrected.
Some bug fixes, including one important one correcting cache file
output.
- 3.3 (19-May-99)
- New commands ERRFILE,
DNSLOCKFILE,
APACHELOGFORMAT and
APACHEDEFAULTLOGFORMAT.
Can include the date in the name of the
OUTFILE and the
CACHEOUTFILE.
Support for WebSite logfiles.
New token %U in log
formats for "Unix time" (seconds since 1970).
Won't overwrite old cache files.
Now works properly on SunOS 4.
Fix for occasional crashes on Windows.
Checks language files are not too long.
"Last seven days" data now calculated more accurately and
displayed more clearly.
Computer-readable output now reports SORTBY's as well as
floors.
Revised Makefile will work with older make's.
Corrected Catalan language files.
Includes form interfaces in French and Japanese.
LOGFORMAT documentation now includes the
LOGFORMAT commands for
all built-in log formats.
- 3.2 (04-May-99)
- Bug fixes: in particular REFLINKINCLUDE pages now works;
and cache files now include all items even if they're not wanted for the
main report.
Lines without a particular item now work properly with
INCLUDE and EXCLUDE commands. This can cause
differences in results from previous
versions.
New version of form interface to work round bug in Microsoft
Internet Information Server.
New command NOROBOTS.
Backslashes are now coerced to forward slashes in filenames and
usernames. While not always correct technically, it usually is in
practice, and it makes them behave correctly in other parts of the
program.
Usernames are now treated as case insensitive. Let me know if this
causes a problem on any system.
Computer-readable output style now reports floors.
Rewritten Unix Makefile, and VMS build script. Let me know of any
compilation problems.
New languages: Catalan, Icelandic, Japanese, Korean, Latvian,
Lithuanian. Corrected Spanish language files and French domains file.
LANGUAGE now selects local domains file automatically,
where available.
Removed support for NetPresenz logs. The reasons are in the section
on how to upgrade.
Form interface documentation rewritten;
FAQ broken into sections; sections on
logfiles and log
formats separated and rewritten; new section on
helper applications; and dozens of other
improvements to the documentation.
- 3.11 (26-Nov-98)
- Bug fix version.
Microsoft's attempt at W3 extended format is now understood even if
there is a second #Fields: line in the logfile.
There is also a fix for a new Microsoft bug which results in an
non-standard common format.
Intermittent crashes under Windows fixed.
Mailing lists announced.
- 3.1 (17-Oct-98)
- Understands Microsoft's attempt at W3 extended format.
Several bugs fixed, including one that caused occasional
crashes and one that caused the output to grow and grow.
Form interface works on Windows.
Allows aliases with two or more *'s on left hand side, if
right hand side contains no *'s.
Aliases work properly with CASE INSENSITIVE.
Numerical SUBDOMAINs fixed.
Understands more WebSTAR and Netscape tokens.
Accents in domains file work.
LOGFORMAT removed from form interface as security
risk.
Several warning messages improved.
Report aliases and in/exclusions shown in settings
output.
Character set declared at top of output.
Spanish, Dutch, Norwegian (Bokmål and Nynorsk), Finnish,
Turkish, Greek, Polish, Russian & Chinese language files included.
- 3.0 (15-Jun-98)
- Corrected W3 extended format.
Fix for broken strcmp() function on SunOS 5.
Portuguese, Brazilian Portuguese, Danish and Hungarian language
files included.
Precompiled executable for OS/2 available.
- 2.91beta1 (04-Jun-98)
- Form interface included.
Uses less memory when compiling reports.
New operating system, BS2000/OSD, and code for EBCDIC character set.
New command
DEFAULTLOGFORMAT.
LASTSEVEN and
BASEURL reinstated.
More information added to PRINTVARS output.
AppleScript support for Unix-style command lines added to Mac
version.
Now works on SunOS 4, and other small bug fixes.
French, German, Swedish, Czech, Slovak, Slovene and Romanian
language files included.
One page version of the Readme included in the documentation.
- 2.90beta4 (09-Apr-98)
- Mended DNS cache file reading, which I broke in yesterday's release.
- 2.90beta3 (08-Apr-98)
- Fixed bug that caused a crash while giving warning messages on SunOS;
bug that caused configuration files that called other configuration
files not to be completed; and other smaller bugs.
Italian language files included.
- 2.90beta2 (03-Apr-98)
- Separate LOGFORMATs for North American and international
date formats, when using Microsoft or Netpresenz logs.
Understands the AppleShare IP server's attempt at the WebSTAR format.
Directory report now works properly even if you use the second
argument to the LOGFILE
command.
Wild cards in filenames work properly on the Mac.
Other small bug fixes.
One speed improvement (I gain about 3%).
Several corrections and clarifications to the documentation.
- 2.90beta1 (27-Mar-98)
- This version is a completely rewritten version. Every single line of
code is new. The whole code is shorter despite considerable improvements
in functionality. Several people have reported that it is significantly
faster. The most important new features are:
- Eleven new reports (Quarter-Hour, Five-Minute, Redirection,
Failure, File Size, Referring Site, Redirected Referrer, Failed
Referrer, Virtual Host, User, User Failure).
- Reads logfiles in user-customisable format.
- Analyses user and virtual host data, and failed requests.
- Hierarchical reports list subdirectories under directories, and
allow analysis of browser version numbers.
- Faster sorting of long reports.
- Floor and sort method made independent.
- "Last date" column in reports, and can floor and sort by
date.
- Busiest time period at bottom of time reports.
- "Not listed" line at bottom of other report.
- Knows HTTP/1.1 status codes.
- General Summary can go anywhere in the report.
- General Summary and "Go To"s can now be turned on and
off independently.
- Status Code Report can be sorted in different ways.
- Time offset commands.
- Much better checking of invalid configuration options and invalid
logfile lines.
- Only reads logfiles it might need.
- Improvements in DNS functionality: can now read the DNS file
without further lookups: also, separate recheck intervals for
successful and failed lookups.
- Hash sizes now chosen automatically.
- More flexible language support.
- Mac version reads gzipped logfiles.
- Mac version supports drag-and-drop onto program icon.
- Readme files completely re-written. Broken into lots of files,
and new sections on Starting to use
analog and What the
results mean, as well as an
index.
The following features have been abolished.
- No Error Report. The error log was always intended for humans
rather than computers to read. Moreover, its format varied from
server to server, and even between different versions of the same
server. The place of the Error Report has largely been taken by
the new reports, particularly the Failure Report.
- The approximate host counting has been abolished for the time
being. I can put it back if there is a significant demand for it.
- Only one * can now appear on the left-hand side of
aliases. This is to avoid ambiguities.
- For changes in the names and syntax of configuration options and
command line arguments, see the section about
upgrading.
The following features are not yet present, but will be added by
version 3.
- The form interface.
- Most of the languages.
- What was new in version 2?
- What was new in version 1?
This section lists the new features which were in version 2 of analog.
- What's new in version 5?
- What was new in version 4?
- What was new in version 3?
- 2.11 (14-Mar-97)
- Minor bug fixes to yesterday's release.
- 2.1 (13-Mar-97)
- Language support rewritten, causing reduction in code size of 2200 lines.
New configuration command LANGFILE.
New Acorn RiscOS version.
Page requests per day reported.
Bug fix: CASE INSENSITIVE could cause %7E-type
conversions not to take place.
- 2.0.2 (04-Mar-97)
- DNS lookups and wildcards should now work in the Win32 version.
New configuration command PRINTVARS.
Fix for zero length hostnames after DNS lookups.
Minor corrections in French and Spanish translations.
- 2.0 (10-Feb-97)
- New native Win32 version.
Wildcards allowed in filenames on Mac.
Ignores browser "-".
- 1.93beta (18-Jan-97)
- New commands BROWALIAS, CONFIGFILE and
PROGRESSFREQ.
Form program can now call configuration files.
Form program now uses the default choices if none specified.
Domain report prints correctly in preformatted output.
Specifying +1 and +V2 doesn't crash the program.
-v reports dates correctly.
Trailing dots on hostnames removed.
Second argument to LOGFILE command can't be obliterated
by /../
- 1.92beta (08-Oct-96)
- DNS lookups added on Mac.
Netpresenz format understood on Mac.
New languages: Spanish, Italian and Danish.
Extra information when debugging turned on.
*.htm
are now pages on all machines.
A few small bugs fixed.
- 1.91beta4 (13-Jul-96)
- Cache file now includes page request information.
DNS bug fixed.
New command DNSHASHSIZE.
Bug in browser reports fixed.
- 1.91beta3 (09-Jul-96)
- BSD/OS compilation bug believed fixed.
Fixed HOSTALIAS which I broke yesterday.
DNS bug (causing too many lookups) identified,
although not yet fixed.
- 1.91beta2 (08-Jul-96)
- Some bug fixes (including: HOSTEXCLUDE and CASE
INSENSITIVE didn't work properly; selecting "no links"
failed on the form; less fussy about what can appear on the form).
Mac version no longer includes source code, so is much shorter.
- 1.91beta1 (05-Jul-96)
- Now DNS code doesn't look up a name twice, even if one is a failed
request.
- 1.91beta (05-Jul-96)
- Will now output in any of several languages.
Preformatted output introduced.
New File Type Report.
Can limit the number of rows in the time reports.
Number of requests for pages (as opposed to raw requests) now
calculated throughout.
DNS lookup returns, with caching across runs.
Logfiles can include wildcards.
Wildcards can include multiple *'s.
Can process case insensitive logfiles.
OUTPUTALIAS commands introduced.
New commands to specify exactly what is included, and what linked, in
the request report and referrer report.
FILEALIAS a a and FILEALIAS a b; FILEALIAS b c
now work.
New ALLOW options to cancel INCLUDES.
REPSEPCHAR and DECPOINT introduced.
DIRSUFFIX introduced.
Debugging reports number of corrupt lines in other logs.
Hash sizes can now be allocated at run time.
stdin can now be used for any input file, but not for two.
Macintosh version now quits automatically if no warnings have been
issued.
Form interface made more secure.
"Mozilla (compatible)" separated out in Browser Summary.
Major internal changes should improve speed.
Code for non-Unix platforms integrated into main code.
"Referrer" spelled correctly.
Licence introduced.
Update file introduced.
Readme updated to include non-Unix instructions.
- (19-Apr-96)
- First Mac version.
- 1.9beta6
- Two bug fixes (number of bytes was incorrectly reported in some cases,
and -v would overwrite the OUTFILE).
Documentation improved.
- 1.9beta5
- More bug fixes...
- 1.9beta4
- One important bug fix (I broke GRAPHICAL OFF in 1.9beta3).
New form cgi options: ch, gr and
ou=3.
Code shortened.
- (05-Mar-96)
- First DOS version.
- 1.9beta3
- Mainly bug fixes and improved documentation.
Browser and referer reports now include failed requests.
The WARNINGS option can now be specified on the form.
- 1.9beta2
- Small bug fixes
- 1.9beta (06-Feb-96)
- Lots of changes. The most important new features are
- Six new reports (hourly report, browser report, browser summary,
referer report, status code report and error report).
- Analysis of NCSA/Apache referer log, agent log and combined log
formats.
- Graphical time reports that still work on text-based browsers.
- Configurable columns in the time reports.
- Time reports can run backwards.
- Time graphs can be plotted by bytes instead of by requests.
- Can cache old data so that old logfiles need not be kept.
- Can process several logfiles.
- Can combine logfiles from several different hosts.
- Will uncompress compressed logfiles.
- All configuration options can now be specified on the commandline.
- Mandatory configuration file added.
- Lots of new options in the form processing program.
- Wildcards greatly improved throughout.
- Alphabetical host report right-aligned.
- Bytes now quoted as MBytes etc. instead of long number.
- Produces HTML2.0 compliant output.
- New sort method RANDOM (saves time for long reports).
- Floors for reports now work properly.
- Can now specify a report FROM 100 or more days ago.
- Option to turn off warnings.
- Considerable savings in code length over previous versions.
- What was new in version 1?
This section lists the new features which were in version 1 of analog.
- What's new in version 5?
- What was new in version 4?
- What was new in version 3?
- What was new in version 2?
- 1.2.6
- Minor bug fix; will only affect those with corrupt logfiles.
- 1.2.5
- Minor bug fix for weekly report.
- 1.2.4
- Patch for Spyglass server logfile format.
- 1.2.3
- A couple of bug fixes (wild subdomains sometimes caused crashes).
-v option now gives the version number.
- 1.2.2
- Patch for proxy servers: http:// not translated to
http:/
- 1.2 (11-Nov-95)
- Can configure columns in reports to give percentage requests and number
of bytes.
Wild subdomains (e.g., *.com).
Nameless subdomains.
Subdomains now listed in alphabetical order.
Proper support for numerical hostnames in HOSTIGNORE,
HOSTONLY, SUBDOMAIN and alphabetical sorting.
New BASEURL command allowing statistics to be
displayed on other servers.
Output always says how things are sorted.
"Last 7 days" now behaves sensibly with TO.
Filenames containing /../, /./ and
// translated.
Header and footer options removed from form (for security reasons).
- 1.1 (02-Oct-95)
- Form interface introduced.
ASCII output now possible as well as HTML.
Output file can now be specified in the configuration file.
FROM and TO commands more powerful.
DEBUG and BACKGROUND introduced.
One bug fix: alphabetical sorting doesn't now swap some hostnames.
List of primes included in distribution.
- 1.0 (12-Sep-95)
- Only minor changes since 0.94beta.
- 0.94beta (30-Aug-95)
- New configuration variables SEPCHAR and
REPORTORDER.
New configuration commands WITHARGS and
WITHOUTARGS.
New commandline options +-A and +-x.
(Config.: ALL and GENERAL).
Logfile entries with - as the return code are now regarded
as successes, not corrupt entries.
Fixed bugs in host report when aliases or numerical hosts are
present.
Documentation rewritten.
- 0.93beta (27-Jul-95)
- Approximate hostname counting now possible in fixed memory.
New configuration commands ISPAGE and
ISNOTPAGE.
New commandline option -v.
New configuration command WEEKBEGINSON.
Proper error message when memory exceeded.
Program split into several files.
- 0.92beta (11-Jul-95)
- New reports introduced: hostname, full daily, and weekly.
FROM and TO commands introduced.
Header and footer files introduced.
More helpful warning messages.
Ability to read configuration instructions from stdin.
Subdomain commands moved from domains file to configuration file.
Makefile provided.
- 0.91beta (04-Jul-95)
- Configuration file introduced, enabling many new options.
Some bug fixes and speed improvements.
Ability to print "top n" reports (rather than
"everything higher than n").
Request report can print only pages.
Ability to try and resolve numerical addresses.
Now less fussy about the format of the domains file.
Logo added.
Readme converted to HTML.
- 0.9beta
- More speed improvements, and some bug fixes.
Introduced -u option.
Introduced subdomain analysis.
Included "not modified" replies as successes, not
redirects.
First public release at 0.9beta3. (29-Jun-95)
- 0.89beta (21-Jun-95)
- Commandline arguments.
Efficiency improvements.
Host count and "last 7 day" statistics.
- 0.8beta (14-Jun-95)
- Initial program, just default options.
This section is list of all of analog's configuration commands, together with a
quick reference to their syntax and some examples. It's designed for those who
are already familiar with the program, so it's pretty useless for trying to
learn the program: to learn about the commands, read the section on
Customising analog instead, or consult
the index for a reference. Command line arguments
aren't listed here, but there is a list of them in the
index. Not all commands are available on all
platforms.
This section is divided into the following parts:
The syntax for each command is given using the following notation.
"stuff" the word stuff
x y x followed by y
(x | y) x or y
[x] optional x
subset("...") any letters from the string, in any order
perm("...") all the letters from the string, in any order
*x x may contain wildcards * and ? (and often comma-separated list)
x := y x is defined to be y
COMMAND the command under discussion
In addition, I use the following names for different types of argument.
char a single character
string a string
digit a digit
number a non-negative integer (i.e. a string of digits)
real a non-negative real number
regexp a Perl-syntax regular expression
file a filename within your server's filespace;
e.g. /index.html
localfile a filename within your system's filespace;
e.g. /usr/local/analog.html
or analog.html
if no directory specified, placed within suitable
directory specified at compile-time
localfmtfile as localfile, but may contain date codes;
e.g. /usr/local/analog%y%M.html
referrer a URL of a referring page;
e.g. http://search.yahoo.com/
URL a URL which may be absolute, or relative to the output page;
e.g. images/ or /~fred/images/
or http://www.fred.com/images/
fmtURL as URL, but may contain date codes
Note: I have occasionally opted for clarity above strict accuracy where I
don't think it will cause any confusion!
The syntax for commands in general was given
earlier: remember that an argument which contains a
hash or a space must be put in quotes or parentheses.
- Syntax
-
LOGFILE (*localfile | "-" | "none") [prefix_string]
OUTFILE (localfmtfile | "-" | "none")
CACHEFILE (*localfile | "-" | "none")
CACHEOUTFILE (localfmtfile | "-" | "none")
UNCOMPRESS *localfile program
- Examples
-
LOGFILE /httpd/logs/*
LOGFILE c:\logs\log1,c:\logs\log2
OUTFILE "Hard Disk:Report%Y%M.html"
UNCOMPRESS *.gz "/usr/bin/gzip -cd"
- Syntax
format_string := (see documentation)
Apache_format_string := (see Apache documentation)
logformat := ("COMMON" | "COMBINED" | "REFERRER" | "BROWSER" | "EXTENDED" |
"MICROSOFT-NA" | "MICROSOFT-INT" | "WEBSITE-NA" | "WEBSITE-INT" |
"MS-EXTENDED" | "WEBSTAR-EXTENDED" | "MS-COMMON" | "NETSCAPE" |
"WEBSTAR" | "AUTO" | format_string)
LOGFORMAT logformat
DEFAULTLOGFORMAT logformat
APACHELOGFORMAT Apache_format_string
APACHEDEFAULTLOGFORMAT Apache_format_string
- Notes
- LOGFORMAT and APACHELOGFORMAT only affect logfiles
occurring later in the same configuration file.
- Examples
-
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT MS-EXTENDED
APACHELOGFORMAT (%h %l %u %t \"%r\" %s %b)
- 1. Commands (items)
-
FILEALIAS,
HOSTALIAS,
BROWALIAS,
REFALIAS,
USERALIAS,
VHOSTALIAS
- Syntax
-
COMMAND *olditem newitem
COMMAND ("REGEXP:" | "REGEXPI:")regexp newitem
- Notes
- Aliases item in all reports. Items with the same resultant name are
combined. newitem may contain $1, $2
etc., representing the *'s in olditem or the
bracketed subexpressions in regexp.
- Examples
- FILEALIAS /*/football/* /$1/soccer/$2
- USERALIAS REGEXP:^([^U].*) U$1
- 2. Commands (reports)
-
TYPEALIAS,
HOSTREPALIAS,
REDIRHOSTALIAS,
FAILHOSTALIAS,
REQALIAS,
REDIRALIAS,
FAILALIAS,
DIRALIAS,
DOMALIAS,
ORGALIAS,
REFREPALIAS,
REFSITEALIAS,
REDIRREFALIAS,
FAILREFALIAS,
BROWREPALIAS,
BROWSUMALIAS,
OSALIAS,
VHOSTREPALIAS,
REDIRVHOSTALIAS,
FAILVHOSTALIAS,
USERREPALIAS,
REDIRUSERALIAS,
FAILUSERALIAS
- Syntax
-
COMMAND *item string
COMMAND ("REGEXP:" | "REGEXPI:")regexp string
- Notes
- Aliases item on one line of one report only. string may contain
$1, $2 etc., representing the *'s in
item or the bracketed subexpressions in
regexp.
- Examples
- REQALIAS /football/ "/football/ (Main football page)"
- REFREPALIAS REGEXP:^(http://([^/]*\.)?(maths|stats)\.uxy\.edu.*) ([$3] $1)
- 3. Other commands: syntax
-
CASE ("SENSITIVE" | "INSENSITIVE")
USERCASE ("SENSITIVE" | "INSENSITIVE")
SEARCHCHARCONVERT ("ON" | "OFF")
DIRSUFFIX suffix
LOGTIMEOFFSET ["+" | "-"] number
TIMEOFFSET ["+" | "-"] number
304ISSUCCESS ("ON" | "OFF")
- Examples
-
CASE SENSITIVE
DIRSUFFIX index.htm
LOGTIMEOFFSET -300
- 1. Commands (items)
-
FILEINCLUDE,
FILEEXCLUDE,
HOSTINCLUDE,
HOSTEXCLUDE,
BROWINCLUDE,
BROWEXCLUDE,
REFINCLUDE,
REFEXCLUDE,
USERINCLUDE,
USEREXCLUDE,
VHOSTINCLUDE,
VHOSTEXCLUDE
- Syntax
-
COMMAND (*item | "")
COMMAND ("REGEXP:" | "REGEXPI:")regexp
- Notes
- Excludes all logfile entries containing an excluded item from all reports.
Includes and excludes are done after aliases, so the item is the
aliased name, if applicable.
- Examples
-
FILEINCLUDE /jim/*,/jane/*
FILEINCLUDE REGEXP:^/~[^/]*/$
HOSTEXCLUDE proxy*.aol.com
USEREXCLUDE ""
- 2. Syntax (including and excluding status codes)
-
range := (number | number "-" number | number "-" | "-" number | "*")
STATUSINCLUDE range [, ranges]
STATUSEXCLUDE range [, ranges]
- Notes
- All numbers must be in the range 100-599.
- Example
-
STATUSINCLUDE 200-299,304
- 3. Syntax (including and excluding dates)
-
partdate := ["+" | "-"] digit digit
date := partdate partdate partdate [":" partdate partdate]
FROM date
TO date
- Examples
-
FROM 990719:1200
TO -00-0101
- 4. Commands (reports)
-
REQINCLUDE,
REQEXCLUDE,
REDIRINCLUDE,
REDIREXCLUDE,
FAILINCLUDE,
FAILEXCLUDE,
TYPEINCLUDE,
TYPEEXCLUDE,
DIRINCLUDE,
DIREXCLUDE,
HOSTREPINCLUDE,
HOSTREPEXCLUDE,
REDIRHOSTINCLUDE,
REDIRHOSTEXCLUDE,
FAILHOSTINCLUDE,
FAILHOSTEXCLUDE,
DOMINCLUDE,
DOMEXCLUDE,
ORGINCLUDE,
ORGEXCLUDE,
REFREPINCLUDE,
REFREPEXCLUDE,
REFSITEINCLUDE,
REFSITEEXCLUDE,
SEARCHQUERYINCLUDE,
SEARCHQUERYEXCLUDE,
SEARCHWORDINCLUDE,
SEARCHWORDEXCLUDE,
INTSEARCHQUERYINCLUDE,
INTSEARCHQUERYEXCLUDE,
INTSEARCHWORDINCLUDE,
INTSEARCHWORDEXCLUDE,
REDIRREFINCLUDE,
REDIRREFEXCLUDE,
FAILREFINCLUDE,
FAILREFEXCLUDE,
BROWREPINCLUDE,
BROWREPEXCLUDE,
BROWSUMINCLUDE,
BROWSUMEXCLUDE,
OSINCLUDE,
OSEXCLUDE,
VHOSTREPINCLUDE,
VHOSTREPEXCLUDE,
REDIRVHOSTINCLUDE,
REDIRVHOSTEXCLUDE,
FAILVHOSTINCLUDE,
FAILVHOSTEXCLUDE,
USERREPINCLUDE,
USERREPEXCLUDE,
REDIRUSERINCLUDE,
REDIRUSEREXCLUDE,
FAILUSERINCLUDE,
FAILUSEREXCLUDE
- Syntax
-
COMMAND *item
COMMAND ("REGEXP:" | "REGEXPI:")regexp
- Notes
- Excludes an excluded item from one report only.
- Example
- REQINCLUDE pages
- 5. Commands (hyperlinks)
- See below.
- 6. Syntax (miscellaneous)
-
PAGEINCLUDE *file
PAGEEXCLUDE *file
ARGSINCLUDE *file
ARGSEXCLUDE *file
REFARGSINCLUDE *referrer
REFARGSEXCLUDE *referrer
ROBOTINCLUDE *browser
ROBOTEXCLUDE *browser
- Notes
- These can be regular expressions too.
- Examples
-
PAGEINCLUDE *.jsp
ROBOTINCLUDE *crawler*
- Syntax
-
DNSFILE localfile
DNS ("NONE" | "READ" | "LOOKUP" | "WRITE")
DNSLOCKFILE localfile
DNSGOODHOURS number
DNSBADHOURS number
DNSTIMEOUT number
- Examples
-
DNSFILE dnscache.txt
DNS WRITE
DNSBADHOURS 48
- Syntax
-
SUBDIR *file
SUBDOMAIN *subdomain
SUBORG *subdomain
SUBTYPE *extension
SUBBROW *browser
REFDIR *referrer
- Examples
-
SUBDIR /jim/*/*
SUBTYPE *.gz
- Commands
- FILELOWMEM,
HOSTLOWMEM,
BROWLOWMEM,
REFLOWMEM,
USERLOWMEM,
VHOSTLOWMEM
- Syntax
-
COMMAND ("0" | "1" | "2" | "3")
- Example
-
HOSTLOWMEM 3
- Commands
- GENERAL,
ALL,
YEARLY,
QUARTERLY,
MONTHLY,
WEEKLY,
DAILYREP,
DAILYSUM,
HOURLYREP,
HOURLYSUM,
WEEKHOUR,
QUARTERREP,
QUARTERSUM,
FIVEREP,
FIVESUM,
HOST,
REDIRHOST,
FAILHOST,
ORGANISATION,
DOMAIN,
REQUEST,
DIRECTORY,
FILETYPE,
SIZE,
PROCTIME,
REDIR,
FAILURE,
REFERRER,
REFSITE,
SEARCHQUERY,
SEARCHWORD,
INTSEARCHQUERY,
INTSEARCHWORD,
REDIRREF,
FAILREF,
BROWSERREP,
BROWSERSUM,
OSREP,
VHOST,
REDIRVHOST,
FAILVHOST,
USER,
REDIRUSER,
FAILUSER,
STATUS
- Syntax
-
REPORTCOMMAND ("ON" | "OFF")
- Examples
-
ALL OFF
HOURLYREP ON
- Commands
- ALLGRAPH,
YEARGRAPH,
QUARTERLYGRAPH,
MONTHGRAPH,
WEEKGRAPH,
DAYREPGRAPH,
DAYSUMGRAPH,
HOURREPGRAPH,
HOURSUMGRAPH,
WEEKHOURGRAPH,
QUARTERREPGRAPH,
QUARTERSUMGRAPH,
FIVESUMGRAPH,
FIVEREPGRAPH
- Syntax
-
COMMAND ("R" | "r" | "P" | "p" | "B" | "b")
- Example
-
ALLGRAPH B
- Commands
- ALLBACK,
YEARBACK,
QUARTERLYBACK,
MONTHBACK,
WEEKBACK,
DAYREPBACK,
HOURREPBACK,
QUARTERREPBACK,
FIVEREPBACK
- Syntax
-
COMMAND ("ON" | "OFF")
- Example
-
ALLBACK ON
- Commands
- YEARROWS,
QUARTERLYROWS,
MONTHROWS,
WEEKROWS,
DAYREPROWS,
HOURREPROWS,
QUARTERREPROWS,
FIVEREPROWS
- Syntax
-
COMMAND number
- Example
-
QUARTERREPROWS 192
- 1. Commands (time reports)
-
TIMECOLS,
YEARCOLS,
WEEKLYCOLS,
MONTHCOLS,
WEEKCOLS,
DAYREPCOLS,
DAYSUMCOLS,
HOURREPCOLS,
HOURSUMCOLS,
WEEKHOURCOLS,
QUARTERREPCOLS,
QUARTERSUMCOLS,
FIVEREPCOLS,
FIVESUMCOLS
- Syntax
- cols1 := subset("RrPpBb")
COMMAND cols1
- Example
-
MONTHCOLS bRP
- 2. Commands (most success reports)
-
HOSTCOLS,
ORGCOLS,
DOMCOLS,
DIRCOLS,
REFCOLS,
REFSITECOLS,
SEARCHQUERYCOLS,
SEARCHWORDCOLS,
INTSEARCHQUERYCOLS,
INTSEARCHWORDCOLS,
BROWREPCOLS,
BROWSUMCOLS,
OSCOLS,
VHOSTCOLS,
USERCOLS
- Syntax
- cols2 := subset("NDdEeRrSsPpQqBbCc")
COMMAND cols2
- Example
-
USERCOLS BD
- 3. Syntax (Request and File Type Reports)
-
REQCOLS subset("NDdEeRrSspqBbCc")
TYPECOLS subset("NDdEeRrSsBbCc")
- Example
-
TYPECOLS NRb
- 4. Commands (failure, redirection and Status Code reports)
-
REDIRCOLS,
FAILCOLS,
REDIRHOSTCOLS,
FAILHOSTCOLS,
REDIRREFCOLS,
FAILREFCOLS,
REDIRVHOSTCOLS,
FAILVHOSTCOLS,
REDIRUSERCOLS,
FAILUSERCOLS,
STATUSCOLS
- Syntax
- cols4 := subset("NDdEeRrSs")
COMMAND cols4
- Example
-
FAILCOLS D
- 5. Commands (Size and Processing Time Reports)
-
SIZECOLS,
PROCTIMECOLS
- Syntax
- cols5 := subset("DdEeRrSsPpQqBbCc")
COMMAND cols5
- Example
-
SIZECOLS RB
- 1. Commands (most success reports)
-
HOSTSORTBY,
ORGSORTBY,
DOMSORTBY,
DIRSORTBY,
REFSORTBY,
REFSITESORTBY,
SEARCHQUERYSORTBY,
SEARCHWORDSORTBY,
INTSEARCHQUERYSORTBY,
INTSEARCHWORDSORTBY,
BROWREPSORTBY,
BROWSUMSORTBY,
OSSORTBY,
VHOSTSORTBY,
USERSORTBY,
SUBDIRSORTBY,
SUBDOMSORTBY,
SUBORGSORTBY,
SUBBROWSORTBY,
SUBOSSORTBY,
REFDIRSORTBY,
REFARGSSORTBY
- Syntax
sortby1 := ("REQUESTS" | "REQUESTS7" | "PAGES" | "PAGES7" |
"BYTES" | "BYTES7" | "DATE" | "FIRSTDATE" |
"ALPHABETICAL" | "RANDOM")
COMMAND sortby1
- Example
-
DOMSORTBY ALPHABETICAL
- 2. Commands (Request and File Type Reports)
-
REQSORTBY,
TYPESORTBY,
REQARGSSORTBY,
SUBTYPESORTBY
- Syntax
sortby2 := ("REQUESTS" | "REQUESTS7" | "BYTES" | "BYTES7" |
"DATE" | "FIRSTDATE" | "ALPHABETICAL" | "RANDOM")
COMMAND sortby2
- Example
-
REQSORTBY REQUESTS
- 3. Commands (failure, redirection and Status Code reports)
-
REDIRSORTBY,
FAILSORTBY,
REDIRHOSTSORTBY,
FAILHOSTSORTBY,
REDIRREFSORTBY,
FAILREFSORTBY,
REDIRVHOSTSORTBY,
FAILVHOSTSORTBY,
REDIRUSERSORTBY,
FAILUSERSORTBY,
STATUSSORTBY,
REDIRARGSSORTBY,
FAILARGSSORTBY,
REDIRREFARGSSORTBY,
FAILREFARGSSORTBY
- Syntax
sortby3 := ("REQUESTS" | "REQUESTS7" | "DATE" | "FIRSTDATE" |
"ALPHABETICAL" | "RANDOM")
COMMAND sortby3
- Example
-
FAILSORTBY DATE
- Commands (top-level)
-
HOSTFLOOR,
REDIRHOSTFLOOR,
FAILHOSTFLOOR,
ORGFLOOR,
DOMFLOOR,
REQFLOOR,
DIRFLOOR,
TYPEFLOOR,
REDIRFLOOR,
FAILFLOOR,
REFFLOOR,
REFSITEFLOOR,
SEARCHQUERYFLOOR,
SEARCHWORDFLOOR,
INTSEARCHQUERYFLOOR,
INTSEARCHWORDFLOOR,
REDIRREFFLOOR,
FAILREFFLOOR,
BROWREPFLOOR,
BROWSUMFLOOR,
OSFLOOR,
VHOSTFLOOR,
REDIRVHOSTFLOOR,
FAILVHOSTFLOOR,
USERFLOOR,
REDIRUSERFLOOR,
FAILUSERFLOOR,
STATUSFLOOR
- Commands (lower levels)
-
REQARGSFLOOR,
REDIRARGSFLOOR,
FAILARGSFLOOR,
REFARGSFLOOR,
REDIRREFARGSFLOOR,
FAILREFARGSFLOOR,
SUBDIRFLOOR,
SUBDOMFLOOR,
SUBORGFLOOR,
SUBTYPEFLOOR,
SUBBROWFLOOR,
SUBOSFLOOR,
REFDIRFLOOR
- Syntax
-
partdate := ["+" | "-"] digit digit
date := partdate partdate partdate [":" partdate partdate]
COMMAND number ("r" | "s" | "p" | "q")
COMMAND number ["k" | "M" | "G" | "T" | "P" | "E" | "Z" | "Y"] ("b" | "c")
COMMAND real ("%" | ":") ("r" | "s" | "p" | "q" | "b" | "c")
COMMAND date ("d" | "e")
COMMAND "-" number ("r" | "s" | "p" | "q" | "b" | "c" | "d" | "e")
- Notes
- Actually, this syntax isn't quite correct. REQFLOOR,
TYPEFLOOR, REQARGSFLOOR and SUBTYPEFLOOR
aren't allowed to be of type "p" or "q"; and
REDIRFLOOR,
FAILFLOOR, REDIRHOSTFLOOR, FAILHOSTFLOOR,
REDIRREFFLOOR, FAILREFFLOOR, REDIRVHOSTFLOOR,
FAILVHOSTFLOOR, REDIRUSERFLOOR, FAILUSERFLOOR,
STATUSFLOOR, REDIRARGSFLOOR,
FAILARGSFLOOR, REDIRREFARGSFLOOR and
FAILREFARGSFLOOR aren't allowed to be of type "p",
"q", "b" or "c".
- Examples
-
TYPEFLOOR -20r
REQARGSFLOOR 0.1%b
- 1. Commands (most success reports)
-
HOSTCHART,
ORGCHART,
DOMCHART,
REQCHART,
DIRCHART,
REFCHART,
REFSITECHART,
SEARCHQUERYCHART,
SEARCHWORDCHART,
INTSEARCHQUERYCHART,
INTSEARCHWORDCHART,
BROWREPCHART,
BROWSUMCHART,
OSCHART,
VHOSTCHART,
USERCHART,
SIZECHART,
PROCTIMECHART
- Syntax
chart1 := ("ON" | "OFF" | "REQUESTS" | "REQUESTS7" | "PAGES" | "PAGES7" |
"BYTES" | "BYTES7")
COMMAND chart1
- Example
-
DOMCHART BYTES
- 2. Commands (failure, redirection and Status Code reports)
-
REDIRHOSTCHART,
FAILHOSTCHART,
REDIRCHART,
FAILCHART,
REDIRREFCHART,
FAILREFCHART,
REDIRVHOSTCHART,
FAILVHOSTCHART,
REDIRUSERCHART,
FAILUSERCHART,
STATUSCHART
- Syntax
chart2 := ("ON" | "OFF" | "REQUESTS" | "REQUESTS7")
COMMAND chart2
- Example
-
FAILCHART ON
- 3. Syntax (TYPECHART)
-
TYPECHART ("ON" | "OFF" | "REQUESTS" | "REQUESTS7" | "BYTES" | "BYTES7")
- 4. Syntax (ALLCHART)
-
ALLCHART ("ON" | "OFF")
- Syntax
-
REQLINKINCLUDE *file
REQLINKEXCLUDE *file
REDIRLINKINCLUDE *file
REDIRLINKEXCLUDE *file
FAILLINKINCLUDE *file
FAILLINKEXCLUDE *file
REFLINKINCLUDE *referrer
REFLINKEXCLUDE *referrer
REDIRREFLINKINCLUDE *referrer
REDIRREFLINKEXCLUDE *referrer
FAILREFLINKINCLUDE *referrer
FAILREFLINKEXCLUDE *referrer
BASEURL prefix_string
- Notes
- The LINK commands can be regular expressions too.
- Examples
-
REQLINKINCLUDE pages
REFLINKINCLUDE *.cgi,*.cgi?*
BASEURL http://www.mycompany.com
- Syntax
LANGUAGE ("ARMENIAN" | "BOSNIAN" | "BULGARIAN" | "CATALAN" | "SIMP-CHINESE" |
"TRAD-CHINESE" | "CROATIAN" | "CZECH" | "DANISH" | "DUTCH" |
"ENGLISH" | "US-ENGLISH" | "FINNISH" | "FRENCH" | "GERMAN" |
"GREEK" | "HUNGARIAN" | "ICELANDIC" | "ITALIAN" | "JAPANESE" |
"KOREAN" | "LATVIAN" | "LITHUANIAN" | "NORWEGIAN" | "NYNORSK" |
"POLISH" | "PORTUGUESE" | "BR-PORTUGUESE" | "ROMANIAN" | "RUSSIAN" |
"SERBIAN" | "SLOVAK" | "SLOVENE" | "SPANISH" | "SWEDISH" |
"TURKISH" | "UKRAINIAN")
LANGFILE localfile
DOMAINSFILE localfile
DESCFILE localfile
- Notes
- Actually, most of these languages have not yet been translated for version
5, and so are not available.
- Examples
-
LANGUAGE ITALIAN
LANGFILE hindi.lng
- Syntax
-
OUTPUT ("HTML" | "PLAIN" | "ASCII" | "LATEX" | "COMPUTER" | "NONE")
GOTOS ("ON" | "OFF" | "FEW")
RUNTIME ("ON" | "OFF")
DESCRIPTIONS ("ON" | "OFF")
REPORTSPAN ("ON" | "OFF")
REPORTSPANTHRESHOLD number
LASTSEVEN ("ON" | "OFF")
REPORTORDER perm("x1QmWDdHwh4657oZSlLujJkKfsNnBbpvRMcPztiEIYyr")
GENSUMLINES ("ALL" | ["+" | "-"] subset("BCDEFGHIJKLMN"))
IMAGEDIR URL
CHARTDIR fmtURL
LOCALCHARTDIR localfmtfile
NOROBOTS ("ON" | "OFF")
LOGO (URL | "none")
HOSTNAME string
HOSTURL (URL | "none")
HEADERFILE (localfile | "none")
FOOTERFILE (localfile | "none")
STYLESHEET (URL | "none")
SEPCHAR (char | "none")
REPSEPCHAR (char | "none")
DECPOINT char
COMPSEP string
RAWBYTES ("ON" | "OFF")
HTMLPAGEWIDTH number
PLAINPAGEWIDTH number
LATEXPAGEWIDTH number
BARSTYLE ("a" | "b" | "c" | "d" | "e" | "f" | "g" | "h")
MARKCHAR char
MINGRAPHWIDTH number
WEEKBEGINSON ("SUNDAY" | "MONDAY" | "TUESDAY" | "WEDNESDAY" | "THURSDAY" | "FRIDAY" | "SATURDAY")
SEARCHENGINE *referrer comma-separated-strings
INTSEARCHENGINE *file comma-separated-strings
- Examples
- Too many to list. See the documentation on each individual command.
- Syntax
-
SETTINGS ("ON" | "OFF")
DEBUG ("ON" | "OFF" | ["+" | "-"] subset("CDFSU"))
WARNINGS ("ON" | "OFF" | ["+" | "-"] subset("CDEFLMR"))
PROGRESSFREQ number
ERRFILE localfile
ERRLINELENGTH number
- Examples
-
DEBUG ON
DEBUG CF
WARNINGS -DL
PROGRESSFREQ 50000
[ A | B | C |
D | E | F |
G | H | I | J | K |
L | M | N |
O | P | Q |
R | S | T |
U | V | W | X |
Y | Z | numbers]
This is the index for this Readme. Follow the numbers after each name to
find references to that command or concept. Families of commands
are indexed under the second part of the name: for example,
HOSTEXCLUDE is under *EXCLUDE, not under HOST.
This index includes all of analog's configuration commands: if a command you
used in previous versions is not here, see the section on
Upgrading from earlier versions.
All commands are also listed in the Quick
reference with their syntax and examples, and that section is not
cross-referenced from this index.
Acknowledgements [1]
Addresses, numerical [1]
*ALIAS [1]
Aliases [1]
ALL [1]
ALLBACK [1]
ALLGRAPH [1]
analog.cfg [1][2][3][4]
anlgform.html [1]
anlgform.pl [1]
anlghead.h [1][2]
Announcements [1]
APACHEDEFAULTLOGFORMAT [1]
APACHELOGFORMAT [1]
ARGSEXCLUDE [1]
*ARGSFLOOR [1]
ARGSINCLUDE [1]
*ARGSSORTBY [1]
Arguments in URLs [1][2]
ASCII output style [1]
*BACK [1]
Bar charts [1]
BARSTYLE [1]
BASEURL [1]
Basic commands [1]
Broken pipe [1][2]
BROW* commands - see under second part of name
Browser Report [1][2][3][4]
Browser Summary [1][2][3]
BROWREP* commands - see under second part of name
BROWSERREP [1]
BROWSERSUM [1]
BROWSUM* commands - see under second part of name
Bugs, reporting [1]
Bytes, how displayed [1]
Cache files [1]
CACHEOUTFILE [1]
CACHEFILE [1]
CASE [1]
CGI program [1]
*CHART [1]
CHARTDIR [1]
"Click-thru"s [1]
Colours [1]
*COLS [1][2]
Comma separated value output [1]
Command line arguments [1][2][3]
- logfile name (LOGFILE) [1]
- - (LOGFILE stdin) [1]
- 1 (Yearly Report) [1]
- 4 (Quarter-Hour Report) [1]
- 5 (Five-Minute Report) [1]
- 6 (Quarter-Hour Summary) [1]
- 7 (Five-Minute Summary) [1]
- A (All reports) [1]
- a (HTML/PLAIN output) [1]
- B (Browser Report) [1][2]
- b (Browser Summary) [1][2]
- c (Status Code Report) [1][2]
- C (Arbitrary configuration command) [1]
- D (Daily Report) [1]
- d (Daily Summary) [1]
- E (Redirection Report) [1][2]
- F (FROM date) [1]
- f (Referrer Report) [1][2]
- G (Default configuration file) [1]
- g (Other configuration files) [1]
- H (Hourly Report) [1]
- h (Hourly Summary) [1]
- I (Failure Report) [1][2]
- i (Directory Report) [1][2]
- J (User Failure Report) [1][2]
- j (User Redirection Report) [1][2]
- K (Failed Referrer Report) [1][2]
- k (Redirected Referrer Report) [1][2]
- L (Host Failure Report) [1][2]
- l (Host Redirection Report) [1][2]
- M (Virtual Host Failure Report) [1][2]
- m (Monthly Report) [1]
- N (Search Query Report) [1][2]
- n (Search Word Report) [1][2]
- O (Output file) [1]
- o (Domain Report) [1][2]
- P (Processing Time Report) [1]
- p (Operating System Report) [1][2]
- Q (Quarterly Report) [1]
- q (Warnings) [1]
- R (Virtual Host Redirection Report) [1][2]
- r (Request Report) [1][2]
- S (Host Report) [1][2]
- s (Referring Site Report) [1][2]
- settings (Settings of all variables) [1][2]
- T (TO date) [1]
- t (File Type Report) [1][2]
- U (Cache file) [1]
- u (User Report) [1][2]
- V (Debugging) [1]
- v (Virtual Host Report) [1][2]
- version (Just give the version number) [1]
- W (Weekly Report) [1]
- w (Hour of the Week Summary) [1]
- X (Goto's) [1]
- x (General Summary) [1]
- Y (Internal Search Query Report) [1][2]
- y (Internal Search Word Report) [1][2]
- Z (Organisation Report) [1][2]
- z (File Size Report) [1]
Compilation problems [1]
Compiling [1]
Compressed logfiles [1]
COMPSEP [1]
Computer-readable output style [1]
CONFIGFILE [1]
Configuration files [1][2][3][4]
Configuration file, default [1]
Configuration file, mandatory [1]
Contents [1]
Contributors [1]
Cookies [1]
Corrupt logfile lines, definition [1]
Countries [1]
Crashes [1]
CSV output [1]
Customising analog [1]
Daily Report [1][2][3]
Daily Summary [1][2][3]
DAILYREP [1]
DAILYSUM [1]
Date reports [1][2]
Dates, restricting [1]
DAYREP* commands - see under second part of name
DAYSUM* commands - see under second part of name
Debugging [1]
DECPOINT [1]
Default configuration file [1]
Default logfile format [1]
DEFAULTLOGFORMAT [1]
Definitions [1]
DESCFILE [1][2]
DESCRIPTIONS [1]
DIR* commands - see under second part of name
DIRECTORY [1]
Directory Report [1][2][3][4]
DIRSUFFIX [1]
DNS [1]
DNS lookups [1]
DNSBADHOURS [1]
DNSFILE [1]
DNSGOODHOURS [1]
DNSLOCKFILE [1]
DNSTIMEOUT [1]
DOM* commands - see under second part of name
DOMAIN [1]
Domain Report [1][2][3][4][5]
Domains file [1]
DOMAINSFILE [1]
ERRFILE [1]
ERRLINELENGTH [1]
error_log [1][2]
Error Report [1]
Errors [1]
Example reports [1]
Examples of each command [1]
*EXCLUDE [1]
Exclusions [1]
FAIL* commands - see under second part of name
Failed Referrer Report [1][2][3][4]
Failed requests, definition [1]
FAILHOST [1]
FAILHOST* commands - see under second part of name
FAILREF [1]
FAILREF* commands - see under second part of name
FAILURE [1]
Failure Report [1][2][3][4]
FAILUSER [1]
FAILUSER* commands - see under second part of name
FAILVHOST [1]
FAILVHOST* commands - see under second part of name
FAQ [1]
Fatal errors [1]
FILE* commands - see under second part of name
File, definition [1]
File Size Report [1][2][3]
File Type Report [1][2][3][4]
FILETYPE [1]
Filters [1]
First day of week [1]
FIVEREP [1]
FIVEREP* commands - see under second part of name
FIVESUM [1]
FIVESUM* commands - see under second part of name
Five-Minute Report [1][2][3]
Five-Minute Summary [1][2][3]
*FLOOR [1][2][3]
FOOTERFILE [1]
Form interface [1]
Frequently Asked Questions [1]
FROM [1]
GENERAL [1]
General Summary [1][2]
GENSUMLINES [1]
GOTOS [1]
*GRAPH [1]
Graphs [1]
HEADERFILE [1]
Helper applications [1]
Hierarchical reports [1]
Hits [1]
Home page [1]
HOST [1]
HOST* commands - see under second part of name
Host, definition [1]
Host Failure Report [1][2][3]
Host Redirection Report [1][2][3]
Host Report [1][2][3]
HOSTNAME [1]
Hostnames, numerical [1]
HOSTREP* commands - see under second part of name
HOSTURL [1]
Hour of the Week Summary [1][2][3]
Hourly Report [1][2][3]
Hourly Summary [1][2][3]
HOURLYREP [1]
HOURLYSUM [1]
HOURREP* commands - see under second part of name
HOURSUM* commands - see under second part of name
HTML output style [1]
HTMLPAGEWIDTH [1]
IMAGEDIR [1]
*INCLUDE [1]
Inclusions and exclusions [1]
Incremental processing [1]
Internal Search Query Report [1][2][3][4]
Internal Search Word Report [1][2][3][4]
Introduction [1]
INTSEARCHENGINE [1]
INTSEARCHQUERY [1]
INTSEARCHQUERY* commands - see under second part of name
INTSEARCHWORD [1]
INTSEARCHWORD* commands - see under second part of name
IP addresses [1]
LANGFILE [1]
LANGUAGE [1]
Languages [1][2]
LASTSEVEN [1]
LATEX output style [1]
Licence [1][2]
*LINKEXCLUDE [1]
*LINKINCLUDE [1]
LOCALCHARTDIR [1]
LOGFILE [1]
Logfile formats [1][2]
Logfile prefix [1]
Logfiles [1]
Logfiles, choosing [1]
Logfiles, compressed [1]
Logfiles, finding [1]
LOGFORMAT [1]
LOGO [1]
LOGTIMEOFFSET [1]
Low memory [1]
*LOWMEM [1][2]
Mailing lists [1]
Makefile [1]
Mandatory configuration file [1]
Map [1]
MARKCHAR [1]
Meaning of reports [1]
Memory, using less [1]
MINGRAPHWIDTH [1]
MONTH* commands - see under second part of name
MONTHLY [1]
Monthly Report [1][2][3]
Non-time reports [1][2]
NOROBOTS [1]
Numerical addresses [1]
Numerical hostnames [1]
Operating System Report [1][2][3][4]
ORG* commands - see under second part of name
ORGANISATION [1]
Organisations, definition [1]
Organisation Report [1][2][3][4]
OS Report [1][2][3][4]
OS* commands - see under second part of name
OSREP [1]
OUTFILE [1]
OUTPUT [1]
Output aliases [1]
OUTPUT COMPUTER [1][2]
Output, configuring [1]
Output style, computer-readable [1]
Output styles [1]
Page, definition [1]
PAGEEXCLUDE [1]
PAGEINCLUDE [1]
Pages, defining [1]
*PAGEWIDTH [1]
Path through site [1]
Pie charts [1]
PLAIN output style [1]
PLAINPAGEWIDTH [1]
Processing Time Report [1][2][3]
PROCTIME [1]
PROCTIME* commands - see under second part of name
PROGRESSFREQ [1]
QUARTERLY [1]
Quarterly Report [1][2][3]
QUARTERLY* commands - see under second part of name
QUARTERREP [1]
QUARTERREP* commands - see under second part of name
QUARTERSUM [1]
QUARTERSUM* commands - see under second part of name
Quarter-Hour Report [1][2][3]
Quarter-Hour Summary [1][2][3]
Quick reference [1]
RAWBYTES [1]
REDIR [1]
REDIR* commands - see under second part of name
Redirected Referrer Report [1][2][3][4]
Redirected requests, definition [1]
Redirection Report [1][2][3][4]
REDIRHOST [1]
REDIRHOST* commands - see under second part of name
REDIRREF [1]
REDIRREF* commands - see under second part of name
REDIRUSER [1]
REDIRUSER* commands - see under second part of name
REDIRVHOST [1]
REDIRVHOST* commands - see under second part of name
REF* commands - see under second part of name
REFARGSEXCLUDE [1]
REFARGSINCLUDE [1]
REFDIR [1]
Reference, quick [1]
REFERRER [1]
Referrer, definition [1]
Referrer Report [1][2][3][4]
Referring Site Report [1][2][3][4]
REFREP* commands - see under second part of name
REFSITE [1]
REFSITE* commands - see under second part of name
Regular expressions [1][2]
Report descriptions, in documentation [1]
Report descriptions, in output [1]
Report descriptions file [1][2]
Report.html [1][2]
Reporting bugs [1]
REPORTORDER [1]
Reports, list of [1][2]
REPORTSPAN [1]
REPORTSPANTHRESHOLD [1]
REPSEPCHAR [1]
REQ* commands - see under second part of name
REQUEST [1]
Request Report [1][2][3][4]
Requests, definition [1]
Requests for pages, defining [1]
Requests for pages, definition [1]
Requests, types of [1]
Robots, discouraging [1]
Robots, identifying [1]
ROBOTEXCLUDE [1]
ROBOTINCLUDE [1]
*ROWS [1]
RUNTIME [1]
Sample reports [1]
Search arguments [1][2] -- see also Search Query Report and Search Word Report below
Search Query Report [1][2][3][4]
Search Word Report [1][2][3][4]
SEARCHCHARCONVERT [1]
SEARCHENGINE [1]
SEARCHQUERY [1]
SEARCHQUERY* commands - see under second part of name
SEARCHWORD [1]
SEARCHWORD* commands - see under second part of name
Search engines, discouraging [1]
SEPCHAR [1]
SETTINGS [1][2]
SIZE [1]
SIZE* commands - see under second part of name
*SORTBY [1][2][3]
Source code [1]
Spiders, discouraging [1]
Starting to use analog [1]
Starting to use analog on a Mac [1]
Starting to use analog on Windows [1]
Starting to use analog on other platforms [1]
STATUS [1]
Status Code Report [1][2][3]
STATUS* commands - see under second part of name
STYLESHEET [1]
SUBBROW [1]
SUBDIR [1]
Subdirectories [1]
SUBDOMAIN [1]
Subdomains [1]
SUB*FLOOR [1]
SUBORG [1]
SUB*SORTBY [1]
SUBTYPE [1]
Successful requests, definition [1]
Syntax [1][2]
Time reports [1][2]
TIMECOLS [1]
TIMEOFFSET [1]
Times, restricting [1]
Title line [1][2]
TO [1]
Total requests, definition [1]
Translators [1]
Tree reports [1]
TYPE* commands - see under second part of name
UNCOMPRESS [1]
Unknown domains [1][2]
Unresolved numerical addresses [1][2]
Unwanted logfile entries, definition [1]
Upgrading from earlier versions [1]
USER [1]
USER* commands - see under second part of name
USERCASE [1]
User Failure Report [1][2][3]
User Redirection Report [1][2][3]
User Report [1][2][3]
USERREP* commands - see under second part of name
VHOST [1]
VHOST* commands - see under second part of name
VHOSTREP* commands - see under second part of name
Virtual domains/virtual hosts [1][2]
Virtual Host Failure Report [1][2][3]
Virtual Host Redirection Report [1][2][3]
Virtual Host Report [1][2][3]
Visitors [1]
Visits [1]
WARNINGS [1]
Warnings [1][2]
WEEK* commands - see under second part of name
WEEKBEGINSON [1]
WEEKHOUR [1]
WEEKHOUR* commands - see under second part of name
WEEKLY [1]
Weekly Report [1][2][3]
What was new? [1][2][3][4]
What's new? [1][2]
YEAR* commands - see under second part of name
YEARLY [1]
Yearly Report [1][2][3]
304ISSUCCESS [1]
[ A | B | C |
D | E | F |
G | H | I | J | K |
L | M | N |
O | P | Q |
R | S | T |
U | V | W | X |
Y | Z | numbers]
Go to the analog home page.Stephen Turner
26 March 2001
Need help with analog? Use the analog-help mailing list.