LOGFILE logfilenameor just to put the logfile name on the command line without any arguments, e.g., analog logfilename. A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. The word none means that the list of logfiles specified so far is erased. All logfiles must be on your local disk -- analog doesn't fetch them from across the network. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon.
You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:
LOGFILE logfile1,*.log LOGFILE c:\logs\logfile2The LOGFILE commands are cumulative, except that any logfiles on the command line or in user-specified configuration files override any in the default configuration file, and are themselves overridden by any in the mandatory configuration file.
The reason for the "sometimes" in the previous paragraph is as follows. The Microsoft and Netpresenz formats are extremely badly designed in that the date can occur in either of the forms date/month/year or month/date/year, and they don't say which they're using. Analog will detect them automatically if it can tell which date format is being used (e.g., 13/2/98 or 2/13/98), but if it can't, it will tell you to use one of the LOGFORMAT strings below. Also, the NCSA browser log can only be detected if it includes the date.
There are two types of argument to the LOGFORMAT command: either you can specify a symbolic word, or a log format string. We'll look at the words first.
The command
LOGFORMAT COMMONwill select common format; you can replace COMMON with COMBINED, REFERRER, BROWSER, EXTENDED, MICROSOFT-NA (North American date format), MICROSOFT-INT (international date format), NETSCAPE, WEBSTAR, NETPRESENZ-NA (North American) or NETPRESENZ-INT (international) to get one of the above types of logfile. The command
LOGFORMAT AUTOwill return to automatic detection. The command LOGFILE none also returns the log format to AUTO.
If your logfile is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows.
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243can be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)including two items, host and file. (The parentheses are needed because the argument contains spaces.)
Logfiles often contain lines in several different formats, so you can specify several log formats for one file. For example, the definition of common format should also include the line
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)to handle lines where the HTTP/1.0 part of the request is absent. Or you might use
LOGFORMAT COMMON LOGFORMAT COMBINEDto represent a logfile which had lines in both those formats. Analog tries to match the line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats.
The log formats which analog can handle are those which are known as instantaneously decipherable: this means that the character which terminates a string can never occur in the string. In the above example, if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, not at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both %m and %M, for example).
Sometimes you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like
[14/Mar/1996:17:48:10] http://guide-p.infoseek.com/Titles -> /~sret1/analog/But the requests for /~sret1/analog/ would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a * in that item in the format string, like this:
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r)Any of the seven items can be treated in this way.
Here are the exact rules about which logfile gets which log formats. Log formats accumulate until a LOGFILE command intervenes; or until you specify LOGFORMAT AUTO to return to automatic detection; or until a LOGFILE none command or the end of the command line or of a configuration file, when the format is reset to AUTO implicitly. Conversely, if you specify several logfiles, they will all use the same formats, unless there's another LOGFORMAT command or an implicit return to AUTO format between them.
LOGFILE log1,log2 http://www.%v.mydomain.comwould translate a filename /file.html with virtual host spam in log1 or log2 to http://www.spam.mydomain.com/file.html. If you are using the second argument to the LOGFILE command, you will probably want to use the SUBDIR command as well.
If %v is included in the argument and the line doesn't have virtual host, that line will be marked as corrupt. If VHOSTLOWMEM 3 is specified, the %v's will not be translated and will just appear as %v in the output.
LOGTIMEOFFSET -300 LOGFILE summer*.log LOGTIMEOFFSET -360 LOGFILE winter*.log
While we're on the subject of time offsets, there is one other similar command, which is not directly to do with logfiles. You can specify a TIMEOFFSET command to say how much analog should offset the time of the computer on which it is running, to get your local time.
UNCOMPRESS *.gz,*.Z /usr/bin/gzcatwhereas on Windows NT, you might use
UNCOMPRESS *.gz "c:\Program Files\gzip\gzip -cd"and on VMS, it could be
UNCOMPRESS *.LOG-GZ;* "gunzip -c"This would be a suitable command to include in the default configuration file.
If analog determines when it starts to uncompress a logfile that that file isn't wanted for the analysis, two undesirable things can happen. Either the program might pause until the logfile is fully uncompressed, or there might be a "broken pipe" error reported. This is system dependent, and out of analog's control.
The common logfile format is written by most servers. Its lines look like
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243Specifying LOGFORMAT COMMON is the same as specifying the three commands
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b) LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b) LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j" %c %b)
[14/Mar/1996:17:48:10] http://guide-p.infoseek.com/Titles -> /~sret1/analog/and the browser (or agent) log looks like
[14/Mar/1996:17:45:08] Mozilla/2.0 (X11; I; HP-UX A.09.05 9000/735)The respective LOGFORMAT commands are
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r) LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B)In both of these logfiles the date can be omitted, except if the date is omitted in the browser log, analog will not be able to detect the log format automatically. (It doesn't contain enough clues, so there is too much danger of confusing other log formats with it; just use "LOGFORMAT %B").
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 "http://www.statslab.cam.ac.uk/" "Mozilla/2.0 (X11; I; HP-UX A.09.05 9000/735)"except all one line. If you are using the Apache server, you can generate this with the mod_log_config module, using the command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""The corresponding LOGFORMAT commands are
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B") LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b "%f" "%B") LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j" %c %b "%f" "%B")It is usually better to use the combined log than separate logs, because it stores more information in less space.
The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like
#Fields: date time cs-uriIn the rest of the logfile, the fields can be separated by spaces or tabs. The WebSTAR file has a header line like
!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAMEIn the rest of the logfile, the fields are separated by tabs. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too. Finally, the Netscape header line looks like
format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%
Sometimes these three logfile formats can contain header lines which refer to the same item in two different ways. Analog doesn't know which one you want to count, so such header lines will generate a "corrupt format line" warning. You can then use a LOGFORMAT command to specify the format more precisely.
192.64.25.41, -, 21/02/97, 00:03:46, W3SVC, SPIDER, 192.16.225.10, 30, 303, 1455, 200, 0, GET, /siege.htm, -,(except all on one line) or
LOGFORMAT (%S, %u, %d/%m/%y, %h:%n:%j, W3SVC, %j, %v, %j, %j, %b, %c, %j, %j, %r, %j,)However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date 02/21/97 instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, you need to use the LOGFORMAT command MICROSOFT-NA for North American date format, or MICROSOFT-INT for international date format.
5:54 pm 14/11/96 134.87.19.110 HTTP get file Research.html Web:Research:Research.html Referer: http://guide-p.infoseek.com/TitlesThe fields are separated by tabs. It is equivalent to four LOGFORMAT commands:
LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%j\t\n%R\nReferer: %f) LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%j\t\n%R) LOGFORMAT (%h:%n %aM\t%m/%d/%y\t%S\tHTTP\t\t%C\t%R) LOGFORMAT (%j)Again, the Netpresenz format uses local conventions for the date and time. Analog will diagnose it where it can: otherwise, you will have to use
LOGFORMAT NETPRESENZ-NA # dates like 9:14 AM 3/23/98 (upper case AM)or
LOGFORMAT NETPRESENZ-INT # dates like 9:14 am 23/3/98 (lower case am)It can even be that the date and time is in neither of these forms, in which case you will have to enter your own LOGFORMAT string.