LOGFILE logfilenameor just to put the logfile name on the command line without any arguments, e.g., analog logfilename. A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. All logfiles must be within your computer's file system (on disk, or at least mounted under Unix, or on a mapped drive under NT) -- analog won't use FTP or HTTP to fetch them from the internet. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon.
You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:
LOGFILE logfile1,*.log LOGFILE c:\logs\logfile2Or if you were on a Mac, you might use something like
LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*"The LOGFILE commands are cumulative, except that any logfiles on the command line or in user-specified configuration files override any in the default configuration file, and are themselves overridden by any in the mandatory configuration file. There is also the special command
LOGFILE nonewhich erases the list of logfiles specified so far.
If your logfile is not in one of the standard formats, you will probably still be OK, because it is possible to tell analog about other formats using a LOGFORMAT command. This is explained in the next section. But most users don't ever need to know about this because they have logfiles in a standard format. So the best thing to do is just to try analysing your logfile and see if analog will understand it. If it does, you don't need to worry about LOGFORMATs.
If analog can't understand your logfile, it will warn you that it can't detect the format, or possibly that it found a lot of corrupt lines. There are basically four reasons why this might happen:
LOGFILE log1,log2 http://www.%v.mydomain.comwould translate a filename /file.html with virtual host host1 in log1 or log2 to http://www.host1.mydomain.com/file.html. If you are using the second argument to the LOGFILE command, you will probably want to use the SUBDIR command as well.
If %v is included in the argument and the logfile line doesn't have a virtual host, that line will be marked as corrupt. If VHOSTLOWMEM 3 is specified, the %v's will not be translated and will just appear as %v in the output.
UNCOMPRESS *.gz,*.Z /usr/bin/gzcatwhereas on Windows NT, you might use
UNCOMPRESS *.gz ("c:\Program Files\gzip\gzip" -cd)This would be a suitable command to include in the default configuration file.
If analog determines when it starts to uncompress a logfile that that file isn't wanted for the analysis, two undesirable things can happen. Either the program might pause until the logfile is fully uncompressed, or there might be a "broken pipe" error reported. This is system dependent, and out of analog's control.
The common logfile format is written by most servers. Its lines look like
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243(except all on one line). Some versions of Microsoft software have a buggy version of this with an extra quote mark before the HTTP like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ "HTTP/1.0" 200 1243Analog will understand these, but (as with any two formats) it will reject lines if the format changes half way through.
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/and the browser (or agent) log looks like
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)In the referrer log, the date can be omitted.
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"(except all one line). If you are using the Apache server, you can generate this with the mod_log_config module, using the command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""It is usually better to use the combined log than separate logs, because it stores more information in less space.
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10, 2178, 303, 1243, 200, 0, GET, /~sret1/, -,(except all on one line). However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date 12/25/98 instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, it will advise you to use the command LOGFORMAT MICROSOFT-NA for North American date format, or LOGFORMAT MICROSOFT-INT for international date format. In some countries, the date will not be in either of these formats, in which case you need to write your own LOGFORMAT command.
There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. But they all do it in different ways, so analog can't automatically diagnose them, and again, you need to write a LOGFORMAT command for them.
12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178(except all on one line, and with the fields separated by tabs). It suffers from the same problem with ambiguous dates as the IIS logfile (above), so again you might have to use LOGFORMAT WEBSITE-NA or LOGFORMAT WEBSITE-INT, or even have to write your own LOGFORMAT command.
If analog finds that the header line is corrupt, it will usually tell you what was wrong with it. Here are two common problems. First, the header line musn't contain the same item twice, even under two different names. (This is because analog doesn't know which one you want to use.) If it does contain the same item twice, you will have to use a LOGFORMAT command to tell analog which one you want to ignore.
Secondly, you're not allowed the time without the date or vice versa -- in particular, having the date just at the top of the logfile is not sufficient; you must have it on each line. Microsoft servers produce extended logs with the date only at the top. But if the date changes during the logfile, the server doesn't then write a new date line. For this reason analog can't analyse such logfiles safely. There are some programs on the helper applications page to put the date on each line. If you already have such a logfile you might want to use one of these programs, but they have to assume that the date doesn't change during the logfile, so it would be safer to tell your server to log in a better format in future.
The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like
#Fields: date time cs-uriIn the rest of the logfile, the fields can be separated by spaces or tabs. There is also Microsoft's attempt at the extended format -- unfortunately they didn't read the spec., so they didn't enclose the browser and referrer in quotes, they replaced spaces in the browser name with +'s, and they put the time taken to serve the request in milliseconds instead of seconds. Extended logs always record the time in GMT, so you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone.
The WebSTAR format is described at http://www.starnine.com/webstar/docs/ws4manual.3f.html. It has a header line like
!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAMEIn the rest of the logfile, the fields are separated by tabs. The WebSTAR server also records the time in GMT, so again you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too.
Finally, the Netscape header line looks like
format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%