The basic command to specify a log format looks like
LOGFORMAT format-- we'll discuss what the formats can be in a minute. The LOGFORMAT command only applies to logfiles specified with a LOGFILE command later in the same configuration file. So you must put the LOGFORMAT above the LOGFILE to which it refers. This way, different logfiles can have different formats, like this:
LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3In this example, log1 is in format1, log2 and log3 are in format2, and log0 isn't in either format -- analog will try and detect which format it's in.
There are format words for all the built-in formats analog knows about. You might need one of these words if your logfile is in a standard format, but analog can't detect which format it's in for some reason; for example, maybe the first line is corrupt; or maybe analog can't tell whether you're using North American or international dates. So for example
LOGFORMAT COMMONwill select common format; you can also have COMBINED, REFERRER, BROWSER, EXTENDED, MICROSOFT-NA (North American date format), MICROSOFT-INT (international date format), MS-EXTENDED (Microsoft's attempt at extended format), MS-COMMON (a buggy version of common format in some versions of Microsoft software), NETSCAPE or WEBSTAR. All these formats were defined at the end of the previous section. You can also use the special word AUTO to return to automatic detection.
If your logfile is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows. Please note that these codes are case sensitive -- for example, %b is completely different from %B!
jay.bird.com - fred [14/Mar/1996:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243can be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)In other words, it's just the sample line but with the hostname replaced by %S, the username by %u etc. (The parentheses are needed because the argument contains spaces.) Or take another example: if you had lines which looked like
Fri 19/02/99 6:38pm, /index.html, proxy0576.isp.com, 200, 5076, http://www.ref.com, Mozilla/2.0 (compatible)you could use the format
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)
LOGFORMAT COMMON LOGFORMAT COMBINED LOGFILE log1 LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B) LOGFILE log2 LOGFILE log3log1 has lines in both common and combined format, whereas log2 and log3 have lines just in the format in the previous example.
If you specify several formats, analog tries to match each line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats.
So let's go back to the first example:
LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3Here log0 actually gets the default log format. If there are no DEFAULTLOGFORMAT commands, the default will be auto-detection. But if there are DEFAULTLOGFORMAT commands, even in another configuration file, that will be the format of log0.
The times you need to use the DEFAULTLOGFORMAT instead of the LOGFORMAT are if you want to change the format of logfiles which aren't given in a LOGFILE command -- for example, ones specified on the command line, or dragged onto the program icon on a Mac, or compiled in). It is also useful to use the DEFAULTLOGFORMAT if your logfiles are always in the same format, so that you don't have to worry about putting in enough LOGFORMATs in the right places.
The log formats which analog can handle are those which are known as instantaneously decipherable: this means that the character which terminates a string can never occur in the string. So for example, in common format, which looks like
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, not at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both %m and %M, for example, because these both represent the month; make one of them a %j to have it ignored).
Sometimes you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like
http://guide-p.infoseek.com/Titles -> /~sret1/analog/But the requests for /~sret1/analog/ would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a * in that item in the format string, like this:
LOGFORMAT (%f -> %*r)
A tip: sometimes it is more efficient to specify two or more adjacent fields to ignore with a single %j, as long as the whole group ends with a recognisable character. So common format is actually given as
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)--- in the date and time [14/Mar/1996:17:45:35 +0000], the seconds and the timezone can be ignored with a single %j, extending until the close-bracket.