Important: For security reasons, you must not attempt to run analog itself as a CGI program, or even leave it in the directory or folder with your web files or CGI programs. When the form interface runs analog for you, it checks that analog isn't given any dangerous options. Without this check, your system could be vulnerable to attack.
Please don't try and set up the form until analog has been set up and is running properly on its own. It just adds another level of complexity to troubleshoot. And unlike analog itself, the form interface will not run "out of the box". You have to read this section to find out how to set it up.
The form interface is suitable for ordinary users to use, but it needs to be set up by a system administrator or other expert. In order to set it up, you have to be running a web server. You need to know what CGI programs are, where they live on your server, and how to set up their permissions properly. You also need to know how to write HTML forms. I shall assume this level of background knowledge for the rest of this section. And you have to be running Perl 5.001 or later: see Technical details below for other system requirements. (Actually, if you're on Windows and don't have Perl, you can download an executable version of the form interface from the helper applications page.)
Warning: CGI programs can contain security loopholes which allow an unscrupulous user to harm your system. (If you don't know about this, you shouldn't be running CGI programs at all. Read and understand the World Wide Web Security FAQ and the CGI Security FAQ first.) I have tried to make this form interface safe, but I cannot guarantee it. Even the most carefully-designed CGI programs can accidentally have serious security bugs. And I take no responsibility if anything goes wrong: you use it at your own risk. (See the licence.) Furthermore, you should be aware that unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, setting up the form interface implies making analog executable, and your logfiles analysable, by anyone on the internet. There are more notes on security design in this program towards the end of this section.
The form interface consists of two parts: a form (called anlgform.html) to choose the options, and a cgi program (called anlgform.pl) to pass them to the analog program. Both anlgform.html and anlgform.pl must be configured to your system before they will work at all. There are instructions at the top of both files explaining how to do this.
The form which is distributed with the program should only be regarded as an example form. You can find forms in languages other than English in the lang directory. Or you can write your own if you prefer. In fact you don't actually need the form at all: if you want just to create a link to the cgi program, with the arguments passed after a question mark in the URL in the usual way, then that's fine.
Logfile name: <input type=text name="LOGFILE">or maybe something like
<select name=LOGFILE size=1> <option value="/var/log/apache/fred"> Fred's logfile <option value="/var/log/apache/jane"> Jane's logfile </select>
There are a few commands which you can't specify on the form for security or performance reasons. The full list is *LOGFORMAT, LANGFILE, HEADERFILE, FOOTERFILE, UNCOMPRESS, OUTFILE, CACHEOUTFILE, ERRFILE, DNS and SETTINGS; and the person setting up the form can add more. There are also certain arguments you can't give to commands: the most important is that you can't include the wildcard * in the LOGFILE. See the security notes below for the reasons for these exclusions, and for some more commands you might want to add to the forbidden list.
Alias this file: <input type=text name="FILEALIAS1"> To this one: <input type=text name="FILEALIAS2">You can only specify one such pair this way; so there's no way to specify several of the same ALIAS, for example.
Then there are FLOOR commands. To avoid users of the form having to know the syntax of these commands, you can if you want specify them in two halves, FLOORA and FLOORB, and they will be stuck together. For example, the form distributed with the program specifies
<br>Include all domains with at least <input type=TEXT name="DOMFLOORA" maxlength=6 size=6> <select name="DOMFLOORB"> <option value=r>requests <option value=p>requests for pages <option value=b selected>bytes </select>If DOMFLOORA contains 5% and DOMFLOORB contains r, then DOMFLOOR 5%r will be sent to the program. (Or DOMFLOORA=5 and DOMFLOORB=%r would work too, if you chose to present the form that way.)
Secondly, you can specify other configuration files to be included at specific times. When analog is called by the CGI program, it first processes the default configuration file as usual. Then it processes any configuration file specified by an option with name cg. Then it processes all the other commands which the CGI program specifies. After that, it processes any configuration file specified by an option with name cm. Finally, it processes the mandatory configuration file as usual. (You may therefore want two copies of analog, one for form use and one for non-form use, with different configuration files compiled in.) Note that the commands in the default and mandatory configuration files will contribute to the configuration: some of them may even override options specified on the form. For example, if the default configuration file contains an INCLUDE command, this may cause INCLUDE and EXCLUDE commands specified on the form to behave unexpectedly.
There are a couple of commands which the form always sets. These may override what you have set elsewhere. First, it sets either DNS READ (if a DNSFILE is set on the form) or DNS NONE (otherwise). You can override this behaviour in the mandatory configuration file, but you are likely to run into timeout problems if you do. Secondly, it always sets WARNINGS FL, so that the less important warnings don't fill up your server's error log. You can override this by sending an explicit WARNINGS command from the form.
There is one small point about compressed logfiles. For security reasons, when using the form interface you need to specify the full pathname to the uncompression command in the UNCOMPRESS command in your configuration file.
First, you can run anlgform.pl from the (DOS or Unix) command line. This is good enough to debug most problems. You can specify options in pairs like this:
anlgform.pl qv=1 LOGFILE=/some/log REQINCLUDE=pagesIf you include qv=1 in the argument list as above, you will see what anlgform.pl is trying to send to analog. If you don't include qv=1, anlgform.pl will try and run analog.
If it still doesn't work, check the following points:
First, you should think about who can run the form interface. Unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, adding the form interface to your site implies making analog executable, and your logfiles analysable, by anyone on the internet. There are obvious concerns both about privacy and about the load on your system.
Certain commands are ignored by anlgform.pl and not passed to analog. The list of them can be found at the top of anlgform.pl. Here are the reasons for them. HEADERFILE and FOOTERFILE would place any file on your system within the output. The *LOGFORMAT commands would also allow any file to be read, because someone could designate each line to be a single filename and then just list the filenames. OUTFILE, CACHEOUTFILE and ERRFILE would allow people to write to your filespace; ERRFILE would also divert errors away from your error log. UNCOMPRESS would allow a user to execute any command. DNS is forbidden because setting it higher than READ would normally cause the process to time out.
None of the above should be deleted (unless you are really, really sure that it's completely impossible for anyone other than yourself to run anlgform.pl). There are two other commands which are forbidden by default but which you could consider removing from the forbidden list. SETTINGS is included because it will give away the locations of some files on your system. But it is useful for diagnostic purposes, and you could consider removing it temporarily if you have trouble setting up the form. The other command which is included is LANGFILE, although I consider it to be a lower risk. It is included because it is theoretically possible that another file could be exactly the right number of lines long to be accepted as a language file, and then parts of it would get into the output. But it would have to be exactly the right length first. If that's a risk you're prepared to take, you can remove LANGFILE from the list.
There are other commands which you might consider adding to the list. For example, it is theoretically possible (though rather unlikely), that another file on your system could conform sufficiently closely to one of the predefined log formats that analog could be persuaded to analyse it and so reveal some of its contents. If you're worried about this, or even if you want to force only one particular logfile to be analysed from the form, you can add the LOGFILE command to the list of forbidden commands. And you could add DOMAINSFILE for similar reasons.
You can of course add any command you like to the list. For example, a user can use any configuration file on your system unless you add all of CONFIGFILE, CM and CG. Or if you wanted to stop a user having control of which warnings were written to the error log, you could add WARNINGS.
The arguments to LOGFILE and CACHEFILE commands are checked for containing only certain allowed characters (specifically, letters, digits, /\.:_ space, and - between two {letter, digit, underscore}'s). This is because they could match an UNCOMPRESS command and thus be passed to the shell when the uncompress command is popen()'ed.
Apart from that, command names are checked for containing only letters and the digits 1 and 2; and the arguments to commands are checked for not containing control characters (actually characters 0-32 and 127-159; in particular newline characters are prohibited). The length of the commands isn't checked by anlgform.pl, but buffer overflow shouldn't be an issue as configuration commands are checked for length by analog.
By the way, the reason that I advise that analog itself shouldn't be used as a CGI program is that some servers, notably Microsoft IIS, allow users to pass command line arguments into a CGI program. And even if the program doesn't return the proper CGI headers, the output can be sent back to the user. This means that all the above checking of arguments is then thwarted. Of course, on servers on which you can't pass command line arguments to a CGI program, there are not the same security concerns, but then analog isn't very useful as a CGI program because if you can't pass any arguments, you can only get the default output.
On Windows, you have to associate the .pl extension with the Perl executable so that Perl scripts are executed by Perl.
anlgform.pl will understand the GET or POST methods of form submission. The HTML spec says that GET should be used when, as in this case, running the program has no side effects. However, section 15.1.3 of the HTTP spec says that POST should be used if some of the options being passed might be confidential. Also, very long URLs, formed by specifying lots of options, can cause trouble to some older servers. So anlgform.html uses the POST method by default. However, the GET method will also work. For example, you could make a normal link to anlgform.pl with options specified after a question mark in the usual GET way.