[ Top | Up | Prev | Next | Map | Index ]

Readme for analog 4.04

Form interface and CGI program

The form interface provides an HTML front end to analog, on Unix or Windows platforms (and maybe others). That means that users can select options from a web page, instead of having to create a configuration file.

Important: For security reasons, you must not attempt to run analog itself as a CGI program, or even leave it in the directory or folder with your web files or CGI programs. When the form interface runs analog for you, it checks that analog isn't given any dangerous options. Without this check, your system could be vulnerable to attack.

Please don't try and set up the form until analog has been set up and is running properly on its own. It just adds another level of complexity to troubleshoot. And unlike analog itself, the form interface will not run "out of the box". You have to read this section to find out how to set it up.

The form interface is suitable for ordinary users to use, but it needs to be set up by a system administrator or other expert. In order to set it up, you have to be running a web server. You need to know what CGI programs are, where they live on your server, and how to set up their permissions properly. You also need to know how to write HTML forms. I shall assume this level of background knowledge for the rest of this section. And you have to be running Perl 5.001 or later: see Technical details below for other system requirements. (Actually, if you're on Windows and don't have Perl, you can download an executable version of the form interface from the helper applications page.)

Warning: CGI programs can contain security loopholes which allow an unscrupulous user to harm your system. (If you don't know about this, you shouldn't be running CGI programs at all. Read and understand the World Wide Web Security FAQ and the CGI Security FAQ first.) I have tried to make this form interface safe, but I cannot guarantee it. Even the most carefully-designed CGI programs can accidentally have serious security bugs. And I take no responsibility if anything goes wrong: you use it at your own risk. (See the licence.) Furthermore, you should be aware that unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, setting up the form interface implies making analog executable, and your logfiles analysable, by anyone on the internet. There are more notes on security design in this program towards the end of this section.

The form interface consists of two parts: a form (called anlgform.html) to choose the options, and a cgi program (called anlgform.pl) to pass them to the analog program. Both anlgform.html and anlgform.pl must be configured to your system before they will work at all. There are instructions at the top of both files explaining how to do this.

The form which is distributed with the program should only be regarded as an example form. You can find forms in languages other than English in the lang directory. Or you can write your own if you prefer. In fact you don't actually need the form at all: if you want just to create a link to the cgi program, with the arguments passed after a question mark in the URL in the usual way, then that's fine.


Almost every analog configuration command can be specified on the form, just by including a form element with that name on the form. So, for example, if you wanted to add a field for users to choose a logfile, you could write
Logfile name: <input type=text name="LOGFILE">
or maybe something like
<select name=LOGFILE size=1>
  <option value="/var/log/apache/fred"> Fred's logfile
  <option value="/var/log/apache/jane"> Jane's logfile
</select>

There are a few commands which you can't specify on the form for security or performance reasons. The full list is *LOGFORMAT, LANGFILE, HEADERFILE, FOOTERFILE, UNCOMPRESS, OUTFILE, CACHEOUTFILE, ERRFILE, DNS and SETTINGS; and the person setting up the form can add more. There are also certain arguments you can't give to commands: the most important is that you can't include the wildcard * in the LOGFILE. See the security notes below for the reasons for these exclusions, and for some more commands you might want to add to the forbidden list.


Some commands are most conveniently specified in two halves. First, there are commands which take two arguments (for example ALIASes). You can cope with these by sending two commands from the form, called COMMAND1 and COMMAND2. For example,
Alias this file: <input type=text name="FILEALIAS1">
To this one: <input type=text name="FILEALIAS2">
You can only specify one such pair this way; so there's no way to specify several of the same ALIAS, for example.

Then there are FLOOR commands. To avoid users of the form having to know the syntax of these commands, you can if you want specify them in two halves, FLOORA and FLOORB, and they will be stuck together. For example, the form distributed with the program specifies

<br>Include all domains with at least
<input type=TEXT name="DOMFLOORA" maxlength=6 size=6>
<select name="DOMFLOORB">
  <option value=r>requests
  <option value=p>requests for pages
  <option value=b selected>bytes
</select>
If DOMFLOORA contains 5% and DOMFLOORB contains r, then DOMFLOOR 5%r will be sent to the program. (Or DOMFLOORA=5 and DOMFLOORB=%r would work too, if you chose to present the form that way.)
There are a couple of extra non-analog commands which can be sent from the form. First, if the option qv=1 is set, then analog is not run, but a list of the configuration commands which would have been sent to analog is printed instead. This is useful for checking that the CGI program is working properly. It can also allow users to produce a configuration file from form settings.

Secondly, you can specify other configuration files to be included at specific times. When analog is called by the CGI program, it first processes the default configuration file as usual. Then it processes any configuration file specified by an option with name cg. Then it processes all the other commands which the CGI program specifies. After that, it processes any configuration file specified by an option with name cm. Finally, it processes the mandatory configuration file as usual. (You may therefore want two copies of analog, one for form use and one for non-form use, with different configuration files compiled in.) Note that the commands in the default and mandatory configuration files will contribute to the configuration: some of them may even override options specified on the form. For example, if the default configuration file contains an INCLUDE command, this may cause INCLUDE and EXCLUDE commands specified on the form to behave unexpectedly.


anlgform.pl usually sends the commands to analog in the order in which it received them, which should be the same as the order they occurred in the form. But there are some exceptions. First, all commands of the same name are grouped together. So an interleaved sequence of INCLUDEs and EXCLUDEs won't work, for example. Secondly, even though the names of commands are case-insensitive, commands of the same name but in different cases may come in the wrong order. Keep them in the same case! Thirdly, WARNINGS and LOGTIMEOFFSET are sent first (and thus the LOGTIMEOFFSET applies to any logfiles specified on the form).

There are a couple of commands which the form always sets. These may override what you have set elsewhere. First, it sets either DNS READ (if a DNSFILE is set on the form) or DNS NONE (otherwise). You can override this behaviour in the mandatory configuration file, but you are likely to run into timeout problems if you do. Secondly, it always sets WARNINGS FL, so that the less important warnings don't fill up your server's error log. You can override this by sending an explicit WARNINGS command from the form.

There is one small point about compressed logfiles. For security reasons, when using the form interface you need to specify the full pathname to the uncompression command in the UNCOMPRESS command in your configuration file.


Troubleshooting

Here is what to do if you are having problems setting up the form interface.

First, you can run anlgform.pl from the (DOS or Unix) command line. This is good enough to debug most problems. You can specify options in pairs like this:

anlgform.pl qv=1 LOGFILE=/some/log REQINCLUDE=pages
If you include qv=1 in the argument list as above, you will see what anlgform.pl is trying to send to analog. If you don't include qv=1, anlgform.pl will try and run analog.

If it still doesn't work, check the following points:

  1. Have you edited anlgform.pl and anlgform.html as instructed at the top of those files?
  2. Do other CGI programs work on your server? Is anlgform.pl in the right place to be recognised as a CGI program by the server?
  3. Look in the server's error log for clues.
  4. Are all relevant files (analog itself, logfiles, configuration files, auxiliary files such as domain files...) executable/readable by your web server?
  5. If some form options don't seem to take effect, then check whether they are being overridden by a command in a configuration file.
  6. If you get a long wait, then no data returned, the server is probably timing out the request before analog has finished. The remedy is to increase the timeout interval.
  7. As explained above, the form always sets DNS READ or DNS NONE, and WARNINGS FL, overriding your default configuration file.
  8. Again as explained above, uncompressing of compressed logfiles doesn't work unless you use the full pathname in the UNCOMPRESS command.

Security notes

As I said above, CGI programs can often contain security loopholes. Although I don't guarantee that the form interface is safe, I have done my best to make it so. Here I shall explain my design decisions. Comments on them are of course welcome: if they need to remain confidential, you can e-mail me privately at analog-author@lists.isite.net.

First, you should think about who can run the form interface. Unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, adding the form interface to your site implies making analog executable, and your logfiles analysable, by anyone on the internet. There are obvious concerns both about privacy and about the load on your system.

Certain commands are ignored by anlgform.pl and not passed to analog. The list of them can be found at the top of anlgform.pl. Here are the reasons for them. HEADERFILE and FOOTERFILE would place any file on your system within the output. The *LOGFORMAT commands would also allow any file to be read, because someone could designate each line to be a single filename and then just list the filenames. OUTFILE, CACHEOUTFILE and ERRFILE would allow people to write to your filespace; ERRFILE would also divert errors away from your error log. UNCOMPRESS would allow a user to execute any command. DNS is forbidden because setting it higher than READ would normally cause the process to time out.

None of the above should be deleted (unless you are really, really sure that it's completely impossible for anyone other than yourself to run anlgform.pl). There are two other commands which are forbidden by default but which you could consider removing from the forbidden list. SETTINGS is included because it will give away the locations of some files on your system. But it is useful for diagnostic purposes, and you could consider removing it temporarily if you have trouble setting up the form. The other command which is included is LANGFILE, although I consider it to be a lower risk. It is included because it is theoretically possible that another file could be exactly the right number of lines long to be accepted as a language file, and then parts of it would get into the output. But it would have to be exactly the right length first. If that's a risk you're prepared to take, you can remove LANGFILE from the list.

There are other commands which you might consider adding to the list. For example, it is theoretically possible (though rather unlikely), that another file on your system could conform sufficiently closely to one of the predefined log formats that analog could be persuaded to analyse it and so reveal some of its contents. If you're worried about this, or even if you want to force only one particular logfile to be analysed from the form, you can add the LOGFILE command to the list of forbidden commands. And you could add DOMAINSFILE for similar reasons.

You can of course add any command you like to the list. For example, a user can use any configuration file on your system unless you add all of CONFIGFILE, CM and CG. Or if you wanted to stop a user having control of which warnings were written to the error log, you could add WARNINGS.


For those who know about CGI security issues, here are some more technical comments on my design. anlgform.pl sets the $PATH environment variable to be empty. It opens analog as a pipe in order to pass arguments into analog's standard input. User-specified data is not used for the open() function, only passed down the pipe. anlgform.pl is run with the -T flag on Unix. (Does anyone know how to get this working under Windows?)

The arguments to LOGFILE and CACHEFILE commands are checked for containing only certain allowed characters (specifically, letters, digits, /\.:_ space, and - between two {letter, digit, underscore}'s). This is because they could match an UNCOMPRESS command and thus be passed to the shell when the uncompress command is popen()'ed.

Apart from that, command names are checked for containing only letters and the digits 1 and 2; and the arguments to commands are checked for not containing control characters (actually characters 0-32 and 127-159; in particular newline characters are prohibited). The length of the commands isn't checked by anlgform.pl, but buffer overflow shouldn't be an issue as configuration commands are checked for length by analog.

By the way, the reason that I advise that analog itself shouldn't be used as a CGI program is that some servers, notably Microsoft IIS, allow users to pass command line arguments into a CGI program. And even if the program doesn't return the proper CGI headers, the output can be sent back to the user. This means that all the above checking of arguments is then thwarted. Of course, on servers on which you can't pass command line arguments to a CGI program, there are not the same security concerns, but then analog isn't very useful as a CGI program because if you can't pass any arguments, you can only get the default output.


Technical details

You need to be running Perl 5.001 or later (unless you're on Windows and download the executable version of the form interface from the helper applications page). You can get the latest version of Perl free from www.perl.org (or from http://www.activestate.com/ActivePerl/ if you're on Windows). You also need the module CGI.pm, but this should have come with Perl anyway.

On Windows, you have to associate the .pl extension with the Perl executable so that Perl scripts are executed by Perl.

anlgform.pl will understand the GET or POST methods of form submission. The HTML spec says that GET should be used when, as in this case, running the program has no side effects. However, section 15.1.3 of the HTTP spec says that POST should be used if some of the options being passed might be confidential. Also, very long URLs, formed by specifying lots of options, can cause trouble to some older servers. So anlgform.html uses the POST method by default. However, the GET method will also work. For example, you could make a normal link to anlgform.pl with options specified after a question mark in the usual GET way.


Stephen Turner
Need help with analog? Subscribe to the analog-help mailing list

[ Top | Up | Prev | Next | Map | Index ]