Man page for Unix version of ick-proxy

NAME

ick-proxy - custom web proxy for rewriting URLs

SYNOPSIS

ick-proxy [ options ] [ subcommand | --multiuser ]
ick-proxy [ options ] -t test-url

DESCRIPTION

ick-proxy is a specialist web proxy whose job is to rewrite URLs and return 302 (Moved Temporarily) redirections for them.

You might use ick-proxy if there was a class of URL which you frequently needed to click on links to, but which you preferred to have modified before you visited them.

For example, some web sites provide their content in multiple formats, distinguished by some aspect of the URL. (E.g. BBC News provides low-graphics and high-graphics versions of all its news articles.) You might have a preferred style in which to read such pages, and wish to arrange that any page you read is shown in your style, even when following a link to that page from someone who had cited the other kind of URL. ick-proxy can solve this for you by automatically rewriting all such URLs into the form you wanted, no matter whether the URL was entered manually into the address bar or followed from some other unrelated web page.

(Of course, some web sites of this type provide their own internal cookie mechanism for accommodating your viewing preferences, in which case it's almost certainly simpler to use that. But some don't, and in that case ick-proxy can help.)

To configure ick-proxy, you provide a script written in the Ick language (described below), which implements a function called rewrite taking one string argument and returning a string value. This function should transform any URL you want rewriting into the URL it should be rewritten as. URLs that you do not want rewriting should be returned unchanged.

(Your script must be idempotent: rewriting a URL twice should give the same result as doing so once. In other words, the output from your rewrite function should always be unchanged if fed back to the function as input.)

ick-proxy will take this script, and optionally a .pac file describing your conventional web proxy requirements, and will output a replacement .pac file which you should configure your web browser to use. This .pac file will contain a Javascript translation of your rewrite script, so that your web browser can identify URLs which require rewriting and pass them to ick-proxy, which will return 302 Moved Temporarily responses containing the rewritten URLs. Those URLs in turn, and any URLs which did not need rewriting in the first place, will be retrieved in the manner specified by your input .pac file, or by direct access if you did not supply an input .pac.

(ick-proxy has no capability to actually fetch web pages; the only thing it knows how to do is to return 302s. So if you ask it to proxy a URL which does not require rewriting, it will have no option but to return a 501 internal error code. Hence, the configuration it supplies to your browser must be careful to send no URL to ick-proxy which does not need redirection.)

INVOCATION

ick-proxy can be run in various different modes.

In its default mode, if invoked without arguments and without --multiuser, it will run as an X client. It reads the calling user's configuration files, writes out its custom .pac file, attaches to your X server, and forks off into the background. It will last as long as your X session does (unless it crashes or is killed), and when your X session terminates it will detect this and terminate as well.

$ ick-proxy

Alternatively, you can run it as a wrapper around a subcommand, by providing that command as arguments on the command line. In this mode ick-proxy will continue running until the subcommand terminates, and will then shut down. For example, you might run it as a wrapper around your web browser itself:

$ ick-proxy firefox

In both of the above modes ick-proxy will write its output .pac file to the file system, by default as ~/.ick-proxy/output.pac (though this location is configurable; see the next section). In order to actually enable URL rewriting in your browser, you would then configure the browser to read its proxy configuration from a URL along the lines of file:///home/username/.ick-proxy/output.pac. If you need ick-proxy to re-read its configuration files during its run, you can send it the SIGHUP signal.

ick-proxy also supports a third rather different operating mode: it can run as a system-wide daemon providing its service to all users of a system. You invoke this mode using the --multiuser option, typically as root:

$ ick-proxy --multiuser

In this mode, ick-proxy will no longer write its output .pac files into the file system. Instead it will allocate a central port to listen on (880 by default). On that port it will perform no proxying functions; all it will do is to serve its generated .pac files over HTTP. So a user wanting to use ick-proxy would then configure their web browser to retrieve its proxy configuration from a URL of the form http://localhost:880/pac/username.

When a multi-user ick-proxy is asked for a .pac file for a particular user, it will allocate a secondary port on which to perform proxying for that user; it will then read that user's configuration files out of their home directory, and return an output .pac which cites the secondary port it has allocated. Subsequent .pac requests for the same user will cause ick-proxy to re-read the user's configuration, but to re-use the same port number. .pac requests for other users will result in separate port numbers being allocated.

In practice, the author has found that the most convenient mode of use seems to be the default one: start ick-proxy without arguments from within your .xsession script, and then it will write a static .pac file into your home directory and run for the lifetime of your X session. However, it is rumoured that browsers are required to be able to retrieve .pac files over HTTP but not required to be able to read them from static files. This suggests that the multi-user mode may technically be the most standards-compliant and hence in principle the most likely to work on all browsers. Unfortunately, the author has found that in practice at least one browser has problems retrieving .pacs over HTTP but copes fine with static files, so draw your own conclusions...

OPTIONS AND ARGUMENTS

To select different running modes:

subcommand
Run in single-user mode as a wrapper around a specific process, and terminate when that process does.
--multiuser
Run in multi-user mode.
-t url
Run in rewriting-test mode: feed the provided URL through the rewrite mechanism, report the result on standard output, and terminate. Does not set up a web proxy at all.

If none of the above options is given, the default mode is to attach to your X session and run in single-user mode.

To configure single-user mode (whether X-attached or wrapping a subprogram):

-s script-file
Specify the location of the Ick source file containing the rewrite function. Default is ‘~/.ick-proxy/rewrite.ick’.
-i input-pac
Specify the location of the input .pac file specifying the user's conventional web proxy preferences. Default is ‘~/.ick-proxy/input.pac’. It is not an error for this file not to exist: if ick-proxy cannot read it, it will assume you did not wish to use a conventional web proxy at all.
-o output-pac
Specify the location where ick-proxy will write the output .pac file which configures the browser to use ick-proxy for URLs requiring rewriting. Default is ‘~/.ick-proxy/output.pac’.

To configure the X-client mode:

-display display
Specify an X display to connect to other than $DISPLAY.

To configure the multi-user mode:

-p port
Specify the port on which ick-proxy will listen in multi-user mode. Default is 880.
-u username
Cause ick-proxy in multi-user mode to drop root privileges by setting its user ID to that of username. This will be done after it binds to its primary port (since that port number can be less than 1024).

LANGUAGE

This section describes the Ick language, in which rewrite scripts are written.

In brief: the Ick language is roughly C-like, but simplified, and in particular it has a very simple type system which supports no compound types at all but does support arbitrarily sized strings as a basic type.

Syntax

The Ick language has basically C-like syntax.

At the top level, a source file consists of function definitions, variable declarations, and nothing else. A function definition is of the form

return-type function-name ( [ type param [ , type param ... ] ] )
{
    variable-declarations
    statements
}

and a variable declaration is of the form

type varname [ = expression ] [ , varname [ = expression ] ... ];

The only valid types are string, int and bool. The pseudo-type void may also be used as the return type of a function (indicating that the function returns no value at all), but not for any variable or function parameter.

(To declare a function with no arguments, the word void may be used between the parentheses in place of the parameter list, as an alternative syntax to simply leaving the parentheses empty.)

ick-proxy requires that scripts written in this language provide a function called ‘rewrite’, taking one string argument and returning a string. So the simplest possible rewrite script, which does nothing at all, might look like this:

string rewrite(string url)
{
    return url;
}

Comments in ick-proxy are like C and C++: either contained between ‘/*’ and ‘*/’ (without nesting), or between ‘//’ and the next newline.

Statements

Valid statements are listed below.

Expression statements

The statement

    expression;

has the effect of evaluating the expression, including any side effects, and ignoring its result (if any). This type of statement can be used to perform assignments, increments and decrements, function calls, or a combination of those.

Return statements

The statement

    return [ expression ];

immediately terminates the current instance of the function in which it is invoked. If an expression is supplied, then its value is the return value of the function (and the type of the expression must be the same as the function's return type, which must not be void). If no expression is supplied, then no value is returned (and the function's return type must be void).

Break and continue statements

The statements

    break;
    continue;

must be contained within at least one loop construction (if, while, for or do). Both of them immediately terminate the current iteration of the innermost loop containing them; break also terminates the entire loop, whereas continue merely causes the next iteration to begin.

If statements

The statement

    if (expression) then-statement [ else else-statement ]

evaluates expression (which must have boolean type). If the result is true, it runs then-statement; otherwise it runs else-statement if provided.

While statements

The statement

    while (expression) statement

evaluates expression (which must have boolean type). If the result is false, it does nothing further. If the result is true, it runs statement, and then starts all over again (evaluating expression again, and potentially continuing to loop).

Do statements

The statement

    do statement while (expression);

first runs statement. Then it evaluates expression (which must have boolean type). If the result is false, it does nothing further; otherwise, it starts all over again (running statement again, and potentially continuing to loop).

For statements

The statement

    for ( [ expr1 ] ; [ expr2 ] ; [ expr3 ] ) statement

starts by evaluating expr1 and ignoring any result.

Next it evaluates expr2, which must have boolean type. If the result is false, it does nothing further. If the result is true, it runs statement, then evaluates expr3 and ignores any result, and then goes back to the evaluation of expr2, potentially continuing to loop.

A continue statement within statement does not skip the evaluation of expr3.

Statement blocks

Anywhere a single statement is syntactically valid, a braced block may appear instead:

    {
        variable-declarations
        statements
    }

Variables declared within this block are only valid within the block. If they include initialisers, they are initialised every time execution enters the block.

Expressions

Expressions use ordinary infix syntax, with a restricted subset of the usual C operators. The accepted operators are listed below. Each subheading indicates a group of operators with the same precedence, and the operators are listed from lowest to highest precedence.

The comma operator

The expression

    leftexpr , rightexpr

has the value of rightexpr, but before it evaluates rightexpr it first evaluates leftexpr and ignores the result.

leftexpr and rightexpr need not have the same type, and either or both may even be void. The type of the entire comma expression is the same as the type of rightexpr.

Assignment operators

The expression

    variable = expression

has the value of expression, and the side effect of copying that value into variable. variable and expression must have the same type, and of course the type of the expression as a whole is the same type again.

The compound assignment expressions

    variable += expression
    variable -= expression
    variable *= expression
    variable /= expression
    variable &&= expression
    variable ||= expression

are equivalent, respectively, to

    variable = variable + expression
    variable = variable - expression
    variable = variable * expression
    variable = variable / expression
    variable = variable && expression
    variable = variable || expression

The conditional operator

The expression

    condexpr ? trueexpr : falseexpr

has the value of trueexpr if condexpr evaluates to true, or of falseexpr if condexpr evaluates to false.

condexpr must have boolean type. Either or both of trueexpr and falseexpr may have void type, in which case the expression as a whole has void type as well; otherwise trueexpr and falseexpr must have the same type, which is also the type of the whole expression.

Logical operators

The expressions

    leftexpr && rightexpr
    leftexpr || rightexpr

have, respectively, the value of the logical AND and logical OR of their operands. Both operands must have boolean type, and the expressions as a whole have boolean type too.

These operators are guaranteed to short-circuit: that is, if evaluating leftexpr leaves the value of the entire expression in no doubt (i.e. leftexpr is false in an && expression, or true in an || expression) then rightexpr is not evaluated at all (so its side effects, if any, will not occur).

The && and || operators have the same precedence, and associate with themselves, but may not associate with one another. That is, you can legally write either of

    expr1 && expr2 && expr3
    expr1 || expr2 || expr3

but it is an error to write either of

    expr1 && expr2 || expr3
    expr1 || expr2 && expr3

and you must instead use parentheses to disambiguate the relative priority of the operators.

Comparison operators

The expressions

    leftexpr < rightexpr
    leftexpr <= rightexpr
    leftexpr > rightexpr
    leftexpr >= rightexpr
    leftexpr == rightexpr
    leftexpr != rightexpr

return true if and only if leftexpr compares, respectively, less than, less than or equal to, greater than, greater than or equal to, equal to, or unequal to rightexpr.

leftexpr and rightexpr must both have the same type, which must be either string or integer. The expressions as a whole have boolean type.

Additive operators

The expressions

    leftexpr + rightexpr
    leftexpr - rightexpr

return, respectively, the sum and difference of leftexpr and rightexpr.

leftexpr and rightexpr must have the same type, and the expressions as a whole have the same type. That type must be integer for the - operator; for the + operator it may be either integer or string. In the latter case, the operation performed is string concatenation.

Multiplicative operators

The expressions

    leftexpr * rightexpr
    leftexpr / rightexpr

return, respectively, the product and quotient of leftexpr and rightexpr.

leftexpr and rightexpr must both have integer type, and the expressions as a whole have the same type.

Unary operators

The expressions

    + expression
    - expression

have, respectively, the same value as expression and the arithmetic negative of the value of expression. expression must have integer type, and the expression as a whole has integer type too.

The expression

    ! expression

has the value of the boolean negation of the value of expression. expression must have boolean type, and the expression as a whole has boolean type too.

The expressions

    ++ variable
    -- variable

have, respectively, the effect of adding 1 to variable and subtracting 1 from it. Their value is the value of variable after it is modified. variable must have integer type, and the expression as a whole has integer type too.

The expressions

    variable ++
    variable --

have, respectively, the effect of adding 1 to variable and subtracting 1 from it. Their value is the value of variable before it is modified. variable must have integer type, and the expression as a whole has integer type too.

Core expression components

The expression

    ( expression )

has the same type and value as expression.

The expression

    function-name ( [ argument [ , argument ... ] ] )

has the effect of calling the named function, with its parameters set to the values of the argument expressions in order. The types of the argument expressions must match the types of the parameters of the function; the type of the expression as a whole is the return type of the function, and its value (if any) is equal to the value returned by any return statement within the function body.

Functions are overloaded by their number and type of parameters. That is, you can independently define two functions with the same name, as long as their lists of parameter types are distinct.

The expression

    variable-name

has the type and value of the contents of the named variable.

The expressions

    true
    false

have boolean type, and their values are respectively boolean truth and boolean falsehood.

Finally, expressions can also be integer literals and string literals.

An integer literal consists of either a sequence of decimal digits starting with a non-zero one, or a sequence of octal digits starting with a zero, or a sequence of hexadecimal digits preceded by ‘0x’.

A string literal consists of a sequence of characters enclosed in double quotes. Within those quotes, the backslash character is special, and must introduce one of the following sequences:

\a
The alert or bell character (ASCII value 7).
\b
The backspace character (ASCII value 8).
\f
The form feed character (ASCII value 12).
\n
The new line or line feed character (ASCII value 10).
\r
The carriage return character (ASCII value 13).
\t
The horizontal tab character (ASCII value 9).
\v
The vertical tab character (ASCII value 11).
\\
A literal backslash.
\"
A literal double quote.
\ followed by a new line in the source
Causes the new line in the source to be ignored, so you can break a single string literal across multiple source lines.
\x followed by up to two hex digits
Encodes the character with the code given by the hex digits.
\ followed by up to three octal digits
Encodes the character with the code given by the octal digits.

Multiple string literals may also be specified in immediate succession, and will be automatically concatenated.

Standard library

The Ick execution environment pre-defines a number of standard functions you can use for string processing. Those functions are listed below.

int len(string str)

Returns the length of str.

string substr(string str, int start, int end)

Returns the substring of str starting at character position start (counting the first character in the string as zero), and continuing until position end. The character at position start is included, but the one at position end is not.

string substr(string str, int start)

Returns the substring of str starting at character position start (counting the first character in the string as zero), and continuing until the end of the string.

int atoi(string str)

Interprets str as a sequence of decimal digits (with optional minus sign) encoding an integer, and returns that integer.

string itoa(int i)

Encodes i as a string containing a sequence of decimal digits (with optional minus sign) encoding an integer, and returns that string.

int ord(string str)

Returns the character code of the first character in str, or zero if str is empty.

string chr(int c)

Returns a string containing a single character with code c, or the empty string if c is zero.

int index(string haystack, string needle)

Searches for the string needle occurring anywhere in the string haystack. Returns the first position at which it occurs, or -1 if it does not occur at all.

int index(string haystack, string needle, int start)

Same as above, but only counts matches at or after the position start.

int rindex(string haystack, string needle)
int rindex(string haystack, string needle, int start)

Same as index, but returns the last position at which needle occurs rather than the first. (Still returns -1 if it does not occur at all.)

int min(int a, int b)

Returns the smaller of the two integers a and b.

int max(int a, int b)

Returns the larger of the two integers a and b.

Example

Here is a simple example of an ick-proxy configuration which rewrites BBC News article URLs to reference the low-graphics version always.

bool strprefix(string str, string pfx)
{
    return (len(str) >= len(pfx) &&
	    substr(str, 0, len(pfx)) == pfx);
}
string rewrite(string url)
{
    if (strprefix(url, "http://news.bbc.co.uk/1/hi/")) {
        url = substr(url, 0, 24) + "low" + substr(url, 26);
    }
    return url;
}

BUGS

None currently known, other than the fact that the entire concept is utterly disgusting (hence the program's name).

LICENCE

ick-proxy is free software, distributed under the MIT licence. Type ick-proxy --licence to see the full licence text.


[$Id: manpage0.but 7894 2008-02-25 19:42:51Z simon $]
[$Id: icklang.but 7922 2008-03-12 21:24:08Z simon $]
[$Id: manpage1.but 7894 2008-02-25 19:42:51Z simon $]