For PGI, caching is far from a small boost in performance for edge-cases: it is caching which makes the PGI approach practical. Without caching, PGI would be a ridiculous approach, and no assembler should be considered PGI-compliant without caching. Similarly a PGI porthole which does not consider caching should not expect any performance quality at all. Though cacheing can be complex, most situations are asily solved with simple solutions, and even relatively "dumb" schemes can be effective.
The caching described here is performed by the assembler. Caching allows the previous output of portholes to be used rather, than invoking the porthole itself. A shopping mechanism is used to allow dynamic content to be cached. In this cache, no substitution is pre-performed. PGI comes without opportunities of post-inclusion reorganisation by portholes, in order that this caching can be effective (one can use the argument system instead). By default, pages are rebuilt for each request using cached PGI porthole outputs. A full-page cache may also be implemented using the mechanisms here, though this is far from a critical feature.
Caching is not performed for requests which are marked as not-for-caching. These can be specific URLs in the assembler's configuration file. Only GET requests are ever cached. Those requests which are not cached are created using the process described in earlier sections of this specification.
Shopping is the process of matching a for-caching request URL to a particular fragment output, given a particular key and arguments, for a certain session. Two sets of atomic entities are maintained for this process. The first are preferons. Preferons are associated with a PGI session. When the PGI session expires, the preferons go with it. Preferons can also be micromanaged during a session. Preferons are stored in the assembler so, with the proviso that a user can create a fresh session at any time, preferons can be rude (refusing access to things, and so on). The second atomic entities are products. A product is a particular rendering of a fragment, which may or may not be in the cache. It could be a permission denied fragment, a rendering with a particular skin, with particular facilities for a particular user class, and so on.
A sales line is a mapping from combinations of preferons to products. Each cachable porthole has a sales line describing its cachability. The sales line does not include the contents of each product, merely names it. An incoming request has its session examined to determine the perferons for that session. Then, for each fragment, the preferons are matched against a porthole's sales line to determine the appropriate product. If that product is cached, it is served from the cache. If it is not in the cache, the porthole is invoked, and the cache is subsequently updated with the retrieved product.
Products can expire. They expire when one of a number of implementation-defined criteria are matched. Each assembler must implement not-used-for (unused for a particular period of time, of use in keeping the cache sizes down), last-checked (maximum age for cache contents, for keeping contents up-to-date) and check-file (for checking against a file's last update), and may implement others.
Preferons expire when a session expires. They can also be modified by any porthole using the preferon-set and preferon-add preferon-del header lines. No preferon line is equivalent to an empty preferon line. A preferon is a key (alphanumeric and minus with the usual case insensitivity, including minus/underscore). Preferon lists are comma delimited. The preferon-set line replaces current preferons with the specified set, preferon-add unifies current preferons with the current set, and preferon-del removes them from the set.
A product is simply a key used in a sales line (alnum and -_ case insensitive), it is a key-scoped name. For match purposes, suffixed is certain data which is discussed in a later section. Therefore this key is known as the product prefix, and the appended data as the suffix.
A sales line is a complex description of the properties of a request which cause it to be mapped to a particular product. Sales lines are much simpler to write in practice than to describe fully. If an assembler has no sales line for a particular product, then caching cannot proceed. A request is issued with a PGI_GET_SALES set to 1. A PGI porthole should respond with a Sales-Line header (along with its output). A PGI may also respond with a Sales-Line at any time, which will have authority immediately following the current page request. This facility is designed to be used sparingly, for example on PGI porthole code-updates, or else dumbly, always returning the same line, not for dynamic changes. Note that a sales line should never be used by an assembler on return to cache output.
The sales-line consists of a mapping from predicates to products. This mapping is implemented as a semicolon-delimited list. Each entry takes the form of a predicate, followed by an equals, followed by a product prefix. This is optionally followed by a colon and a lifetime list. A lifetime list is a comma separated list of key-value pairs. It is described in detail in a following section. The are Tokens can by space-padded.
The first matching predicate is the associated product. A predicate is an assembly of request atoms. These atoms can be joined with comma denoting and, pipe denoting or and ! denoting not. There are also ^ and % unary operators, described later. Parentheses can be used for grouping. Or binds less closely than and.
A lifetime list handles product expiry. A key specifies a quantity to use in expiry, and the value specifies the value of that quantity. Currently three are defined. Two are not-used-for and last-checked. These both take a time-period as an argument. The time period can be a number (seconds) or a string of space-separated integer-unit pairs, valid units being d,h,m,s signifying days,hours,minutes,seconds, these numbers being summed. not-used-for denotes expiry due to lack of use, last-checked denotes expiry due to staleness of copy.
The third is check-file. This checks that the modification time of a particular file is less than that of the cached copy. It takes a file path as an agrument. Multiple check-files can be specified if any has been updated, the product is marked as expired.
Values with dangerous punctuation should be be URL escaped to prevent ambiguity (for example, with use of comma).
Intuitively, the percent operator means one distinct product for each of the enclosed, and caret has the additional suffix if any. A percent/caret as a unary operator has no truth-theoretic effect, but suffixes to the product name a descriminator based upon the value of that expression. If the expression is multivalued, this capturing affects all the values given inner assignments. Caret is syntactic sugar for a common more complex operation: ^P = (%P|!P)
For computation purposes, a predicate is conceptually reduced to a more stuctured form prior to execution. It is in this simpler form that the semantics of a predicate are defined. Note that the process is designed to preserve the semantics of the expression, whilst expressing its meaning in a clearer form. The process is as follows.
At all stages double negations should be removed. First, the caret operator is expanded into its equivalent form. Then de Morgan's laws are used to reduce the expression to NNF. In other words, the transformations !(P|Q) -to-> (!P,!Q) and !(P,Q) --to-> (!P|!Q) are applied. Percent is commutitive with respect to not for these purposes (!%P --to-> %!P). Then, all complex empressions contained within a % are replaced by a % prepended to each literal. (for example %(P,Q|!R) --to-> %P,%Q|%!R). Any negated atoms which are operated upon by % then have their % removed (for example %P,%Q|%!R --to-> %P,%Q|!R).
Finally the expression is reordered around | and ,, for consistency of equivalent expressions. The ordering is as follows. Lesser expressions always live on the left. Atoms are ordered in ASCII ordering of their written form, with all optional whitespaces removed. All atoms are lesser than all subexpressions. A negation (necessarily of an atom after the transformation) is lesser than all other subexpressions, and these are ordered by the order of their atoms. A percent is greater than negation, but lesser than all others. All and-outer expressions are lesser than all or-outer expressions. Each of these is internally ordered by the order of their leftmost element, and in the case of a tie, its next most right subexpression. Expressions with the same operator are considered at the same "level", eg a|(b|c)|d is to be considered as a|b|c|d.
For example, consider the predicate ^!(z|b,!(c|d)). First we replace the caret, so that we have (%!(z|b,!(c|d)))|(z|b,!(c|d)). Now we convert to NNF to give (%(!z,(!b|c|d)))|(z|b,!c,!d). Then %'s are pushed inward and removed from negations to give (!z,(!b|%c|%d))|z|b,!c,!d. Finally, the ordering process is applied to give z|b,!c,!d|!z,(!b|%c,%d).
The purpose of the ordering in reduction is principally to define precisely the format of the suffix appending performed upon a product prefix by the capturing process. A suffix is appended with a colon and then a series of captured results separated by semicolon. The order is the same as the order of captures in the reduced expression. Note that ordering is defined on a pattern (including glob characters etc, see later). During the matching process, the leftmost alternative is tried first.
A number of atoms take arguments, in square brackets separated by commas. Backslash, asterisk and semicolon must be escaped. Asterisks denote the 'glob' style greedy match. A pling separates a match from things that it must not match, any number can be specified, each separated by a pling. So a*!abba*!abacab represents any string beginning a, excluding those beginning abba and athe string abacab.
For example, %qs[skin-*,*] represents capturing any query string whose name begins skin-, which has any value. This should be seen as an anding of all the values which would match for a particular request. So with the query string ?skin-a=banana&skin-b=apple it would be equivalent to %qs[skin-a,banana],%qs[skin-b,apple] and would result (for product prefix foo) in the product foo:qs[skin-a,banana];qs[skin-b,apple].
For the purposes of glob-subsituted patterns, they are each included in ASCII order with optional whitespace removed, in the position within the ordering of their glob.
Here are the various forms of currently defined atoms. Remember that a porthole need not use different sales lines for varying contents of nested portholes (as in the case of a relatively static porthole conatining a highly dynamic one), as the cache stores unsubstituted products.
The specification of sales lines is complex, but their use is, in practice quite simple. For each conceptual layout of a page, write a predicate, the overriding ones coming first. If there should be a distinct product for each value of an argument, use ^.
This is a sales line for a porthole which requires a preferon called permission, and is different depending on the value of every query parameter except useless, and which doesn't use (pgi-)arguments or path-info. There are also preferons beginning skin- which alter the layout of the porthole.
!pr[permission] = denied : last-checked=2d ; ^qv[*!useless,*],^pr[skin-*] = ok : last-checked=1h
This can be read, from left to right as: If no permission preferon, use product denied (refresh every two days). Otherwise, use product ok, parameterised on any query aruments not called useless, and also on any skin preferons (refresh every hour).
Without preferon permission the product denied is served. With permission and skin-banana and query parameters q=x&r=y&useless=foo we would generate product name ok:pr[skin-banana];qv[q,x];qv[r,y].
Scripts can take advantage, if they wish, of the infered product and the preferon list from the assembler. This can save time authenticating, managing sessions, and so on, and so duplicating work performed by the assembler. These are provided in the PGI environment to a script in PGI-Preferons: and PGI-Product:. The prefereons value is comma delimited.
main spec page | Basic Spec (spec1) | Fronting, Bursting and Telegraphics (spec 2) | Caching (spec 3) | Naming (spec 4)
This project was executed as an infrastructure component of an
EPSRC CTA award at CARET.