PGI: Porthole Gateway Interface

PGI Architecture Organisation

Overall Organisation

For overview on the nature of a porthole, portal or portlet see the overview page.

The overall architecture of a PGI system is governed a relatively thin process or web-server module known as the assembler. This process manages the task of assembling pages, cacheing, and acheiving communication between processes. This assembler calls a number of porthole chains, series of scripts to produce content fragments for a page. Typically, a porthole chain will contain only a generator, a script which generates content from sources external to the PGI mechanism. In some cases, this generator is followed by one or more pipes, which transform the output into another form.

A CGI script can configured to be used as a generator. Unaware of PGI additions, it can generate a page which can be either emitted directly, or subsumed into an assembled page. An implicit pipe is introduced by the assemble for non PGI-aware scripts, which strips out unneeded outer html elements from a CGI output to allow it to be embeded in a composite page.

summary diagram of the above

The Assembler

The assembler can be a persistent process however, at least in initial implementations, it will instead likely be a CGI 'script', created and executed per-request. The assembler maintains a context within a session in an implementation-defined manner, probably using cookies. The selection mechanism allows cacheing of fragments to avoid excessive invocation of PGI scripts, but allowing dynamic content. PGIs can clear and reset the session through communication with the assembler. Though complete pages may be cached, the principal cache of an assembler is that of individual fragments returned by PGIs.

The assembler also maintains a small area of storage, known as the telegraphics store for communication of short messages and configurations between PGIs. Much of the telegraphics system is request-scoped, so does not need to be stored in secondary storage, however some has a larger scope and will be externally stored.

For an uncached fragment, the assembler uses a lookup table to determine the appropriate porthole chain to use. For top-level cache misses this is determined by the URL, for nested misses by the key provided by the nesting element. A PGI returns a particular instruction invoking a nested PGI at a particular point in its return document, including this key information (and optional arguments).

Generators

A generator PGI is the principal source of content in a PGI system and is broadly equivalent to a CGI script in a CGI system. A generator can proceed as a CGI, reading standard input and the environment, and writitng results to standard output and errors to standard error. An additional two file-descriptors are passed to the exec-ed process, one for requests of the assembler, and one for the assembler's responses. These are principally used in the telegraphics system, but are also used to communicate other meta-information with the assembler.

A PGI can be invoked once per fragment, but this is costly. Often a single PGI will handle a large proportion of fragments on a page, and a single invocation reduces startup cost considerably. Analagous with the advantages of mod_perl over CGI perl scripts, the performance issue here is even more accute. PGIs on a mature site should use this multiple-invocation mechanism. In principal, a persitent assembler could use the multiple invocation system across multiple requests, though a script can elect to disable cross-request persistence.

Pipes

Pipes are intended to be of use only in particular restricted circumstances: when a generator is not easily modifiable into a PGI-compatible script. A simple implicit pipe is created for CGIs which do not identify themselves as PGI-aware, which resides within the assembler's address space and is not requested explicitly in configurations, but is otherwise a standard pipe. Pipes take, on standard input, the output of a script and produce, on standard output, the transformed output. They may use standard error for reporting. The pgi request and pgi response fd's are also available for interaction with the assembler, ofr example for the discussion of telegraphics.

Further Details

Note that the specification is not yet defined in a language typical of the standards community (owing prinicipally to a lack of time). All four pages are required for a PGI system, the layers are simply a means of describing the system coherently, they are not a description of optional layers of service.


This project was executed as an infrastructure component of an EPSRC CTA award at CARET.