PGI: Porthole Gateway Interface

PGI Spec: Basic Specification

Introduction

A PGI porthole -- generator or pipe -- operates within an operating system environment which allows it to communicate with other components of the PGI system. The definitions below are probably UNIX-specific. It is envisaged that these will be augmented by similar specifications in other environments.

A porthole may be in one of a number of modes. These are given below. In the following sections, the environment can be seen to operate differently depending on the current mode. Also described are the means of changing mode.

PGI-CGI mode
A porthole can be placed in CGI mode by being configured as such in the assembler configuration. This is a compatibility mode for non-PGI-aware scripts. For each request a porthole in CGI mode is called only once, multiple invocations being filled with the same output text. A pipe cannot be in this mode. An implicit pipe is invoked, by default, in this mode to strip outer html elements to allow embedding, where necessary. This implicit pipe can be disabled in the configuration.
PGI-single mode
A porthole in PGI-single mode is is invoked no more than once per page request and each request for its content is filled with the same output text, as in CGI mode. A process is created for each page request, and destroyed subsequently. The implicit pipe is not invoked. There can be no inclusion. A pipe cannot be in this mode.
PGI-nonpersist mode
A porthole in PGI-nonpersist mode is invoked once per fragment instance within a page request, potentially being invoked multiple times per request. However, a process is created for each fragment request, and destroyed subsequently. A pipe in this mode is invoked once per entity in need of transformation.
PGI-normal mode
A porthole in PGI-normal mode is invoked one or more times per page request with requests for potentially multiple fragment instances upon a page. The assembler will, in general, attempt to queue fragments into a single request. A pipe in this mode is invoked once per invocation.
PGI-wide mode
A porthole in PGI-wide mode is as-per one in PGI-normal mode, but the invocation may persist between multiple requests. A pipe in this mode may be invoked less than once per request.

Initially we specify generators, modifications for pipes are given later.

PGI-CGI Mode

In PGI-CGI mode the floowing sections, on standard input and standard output, do not apply. Instead the assembler operates as if a webserver invoking a CGI script. A full CGI environment is created, modified according to the manner in which the variables are modified in the following Environment section. The input from the request is placed on standard input, and the output on standard output is recorded, possibly filtered through the implicit pipe, and then used as output.

Standard Input and Standard Output

In modes other than PGI-CGI (the true modes), communication with a PGI script is by means of a line-oriented discourse carried out upon standard input and standard output. These lines occur in line blocks, as described in spec4. The exchange is half-duplex with respect to line blocks.

PGI mode

The first communication between the assembler and the PGI script is the establishing of a PGI mode. The PGI sends PGI-Mode: header line alone in a block, and awaits the assembler's response. This response will be of the form of a single line in a line-block. The assembler responds with PGI-Mode-Status: followed by will or wont. If wont is specified, the porthole may try again with a different mode.

An assembler which is incapable of wide mode because of the manner it is run (eg as a CGI script) must respond wont to a PGI-wide mode request. Even though an assembler may invoke even a PGI-wide porthole once per request, indistinguishable from PGI-normal mode, the wont signals to the script the inability of the assembler to support wide mode, possibly suggesting to it optimisations. The PGI should simply follow the failed PGI-wide request with a PGI-normal one.

A PGI must have established a satisfactory mode before it continues.

Next Request

A request is composed of an environment, which is a set of key value pairs. In general, this will refer to a non-empty set of fragments to render for a particular page request. The next task is requested by the PGI sending a Request: next line in a block of its own to the assembler. The assembler responds with a block containing the environment of the request (detailed later).

CGI Standard Input

In the CGI specification requests, such as POST or PUT, which carry information back to the server, have their input presented on standard input. A PGI requests this data, if needed, by sending a Request: input line alone in a block. The response is a Request-Status: line which will either contain okay, or an error message. If okay, this will be followed by a Content-Length: header in the same block, with a byte count, Content-Type:, and possibly further headers. Once the block is terminated, the number of bytes in Content-Length: is transmitted to the script.

First a series of headers are sent, including a Content-Length: and a Content-Type: header. After a terminating blank header line, exactly content-length bytes are sent of type content-type. The input is then done for this request, and remains open for future requests. If content-length is missing, no data follows and the blank header line marks the request-end. The headers are described in more detail in the environment section. Note that this spooling occurs only once per page request, not once per fragment.

For pipes, standard input is in the same format, except that the body is the output of the previous pipe or generator, including any header lines they generated.

PGI Output

When a PGI script is ready to produce output it sends a header line, Request: output. Also in that line block is a Content-Length: and Content-Type: header, and possibly other headers as defined in HTTP, or the CGI spec. In particular, yhe content-type, status and location headers, as specified by the CGI specification, may be added here. In PGI-normal and PGI-wide mode (optionally in others) a header must be specified PGI-Id which contains the ID of the fragment output being described, according to the pgi-id parameter in the PGI_REQUEST environment value. The headers burst and session are specified in spec2. The headers preferon-set, preferon-add, preferon-remove and sales-line are specified in spec3. There then follows the body of the output, exactly Content-Length bytes long.

Response Header Interpretation

The outermost porthole specifying a status is used as the final page status, or 200 if none is specified. Other statuses are assembled into successes and failures, failures are replaced by standard markup to indicate compnent failure to the user.

Similarly, for the location header, the outermost, if any, takes precedence. Inner locaiton headers are ignored. Content-type headers are also valid only for the outermost specifying porthole. Inner content-types are auto-converted, if possible, by the assembler (for example text/plain -> text/html), otherwise, content is marked as erroneous as per inner status failures above. This is modified by bursting, described in a subsequent document.

In PGI-single and PGI-nonpersist modes, a content-length header is optional. If present it must be used. If absent, a closing of the stdout fd acts in its place. In PGI-normal and PGI-wide modes, the content-length header is mandatory. After that many bytes, the stream reverts to the communcation stream for the next fragment.

A pipe has the same output format as a generator.

Standard Error

Standard error is connected to the assembler's error reporting mechanisms. The assembler must add the invocation context, and time, and then record the message in some manner. Messages are CRLF delimited.

Environment

The environemnt refers to the header lines supplied on stdin with each page request. The names are not case sensitive, and should be emitted in camel-caps for header lines, similarly there is no distinction between minus and underscore.

Server-Software:
(mandatory) Name of the assembler for this request, in format name/version (eg pgia/0.0.0a).
Server-Name:, Server-Protocol:, Server-Port:, Request-Method:, Query-String:, Remote-Host:, Remote-Addr:, Auth-Type:, Remote-User:, Remote-Ident:, Content-Type:, Content-Length:, Http-...:
As specified in the CGI specification. For the HTTP_ variables, these encode header lines received by the assembler, for example Http-Cookie:.
Gateway-Interface:
Name of the CGI version to which this assembler adheres in PGI-CGI mode. (always CGI/1.0).
PGI-Revision:
(mandatory) Name of the PGI version to which the assembler adheres. (currently PGI/0.0).
Path-Info:
This variable contains any extra path arguments in a URL beyond those required to specify the top-level fragment. So a url of the form http://example.com/cgi-bin/assembler/path/to/top-level/extra/parts, if path/to/top-level specifies a top level fragment to the assembler will be /extra/parts.
Path-Translated:
This variable is the document root prepended to the PATH_INFO variable shown above. It is mandatory if Path-Info: is specified. Though is is probably of little use, some scripts require it.
Script-Filename:
This variable specifies the path to the invoked porthole script, on the filesystem.
Script-Name:
This should be a URI, minus the host and protocol parts, which would allow regeneration of the current URL. Note that this is a greatly different matter to that of the similarly named Script-Filename:.
Request-URI:
This is Script-Name: followed by Path-Info:, followed by Query-String:.
PGI-Request:
This contains the request for the porthole to be rendered. In PGI-single and PGI-nonpersist modes only one request is presented. In PGI-normal and PGI-wide modes, multiple requests are (potentially) presented, separated by semicolons. Each semicolon is optionally padded by whitespace. A request takes the form of key-value pairs, separated by commas, each optionally padded by whitespace. The key-value pairs are separated into key and value by an equals, optionally padded by whitespace. The value is URL encoded, and is UTF-8 if it is text.

pgi-path contains the full path derived from pgi-name attributes (see later section). pgi-key contains the name of the key (porthole) to be rendered. Other attributes are the arguments to the porthole (which must be ignored in PGI-single mode), minus those with perpended paths. pgi-id contains the identifier to use when returning a fragment on stdout for PGI-normal and PGI-wide mode.

PGI-Fronted:
See spec2.
PGI-Get-Sales:, PGI-Preferons:, PGI-Product:
See spec3.

Requests and Responses

A PGI can further request various pieces of information through sending a line-block on stdout, and waiting for a response block on stdin. The principal use for these requests are in the telegraphics system specified elsewhere.

Inclusion text

The byte sequence corresponding to the inclusion point varies in form between media types, though it contains the same semantic components. The syntax here is described for HTML (text/html), other syntaxes may be defined in time. Inclusion is disabled in PGI-CGI and PGI-single modes.

In HTML, the tag has name porthole and has a number of attributes. It can be closed explicitly, html style, or not at all. The close has no effect, inclusion is at the position of the open tag. Every inclusion must have a pgi-name attribute. This can be concatenated (using periods) to produce the path to the inclusion for a page. An inclusion must also include a pgi-key. The value of this attribute is a key which determines which porthole is to be invoked. Other attributes beginning pgi- are reserved.

Attributes not beginning pgi- are arguments. These are passed in the request to the subsequent PGI to render the inclusion. An attribute may be preceded by a path-part delimited by period. These attributes are not arguments to the current inclusion but if a subpath matching the subpath of the argument begins at the present inclusion, then the attribute becomes and argument to that porthole. For example, the attribute a.b.c.d for porthole with path m.n.p becomes the argument to m.n.p.a.b.c (if it exists) named d.

Line Protocols

Information on line protocols, escaping, and so on, can be found in spec4.


main spec page | Basic Spec (spec1) | Fronting, Bursting and Telegraphics (spec 2) | Caching (spec 3) | Naming (spec 4)

This project was executed as an infrastructure component of an EPSRC CTA award at CARET.