+
+@ \relax
+
+%%%--------------------------------------------------------------------------
+@* QP-Tries.
+
+@*1 Theory lesson.
+
+QP-tries are a data structure for keeping track of a dynamic set of
+\emph{strings}. It'll be best if we don't think of these strings as being
+made up of bytes or characters: those units are too large to be convenient.
+Instead, we'll work with a smaller set~$\Sigma$ of \emph{symbols}, which will
+usually be small fixed-size groups of bits, say four or five. The limiting
+factor will be that the number of symbols $\card\Sigma$ must be at most equal
+to the machine's word size $w$.
+
+There's another constraint, which is that the set of strings that we work
+with must be \emph{prefix-free}. This usually most easily arranged by ending
+each string with a \emph{terminating symbol}. For C~strings, the obvious
+choice is a null character, since C has a preference for null-terminated
+strings anyway.
+
+@ The trie \emph{root} points at a node. The trie structure consists of two
+kinds of nodes called \emph{branch} nodes and \emph{leaf} nodes. Both kinds
+of nodes are the same size. Both kinds of notes have a \emph{tag bit} which
+is clear in leaf nodes and set in branch nodes. (The opposite convention is
+possible, but doesn't work as well in practice.)
+
+In broad outline, searching QP-trie for a key proceeds as follows. We start
+at the root. A branch node has children corresponding to the symbols in
+$\Sigma$: if we encounter one, we move to the child selected by one of the
+symbols of the search key. A leaf node holds a key and its associated value:
+if we encounter one, then we compare the search key against the key in the
+leaf, and return the associated value if keys match, or report failure if
+they don't.
+
+The trie is compressed in two ways.
+\begin{itemize}
+\item Branch nodes which would have fewer than two children are elided.
+ Branch nodes contain an \emph{offset} to indicate which of the search key's
+ symbols to check.
+\item Children which don't have leaves are omitted. A branch node contains a
+ \emph{bitmap}, indexed by symbols in some convenient ordering: the bit
+ corresponding to a symbol is set if the node has a corresponding child. A
+ branch node also holds a pointer to a \emph{twigs} vector, which simply
+ holds the node's remaining children, in order.
+\end{itemize}
+The size of a branch node's twigs vector is equal to the Hamming weight --
+i.e., the number of set bits -- of its bitmap~$B$. Moreover, the index of
+the child corresponding to a symbol $s \in \Sigma$, if one exists, is equal
+to the number of present children corresponding to symbols preceding~$s$ in
+the ordering, which can conveniently be computed by
+\[ \hwt\bigparens{B \bitand \bigparens{(1 \lsl s) - 1}} \mpunct, \]
+where $\hwt(\cdot)$ denotes Hamming weight.
+
+This leads to the first algorithm, for lookup in a QP-trie.
+
+\begin{list}
+ {\textbf{Algorithm L} \emph{(Lookup in a QP-trie.)}}
+ {\leftmargin=0pt \rightmargin=0pt \labelwidth=-\labelsep}
+\item
+Given a QP-trie root pointer $\texttt{ROOT}$ and a key string~$k$ of
+length~$n$, find the value associated with~$k$ in the trie.
+
+Every node~$\texttt{N}$ has a tag bit $\texttt{TAG(N)}$. If
+$\texttt{TAG(N)} = 0$ then $\texttt{N}$ points to a leaf node, which has
+$\texttt{KEY}$ and $\texttt{VALUE}$ fields. If $\texttt{TAG(N)} = 1$ then
+$\texttt{N}$ points to a branch node, which has $\texttt{OFFSET}$,
+$\texttt{BITMAP}$, and $\texttt{TWIGS}$ fields.
+
+\begin{enumerate}[\bfseries L1.]
+\item{} [Empty tree?]
+ If $\texttt{ROOT} = \NULL$, then return failure.
+\item{} [Initialize.]
+ Set $\texttt{N} \gets \texttt{ROOT}$.
+\item{} [Check tag.] \label{en:qp.lookup.chk-tag}
+ If $\texttt{TAG(N)} = 0$ then then $\texttt{N}$ points to a leaf node: go to
+ step~L\ref{en:qp.lookup.cmp-key}. Otherwise, it points to a branch node: go
+ to step~L\ref{en:qp.lookup.chk-len}.
+\item{} [Check length.] \label{en:qp.lookup.chk-len}
+ If $\texttt{OFFSET(N)} \ge n$ then return failure. Otherwise, continue
+ with the next step.
+\item{} [Check symbol.]
+ Let $s \gets k[\texttt{OFFSET(N)}]$. If $\texttt{BITMAP(N)} \bitand (1 \lsl
+ s) = 0$ then return failure. Otherwise, set $\texttt{N}$ to be the address
+ of $\texttt{TWIGS(N)}[\hwt(\texttt{BITMAP(N)} \bitand ((1 \lsl s) - 1))]$
+ and go to step~L\ref{en:qp.lookup.chk-tag}.
+\item{} [Compare key.] \label{en:qp.lookup.cmp-key}
+ If the strings $k$ and $\texttt{KEY(N)}$ are equal then return
+ $\texttt{VALUE(N)}$ successfully; otherwise return failure.
+ \qed
+\end{enumerate}
+\end{list}
+
+\begin{figure}
+ \centering
+ \begin{tikzpicture}
+ [x = 48mm, y = 10mm]
+ \node[algcond] (emptyp) at (0, 8) {L1. Empty tree?};
+ \node[algstep] (init) at (0, 6) {L2. Initialize};
+ \node[algcond] (leafp) at (0, 4) {L3. Check tag};
+ \node[algcond] (len) at (1, 5.5) {L4. Check length};
+ \node[algcond] (sym) at (1, 3) {L5. Check symbol};
+ \node[algcond] (key) at (0, 2) {L6. Compare key};
+ \node (lose) at (2, 4) {\texttt{FAILURE}};
+ \node (win) at (0, 0) {\texttt{SUCCESS}};
+ \path[->]
+ (emptyp)
+ edge node[left] {$\texttt{ROOT} \ne \NULL$} (init)
+ edge [out = 0, in = 140]
+ node[above, sloped] {$\texttt{ROOT} = \NULL$} (lose)
+ (init)
+ edge (leafp)
+ (leafp)
+ edge node[left] {$\texttt{TAG(N)} = 0$} (key)
+ edge node[sloped, above] {$\texttt{TAG(N)} = 1$} (len)
+ (len)
+ edge node[left] {$\texttt{OFFSET(N)} < n$} (sym)
+ edge node[above, sloped] {$\texttt{OFFSET(N)} \ge n$} (lose)
+ (sym)
+ edge node[below, sloped, align = center]
+ {bit $s$ set in \\ $\texttt{BITMAP(N)}$} (leafp)
+ edge node[below, sloped, align = center]
+ {bit $s$ clear in \\ $\texttt{BITMAP(N)}$} (lose)
+ (key)
+ edge node[left] {$k$ matches $\texttt{KEY(N)}$} (win)
+ edge [out = 0, in = -125]
+ node[below] {$k$ doesn't match $\texttt{KEY(N)}$} (lose);
+ \end{tikzpicture}
+ \caption{QP-trie lookup algorithm} \label{fig:qp.lookup}
+\end{figure}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+