3 %%% Description of the parsing machinery
5 %%% (c) 2015 Straylight/Edgeware
8 %%%----- Licensing notice ---------------------------------------------------
10 %%% This file is part of the Sensble Object Design, an object system for C.
12 %%% SOD is free software; you can redistribute it and/or modify
13 %%% it under the terms of the GNU General Public License as published by
14 %%% the Free Software Foundation; either version 2 of the License, or
15 %%% (at your option) any later version.
17 %%% SOD is distributed in the hope that it will be useful,
18 %%% but WITHOUT ANY WARRANTY; without even the implied warranty of
19 %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 %%% GNU General Public License for more details.
22 %%% You should have received a copy of the GNU General Public License
23 %%% along with SOD; if not, write to the Free Software Foundation,
24 %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
26 \chapter{Parsing} \label{ch:parsing}
28 %%%--------------------------------------------------------------------------
29 \section{The parser protocol} \label{sec:parsing.proto}
31 For the purpose of Sod's parsing library, \emph{parsing} is the process of
32 reading a sequence of input items, in order, and computing an output value.
34 A \emph{parser} is an expression which consumes zero or more input items and
35 returns three values: a \emph{result}, a \emph{success flag}, and a
36 \emph{consumed flag}. The two flags are (generalized) booleans. If the
37 success flag is non-nil, then the parser is said to have \emph{succeeded},
38 and the result is the parser's output. If the success flag is nil then the
39 parser is said to have \emph{failed}, and the result is a list of
40 \emph{indicators}. Finally, the consumed flag is non-nil if the parser
41 consumed any input items.
43 %%%--------------------------------------------------------------------------
44 \section{File locations}
46 %%%--------------------------------------------------------------------------
47 \section{Scanners} \label{sec:parsing.scanner}
49 A \emph{scanner} is an object which keeps track of a parser's progress as it
50 works through its input. There's no common base class for scanners: a
51 scanner is simply any object which implements the scanner protocol described
54 A scanner maintains a sequence of items to read. It can step forwards
55 through the items, one at a time, until it reaches the end (if, indeed, the
56 sequence is finite, which it needn't be). Until that point, there is a
57 current item, though there's no protocol for accessing it at this level
58 because the nature of the items is left unspecified.
60 Some scanners support an additional \emph{place-capture} protocol which
61 allows rewinding the scanner to an earlier point in the input so that it can
64 \subsection{Basic scanner protocol} \label{sec:parsing.scanner.basic}
66 The basic protocol supports stepping the scanner forward through its input
67 sequence, and detecting the end of the sequence.
69 \begin{describe}{gf}{scanner-step @<scanner>}
70 Advance the @<scanner> to the next item, which becomes current.
72 It is an error to step the scanner if the scanner is at end-of-file.
75 \begin{describe}{gf}{scanner-at-eof-p @<scanner> @> @<generalized-boolean>}
76 Return non-nil if the scanner is at end-of-file, i.e., there are no more
79 If nil is returned, there is a current item, and it is safe to step the
80 scanner again; otherwise, it is an error to query the current item or to
84 \subsection{Place-capture scanner protocol} \label{sec:parsing.scanner.place}
86 The place-capture protocol allows rewinding to an earlier point in the
87 sequence. Not all scanners support the place-capture protocol.
89 To rewind a scanner to a particular point, that point must be \emph{captured}
90 as a \emph{place} when it's current -- so you must know in advance that this
91 is an interesting place that's worth capturing. The type of place returned
92 depends on the type of scanner. Given a captured place, the scanner can be
93 rewound to the position held in it.
95 Depending on how the scanner works, holding onto a captured place might
96 consume a lot of memory or case poor performance. For example, if the
97 scanner is reading from an input stream, having a captured place means that
98 data from that point on must be buffered in case the program needs to rewind
99 the scanner and read that data again. Therefore it's possible to
100 \emph{release} a place when it turns out not to be needed any more.
102 \begin{describe}{gf}{scanner-capture-place @<scanner> @> @<place>}
103 Capture the @<scanner>'s current position as a place, and return the place.
106 \begin{describe}{gf}{scanner-restore-place @<scanner> @<place>}
107 Rewind the @<scanner> to the state it was in when @<place> was captured.
108 In particular, the item that was current when the @<place> was captured
109 becomes current again.
111 It is an error to restore a @<place> that has been released, or if the
112 @<place> wasn't captured from the @<scanner>.
115 \begin{describe}{gf}{scanner-release-place @<scanner> @<place>}
116 Release the @<place>, to avoid having to maintaining the ability to restore
117 it after it's not needed any more..
119 It is an error if the @<place> wasn't captured from the @<scanner>.
122 \begin{describe}{mac}
123 {with-scanner-place (@<place> @<scanner>) @<body-form>^* @> @<value>^*}
124 Capture the @<scanner>'s current position as a place, evaluate the
125 @<body-form>s as an implicit progn with the variable @<place> bound to the captured
126 place. When control leaves the @<body-form>s, the place is released. The return
127 values are the values of the final @<body-form>.
130 \subsection{Scanner file-location protocol} \label{sec:parsing.scanner.floc}
132 Some scanners participate in the file-location protocol (\xref{sec:floc}).
133 They implement a method on @|file-location| which collects the necessary
134 information using scanner-specific functions described here.
136 \begin{describe}{fun}{scanner-file-location @<scanner> @> @<file-location>}
137 Return a @|file-location| object describing the current position of the
140 This calls the @|scanner-filename|, @|scanner-line| and @|scanner-column|
141 generic functions on the scanner, and uses these to fill in an appropriate
144 Since there are default methods on these generic functions, it is not an
145 error to call @|scanner-file-location| on any kind of value, but it might
146 not be very useful. This function exists to do the work of appropriately
147 specialized methods on @|file-location|.
150 \begin{describe}{gf}{scanner-filename @<scanner> @> @<string>}
151 Return the name of the file the scanner is currently processing, as a
152 string, or nil if the filename is not known.
155 \begin{describe}{meth}{scanner-filename (@<scanner> t) @> @<string>}
159 \begin{describe}{gf}{scanner-line @<scanner> @> @<integer>}
160 Return the line number of the @<scanner>'s current position, as an integer,
161 or nil if the line number is not known.
164 \begin{describe}{meth}{scanner-line (@<scanner> t) @> @<integer>}
168 \begin{describe}{gf}{scanner-column @<scanner> @> @<integer>}
169 Return the column number of the @<scanner>'s current position, as an
170 integer, or nil if the column number is not known.
173 \begin{describe}{meth}{scanner-column (@<scanner> t) @> @<integer>}
177 \subsection{Character scanners} \label{sec:parsing.scanner.char}
179 Character scanners are scanners which read sequences of characters.
181 \begin{describe}{cls}{character-scanner () \&key}
182 Base class for character scanners. This provides some very basic
185 Not all character scanners are subclasses of @|character-scanner|.
188 \begin{describe}{gf}{scanner-current-char @<scanner> @> @<character>}
189 Returns the current character.
192 \begin{describe}{gf}{scanner-unread @<scanner> @<character>}
193 Rewind the @<scanner> by one step. The @<chararacter> must be the previous
194 current character, and becomes the current character again. It is an error
195 if: the @<scanner> has reached end-of-file; the @<scanner> is never been
196 stepped; or @<character> was not the previous current character.
200 {scanner-interval @<scanner> @<place-a> \&optional @<place-b>
202 Return the characters in the @<scanner>'s input from @<place-a> up to (but
203 not including) @<place-b>.
205 The characters are returned as a string. If @<place-b> is omitted, return
206 the characters up to (but not including) the current position. It is an
207 error if @<place-b> precedes @<place-a> or they are from different
210 This function is a character-scanner-specific extension to the
211 place-capture protocol; not all character scanners implement the
212 place-capture protocol, and some that do may not implement this function.
215 \subsubsection{Stream access to character scanners}
216 Sometimes it can be useful to apply the standard Lisp character input
217 operations to the sequence of characters held by a character scanner.
219 \begin{describe}{gf}{make-scanner-stream @<scanner> @> @<stream>}
220 Returns a fresh input @|stream| object which fetches input characters from
221 the character scanner object @<scanner>. Reading characters from the
222 stream steps the scanner. The stream will reach end-of-file when the
223 scanner reports end-of-file. If the scanner implements the file-location
224 protocol then reading from the stream will change the file location in an
227 This is mostly useful for applying standard Lisp stream functions, most
228 particularly the @|read| function, in the middle of a parsing operation.
231 \begin{describe}{cls}{character-scanner-stream (stream) \&key :scanner}
232 A Common Lisp input @|stream| object which works using the character
233 scanner protocol. Any @<scanner> which implements the base scanner and
234 character scanner protocols is suitable. See @|make-scanner-stream|.
237 \subsection{String scanners} \label{sec:parsing.scanner.string}
239 A \emph{string scanner} is a simple kind of character scanner which reads
240 input from a string object. String scanners implement the character scanner
241 and place-capture protocols.
243 \begin{describe}{cls}{string-scanner}
244 The class of string scanners. The @|string-scanner| class is not a
245 subclass of @|character-scanner|.
248 \begin{describe}{fun}{string-scanner-p @<value> @> @<generalized-boolean>}
249 Return non-nil if @<value> is a @|string-scanner| object; otherwise return
253 \begin{describe}{fun}
254 {make-string-scanner @<string> \&key :start :end @> @<string-scanner>}
255 Construct and return a fresh @|string-scanner| object. The new scanner
256 will read characters from @<string>, starting at index @<start> (which
257 defaults to zero), and continuing until it reaches index @<end> (defaults
258 to the end of the @<string>).
261 \subsection{Character buffer scanners} \label{sec:parsing.scanner.charbuf}
263 A \emph{character buffer scanner}, or \emph{charbuf scanner} for short, is an
264 efficient scanner for reading characters from an input stream. Charbuf
265 scanners implements the basic scanner, character buffer, place-capture, and
266 file-location protocols.
268 \begin{describe}{cls}
269 {charbuf-scanner (character-scanner)
270 \&key :stream :filename :line :column}
271 The class of charbuf scanners. The scanner will read characters from
272 @<stream>. Charbuf scanners implement the file-location protocol: the
273 initial location is set from the given @<filename>, @<line> and @<column>;
274 the scanner will update the location as it reads its input.
277 \begin{describe}{cls}{charbuf-scanner-place}
278 The class of place objects captured by a charbuf scanner.
281 \begin{describe}{fun}
282 {charbuf-scanner-place-p @<value> @> @<generalized-boolean>}
283 Type predicate for charbuf scanner places: returns non-nil if @<value> is a
284 place captured by a charbuf scanner, and nil otherwise.
288 {charbuf-scanner-map @<scanner> @<func> \&optional @<fail>
289 \nlret @<result> @<successp> @<consumedp>}
290 Read characters from the @<scanner>'s buffers.
292 This is intended to be an efficient and versatile interface for reading
293 characters from a scanner in bulk. The function @<func> is invoked
296 (multiple-value-bind (@<donep> @<used>) \\ \ind\ind
297 (funcall @<func> @<buf> @<start> @<end>) \- \\
300 The argument @<buf> is a simple string; @<start> and @<end> are two
301 nonnegative fixnums, indicating that the subsequence of @<buf> between
302 @<start> (inclusive) and @<end> (exclusive) should be processed. If
303 @<func>'s return value @<donep> is nil then @<used> is ignored: the
304 function has consumed the entire buffer and wishes to read more. If
305 @<donep> is non-nil, then it must be a fixnum such that $@<start> \le
306 @<used> \le @<end>$: the function has consumed the buffer as far as @<used>
307 (exclusive) and has completed successfully.
309 If end-of-file is encountered before @<func> completes successfully then it
310 fails: the @<fail> function is called with no arguments, and is expected to
311 return two values. If omitted, @<fail> defaults to
317 The @|charbuf-scanner-map| function returns three values. The first value
318 is the non-nil @<donep> value returned by @<func> if @|charbuf-scanner-map|
319 succeeded, or the first value returned by @<fail>; the second value is @|t|
320 on success, or the second value returned by @<fail>; the third value is
321 non-nil if @<func> consumed any input, i.e., it returned with @<donep> nil
322 at least once, or with $@<used> > @<start>$.
325 \subsection{Token scanners} \label{sec:parsing.scanner.token}
327 \begin{describe}{cls}
328 {token-scanner () \&key :filename (:line 1) (:column 0)}
331 \begin{describe}{gf}{token-type @<scanner> @> @<type>}
334 \begin{describe}{gf}{token-value @<scanner> @> @<value>}
337 \begin{describe}{gf}{scanner-token @<scanner> @> @<type> @<value>}
340 \begin{describe}{ty}{token-scanner-place}
343 \begin{describe}{fun}
344 {token-scanner-place-p @<value> @> @<generalized-boolean>}
347 \subsection{List scanners}
349 \begin{describe}{ty}{list-scanner}
352 \begin{describe}{fun}{list-scanner-p @<value> @> @<generalized-boolean>}
355 \begin{describe}{fun}{make-list-scanner @<list> @> @<list-scanner>}
358 %%%--------------------------------------------------------------------------
359 \section{Parsing macros}
361 %%%--------------------------------------------------------------------------
362 \section{Lexical analyser}
364 %%%----- That's all, folks --------------------------------------------------
368 %%% TeX-master: "sod.tex"