chiark / gitweb /
cgi.py: Remove the old error reporting machinery.
[chopwood] / format.py
CommitLineData
a2916c06
MW
1### -*-python-*-
2###
3### String formatting, with bells, whistles, and gongs
4###
5### (c) 2013 Mark Wooding
6###
7
8###----- Licensing notice ---------------------------------------------------
9###
10### This file is part of Chopwood: a password-changing service.
11###
12### Chopwood is free software; you can redistribute it and/or modify
13### it under the terms of the GNU Affero General Public License as
14### published by the Free Software Foundation; either version 3 of the
15### License, or (at your option) any later version.
16###
17### Chopwood is distributed in the hope that it will be useful,
18### but WITHOUT ANY WARRANTY; without even the implied warranty of
19### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20### GNU Affero General Public License for more details.
21###
22### You should have received a copy of the GNU Affero General Public
23### License along with Chopwood; if not, see
24### <http://www.gnu.org/licenses/>.
25
26from __future__ import with_statement
27
28import contextlib as CTX
29import re as RX
30from cStringIO import StringIO
31import sys as SYS
32
33import util as U
34
35###--------------------------------------------------------------------------
36### A quick guide to the formatting machinery.
37###
38### This is basically a re-implementation of Common Lisp's FORMAT function in
39### Python. It differs in a few respects.
40###
41### * Most essentially, Python's object and argument-passing models aren't
42### the same as Lisp's. In fact, for our purposes, they're a bit better:
43### Python's sharp distinction between positional and keyword arguments
44### is often extremely annoying, but here they become a clear benefit.
45### Inspired by Python's own enhanced string-formatting machinery (the
46### new `str.format' method, and `string.Formatting' class, we provide
47### additional syntax to access keyword arguments by name, positional
48### arguments by position (without moving the cursor as manipulated by
49### `~*'), and for selecting individual elements of arguments by indexing
50### or attribute lookup.
51###
52### * Unfortunately, Python's I/O subsystem is much less rich than Lisp's.
53### We lack streams which remember their cursor position, and so can't
54### implmenent the `?&' (fresh line) or `~T' (horizontal tab) operators
55### usefully. Moreover, the Python pretty-printer is rather less well
56### developed than the XP-based Lisp pretty-printer, so the pretty-
57### printing operations are unlikely to be implemented any time soon.
58###
59### * This implementation is missing a number of formatting directives just
60### because they're somewhat tedious to write, such as the detailed
61### floating-point printing provided by `~E', `~F' and `~G'. These might
62### appear in time.
63###
64### Formatting takes place in two separable stages. First, a format string
65### is compiled into a formatting operation. Then, the formatting operation
66### can be applied to sets of arguments. State for these two stages is
67### maintained in fluid variable sets `COMPILE' and `FORMAT'.
68###
69### There are a number of protocols involved in making all of this work.
70### They're described in detail as we come across them, but here's an
71### overview.
72###
73### * Output is determined by formatting-operation objects, typically (but
74### not necessarily) subclasses of `BaseFormatOperation'. A format
75### string is compiled into a single compound formatting operation.
76###
77### * Formatting operations determine what to output from their own
78### internal state and from formatting arguments. The latter are
79### collected from argument-collection objects which are subclasses of
80### `BaseArg'.
81###
82### * Formatting operations can be modified using parameters, which are
83### supplied either through the format string or from arguments. To
84### abstract over this distinction, parameters are collected from
85### parameter-collection objects which are subclasses of `BaseParameter'.
86
87FORMAT = U.Fluid()
88## State for format-time processing. The base state is established by the
89## `format' function, though various formatting operations will rebind
90## portions of the state while they perform recursive processing. The
91## variables are as follows.
92##
93## argmap The map (typically a dictionary) of keyword arguments to be
94## formatted. These can be accessed only though `=KEY' or
95## `!KEY' syntax.
96##
97## argpos The index of the next positional argument to be collected.
98## The `~*' directive works by setting this variable.
99##
100## argseq The sequence (typically a list) of positional arguments to be
101## formatted. These are collected in order (as modified by the
102## `~*' directive), or may be accessed through `=INDEX' or
103## `!INDEX' syntax.
104##
105## escape An escape procedure (i.e., usually created by `Escape()') to
106## be called by `~^'.
107##
108## last_multi_p A boolean, indicating that there are no more lists of
109## arguments (e.g., from `~:{...~}'), so `~:^' should escape if
110## it is encountered.
111##
112## multi_escape An escape procedure (i.e., usually created by `Escape()') to
113## be called by `~:^'.
114##
115## pushback Some formatting operations, notably `~@[...~]', read
116## arguments without consuming them, so a subsequent operation
117## should collect the same argument. This works by pushing the
118## arguments onto the `pushback' list.
119##
120## write A function which writes its single string argument to the
121## current output.
122
123COMPILE = U.Fluid()
124## State for compile-time processing. The base state is established by the
125## `compile' function, though some formatting operations will rebind portions
126## of the state while they perform recursive processing. The variables are
127## as follows.
128##
129## control The control string being parsed.
130##
131## delim An iterable (usually a string) of delimiter directives. See
132## the `FormatDelimeter' class and the `collect_subformat'
133## function for details of this.
134##
135## end The end of the portion of the control string being parsed.
136## There might be more of the string, but we should pretend that
137## it doesn't exist.
138##
139## opmaps A list of operation maps, i.e., dictionaries mapping
140## formatting directive characters to the corresponding
141## formatting operation classes. The list is searched in order,
142## and the first match is used. This can be used to provide
143## local extensions to the formatting language.
144##
145## start The current position in the control string. This is advanced
146## as pieces of the string are successfully parsed.
147
148###--------------------------------------------------------------------------
149### A few random utilities.
150
151def remaining():
152 """
153 Return the number of positional arguments remaining.
154
155 This will /include/ pushed-back arguments, so this needn't be monotonic
156 even in the absence of `~*' repositioning.
157 """
158 return len(FORMAT.pushback) + len(FORMAT.argseq) - FORMAT.argpos
159
160@CTX.contextmanager
161def bind_args(args, **kw):
162 """
163 Context manager: temporarily establish a different collection of arguments.
164
165 If the ARGS have a `keys' attribute, then they're assumed to be a mapping
166 object and are set as the keyword arguments, preserving the positional
167 arguments; otherwise, the positional arguments are set and the keyword
168 arguments are preserved.
169
170 Other keyword arguments to this function are treated as additional `FORMAT'
171 variables to be bound.
172 """
173 if hasattr(args, 'keys'):
174 with FORMAT.bind(argmap = args, **kw): yield
175 else:
176 with FORMAT.bind(argseq = args, argpos = 0, pushback = [], **kw): yield
177
178## Some regular expressions for parsing things.
179R_INT = RX.compile(r'[-+]?[0-9]+')
180R_WORD = RX.compile(r'[_a-zA-Z][_a-zA-Z0-9]*')
181
182###--------------------------------------------------------------------------
183### Format string errors.
184
185class FormatStringError (Exception):
186 """
187 An exception type for reporting errors in format control strings.
188
189 Its most useful feature is that it points out where the error is in a
190 vaguely useful way. Attributes are as follows.
191
192 control The offending format control string.
193
194 msg The error message, as a human-readable string.
195
196 pos The position at which the error was discovered. This might
197 be a little way from the actual problem, but it's usually
198 good enough.
199 """
200
201 def __init__(me, msg, control, pos):
202 """
203 Construct the exception, given a message MSG, a format CONTROL string,
204 and the position POS at which the error was found.
205 """
206 me.msg = msg
207 me.control = control
208 me.pos = pos
209
210 def __str__(me):
211 """
212 Present a string explaining the problem, including a dump of the
213 offending portion of the string.
214 """
215 s = me.control.rfind('\n', 0, me.pos) + 1
216 e = me.control.find('\n', me.pos)
217 if e < 0: e = len(me.control)
218 return '%s\n %s\n %*s^\n' % \
219 (me.msg, me.control[s:e], me.pos - s, '')
220
221def format_string_error(msg):
222 """Report an error in the current format string."""
223 raise FormatStringError(msg, COMPILE.control, COMPILE.start)
224
225###--------------------------------------------------------------------------
226### Argument collection protocol.
227
228## Argument collectors abstract away the details of collecting formatting
229## arguments. They're used both for collecting arguments to be output, and
230## for parameters designated using the `v' or `!ARG' syntaxes.
231##
232## There are a small number of primitive collectors, and some `compound
233## collectors' which read an argument using some other collector, and then
234## process it in some way.
235##
236## An argument collector should implement the following methods.
237##
238## get() Return the argument variable.
239##
240## pair() Return a pair of arguments.
241##
242## tostr(FORCEP)
243## Return a string representation of the collector. If FORCEP,
244## always return a string; otherwise, a `NextArg' collector
245## returns `None' to indicate that no syntax is required to
246## select it.
247
248class BaseArg (object):
249 """
250 Base class for argument collectors.
251
252 This implements the `pair' method by calling `get' and hoping that the
253 corresponding argument is indeed a sequence of two items.
254 """
255
256 def __init__(me):
257 """Trivial constructor."""
258 pass
259
260 def pair(me):
261 """
262 Return a pair of arguments, by returning an argument which is a pair.
263 """
264 return me.get()
265
266 def __repr__(me):
267 """Print a useful string representation of the collector."""
268 return '#<%s "=%s">' % (type(me).__name__, me.tostr(True))
269
270class NextArg (BaseArg):
271 """The default argument collector."""
272
273 def get(me):
274 """
275 Return the next argument.
276
277 If there are pushed-back arguments, then return the one most recently
278 pushed back. Otherwise, return the next argument from `argseq',
279 advancing `argpos'.
280 """
281 if FORMAT.pushback: return FORMAT.pushback.pop()
282 i = FORMAT.argpos
283 a = FORMAT.argseq[i]
284 FORMAT.argpos = i + 1
285 return a
286
287 def pair(me):
288 """Return a pair of arguments, by fetching two separate arguments."""
289 left = me.get()
290 right = me.get()
291 return left, right
292
293 def tostr(me, forcep):
294 """Convert the default collector to a string."""
295 if forcep: return '+'
296 else: return None
297
298NEXTARG = NextArg()
299## Because a `NextArg' collectors are used so commonly, and they're all the
300## same, we make a distinguished one and try to use that instead. Nothing
301## goes badly wrong if you don't use this, but you'll use more memory than
302## strictly necessary.
303
304class ThisArg (BaseArg):
305 """Return the current positional argument without consuming it."""
306 def _get(me, i):
307 """Return the positional argument I on from the current position."""
308 n = len(FORMAT.pushback)
309 if n > i: return FORMAT.pushback[n - i - 1]
310 else: return FORMAT.argseq[FORMAT.argpos + i - n]
311 def get(me):
312 """Return the next argument."""
313 return me._get(0)
314 def pair(me):
315 """Return the next two arguments without consuming either."""
316 return me._get(0), me._get(1)
317 def tostr(me, forcep):
318 """Convert the colector to a string."""
319 return '@'
320
321THISARG = ThisArg()
322
323class SeqArg (BaseArg):
324 """
325 A primitive collector which picks out the positional argument at a specific
326 index.
327 """
328 def __init__(me, index): me.index = index
329 def get(me): return FORMAT.argseq[me.index]
330 def tostr(me, forcep): return '%d' % me.index
331
332class MapArg (BaseArg):
333 """
334 A primitive collector which picks out the keyword argument with a specific
335 key.
336 """
337 def __init__(me, key): me.key = key
338 def get(me): return FORMAT.argmap[me.key]
339 def tostr(me, forcep): return '%s' % me.key
340
341class IndexArg (BaseArg):
342 """
343 A compound collector which indexes an argument.
344 """
345 def __init__(me, base, index):
346 me.base = base
347 me.index = index
348 def get(me):
349 return me.base.get()[me.index]
350 def tostr(me, forcep):
351 return '%s[%s]' % (me.base.tostr(True), me.index)
352
353class AttrArg (BaseArg):
354 """
355 A compound collector which returns an attribute of an argument.
356 """
357 def __init__(me, base, attr):
358 me.base = base
359 me.attr = attr
360 def get(me):
361 return getattr(me.base.get(), me.attr)
362 def tostr(me, forcep):
363 return '%s.%s' % (me.base.tostr(True), me.attr)
364
365## Regular expression matching compound-argument suffixes.
366R_REF = RX.compile(r'''
367 \[ ( [-+]? [0-9]+ ) \]
368 | \[ ( [^]]* ) \]
369 | \. ( [_a-zA-Z] [_a-zA-Z0-9]* )
370''', RX.VERBOSE)
371
372def parse_arg():
373 """
374 Parse an argument collector from the current format control string.
375
376 The syntax of an argument is as follows.
377
378 ARG ::= COMPOUND-ARG | `{' COMPOUND-ARG `}'
379
380 COMPOUND-ARG ::= SIMPLE-ARG
381 | COMPOUND-ARG `[' INDEX `]'
382 | COMPOUND-ARG `.' WORD
383
384 SIMPLE-ARG ::= INT | WORD | `+' | `@'
385
386 Surrounding braces mean nothing, but may serve to separate the argument
387 from a following alphabetic formatting directive.
388
389 A `+' means `the next pushed-back or positional argument'. It's useful to
390 be able to say this explicitly so that indexing and attribute references
391 can be attached to it: for example, in `~={thing}@[~={+.attr}A~]'.
392
393 An integer argument selects the positional argument with that index; a
394 negative index counts backwards from the end, as is usual in Python.
395
396 A word argument selects the keyword argument with that key.
397 """
398
399 c = COMPILE.control
400 s, e = COMPILE.start, COMPILE.end
401
402 ## If it's delimited then pick through the delimiter.
403 brace = None
404 if s < e and c[s] == '{':
405 brace = '}'
406 s += 1
407
408 ## Make sure there's something to look at.
409 if s >= e: raise FormatStringError('missing argument specifier', c, s)
410
411 ## Find the start of the breadcrumbs.
412 if c[s] == '+':
413 getarg = NEXTARG
414 s += 1
415 if c[s] == '@':
416 getarg = THISARG
417 s += 1
418 elif c[s].isdigit():
419 m = R_INT.match(c, s, e)
420 getarg = SeqArg(int(m.group()))
421 s = m.end()
422 else:
423 m = R_WORD.match(c, s, e)
424 if not m: raise FormatStringError('unknown argument specifier', c, s)
425 getarg = MapArg(m.group())
426 s = m.end()
427
428 ## Now parse indices and attribute references.
429 while True:
430 m = R_REF.match(c, s, e)
431 if not m: break
432 if m.group(1): getarg = IndexArg(getarg, int(m.group(1)))
433 elif m.group(2): getarg = IndexArg(getarg, m.group(2))
434 elif m.group(3): getarg = AttrArg(getarg, m.group(3))
435 else: raise FormatStringError('internal error (weird ref)', c, s)
436 s = m.end()
437
438 ## Finally, check that we have the close delimiter we want.
439 if brace:
440 if s >= e or c[s] != brace:
441 raise FormatStringError('missing close brace', c, s)
442 s += 1
443
444 ## Done.
445 COMPILE.start = s
446 return getarg
447
448###--------------------------------------------------------------------------
449### Parameter collectors.
450
451## These are pretty similar in shape to argument collectors. The required
452## methods are as follows.
453##
454## get() Return the parameter value.
455##
456## tostr() Return a string representation of the collector. (We don't
457## need a FORCEP argument here, because there are no default
458## parameters.)
459
460class BaseParameter (object):
461 """
462 Base class for parameter collector objects.
463
464 This isn't currently very useful, because all it provides is `__repr__',
465 but the protocol might get more complicated later.
466 """
467 def __init__(me): pass
468 def __repr__(me): return '#<%s "%s">' % (type(me).__name__, me.tostr())
469
470class LiteralParameter (BaseParameter):
471 """
472 A literal parameter, parsed from the control string.
473 """
474 def __init__(me, lit): me.lit = lit
475 def get(me): return me.lit
476 def tostr(me):
477 if me.lit is None: return ''
478 elif isinstance(me.lit, (int, long)): return str(me.lit)
479 else: return "'%c" % me.lit
480
481## Many parameters are omitted, so let's just reuse a distinguished collector
482## for them.
483LITNONE = LiteralParameter(None)
484
485class RemainingParameter (BaseParameter):
486 """
487 A parameter which collects the number of remaining positional arguments.
488 """
489 def get(me): return remaining()
490 def tostr(me): return '#'
491
492## These are all the same, so let's just have one of them.
493REMAIN = RemainingParameter()
494
495class VariableParameter (BaseParameter):
496 """
497 A variable parameter, fetched from an argument.
498 """
499 def __init__(me, arg): me.arg = arg
500 def get(me): return me.arg.get()
501 def tostr(me):
502 s = me.arg.tostr(False)
503 if not s: return 'V'
504 else: return '!' + s
505VARNEXT = VariableParameter(NEXTARG)
506
507###--------------------------------------------------------------------------
508### Formatting protocol.
509
510## The formatting operation protocol is pretty straightforward. An operation
511## must implement a method `format' which takes no arguments, and should
512## produce its output (if any) by calling `FORMAT.write'. In the course of
513## its execution, it may collect parameters and arguments.
514##
515## The `opmaps' table maps formatting directives (which are individual
516## characters, in upper-case for letters) to functions returning formatting
517## operation objects. All of the directives are implemented in this way.
518## The functions for the base directives are actually the (callable) class
519## objects for subclasses of `BaseFormatOperation', though this isn't
520## necessary.
521##
522## The constructor functions are called as follows:
523##
524## FUNC(ATP, COLONP, GETARG, PARAMS, CHAR)
525## The ATP and COLONP arguments are booleans indicating respectively
526## whether the `@' and `:' modifiers were set in the control string.
527## GETARG is the collector for the operation's argument(s). The PARAMS
528## are a list of parameter collectors. Finally, CHAR is the directive
529## character (so directives with siilar behaviour can use the same
530## class).
531
532class FormatLiteral (object):
533 """
534 A special formatting operation for printing literal text.
535 """
536 def __init__(me, s): me.s = s
537 def __repr__(me): return '#<%s %r>' % (type(me).__name__, me.s)
538 def format(me): FORMAT.write(me.s)
539
540class FormatSequence (object):
541 """
542 A special formatting operation for applying collection of other operations
543 in sequence.
544 """
545 def __init__(me, seq):
546 me.seq = seq
547 def __repr__(me):
548 return '#<%s [%s]>' % (type(me).__name__,
549 ', '.join(repr(p) for p in me.seq))
550 def format(me):
551 for p in me.seq: p.format()
552
553class BaseFormatOperation (object):
554 """
555 The base class for built-in formatting operations (and, probably, most
556 extensions).
557
558 Subclasses should implement a `_format' method.
559
560 _format(ATP, COLONP, [PARAM = DEFAULT, ...])
561 Called to produce output. The ATP and COLONP flags are from
562 the constructor. The remaining function arguments are the
563 computed parameter values. Arguments may be collected using
564 the `getarg' attribute.
565
566 Subclasses can set class attributes to influence the constructor.
567
568 MINPARAM The minimal number of parameters acceptable. If fewer
569 parameters are supplied then an error is reported at compile
570 time. The default is zero.
571
572 MAXPARAM The maximal number of parameters acceptable. If more
573 parameters are supplied then an error is reported at compile
574 time. The default is zero; `None' means that there is no
575 maximum (but this is unusual).
576
577 Instances have a number of useful attributes.
578
579 atp True if an `@' modifier appeared in the directive.
580
581 char The directive character from the control string.
582
583 colonp True if a `:' modifier appeared in the directive.
584
585 getarg Argument collector; may be called by `_format'.
586
587 params A list of parameter collector objects.
588 """
589
590 ## Default bounds on parameters.
591 MINPARAM = MAXPARAM = 0
592
593 def __init__(me, atp, colonp, getarg, params, char):
594 """
595 Constructor: store information about the directive, and check the bounds
596 on the parameters.
597
598 A subclass should call this before doing anything fancy such as parsing
599 the control string further.
600 """
601
602 ## Store information.
603 me.atp = atp
604 me.colonp = colonp
605 me.getarg = getarg
606 me.params = params
607 me.char = char
608
609 ## Check the parameters.
610 bad = False
611 if len(params) < me.MINPARAM: bad = True
612 elif me.MAXPARAM is not None and len(params) > me.MAXPARAM: bad = True
613 if bad:
614 format_string_error('bad parameters')
615
616 def format(me):
617 """Produce output: call the subclass's formatting function."""
618 me._format(me.atp, me.colonp, *[p.get() for p in me.params])
619
620 def tostr(me):
621 """Convert the operation to a directive string."""
622 return '~%s%s%s%s%s' % (
623 ','.join(a.tostr() for a in me.params),
624 me.colonp and ':' or '',
625 me.atp and '@' or '',
626 (lambda s: s and '={%s}' % s or '')(me.getarg.tostr(False)),
627 me.char)
628
629 def __repr__(me):
630 """Produce a readable (ahem) version of the directive."""
631 return '#<%s "%s">' % (type(me).__name__, me.tostr())
632
633class FormatDelimiter (BaseFormatOperation):
634 """
635 A fake formatting operation which exists to impose additional syntactic
636 structure on control strings.
637
638 No `_format' method is actually defined, so `FormatDelimiter' objects
639 should never find their way into the output pipeline. Instead, they are
640 typically useful in conjunction with the `collect_subformat' function. To
641 this end, the constructor will fail if its directive character is not in
642 listed as an expected delimiter in `CONTROL.delim'.
643 """
644
645 def __init__(me, *args):
646 """
647 Constructor: make sure this delimiter is expected in the current context.
648 """
649 super(FormatDelimiter, me).__init__(*args)
650 if me.char not in COMPILE.delim:
651 format_string_error("unexpected close delimiter `~%s'" % me.char)
652
653###--------------------------------------------------------------------------
654### Parsing format strings.
655
656def parse_operator():
657 """
658 Parse the next portion of the current control string and return a single
659 formatting operator for it.
660
661 If we have reached the end of the control string (as stored in
662 `CONTROL.end') then return `None'.
663 """
664
665 c = COMPILE.control
666 s, e = COMPILE.start, COMPILE.end
667
668 ## If we're at the end then stop.
669 if s >= e: return None
670
671 ## If there's some literal text then collect it.
672 if c[s] != '~':
673 i = c.find('~', s, e)
674 if i < 0: i = e
675 COMPILE.start = i
676 return FormatLiteral(c[s:i])
677
678 ## Otherwise there's a formatting directive to collect.
679 s += 1
680
681 ## First, collect arguments.
682 aa = []
683 while True:
684 if s >= e: break
685 if c[s] == ',':
686 aa.append(LITNONE)
687 s += 1
688 continue
689 elif c[s] == "'":
690 s += 1
691 if s >= e: raise FormatStringError('missing argument character', c, s)
692 aa.append(LiteralParameter(c[s]))
693 s += 1
694 elif c[s].upper() == 'V':
695 s += 1
696 aa.append(VARNEXT)
697 elif c[s] == '!':
698 COMPILE.start = s + 1
699 getarg = parse_arg()
700 s = COMPILE.start
701 aa.append(VariableParameter(getarg))
702 elif c[s] == '#':
703 s += 1
704 aa.append(REMAIN)
705 else:
706 m = R_INT.match(c, s, e)
707 if not m: break
708 aa.append(LiteralParameter(int(m.group())))
709 s = m.end()
710 if s >= e or c[s] != ',': break
711 s += 1
712
713 ## Maybe there's an explicit argument.
714 if s < e and c[s] == '=':
715 COMPILE.start = s + 1
716 getarg = parse_arg()
717 s = COMPILE.start
718 else:
719 getarg = NEXTARG
720
721 ## Next, collect the flags.
722 atp = colonp = False
723 while True:
724 if s >= e:
725 break
726 elif c[s] == '@':
727 if atp: raise FormatStringError('duplicate at flag', c, s)
728 atp = True
729 elif c[s] == ':':
730 if colonp: raise FormatStringError('duplicate colon flag', c, s)
731 colonp = True
732 else:
733 break
734 s += 1
735
736 ## We should now have a directive character.
737 if s >= e: raise FormatStringError('missing directive', c, s)
738 ch = c[s].upper()
739 op = None
740 for map in COMPILE.opmaps:
741 try: op = map[ch]
742 except KeyError: pass
743 else: break
744 else:
745 raise FormatStringError('unknown directive', c, s)
746 s += 1
747
748 ## Done.
749 COMPILE.start = s
750 return op(atp, colonp, getarg, aa, ch)
751
752def collect_subformat(delim):
753 """
754 Parse formatting operations from the control string until we find one whose
755 directive character is listed in DELIM.
756
757 Where an operation accepts multiple sequences of formatting directives, the
758 first element of DELIM should be the proper closing delimiter. The
759 traditional separator is `~;'.
760 """
761 pp = []
762 with COMPILE.bind(delim = delim):
763 while True:
764 p = parse_operator()
765 if not p:
766 format_string_error("missing close delimiter `~%s'" % delim[0])
767 if isinstance(p, FormatDelimiter) and p.char in delim: break
768 pp.append(p)
769 return FormatSequence(pp), p
770
771def compile(control):
772 """
773 Parse the whole CONTROL string, returning the corresponding formatting
774 operator.
775 """
776 pp = []
777 with COMPILE.bind(control = control, start = 0, end = len(control),
778 delim = ''):
779 while True:
780 p = parse_operator()
781 if not p: break
782 pp.append(p)
783 return FormatSequence(pp)
784
785###--------------------------------------------------------------------------
786### Formatting text.
787
788def format(out, control, *args, **kw):
789 """
790 Format the positional args and keywords according to the CONTROL, and write
791 the result to OUT.
792
793 The output is written to OUT, which may be one of the following.
794
795 `True' Write to standard output.
796
797 `False' Write to standard error.
798
799 `None' Return the output as a string.
800
801 Any object with a `write' attribute
802 Call `write' repeatedly with strings to be output.
803
804 Any callable object
805 Call the object repeatedly with strings to be output.
806
807 The CONTROL argument may be one of the following.
808
809 A string or unicode object
810 Compile the string into a formatting operation and use that.
811
812 A formatting operation
813 Apply the operation to the arguments.
814 """
815
816 ## Turn the output argument into a function which we can use easily. If
817 ## we're writing to a string, we'll have to extract the result at the end,
818 ## so keep track of anything we have to do later.
819 final = U.constantly(None)
820 if out is True:
821 write = SYS.stdout.write
822 elif out is False:
823 write = SYS.stderr.write
824 elif out is None:
825 strio = StringIO()
826 write = strio.write
827 final = strio.getvalue
828 elif hasattr(out, 'write'):
829 write = out.write
830 elif callable(out):
831 write = out
832 else:
833 raise TypeError, out
834
835 ## Turn the control argument into a formatting operation.
836 if isinstance(control, basestring):
837 op = compile(control)
838 else:
839 op = control
840
841 ## Invoke the formatting operation in the correct environment.
842 with FORMAT.bind(write = write, pushback = [],
843 argseq = args, argpos = 0,
844 argmap = kw):
845 op.format()
846
847 ## Done.
848 return final()
849
850###--------------------------------------------------------------------------
851### Standard formatting directives.
852
853## A dictionary, in which we'll build the basic set of formatting operators.
854## Callers wishing to implement extensions should include this in their
855## `opmaps' lists.
856BASEOPS = {}
857COMPILE.opmaps = [BASEOPS]
858
859## Some standard delimiter directives.
860for i in [']', ')', '}', '>', ';']: BASEOPS[i] = FormatDelimiter
861
862class SimpleFormatOperation (BaseFormatOperation):
863 """
864 Common base class for the `~A' (`str') and `~S' (`repr') directives.
865
866 These take similar parameters, so it's useful to deal with them at the same
867 time. Subclasses should implement a method `_convert' of one argument,
868 which returns a string to be formatted.
869
870 The parameters are as follows.
871
872 MINCOL The minimum number of characters to output. Padding is added
873 if the output string is shorter than this.
874
875 COLINC Lengths of padding groups. The number of padding characters
876 will be MINPAD more than a multiple of COLINC.
877
878 MINPAD The smallest number of padding characters to write.
879
880 PADCHAR The padding character.
881
882 If the `@' modifier is given, then padding is applied on the left;
883 otherwise it is applied on the right.
884 """
885
886 MAXPARAM = 4
887
888 def _format(me, atp, colonp,
889 mincol = 0, colinc = 1, minpad = 0, padchar = ' '):
890 what = me._convert(me.getarg.get())
891 n = len(what)
892 p = mincol - n - minpad + colinc - 1
893 p -= p%colinc
894 if p < 0: p = 0
895 p += minpad
896 if p <= 0: pass
897 elif atp: what = (p * padchar) + what
898 else: what = what + (p * padchar)
899 FORMAT.write(what)
900
901class FormatString (SimpleFormatOperation):
902 """~A: convert argument to a string."""
903 def _convert(me, arg): return str(arg)
904BASEOPS['A'] = FormatString
905
906class FormatRepr (SimpleFormatOperation):
907 """~S: convert argument to readable form."""
908 def _convert(me, arg): return repr(arg)
909BASEOPS['S'] = FormatRepr
910
911class IntegerFormat (BaseFormatOperation):
912 """
913 Common base class for the integer formatting directives `~D', `~B', `~O~,
914 `~X', and `~R'.
915
916 These take similar parameters, so it's useful to deal with them at the same
917 time. There is a `_convert' method which does the main work. By default,
918 `_format' calls this with the argument and the value of the class attribute
919 `RADIX'; complicated subclasses might want to override this behaviour.
920
921 The parameters are as follows.
922
923 MINCOL Minimum column width. If the output is smaller than this
924 then it will be padded on the left. The default is 0.
925
926 PADCHAR Character to use to pad the output, should this be necessary.
927 The default is space.
928
929 COMMACHAR If the `:' modifier is present, then use this character to
930 separate groups of digits. The default is `,'.
931
932 COMMAINTERVAL If the `:' modifier is present, then separate groups of this
933 many digits. The default is 3.
934
935 If `@' is present, then a sign is always written; otherwise only `-' signs
936 are written.
937 """
938
939 MAXPARAM = 4
940
941 def _convert(me, n, radix, atp, colonp,
942 mincol = 0, padchar = ' ',
943 commachar = ',', commainterval = 3):
944 """
945 Convert the integer N into the given RADIX, under the control of the
946 formatting parameters supplied.
947 """
948
949 ## Sort out the sign. We'll deal with it at the end: for now it's just a
950 ## distraction.
951 if n < 0: sign = '-'; n = -n
952 elif atp: sign = '+'
953 else: sign = None
954
955 ## Build in `dd' a list of the digits, in reverse order. This will make
956 ## the commafication easier later. The general radix conversion is
957 ## inefficient but we can make that better later.
958 def revdigits(s):
959 l = list(s)
960 l.reverse()
961 return l
962 if radix == 10: dd = revdigits(str(n))
963 elif radix == 8: dd = revdigits(oct(n))
964 elif radix == 16: dd = revdigits(hex(n).upper())
965 else:
966 dd = []
967 while n:
968 q, r = divmod(n, radix)
969 if r < 10: ch = asc(ord('0') + r)
970 elif r < 36: ch = asc(ord('A') - 10 + r)
971 else: ch = asc(ord('a') - 36 + r)
972 dd.append(ch)
973 if not dd: dd.append('0')
974
975 ## If we must commafy then do that.
976 if colonp:
977 ndd = []
978 i = 0
979 for d in dd:
980 if i >= commainterval: ndd.append(commachar); i = 0
981 ndd.append(d)
982 dd = ndd
983
984 ## Include the sign.
985 if sign: dd.append(sign)
986
987 ## Maybe we must pad the result.
988 s = ''.join(reversed(dd))
989 npad = mincol - len(s)
990 if npad > 0: s = npad*padchar + s
991
992 ## And we're done.
993 FORMAT.write(s)
994
995 def _format(me, atp, colonp, mincol = 0, padchar = ' ',
996 commachar = ',', commainterval = 3):
997 me._convert(me.getarg.get(), me.RADIX, atp, colonp, mincol, padchar,
998 commachar, commainterval)
999
1000class FormatDecimal (IntegerFormat):
1001 """~D: Decimal formatting."""
1002 RADIX = 10
1003BASEOPS['D'] = FormatDecimal
1004
1005class FormatBinary (IntegerFormat):
1006 """~B: Binary formatting."""
1007 RADIX = 2
1008BASEOPS['B'] = FormatBinary
1009
1010class FormatOctal (IntegerFormat):
1011 """~O: Octal formatting."""
1012 RADIX = 8
1013BASEOPS['O'] = FormatOctal
1014
1015class FormatHex (IntegerFormat):
1016 """~X: Hexadecimal formatting."""
1017 RADIX = 16
1018BASEOPS['X'] = FormatHex
1019
1020class FormatRadix (IntegerFormat):
1021 """~R: General integer formatting."""
1022 MAXPARAM = 5
1023 def _format(me, atp, colonp, radix = None, mincol = 0, padchar = ' ',
1024 commachar = ',', commainterval = 3):
1025 if radix is None:
1026 raise ValueError, 'Not implemented'
1027 me._convert(me.getarg.get(), radix, atp, colonp, mincol, padchar,
1028 commachar, commainterval)
1029BASEOPS['R'] = FormatRadix
1030
1031class FormatSuppressNewline (BaseFormatOperation):
1032 """
1033 ~newline: suppressed newline and/or spaces.
1034
1035 Unless the `@' modifier is present, don't print the newline. Unless the
1036 `:' modifier is present, don't print the following string of whitespace
1037 characters either.
1038 """
1039 R_SPACE = RX.compile(r'\s*')
1040 def __init__(me, *args):
1041 super(FormatSuppressNewline, me).__init__(*args)
1042 m = me.R_SPACE.match(COMPILE.control, COMPILE.start, COMPILE.end)
1043 me.trail = m.group()
1044 COMPILE.start = m.end()
1045 def _format(me, atp, colonp):
1046 if atp: FORMAT.write('\n')
1047 if colonp: FORMAT.write(me.trail)
1048BASEOPS['\n'] = FormatSuppressNewline
1049
1050class LiteralFormat (BaseFormatOperation):
1051 """
1052 A base class for formatting operations which write fixed strings.
1053
1054 Subclasses should have an attribute `CHAR' containing the string (usually a
1055 single character) to be written.
1056
1057 These operations accept a single parameter:
1058
1059 COUNT The number of copies of the string to be written.
1060 """
1061 MAXPARAM = 1
1062 def _format(me, atp, colonp, count = 1):
1063 FORMAT.write(count * me.CHAR)
1064
1065class FormatNewline (LiteralFormat):
1066 """~%: Start a new line."""
1067 CHAR = '\n'
1068BASEOPS['%'] = FormatNewline
1069
1070class FormatTilde (LiteralFormat):
1071 """~~: Print a literal `@'."""
1072 CHAR = '~'
1073BASEOPS['~'] = FormatTilde
1074
1075class FormatCaseConvert (BaseFormatOperation):
1076 """
1077 ~(...~): Case-convert the contained output.
1078
1079 The material output by the contained directives is subject to case
1080 conversion as follows.
1081
1082 no modifiers Convert to lower-case.
1083 @ Make initial letter upper-case and remainder lower.
1084 : Make initial letters of words upper-case.
1085 @: Convert to upper-case.
1086 """
1087 def __init__(me, *args):
1088 super(FormatCaseConvert, me).__init__(*args)
1089 me.sub, _ = collect_subformat(')')
1090 def _format(me, atp, colonp):
1091 strio = StringIO()
1092 try:
1093 with FORMAT.bind(write = strio.write):
1094 me.sub.format()
1095 finally:
1096 inner = strio.getvalue()
1097 if atp:
1098 if colonp: out = inner.upper()
1099 else: out = inner.capitalize()
1100 else:
1101 if colonp: out = inner.title()
1102 else: out = inner.lower()
1103 FORMAT.write(out)
1104BASEOPS['('] = FormatCaseConvert
1105
1106class FormatGoto (BaseFormatOperation):
1107 """
1108 ~*: Seek in positional arguments.
1109
1110 There may be a parameter N; the default value depends on which modifiers
1111 are present. Without `@', skip forwards or backwards by N (default
1112 1) places; with `@', move to argument N (default 0). With `:', negate N,
1113 so move backwards instead of forwards, or count from the end rather than
1114 the beginning. (Exception: `~@:0*' leaves no arguments remaining, whereas
1115 `~@-0*' is the same as `~@0*', and starts again from the beginning.
1116
1117 BUG: The list of pushed-back arguments is cleared.
1118 """
1119 MAXPARAM = 1
1120 def _format(me, atp, colonp, n = None):
1121 if atp:
1122 if n is None: n = 0
1123 if colonp:
1124 if n > 0: n = -n
1125 else: n = len(FORMAT.argseq)
1126 if n < 0: n += len(FORMAT.argseq)
1127 else:
1128 if n is None: n = 1
1129 if colonp: n = -n
1130 n += FORMAT.argpos
1131 FORMAT.argpos = n
1132 FORMAT.pushback = []
1133BASEOPS['*'] = FormatGoto
1134
1135class FormatConditional (BaseFormatOperation):
1136 """
1137 ~[...[~;...]...[~:;...]~]: Conditional formatting.
1138
1139 There are three variants, which are best dealt with separately.
1140
1141 With no modifiers, apply the Nth enclosed piece, where N is either the
1142 parameter, or the argument if no parameter is provided. If there is no
1143 such piece (i.e., N is negative or too large) and the final piece is
1144 introduced by `~:;' then use that piece; otherwise produce no output.
1145
1146 With `:', there must be exactly two pieces: apply the first if the argument
1147 is false, otherwise the second.
1148
1149 With `@', there must be exactly one piece: if the argument is not `None'
1150 then push it back and apply the enclosed piece.
1151 """
1152
1153 MAXPARAM = 1
1154
1155 def __init__(me, *args):
1156
1157 ## Store the arguments.
1158 super(FormatConditional, me).__init__(*args)
1159
1160 ## Collect the pieces, and keep track of whether there's a default piece.
1161 pieces = []
1162 default = None
1163 nextdef = False
1164 while True:
1165 piece, delim = collect_subformat('];')
1166 if nextdef: default = piece
1167 else: pieces.append(piece)
1168 if delim.char == ']': break
1169 if delim.colonp:
1170 if default: format_string_error('multiple defaults')
1171 nextdef = True
1172
1173 ## Make sure the syntax matches the modifiers we've been given.
1174 if (me.colonp or me.atp) and default:
1175 format_string_error('default not allowed here')
1176 if (me.colonp and len(pieces) != 2) or \
1177 (me.atp and len(pieces) != 1):
1178 format_string_error('wrong number of pieces')
1179
1180 ## Store stuff.
1181 me.pieces = pieces
1182 me.default = default
1183
1184 def _format(me, atp, colonp, n = None):
1185 if colonp:
1186 arg = me.getarg.get()
1187 if arg: me.pieces[1].format()
1188 else: me.pieces[0].format()
1189 elif atp:
1190 arg = me.getarg.get()
1191 if arg is not None:
1192 FORMAT.pushback.append(arg)
1193 me.pieces[0].format()
1194 else:
1195 if n is None: n = me.getarg.get()
1196 if 0 <= n < len(me.pieces): piece = me.pieces[n]
1197 else: piece = me.default
1198 if piece: piece.format()
1199BASEOPS['['] = FormatConditional
1200
1201class FormatIteration (BaseFormatOperation):
1202 """
1203 ~{...~}: Repeated formatting.
1204
1205 Repeatedly apply the enclosed formatting directives to a sequence of
1206 different arguments. The directives may contain `~^' to escape early.
1207
1208 Without `@', an argument is fetched and is expected to be a sequence; with
1209 `@', the remaining positional arguments are processed.
1210
1211 Without `:', the enclosed directives are simply applied until the sequence
1212 of arguments is exhausted: each iteration may consume any number of
1213 arguments (even zero, though this is likely a bad plan) and any left over
1214 are available to the next iteration. With `:', each element of the
1215 sequence of arguments is itself treated as a collection of arguments --
1216 either positional or keyword depending on whether it looks like a map --
1217 and exactly one such element is consumed in each iteration.
1218
1219 If a parameter is supplied then perform at most this many iterations. If
1220 the closing delimeter bears a `:' modifier, and the parameter is not zero,
1221 then the enclosed directives are applied once even if the argument sequence
1222 is empty.
1223
1224 If the formatting directives are empty then a formatting string is fetched
1225 using the argument collector associated with the closing delimiter.
1226 """
1227
1228 MAXPARAM = 1
1229
1230 def __init__(me, *args):
1231 super(FormatIteration, me).__init__(*args)
1232 me.body, me.end = collect_subformat('}')
1233
1234 def _multi(me, body):
1235 """
1236 Treat the positional arguments as a sequence of argument sets to be
1237 processed.
1238 """
1239 args = NEXTARG.get()
1240 with U.Escape() as esc:
1241 with bind_args(args, multi_escape = FORMAT.escape, escape = esc,
1242 last_multi_p = not remaining()):
1243 body.format()
1244
1245 def _single(me, body):
1246 """
1247 Format arguments from a single argument sequence.
1248 """
1249 body.format()
1250
1251 def _loop(me, each, max):
1252 """
1253 Apply the function EACH repeatedly. Stop if no positional arguments
1254 remain; if MAX is not `None', then stop after that number of iterations.
1255 The EACH function is passed a formatting operation representing the body
1256 to be applied
1257 """
1258 if me.body.seq: body = me.body
1259 else: body = compile(me.end.getarg.get())
1260 oncep = me.end.colonp
1261 i = 0
1262 while True:
1263 if max is not None and i >= max: break
1264 if (i > 0 or not oncep) and not remaining(): break
1265 each(body)
1266 i += 1
1267
1268 def _format(me, atp, colonp, max = None):
1269 if colonp: each = me._multi
1270 else: each = me._single
1271 with U.Escape() as esc:
1272 with FORMAT.bind(escape = esc):
1273 if atp:
1274 me._loop(each, max)
1275 else:
1276 with bind_args(me.getarg.get()):
1277 me._loop(each, max)
1278BASEOPS['{'] = FormatIteration
1279
1280class FormatEscape (BaseFormatOperation):
1281 """
1282 ~^: Escape from iteration.
1283
1284 Conditionally leave an iteration early.
1285
1286 There may be up to three parameters: call then X, Y and Z. If all three
1287 are present then exit unless Y is between X and Z (inclusive); if two are
1288 present then exit if X = Y; if only one is present, then exit if X is
1289 zero. Obviously these are more useful if at least one of X, Y and Z is
1290 variable.
1291
1292 With no parameters, exit if there are no positional arguments remaining.
1293 With `:', check the number of argument sets (as read by `~:{...~}') rather
1294 than the number of arguments in the current set, and escape from the entire
1295 iteration rather than from the processing the current set.
1296 """
1297 MAXPARAM = 3
1298 def _format(me, atp, colonp, x = None, y = None, z = None):
1299 if z is not None: cond = x <= y <= z
1300 elif y is not None: cond = x != y
1301 elif x is not None: cond = x != 0
1302 elif colonp: cond = not FORMAT.last_multi_p
1303 else: cond = remaining()
1304 if cond: return
1305 if colonp: FORMAT.multi_escape()
1306 else: FORMAT.escape()
1307BASEOPS['^'] = FormatEscape
1308
1309class FormatRecursive (BaseFormatOperation):
1310 """
1311 ~?: Recursive formatting.
1312
1313 Without `@', read a pair of arguments: use the first as a format string,
1314 and apply it to the arguments extracted from the second (which may be a
1315 sequence or a map).
1316
1317 With `@', read a single argument: use it as a format string and apply it to
1318 the remaining arguments.
1319 """
1320 def _format(me, atp, colonp):
1321 with U.Escape() as esc:
1322 if atp:
1323 control = me.getarg.get()
1324 op = compile(control)
1325 with FORMAT.bind(escape = esc): op.format()
1326 else:
1327 control, args = me.getarg.pair()
1328 op = compile(control)
1329 with bind_args(args, escape = esc): op.format()
1330BASEOPS['?'] = FormatRecursive
1331
1332###----- That's all, folks --------------------------------------------------