chiark - git - mdw - mLib/blob - codec/codec.3

   1 .\" -*-nroff-*-
   2 .TH codec 3 "9 January 2009" "Straylight/Edgeware" "mLib utilities library"
   3 .SH NAME
   4 codec \- binary encoding and decoding
   5 .\" @codec_class
   6 .\" @codec_strerror
   7 .\" @null_codec_class
   8 .\" @base64_class
   9 .\" @file64_class
  10 .\" @base64url_class
  11 .\" @base32_class
  12 .\" @base32hex_class
  13 .\" @hex_class
  14 .SH SYNOPSIS
  15 .nf
  16 .B "#include <mLib/codec.h>"
  17 .B "#include <mLib/base64.h>"
  18 .B "#include <mLib/base32.h>"
  19 .B "#include <mLib/hex.h>"
  20
  21 .B "codec_class null_codec_class;"
  22 .B "codec_class base64_class, file64_class, base64url_class;"
  23 .B "codec_class base32_class, base32hex_class;"
  24 .B "codec_class hex_class;"
  25
  26 .BI "const char *codec_strerror(int " err ");"
  27 .fi
  28 .SH DESCRIPTION
  29 The
  30 .B codec
  31 system provides an object-based interface to functions which encode
  32 binary data as plain text and decode the result to recover the original
  33 binary data.  The interface makes it easy to support multiple encodings
  34 and select an appropriate one at runtime.
  35 .SS "The codec_class structure"
  36 The
  37 .B codec_class
  38 structure represents a particular encoding format.  The structure has
  39 the following members.
  40 .TP
  41 .B "const char *name"
  42 The name of the class, as a null-terminated string.  The name should not
  43 contain whitespace characters.
  44 .TP
  45 .BI "codec *(*encoder)(unsigned " flags ", const char *" indent ", unsigned " maxline ")"
  46 Pointer to a function which constructs a new encoder object, of type
  47 .BR codec .
  48 The
  49 .I flags
  50 configure the behaviour of the object; the
  51 .I indent
  52 string is written to separate lines of output; the integer
  53 .I maxline
  54 is the maximum length of line to be produced, or zero to forbid line
  55 breaking.
  56 .TP
  57 .BI "codec *(*decoder)(unsigned " flags ")"
  58 Pointer to a function which constructs a new decoder object, also of
  59 type
  60 .BR codec .
  61 The
  62 .I flags
  63 configure the behaviour of the object.
  64 .PP
  65 The
  66 .I flags
  67 to the
  68 .B encoder
  69 and
  70 .B decoder
  71 functions have the following meanings.
  72 .TP
  73 .B CDCF_LOWERC
  74 For codecs which produce output using a single alphabetic case (e.g.,
  75 .BR base32 ,
  76 .BR hex ),
  77 emit and accept only lower case; the default to emit and accept only
  78 upper case, for compatibility with RFC4648.  If the codec usually
  79 produces mixed-case output, then this flag is ignored.
  80 .TP
  81 .B CDCF_IGNCASE
  82 For codecs which produce output using a single alphabetic case, ignore
  83 the case of the input when decoding.  If the codec usually produces
  84 mixed-case output, then this flag is ignored.
  85 .TP
  86 .B CDCF_NOEQPAD
  87 For codecs which usually pad their output (e.g.,
  88 .BR base64 ,
  89 .BR base32 ),
  90 do not emit or accept padding characters.  If the codec does not usually
  91 produce padding, or the padding is not redundant, then this flag is
  92 ignored.
  93 .TP
  94 .B CDCF_IGNEQPAD
  95 For codecs which usually pad their output, do not treat incorrect (e.g.,
  96 missing or excessive) padding as an error when decoding.  If the codec
  97 does not usually produce padding, or the padding is required for
  98 unambiguous decoding, then this flag is ignored.
  99 .TP
 100 .B CDCF_IGNEQMID
 101 For codecs which usually pad their output, ignore padding characters
 102 wherever they may appear when decoding.  Usually padding characters
 103 indicate the end of the input, and further input characters are
 104 considered erroneous.  If the codec does not usually produce padding, or
 105 it is impossible to resume decoding correctly having seen padding
 106 characters, then this flag is ignored.
 107 .TP
 108 .B CDCF_IGNZPAD
 109 For codecs which need to pad their input, ignore unusual padding bits
 110 when decoding.  (This is not at all the same thing as the padding
 111 characters controlled by the flags above: they deal with padding the
 112 length of the encoding
 113 .I output
 114 up to a suitable multiple of characters; this option deals with padding
 115 of the
 116 .I input
 117 prior to encoding.)  If the codec does not add padding bits, or specific
 118 values are required for unambiguous decoding, then this flag is ignored.
 119 .TP
 120 .B CDCF_IGNNEWL
 121 Ignore newline (and carriage-return) characters when decoding: the
 122 default for RFC4648 codecs is to reject newline characters.  If these
 123 characters are significant in the encoding, then this flag is ignored.
 124 .TP
 125 .B CDCF_IGNSPC
 126 Ignore whitespace characters (other than newlines) when decoding: the
 127 default for RFC4648 codecs is to reject whitespace characters.  If these
 128 characters are significant in the encoding, then this flag is ignored.
 129 .TP
 130 .B CDCF_IGNINVCH
 131 Ignore any other invalid characters appearing in the input when
 132 decoding.
 133 .TP
 134 .B CDCF_IGNJUNK
 135 Ignore all `junk' in the input.  This should suppress almost all
 136 decoding errors.
 137 .PP
 138 If you do not set any of the
 139 .BR CDCF_IGN ...
 140 flags, a decoder should only accept the exact encoding that the
 141 corresponding encoder would produce (with
 142 .I maxline
 143 = 0 to inhibit line-breaking).
 144 .SS "The codec and codec_ops structures"
 145 The
 146 .B codec
 147 structure represents the state of an encoder or decoder, as returned by
 148 the
 149 .B encoder
 150 and
 151 .B decoder
 152 functions described above, contains a single member.
 153 .TP
 154 .B "const codec_ops *ops"
 155 Pointer to a
 156 .B codec_ops
 157 structure which contains operations and metadata for use with the
 158 encoder or decoder.
 159 .PP
 160 The
 161 .B codec_ops
 162 structure contains the following members.
 163 .TP
 164 .B "const codec_class *c"
 165 Pointer back to the
 166 .B codec_class
 167 which was used to construct the
 168 .B codec
 169 object.
 170 .TP
 171 .BI "int (*code)(codec *" c ", const void *" p ", size_t " sz ", dstr *" d ")"
 172 Encode or decode, using the codec
 173 .I c ,
 174 the data in the buffer at address
 175 .I p
 176 and continuing for
 177 .I sz
 178 bytes, appending the output to the dynamic string
 179 .I d
 180 (see
 181 .BR dstr (3)).
 182 If the operation was successful, the function returns zero; otherwise it
 183 returns a nonzero error code, as described below.
 184 .TP
 185 .BI "void (*destroy)(codec *" c ")"
 186 Destroy the codec object
 187 .IR c ,
 188 freeing any resources it may hold.
 189 .PP
 190 A codec may buffer its input (e.g., if needs to see more in order to
 191 decide what output to produce next); it may also need to take special
 192 action at the end of the input (e.g., flushing buffers, and applying
 193 padding).  To signal the codec that there is no more input, call the
 194 .B code
 195 function with a null
 196 .I p
 197 pointer.  It will then write any final output to
 198 .IR d .
 199 .PP
 200 The following error conditions may be reported.
 201 .TP
 202 .B CDCERR_INVCH
 203 An invalid character was encountered while decoding.  This includes
 204 encoutering padding characters if padding is disabled using the
 205 .B CDCF_NOEQPAD
 206 flag.
 207 .TP
 208 .B CDCERR_INVEQPAD
 209 Invalid padding characters (e.g., wrong characters, or too few, too
 210 many, or none at all) were found during decoding.  This may also
 211 indicate that the input is truncated, even if the codec does not usually
 212 perform output padding.
 213 .TP
 214 .B CDCERR_INVZPAD
 215 Invalid padding bits were found during decoding.
 216 .PP
 217 The
 218 .B codec_strerror
 219 function converts these error codes to brief, (moderately)
 220 human-readable strings.
 221 .SS "Provided codecs"
 222 The library provides a number of standard codecs.
 223 .TP
 224 .B base64
 225 Implements Base64 encoding, as defined by RFC4648.  Output is
 226 mixed-case, so the
 227 .B CDCF_LOWERC
 228 and
 229 .B CDCF_IGNCASE
 230 flags are ignored.
 231 .TP
 232 .B safe64
 233 Implements a variant of the Base64 encoding which uses
 234 .RB ` % '
 235 in place of
 236 .RB ` / ',
 237 so that its output is suitable for use as a Unix filename.
 238 .TP
 239 .B base64url
 240 Implements the filename- and URL-safe variant of Base64 encoding, as
 241 defined by RFC4648.
 242 .TP
 243 .B base32
 244 Implements Base32 encoding, as defined by RFC4648.  Output is in upper
 245 case by default.
 246 .TP
 247 .B base32hex
 248 Implements the extended-hex variant of Base32, as defined by RFC4648.
 249 This encoding has the property that the encoding preserves the ordering
 250 of messages if padding is suppressed.
 251 .TP
 252 .B hex
 253 Implements hex encoding, defined by RFC4648 under the name Base16.  For
 254 compatibility with that specification, output is in upper case by
 255 default.
 256 .SH "SEE ALSO"
 257 .BR bincode (1),
 258 .BR dstr (3),
 259 .BR mLib (3).
 260 .SH AUTHOR
 261 Mark Wooding, <mdw@distorted.org.uk>