Commit | Line | Data |
---|---|---|
236f657b MW |
1 | .\" -*-nroff-*- |
2 | .TH codec 3 "9 January 2009" "Straylight/Edgeware" "mLib utilities library" | |
3 | .SH NAME | |
4 | codec \- binary encoding and decoding | |
5 | .\" @codec_class | |
6 | .\" @codec_strerror | |
7 | .\" @null_codec_class | |
8 | .\" @base64_class | |
9 | .\" @file64_class | |
10 | .\" @base64url_class | |
11 | .\" @base32_class | |
12 | .\" @base32hex_class | |
13 | .\" @hex_class | |
14 | .SH SYNOPSIS | |
15 | .nf | |
16 | .B "#include <mLib/codec.h>" | |
17 | .B "#include <mLib/base64.h>" | |
18 | .B "#include <mLib/base32.h>" | |
19 | .B "#include <mLib/hex.h>" | |
20 | ||
21 | .B "codec_class null_codec_class;" | |
22 | .B "codec_class base64_class, file64_class, base64url_class;" | |
23 | .B "codec_class base32_class, base32hex_class;" | |
24 | .B "codec_class hex_class;" | |
25 | ||
26 | .BI "const char *codec_strerror(int " err ");" | |
27 | .fi | |
28 | .SH DESCRIPTION | |
29 | The | |
30 | .B codec | |
31 | system provides an object-based interface to functions which encode | |
32 | binary data as plain text and decode the result to recover the original | |
33 | binary data. The interface makes it easy to support multiple encodings | |
34 | and select an appropriate one at runtime. | |
35 | .SS "The codec_class structure" | |
36 | The | |
37 | .B codec_class | |
38 | structure represents a particular encoding format. The structure has | |
39 | the following members. | |
40 | .TP | |
41 | .B "const char *name" | |
42 | The name of the class, as a null-terminated string. The name should not | |
43 | contain whitespace characters. | |
44 | .TP | |
45 | .BI "codec *(*encoder)(unsigned " flags ", const char *" indent ", unsigned " maxline ")" | |
46 | Pointer to a function which constructs a new encoder object, of type | |
47 | .BR codec . | |
48 | The | |
49 | .I flags | |
50 | configure the behaviour of the object; the | |
51 | .I indent | |
52 | string is written to separate lines of output; the integer | |
53 | .I maxline | |
54 | is the maximum length of line to be produced, or zero to forbid line | |
55 | breaking. | |
56 | .TP | |
57 | .BI "codec *(*decoder)(unsigned " flags ")" | |
58 | Pointer to a function which constructs a new decoder object, also of | |
59 | type | |
60 | .BR codec . | |
61 | The | |
62 | .I flags | |
63 | configure the behaviour of the object. | |
64 | .PP | |
65 | The | |
66 | .I flags | |
67 | to the | |
68 | .B encoder | |
69 | and | |
70 | .B decoder | |
71 | functions have the following meanings. | |
72 | .TP | |
73 | .B CDCF_LOWERC | |
74 | For codecs which produce output using a single alphabetic case (e.g., | |
75 | .BR base32 , | |
76 | .BR hex ), | |
77 | emit and accept only lower case; the default to emit and accept only | |
78 | upper case, for compatibility with RFC4648. If the codec usually | |
79 | produces mixed-case output, then this flag is ignored. | |
80 | .TP | |
81 | .B CDCF_IGNCASE | |
82 | For codecs which produce output using a single alphabetic case, ignore | |
83 | the case of the input when decoding. If the codec usually produces | |
84 | mixed-case output, then this flag is ignored. | |
85 | .TP | |
86 | .B CDCF_NOEQPAD | |
87 | For codecs which usually pad their output (e.g., | |
88 | .BR base64 , | |
89 | .BR base32 ), | |
90 | do not emit or accept padding characters. If the codec does not usually | |
91 | produce padding, or the padding is not redundant, then this flag is | |
92 | ignored. | |
93 | .TP | |
94 | .B CDCF_IGNEQPAD | |
95 | For codecs which usually pad their output, do not treat incorrect (e.g., | |
96 | missing or excessive) padding as an error when decoding. If the codec | |
97 | does not usually produce padding, or the padding is required for | |
98 | unambiguous decoding, then this flag is ignored. | |
99 | .TP | |
100 | .B CDCF_IGNEQMID | |
101 | For codecs which usually pad their output, ignore padding characters | |
102 | wherever they may appear when decoding. Usually padding characters | |
103 | indicate the end of the input, and further input characters are | |
104 | considered erroneous. If the codec does not usually produce padding, or | |
105 | it is impossible to resume decoding correctly having seen padding | |
106 | characters, then this flag is ignored. | |
107 | .TP | |
108 | .B CDCF_IGNZPAD | |
109 | For codecs which need to pad their input, ignore unusual padding bits | |
110 | when decoding. (This is not at all the same thing as the padding | |
111 | characters controlled by the flags above: they deal with padding the | |
112 | length of the encoding | |
113 | .I output | |
114 | up to a suitable multiple of characters; this option deals with padding | |
115 | of the | |
116 | .I input | |
117 | prior to encoding.) If the codec does not add padding bits, or specific | |
118 | values are required for unambiguous decoding, then this flag is ignored. | |
119 | .TP | |
120 | .B CDCF_IGNNEWL | |
121 | Ignore newline (and carriage-return) characters when decoding: the | |
122 | default for RFC4648 codecs is to reject newline characters. If these | |
123 | characters are significant in the encoding, then this flag is ignored. | |
124 | .TP | |
09fbf4d0 MW |
125 | .B CDCF_IGNSPC |
126 | Ignore whitespace characters (other than newlines) when decoding: the | |
127 | default for RFC4648 codecs is to reject whitespace characters. If these | |
128 | characters are significant in the encoding, then this flag is ignored. | |
129 | .TP | |
236f657b MW |
130 | .B CDCF_IGNINVCH |
131 | Ignore any other invalid characters appearing in the input when | |
132 | decoding. | |
133 | .TP | |
134 | .B CDCF_IGNJUNK | |
135 | Ignore all `junk' in the input. This should suppress almost all | |
136 | decoding errors. | |
137 | .PP | |
138 | If you do not set any of the | |
139 | .BR CDCF_IGN ... | |
140 | flags, a decoder should only accept the exact encoding that the | |
141 | corresponding encoder would produce (with | |
142 | .I maxline | |
143 | = 0 to inhibit line-breaking). | |
144 | .SS "The codec and codec_ops structures" | |
145 | The | |
146 | .B codec | |
147 | structure represents the state of an encoder or decoder, as returned by | |
148 | the | |
149 | .B encoder | |
150 | and | |
151 | .B decoder | |
152 | functions described above, contains a single member. | |
153 | .TP | |
154 | .B "const codec_ops *ops" | |
155 | Pointer to a | |
156 | .B codec_ops | |
157 | structure which contains operations and metadata for use with the | |
158 | encoder or decoder. | |
159 | .PP | |
160 | The | |
161 | .B codec_ops | |
162 | structure contains the following members. | |
163 | .TP | |
164 | .B "const codec_class *c" | |
165 | Pointer back to the | |
166 | .B codec_class | |
167 | which was used to construct the | |
168 | .B codec | |
169 | object. | |
170 | .TP | |
171 | .BI "int (*code)(codec *" c ", const void *" p ", size_t " sz ", dstr *" d ")" | |
172 | Encode or decode, using the codec | |
173 | .I c , | |
174 | the data in the buffer at address | |
175 | .I p | |
176 | and continuing for | |
177 | .I sz | |
178 | bytes, appending the output to the dynamic string | |
179 | .I d | |
180 | (see | |
181 | .BR dstr (3)). | |
182 | If the operation was successful, the function returns zero; otherwise it | |
183 | returns a nonzero error code, as described below. | |
184 | .TP | |
185 | .BI "void (*destroy)(codec *" c ")" | |
186 | Destroy the codec object | |
187 | .IR c , | |
188 | freeing any resources it may hold. | |
189 | .PP | |
190 | A codec may buffer its input (e.g., if needs to see more in order to | |
191 | decide what output to produce next); it may also need to take special | |
192 | action at the end of the input (e.g., flushing buffers, and applying | |
193 | padding). To signal the codec that there is no more input, call the | |
194 | .B code | |
195 | function with a null | |
196 | .I p | |
197 | pointer. It will then write any final output to | |
198 | .IR d . | |
199 | .PP | |
200 | The following error conditions may be reported. | |
201 | .TP | |
202 | .B CDCERR_INVCH | |
203 | An invalid character was encountered while decoding. This includes | |
204 | encoutering padding characters if padding is disabled using the | |
205 | .B CDCF_NOEQPAD | |
206 | flag. | |
207 | .TP | |
208 | .B CDCERR_INVEQPAD | |
209 | Invalid padding characters (e.g., wrong characters, or too few, too | |
210 | many, or none at all) were found during decoding. This may also | |
211 | indicate that the input is truncated, even if the codec does not usually | |
212 | perform output padding. | |
213 | .TP | |
214 | .B CDCERR_INVZPAD | |
215 | Invalid padding bits were found during decoding. | |
216 | .PP | |
217 | The | |
218 | .B codec_strerror | |
219 | function converts these error codes to brief, (moderately) | |
220 | human-readable strings. | |
221 | .SS "Provided codecs" | |
222 | The library provides a number of standard codecs. | |
223 | .TP | |
224 | .B base64 | |
225 | Implements Base64 encoding, as defined by RFC4648. Output is | |
226 | mixed-case, so the | |
227 | .B CDCF_LOWERC | |
228 | and | |
229 | .B CDCF_IGNCASE | |
230 | flags are ignored. | |
231 | .TP | |
232 | .B safe64 | |
233 | Implements a variant of the Base64 encoding which uses | |
234 | .RB ` % ' | |
235 | in place of | |
236 | .RB ` / ', | |
237 | so that its output is suitable for use as a Unix filename. | |
238 | .TP | |
239 | .B base64url | |
240 | Implements the filename- and URL-safe variant of Base64 encoding, as | |
241 | defined by RFC4648. | |
242 | .TP | |
243 | .B base32 | |
244 | Implements Base32 encoding, as defined by RFC4648. Output is in upper | |
245 | case by default. | |
246 | .TP | |
247 | .B base32hex | |
248 | Implements the extended-hex variant of Base32, as defined by RFC4648. | |
249 | This encoding has the property that the encoding preserves the ordering | |
250 | of messages if padding is suppressed. | |
251 | .TP | |
252 | .B hex | |
253 | Implements hex encoding, defined by RFC4648 under the name Base16. For | |
254 | compatibility with that specification, output is in upper case by | |
255 | default. | |
256 | .SH "SEE ALSO" | |
257 | .BR bincode (1), | |
258 | .BR dstr (3), | |
259 | .BR mLib (3). | |
260 | .SH AUTHOR | |
261 | Mark Wooding, <mdw@distorted.org.uk> |