chiark - git - mdw - mLib/blame_incremental

... / ...

Commit	Line	Data
	1	.\" --nroff--
	2	.TH codec 3 "9 January 2009" "Straylight/Edgeware" "mLib utilities library"
	3	.SH NAME
	4	codec \- binary encoding and decoding
	5	.\" @codec_class
	6	.\" @codec_strerror
	7	.\" @null_codec_class
	8	.\" @base64_class
	9	.\" @file64_class
	10	.\" @base64url_class
	11	.\" @base32_class
	12	.\" @base32hex_class
	13	.\" @hex_class
	14	.SH SYNOPSIS
	15	.nf
	16	.B "#include <mLib/codec.h>"
	17	.B "#include <mLib/base64.h>"
	18	.B "#include <mLib/base32.h>"
	19	.B "#include <mLib/hex.h>"
	20
	21	.B "codec_class null_codec_class;"
	22	.B "codec_class base64_class, file64_class, base64url_class;"
	23	.B "codec_class base32_class, base32hex_class;"
	24	.B "codec_class hex_class;"
	25
	26	.BI "const char *codec_strerror(int " err ");"
	27	.fi
	28	.SH DESCRIPTION
	29	The
	30	.B codec
	31	system provides an object-based interface to functions which encode
	32	binary data as plain text and decode the result to recover the original
	33	binary data. The interface makes it easy to support multiple encodings
	34	and select an appropriate one at runtime.
	35	.SS "The codec_class structure"
	36	The
	37	.B codec_class
	38	structure represents a particular encoding format. The structure has
	39	the following members.
	40	.TP
	41	.B "const char *name"
	42	The name of the class, as a null-terminated string. The name should not
	43	contain whitespace characters.
	44	.TP
	45	.BI "codec (encoder)(unsigned " flags ", const char *" indent ", unsigned " maxline ")"
	46	Pointer to a function which constructs a new encoder object, of type
	47	.BR codec .
	48	The
	49	.I flags
	50	configure the behaviour of the object; the
	51	.I indent
	52	string is written to separate lines of output; the integer
	53	.I maxline
	54	is the maximum length of line to be produced, or zero to forbid line
	55	breaking.
	56	.TP
	57	.BI "codec (decoder)(unsigned " flags ")"
	58	Pointer to a function which constructs a new decoder object, also of
	59	type
	60	.BR codec .
	61	The
	62	.I flags
	63	configure the behaviour of the object.
	64	.PP
	65	The
	66	.I flags
	67	to the
	68	.B encoder
	69	and
	70	.B decoder
	71	functions have the following meanings.
	72	.TP
	73	.B CDCF_LOWERC
	74	For codecs which produce output using a single alphabetic case (e.g.,
	75	.BR base32 ,
	76	.BR hex ),
	77	emit and accept only lower case; the default to emit and accept only
	78	upper case, for compatibility with RFC4648. If the codec usually
	79	produces mixed-case output, then this flag is ignored.
	80	.TP
	81	.B CDCF_IGNCASE
	82	For codecs which produce output using a single alphabetic case, ignore
	83	the case of the input when decoding. If the codec usually produces
	84	mixed-case output, then this flag is ignored.
	85	.TP
	86	.B CDCF_NOEQPAD
	87	For codecs which usually pad their output (e.g.,
	88	.BR base64 ,
	89	.BR base32 ),
	90	do not emit or accept padding characters. If the codec does not usually
	91	produce padding, or the padding is not redundant, then this flag is
	92	ignored.
	93	.TP
	94	.B CDCF_IGNEQPAD
	95	For codecs which usually pad their output, do not treat incorrect (e.g.,
	96	missing or excessive) padding as an error when decoding. If the codec
	97	does not usually produce padding, or the padding is required for
	98	unambiguous decoding, then this flag is ignored.
	99	.TP
	100	.B CDCF_IGNEQMID
	101	For codecs which usually pad their output, ignore padding characters
	102	wherever they may appear when decoding. Usually padding characters
	103	indicate the end of the input, and further input characters are
	104	considered erroneous. If the codec does not usually produce padding, or
	105	it is impossible to resume decoding correctly having seen padding
	106	characters, then this flag is ignored.
	107	.TP
	108	.B CDCF_IGNZPAD
	109	For codecs which need to pad their input, ignore unusual padding bits
	110	when decoding. (This is not at all the same thing as the padding
	111	characters controlled by the flags above: they deal with padding the
	112	length of the encoding
	113	.I output
	114	up to a suitable multiple of characters; this option deals with padding
	115	of the
	116	.I input
	117	prior to encoding.) If the codec does not add padding bits, or specific
	118	values are required for unambiguous decoding, then this flag is ignored.
	119	.TP
	120	.B CDCF_IGNNEWL
	121	Ignore newline (and carriage-return) characters when decoding: the
	122	default for RFC4648 codecs is to reject newline characters. If these
	123	characters are significant in the encoding, then this flag is ignored.
	124	.TP
	125	.B CDCF_IGNINVCH
	126	Ignore any other invalid characters appearing in the input when
	127	decoding.
	128	.TP
	129	.B CDCF_IGNJUNK
	130	Ignore all `junk' in the input. This should suppress almost all
	131	decoding errors.
	132	.PP
	133	If you do not set any of the
	134	.BR CDCF_IGN ...
	135	flags, a decoder should only accept the exact encoding that the
	136	corresponding encoder would produce (with
	137	.I maxline
	138	= 0 to inhibit line-breaking).
	139	.SS "The codec and codec_ops structures"
	140	The
	141	.B codec
	142	structure represents the state of an encoder or decoder, as returned by
	143	the
	144	.B encoder
	145	and
	146	.B decoder
	147	functions described above, contains a single member.
	148	.TP
	149	.B "const codec_ops *ops"
	150	Pointer to a
	151	.B codec_ops
	152	structure which contains operations and metadata for use with the
	153	encoder or decoder.
	154	.PP
	155	The
	156	.B codec_ops
	157	structure contains the following members.
	158	.TP
	159	.B "const codec_class *c"
	160	Pointer back to the
	161	.B codec_class
	162	which was used to construct the
	163	.B codec
	164	object.
	165	.TP
	166	.BI "int (code)(codec " c ", const void " p ", size_t " sz ", dstr " d ")"
	167	Encode or decode, using the codec
	168	.I c ,
	169	the data in the buffer at address
	170	.I p
	171	and continuing for
	172	.I sz
	173	bytes, appending the output to the dynamic string
	174	.I d
	175	(see
	176	.BR dstr (3)).
	177	If the operation was successful, the function returns zero; otherwise it
	178	returns a nonzero error code, as described below.
	179	.TP
	180	.BI "void (destroy)(codec " c ")"
	181	Destroy the codec object
	182	.IR c ,
	183	freeing any resources it may hold.
	184	.PP
	185	A codec may buffer its input (e.g., if needs to see more in order to
	186	decide what output to produce next); it may also need to take special
	187	action at the end of the input (e.g., flushing buffers, and applying
	188	padding). To signal the codec that there is no more input, call the
	189	.B code
	190	function with a null
	191	.I p
	192	pointer. It will then write any final output to
	193	.IR d .
	194	.PP
	195	The following error conditions may be reported.
	196	.TP
	197	.B CDCERR_INVCH
	198	An invalid character was encountered while decoding. This includes
	199	encoutering padding characters if padding is disabled using the
	200	.B CDCF_NOEQPAD
	201	flag.
	202	.TP
	203	.B CDCERR_INVEQPAD
	204	Invalid padding characters (e.g., wrong characters, or too few, too
	205	many, or none at all) were found during decoding. This may also
	206	indicate that the input is truncated, even if the codec does not usually
	207	perform output padding.
	208	.TP
	209	.B CDCERR_INVZPAD
	210	Invalid padding bits were found during decoding.
	211	.PP
	212	The
	213	.B codec_strerror
	214	function converts these error codes to brief, (moderately)
	215	human-readable strings.
	216	.SS "Provided codecs"
	217	The library provides a number of standard codecs.
	218	.TP
	219	.B base64
	220	Implements Base64 encoding, as defined by RFC4648. Output is
	221	mixed-case, so the
	222	.B CDCF_LOWERC
	223	and
	224	.B CDCF_IGNCASE
	225	flags are ignored.
	226	.TP
	227	.B safe64
	228	Implements a variant of the Base64 encoding which uses
	229	.RB ` % '
	230	in place of
	231	.RB ` / ',
	232	so that its output is suitable for use as a Unix filename.
	233	.TP
	234	.B base64url
	235	Implements the filename- and URL-safe variant of Base64 encoding, as
	236	defined by RFC4648.
	237	.TP
	238	.B base32
	239	Implements Base32 encoding, as defined by RFC4648. Output is in upper
	240	case by default.
	241	.TP
	242	.B base32hex
	243	Implements the extended-hex variant of Base32, as defined by RFC4648.
	244	This encoding has the property that the encoding preserves the ordering
	245	of messages if padding is suppressed.
	246	.TP
	247	.B hex
	248	Implements hex encoding, defined by RFC4648 under the name Base16. For
	249	compatibility with that specification, output is in upper case by
	250	default.
	251	.SH "SEE ALSO"
	252	.BR bincode (1),
	253	.BR dstr (3),
	254	.BR mLib (3).
	255	.SH AUTHOR
	256	Mark Wooding, <mdw@distorted.org.uk>