[mLib] / struct / sym.3

.\" -*-nroff-*-
.de VS
.sp 1
.RS
.nf
.ft B
..
.de VE
.ft R
.fi
.RE
.sp 1
..
.TH sym 3 "8 May 1999" "Straylight/Edgeware" "mLib utilities library"
.SH NAME
sym \- symbol table manager
.\" @sym_create
.\" @sym_destroy
.\" @sym_find
.\" @sym_remove
.\" @sym_mkiter
.\" @sym_next
.\"
.\" @SYM_NAME
.\" @SYM_LEN
.\" @SYM_HASH
.\"
.SH SYNOPSIS
.nf
.B "#include <mLib/sym.h>"

.BI "void sym_create(sym_table *" t );
.BI "void sym_destroy(sym_table *" t );

.BI "void *sym_find(sym_table *" t ,
.BI "               const char *" n ", long " l ,
.BI "               size_t " sz ", unsigned *" f );
.BI "void sym_remove(sym_table *" t ", void *" b );

.BI "const char *SYM_NAME(const void *" p );
.BI "size_t SYM_LEN(const void *" p );
.BI "uint32 SYM_HASH(const void *" p );

.BI "void sym_mkiter(sym_iter *" i ", sym_table *" t );
.BI "void *sym_next(sym_iter *" i );
.fi
.SH "DESCRIPTION"
The
.B sym
functions implement a data structure often described as a dictionary, a
finite map, an associative array, or a symbol table.  It associates
.I values
with
.I keys
such that the value corresponding to a given key can be found quickly.
Additionally, all stored associations can be enumerated.
.PP
The interface provides an
.I intrusive
symbol table.  The data objects stored in the table must include a small
header used by the symbol table manager.  This reduces the amount of
pointer fiddling that needs to be done, and in practice doesn't seem to
be much of a problem.  It's also fairly easy to construct a
non-intrusive interface if you really want one.
.PP
There are three main data structures involved in the interface:
.TP
.B sym_table
Keeps track of the information associated with a particular table.
.TP
.B sym_base
The header which must be attached to the front of all the value
objects.
.TP
.B sym_iter
An iterator object, used for enumerating all of the associations stored
in a symbol table.
.PP
All of the above data structures should be considered
.IR opaque :
don't try looking inside.  Representations have changed in the past, and
they may change again in the future.
.SS "Creation and destruction"
The
.B sym_table
object itself needs to be allocated by the caller.  It is initialized by
passing it to the function
.BR sym_create .
After initialization, the table contains no entries.
.PP
Initializing a symbol table involves allocating some memory.  If this
allocation fails, an
.B EXC_NOMEM
exception is raised.
.PP
When a symbol table is no longer needed, the memory occupied by the
values and other maintenance structures can be reclaimed by calling
.BR sym_destroy .
Any bits of user data attached to values should previously have been
destroyed.
.SS "Adding, searching and removing"
Most of the actual work is done by the function
.BR sym_find .
It does both lookup and creation, depending on its arguments.  To do its
job, it needs to know the following bits of information:
.TP
.BI "sym_table *" t
A pointer to a symbol table to manipulate.
.TP
.BI "const char *" n
The address of the
.I key
to look up or create.  Usually this will be a simple text string,
although it can actually be any arbitrary binary data.
.TP
.BI "long " l
The length of the key.  If this is \-1,
.B sym_find
assumes that the key is a null-terminated string, and calculates its
length itself.  This is entirely equivalent to passing
.BI strlen( n )\fR.
.TP
.BI "size_t " sz
The size of the value block to allocate if the key could not be found.
If this is zero, no value is allocated, and a null pointer is returned
to indicate an unsuccessful lookup.
.TP
.BI "unsigned *" f
The address of a `found' flag to set.  This is an output parameter.  On
exit,
.B sym_find
will set the value of
.BI * f
to zero if the key could not be found, or nonzero if it was found.  This
can be used to tell whether the value returned has been newly allocated,
or whether it was already in the table.
.PP
A terminating null byte is appended to the copy of the symbol's name in
memory.  This is not considered to be a part of the symbol's name, and
does not contribute to the name's length as reported by the
.B SYM_LEN
macro.
.PP
A symbol can be removed from the table by calling
.BR sym_remove ,
passing the symbol table itself, and the value block that needs
removing.
.SS "Enquiries about symbols"
Three macros are provided to enable simple enquiries about a symbol.
Given a pointer
.I s
to a symbol table entry,
.BI SYM_LEN( s )
returns the length of the symbol's name (excluding any terminating null
byte);
.BI SYM_NAME( s )
returns a pointer to the symbol's name; and
.BI SYM_HASH( s )
returns the symbol's hash value.
.SS "Enumerating symbols"
Enumerating the values in a symbol table is fairly simple.  Allocate a
.B sym_iter
object from somewhere.  Attach it to a symbol table by calling
.BR sym_mkiter ,
and passing in the addresses of the iterator and the symbol table.
Then, each call to
.B sym_next
will return a different value from the symbol table, until all of them
have been enumerated, at which point,
.B sym_next
returns a null pointer.
.PP
It's safe to remove the symbol you've just been returned by
.BR sym_next .
However, it's not safe to remove any other symbol.  So don't do that.
.PP
When you've finished with an iterator, it's safe to just throw it away.
You don't need to call any functions beforehand.
.SS "Use in practice"
In normal use, the keys are simple strings (usually identifiers from
some language), and the values are nontrivial structures providing
information about types and values.
.PP
In this case, you'd define something like the following structure for
your values:
.VS
typedef struct val {
  sym_base _base;	/* Symbol header */
  unsigned type;	/* Type of this symbol */
  int dispoff;		/* Which display variable is in */
  size_t frameoff;	/* Offset of variable in frame */
} val;
.VE
Given a pointer
.I v
to a
.BR val ,
you can find the variable's name by calling
.BI SYM_NAME( v )\fR.
.PP
You can look up a name in the table by saying something like:
.VS
val *v = sym_find(t, name, -1, 0, 0);
if (!v)
  error("unknown variable `%s'", name);
.VE
You can add in a new variable by saying something like
.VS
unsigned f;
val *v = sym_find(t, name, -1, sizeof(val), &f);
if (f)
  error("variable `%s' already exists", name);
/* fill in v */
.VE
You can examine all the variables in your symbol table by saying
something like:
.VS
sym_iter i;
val *v;

for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
  /* ... */
}
.VE
That ought to be enough examples to be getting on with.
.SS Implementation
The symbol table is an extensible hashtable, using the universal hash
function described in
.BR unihash (3)
and the global hashing key.  The hash chains are kept very short
(probably too short, actually).  Every time a symbol is found, its block
is promoted to the front of its bin chain so it gets found faster next
time.
.SH SEE ALSO
.BR hash (3),
.BR mLib (3).
.SH AUTHOR
Mark Wooding, <mdw@distorted.org.uk>
Commit	Line	Data
b6b9d458	1	.\" --nroff--
	2	.de VS
	3	.sp 1
d66d7727	4	.RS
b6b9d458	5	.nf
	6	.ft B
	7	..
	8	.de VE
	9	.ft R
	10	.fi
	11	.RE
	12	.sp 1
	13	..
fbf20b5b	14	.TH sym 3 "8 May 1999" "Straylight/Edgeware" "mLib utilities library"
b6b9d458	15	.SH NAME
b6b9d458	16	sym \- symbol table manager
08da152e	17	.\" @sym_create
	18	.\" @sym_destroy
	19	.\" @sym_find
	20	.\" @sym_remove
	21	.\" @sym_mkiter
	22	.\" @sym_next
	23	.\"
	24	.\" @SYM_NAME
0c404077	25	.\" @SYM_LEN
0c404077	26	.\" @SYM_HASH
08da152e	27	.\"
b6b9d458	28	.SH SYNOPSIS
	29	.nf
	30	.B "#include <mLib/sym.h>"
	31
	32	.BI "void sym_create(sym_table *" t );
	33	.BI "void sym_destroy(sym_table *" t );
	34
b6b9d458	35	.BI "void sym_find(sym_table " t ,
	36	.BI " const char *" n ", long " l ,
	37	.BI " size_t " sz ", unsigned *" f );
	38	.BI "void sym_remove(sym_table " t ", void " b );
	39
0c404077	40	.BI "const char SYM_NAME(const void " p );
	41	.BI "size_t SYM_LEN(const void *" p );
	42	.BI "uint32 SYM_HASH(const void *" p );
	43
b6b9d458	44	.BI "void sym_mkiter(sym_iter " i ", sym_table " t );
	45	.BI "void sym_next(sym_iter " i );
	46	.fi
0c404077	47	.SH "DESCRIPTION"
b6b9d458	48	The
	49	.B sym
	50	functions implement a data structure often described as a dictionary, a
	51	finite map, an associative array, or a symbol table. It associates
	52	.I values
	53	with
	54	.I keys
	55	such that the value corresponding to a given key can be found quickly.
	56	Additionally, all stored associations can be enumerated.
	57	.PP
	58	The interface provides an
	59	.I intrusive
	60	symbol table. The data objects stored in the table must include a small
	61	header used by the symbol table manager. This reduces the amount of
	62	pointer fiddling that needs to be done, and in practice doesn't seem to
	63	be much of a problem. It's also fairly easy to construct a
	64	non-intrusive interface if you really want one.
	65	.PP
	66	There are three main data structures involved in the interface:
	67	.TP
	68	.B sym_table
	69	Keeps track of the information associated with a particular table.
	70	.TP
	71	.B sym_base
	72	The header which must be attached to the front of all the value
	73	objects.
	74	.TP
	75	.B sym_iter
	76	An iterator object, used for enumerating all of the associations stored
	77	in a symbol table.
	78	.PP
	79	All of the above data structures should be considered
	80	.IR opaque :
	81	don't try looking inside. Representations have changed in the past, and
	82	they may change again in the future.
0c404077	83	.SS "Creation and destruction"
b6b9d458	84	The
	85	.B sym_table
	86	object itself needs to be allocated by the caller. It is initialized by
	87	passing it to the function
	88	.BR sym_create .
	89	After initialization, the table contains no entries.
	90	.PP
	91	Initializing a symbol table involves allocating some memory. If this
d2a91066	92	allocation fails, an
b6b9d458	93	.B EXC_NOMEM
	94	exception is raised.
	95	.PP
	96	When a symbol table is no longer needed, the memory occupied by the
	97	values and other maintenance structures can be reclaimed by calling
	98	.BR sym_destroy .
0c404077	99	Any bits of user data attached to values should previously have been
	100	destroyed.
	101	.SS "Adding, searching and removing"
b6b9d458	102	Most of the actual work is done by the function
	103	.BR sym_find .
	104	It does both lookup and creation, depending on its arguments. To do its
	105	job, it needs to know the following bits of information:
	106	.TP
ff76c38f	107	.BI "sym_table *" t
b6b9d458	108	A pointer to a symbol table to manipulate.
b6b9d458	109	.TP
ff76c38f	110	.BI "const char *" n
b6b9d458	111	The address of the
	112	.I key
	113	to look up or create. Usually this will be a simple text string,
	114	although it can actually be any arbitrary binary data.
	115	.TP
ff76c38f	116	.BI "long " l
b6b9d458	117	The length of the key. If this is \-1,
	118	.B sym_find
	119	assumes that the key is a null-terminated string, and calculates its
0c404077	120	length itself. This is entirely equivalent to passing
0c404077	121	.BI strlen( n )\fR.
b6b9d458	122	.TP
ff76c38f	123	.BI "size_t " sz
b6b9d458	124	The size of the value block to allocate if the key could not be found.
	125	If this is zero, no value is allocated, and a null pointer is returned
	126	to indicate an unsuccessful lookup.
	127	.TP
ff76c38f	128	.BI "unsigned *" f
b6b9d458	129	The address of a `found' flag to set. This is an output parameter. On
	130	exit,
	131	.B sym_find
	132	will set the value of
	133	.BI * f
	134	to zero if the key could not be found, or nonzero if it was found. This
	135	can be used to tell whether the value returned has been newly allocated,
	136	or whether it was already in the table.
	137	.PP
0c404077	138	A terminating null byte is appended to the copy of the symbol's name in
	139	memory. This is not considered to be a part of the symbol's name, and
	140	does not contribute to the name's length as reported by the
	141	.B SYM_LEN
	142	macro.
b6b9d458	143	.PP
	144	A symbol can be removed from the table by calling
	145	.BR sym_remove ,
	146	passing the symbol table itself, and the value block that needs
	147	removing.
0c404077	148	.SS "Enquiries about symbols"
	149	Three macros are provided to enable simple enquiries about a symbol.
	150	Given a pointer
	151	.I s
	152	to a symbol table entry,
	153	.BI SYM_LEN( s )
	154	returns the length of the symbol's name (excluding any terminating null
d4efbcd9	155	byte);
0c404077	156	.BI SYM_NAME( s )
	157	returns a pointer to the symbol's name; and
	158	.BI SYM_HASH( s )
	159	returns the symbol's hash value.
	160	.SS "Enumerating symbols"
b6b9d458	161	Enumerating the values in a symbol table is fairly simple. Allocate a
	162	.B sym_iter
	163	object from somewhere. Attach it to a symbol table by calling
	164	.BR sym_mkiter ,
	165	and passing in the addresses of the iterator and the symbol table.
	166	Then, each call to
	167	.B sym_next
	168	will return a different value from the symbol table, until all of them
	169	have been enumerated, at which point,
	170	.B sym_next
	171	returns a null pointer.
	172	.PP
	173	It's safe to remove the symbol you've just been returned by
	174	.BR sym_next .
	175	However, it's not safe to remove any other symbol. So don't do that.
	176	.PP
	177	When you've finished with an iterator, it's safe to just throw it away.
	178	You don't need to call any functions beforehand.
0c404077	179	.SS "Use in practice"
b6b9d458	180	In normal use, the keys are simple strings (usually identifiers from
	181	some language), and the values are nontrivial structures providing
	182	information about types and values.
	183	.PP
	184	In this case, you'd define something like the following structure for
	185	your values:
	186	.VS
	187	typedef struct val {
	188	sym_base _base; /* Symbol header */
	189	unsigned type; /* Type of this symbol */
	190	int dispoff; /* Which display variable is in */
	191	size_t frameoff; /* Offset of variable in frame */
	192	} val;
	193	.VE
	194	Given a pointer
	195	.I v
	196	to a
	197	.BR val ,
	198	you can find the variable's name by calling
	199	.BI SYM_NAME( v )\fR.
	200	.PP
	201	You can look up a name in the table by saying something like:
	202	.VS
	203	val *v = sym_find(t, name, -1, 0, 0);
	204	if (!v)
	205	error("unknown variable `%s'", name);
	206	.VE
	207	You can add in a new variable by saying something like
	208	.VS
	209	unsigned f;
	210	val *v = sym_find(t, name, -1, sizeof(val), &f);
	211	if (f)
	212	error("variable `%s' already exists", name);
	213	/* fill in v */
	214	.VE
	215	You can examine all the variables in your symbol table by saying
	216	something like:
	217	.VS
	218	sym_iter i;
	219	val *v;
	220
	221	for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
	222	/* ... */
	223	}
	224	.VE
	225	That ought to be enough examples to be getting on with.
0c404077	226	.SS Implementation
6f444bda	227	The symbol table is an extensible hashtable, using the universal hash
	228	function described in
	229	.BR unihash (3)
	230	and the global hashing key. The hash chains are kept very short
	231	(probably too short, actually). Every time a symbol is found, its block
	232	is promoted to the front of its bin chain so it gets found faster next
	233	time.
b6b9d458	234	.SH SEE ALSO
0c404077	235	.BR hash (3),
08da152e	236	.BR mLib (3).
b6b9d458	237	.SH AUTHOR
9b5ac6ff	238	Mark Wooding, <mdw@distorted.org.uk>