[mLib] / man / sym.3

.\" -*-nroff-*-
.de VS
.sp 1
.RS
.nf
.ft B
..
.de VE
.ft R
.fi
.RE
.sp 1
..
.TH sym 3 "8 May 1999" mLib
.SH NAME
sym \- symbol table manager
.\" @sym_create
.\" @sym_destroy
.\" @sym_find
.\" @sym_remove
.\" @sym_mkiter
.\" @sym_next
.\"
.\" @SYM_NAME
.\"
.SH SYNOPSIS
.nf
.B "#include <mLib/sym.h>"

.BI "void sym_create(sym_table *" t );
.BI "void sym_destroy(sym_table *" t );

.BI "char *SYM_NAME(void *" p );
.BI "void *sym_find(sym_table *" t ,
.BI "               const char *" n ", long " l ,
.BI "               size_t " sz ", unsigned *" f );
.BI "void sym_remove(sym_table *" t ", void *" b );

.BI "void sym_mkiter(sym_iter *" i ", sym_table *" t );
.BI "void *sym_next(sym_iter *" i );
.fi
.SH "HOW IT WORKS"
The
.B sym
functions implement a data structure often described as a dictionary, a
finite map, an associative array, or a symbol table.  It associates
.I values
with
.I keys
such that the value corresponding to a given key can be found quickly.
Additionally, all stored associations can be enumerated.
.PP
The interface provides an
.I intrusive
symbol table.  The data objects stored in the table must include a small
header used by the symbol table manager.  This reduces the amount of
pointer fiddling that needs to be done, and in practice doesn't seem to
be much of a problem.  It's also fairly easy to construct a
non-intrusive interface if you really want one.
.PP
There are three main data structures involved in the interface:
.TP
.B sym_table
Keeps track of the information associated with a particular table.
.TP
.B sym_base
The header which must be attached to the front of all the value
objects.
.TP
.B sym_iter
An iterator object, used for enumerating all of the associations stored
in a symbol table.
.PP
All of the above data structures should be considered
.IR opaque :
don't try looking inside.  Representations have changed in the past, and
they may change again in the future.
.SH "CREATION AND DESTRUCTION"
The
.B sym_table
object itself needs to be allocated by the caller.  It is initialized by
passing it to the function
.BR sym_create .
After initialization, the table contains no entries.
.PP
Initializing a symbol table involves allocating some memory.  If this
allocation fails, an
.B EXC_NOMEM
exception is raised.
.PP
When a symbol table is no longer needed, the memory occupied by the
values and other maintenance structures can be reclaimed by calling
.BR sym_destroy .
Any bits of user data attached to the symbol table values should have
previously been destroyed.
.SH "ADDING, SEARCHING AND REMOVING"
Most of the actual work is done by the function
.BR sym_find .
It does both lookup and creation, depending on its arguments.  To do its
job, it needs to know the following bits of information:
.TP
.BI "sym_table *" t
A pointer to a symbol table to manipulate.
.TP
.BI "const char *" n
The address of the
.I key
to look up or create.  Usually this will be a simple text string,
although it can actually be any arbitrary binary data.
.TP
.BI "long " l
The length of the key.  If this is \-1,
.B sym_find
assumes that the key is a null-terminated string, and calculates its
length itself.
.TP
.BI "size_t " sz
The size of the value block to allocate if the key could not be found.
If this is zero, no value is allocated, and a null pointer is returned
to indicate an unsuccessful lookup.
.TP
.BI "unsigned *" f
The address of a `found' flag to set.  This is an output parameter.  On
exit,
.B sym_find
will set the value of
.BI * f
to zero if the key could not be found, or nonzero if it was found.  This
can be used to tell whether the value returned has been newly allocated,
or whether it was already in the table.
.PP
The macro
.B SYM_NAME
will return the key associated with a given value block.  You must use
this macro rather than fiddling inside the
.B sym_base
structure.
.PP
A symbol can be removed from the table by calling
.BR sym_remove ,
passing the symbol table itself, and the value block that needs
removing.
.SH ENUMERATION
Enumerating the values in a symbol table is fairly simple.  Allocate a
.B sym_iter
object from somewhere.  Attach it to a symbol table by calling
.BR sym_mkiter ,
and passing in the addresses of the iterator and the symbol table.
Then, each call to
.B sym_next
will return a different value from the symbol table, until all of them
have been enumerated, at which point,
.B sym_next
returns a null pointer.
.PP
It's safe to remove the symbol you've just been returned by
.BR sym_next .
However, it's not safe to remove any other symbol.  So don't do that.
.PP
When you've finished with an iterator, it's safe to just throw it away.
You don't need to call any functions beforehand.
.SH "USE IN PRACTICE"
In normal use, the keys are simple strings (usually identifiers from
some language), and the values are nontrivial structures providing
information about types and values.
.PP
In this case, you'd define something like the following structure for
your values:
.VS
typedef struct val {
  sym_base _base;	/* Symbol header */
  unsigned type;	/* Type of this symbol */
  int dispoff;		/* Which display variable is in */
  size_t frameoff;	/* Offset of variable in frame */
} val;
.VE
Given a pointer
.I v
to a
.BR val ,
you can find the variable's name by calling
.BI SYM_NAME( v )\fR.
.PP
You can look up a name in the table by saying something like:
.VS
val *v = sym_find(t, name, -1, 0, 0);
if (!v)
  error("unknown variable `%s'", name);
.VE
You can add in a new variable by saying something like
.VS
unsigned f;
val *v = sym_find(t, name, -1, sizeof(val), &f);
if (f)
  error("variable `%s' already exists", name);
/* fill in v */
.VE
You can examine all the variables in your symbol table by saying
something like:
.VS
sym_iter i;
val *v;

for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
  /* ... */
}
.VE
That ought to be enough examples to be getting on with.
.SH CAVEATS
The symbol table manager requires the suballocator (see
.BR sub (3)
for details).  You must ensure that
.B sub_init
has been called before using any symbol tables in your program.
.SH IMPLEMENTATION
The symbol table is an extensible hashtable, using a 32-bit CRC as the
hash function.  The hash chains are kept very short (probably too short,
actually).  Every time a symbol is found, its block is promoted to the
front of its bin chain so it gets found faster next time.
.SH SEE ALSO
.BR sub (3),
.BR mLib (3).
.SH AUTHOR
Mark Wooding, <mdw@nsict.org>
Commit	Line	Data
b6b9d458	1	.\" --nroff--
	2	.de VS
	3	.sp 1
d66d7727	4	.RS
b6b9d458	5	.nf
	6	.ft B
	7	..
	8	.de VE
	9	.ft R
	10	.fi
	11	.RE
	12	.sp 1
	13	..
08da152e	14	.TH sym 3 "8 May 1999" mLib
b6b9d458	15	.SH NAME
b6b9d458	16	sym \- symbol table manager
08da152e	17	.\" @sym_create
	18	.\" @sym_destroy
	19	.\" @sym_find
	20	.\" @sym_remove
	21	.\" @sym_mkiter
	22	.\" @sym_next
	23	.\"
	24	.\" @SYM_NAME
	25	.\"
b6b9d458	26	.SH SYNOPSIS
	27	.nf
	28	.B "#include <mLib/sym.h>"
	29
	30	.BI "void sym_create(sym_table *" t );
	31	.BI "void sym_destroy(sym_table *" t );
	32
	33	.BI "char SYM_NAME(void " p );
	34	.BI "void sym_find(sym_table " t ,
	35	.BI " const char *" n ", long " l ,
	36	.BI " size_t " sz ", unsigned *" f );
	37	.BI "void sym_remove(sym_table " t ", void " b );
	38
	39	.BI "void sym_mkiter(sym_iter " i ", sym_table " t );
	40	.BI "void sym_next(sym_iter " i );
	41	.fi
	42	.SH "HOW IT WORKS"
	43	The
	44	.B sym
	45	functions implement a data structure often described as a dictionary, a
	46	finite map, an associative array, or a symbol table. It associates
	47	.I values
	48	with
	49	.I keys
	50	such that the value corresponding to a given key can be found quickly.
	51	Additionally, all stored associations can be enumerated.
	52	.PP
	53	The interface provides an
	54	.I intrusive
	55	symbol table. The data objects stored in the table must include a small
	56	header used by the symbol table manager. This reduces the amount of
	57	pointer fiddling that needs to be done, and in practice doesn't seem to
	58	be much of a problem. It's also fairly easy to construct a
	59	non-intrusive interface if you really want one.
	60	.PP
	61	There are three main data structures involved in the interface:
	62	.TP
	63	.B sym_table
	64	Keeps track of the information associated with a particular table.
	65	.TP
	66	.B sym_base
	67	The header which must be attached to the front of all the value
	68	objects.
	69	.TP
	70	.B sym_iter
	71	An iterator object, used for enumerating all of the associations stored
	72	in a symbol table.
	73	.PP
	74	All of the above data structures should be considered
	75	.IR opaque :
	76	don't try looking inside. Representations have changed in the past, and
	77	they may change again in the future.
	78	.SH "CREATION AND DESTRUCTION"
	79	The
	80	.B sym_table
	81	object itself needs to be allocated by the caller. It is initialized by
	82	passing it to the function
	83	.BR sym_create .
	84	After initialization, the table contains no entries.
	85	.PP
	86	Initializing a symbol table involves allocating some memory. If this
d2a91066	87	allocation fails, an
b6b9d458	88	.B EXC_NOMEM
	89	exception is raised.
	90	.PP
	91	When a symbol table is no longer needed, the memory occupied by the
	92	values and other maintenance structures can be reclaimed by calling
	93	.BR sym_destroy .
	94	Any bits of user data attached to the symbol table values should have
	95	previously been destroyed.
	96	.SH "ADDING, SEARCHING AND REMOVING"
	97	Most of the actual work is done by the function
	98	.BR sym_find .
	99	It does both lookup and creation, depending on its arguments. To do its
	100	job, it needs to know the following bits of information:
	101	.TP
ff76c38f	102	.BI "sym_table *" t
b6b9d458	103	A pointer to a symbol table to manipulate.
b6b9d458	104	.TP
ff76c38f	105	.BI "const char *" n
b6b9d458	106	The address of the
	107	.I key
	108	to look up or create. Usually this will be a simple text string,
	109	although it can actually be any arbitrary binary data.
	110	.TP
ff76c38f	111	.BI "long " l
b6b9d458	112	The length of the key. If this is \-1,
	113	.B sym_find
	114	assumes that the key is a null-terminated string, and calculates its
	115	length itself.
	116	.TP
ff76c38f	117	.BI "size_t " sz
b6b9d458	118	The size of the value block to allocate if the key could not be found.
	119	If this is zero, no value is allocated, and a null pointer is returned
	120	to indicate an unsuccessful lookup.
	121	.TP
ff76c38f	122	.BI "unsigned *" f
b6b9d458	123	The address of a `found' flag to set. This is an output parameter. On
	124	exit,
	125	.B sym_find
	126	will set the value of
	127	.BI * f
	128	to zero if the key could not be found, or nonzero if it was found. This
	129	can be used to tell whether the value returned has been newly allocated,
	130	or whether it was already in the table.
	131	.PP
	132	The macro
	133	.B SYM_NAME
	134	will return the key associated with a given value block. You must use
	135	this macro rather than fiddling inside the
	136	.B sym_base
	137	structure.
	138	.PP
	139	A symbol can be removed from the table by calling
	140	.BR sym_remove ,
	141	passing the symbol table itself, and the value block that needs
	142	removing.
	143	.SH ENUMERATION
	144	Enumerating the values in a symbol table is fairly simple. Allocate a
	145	.B sym_iter
	146	object from somewhere. Attach it to a symbol table by calling
	147	.BR sym_mkiter ,
	148	and passing in the addresses of the iterator and the symbol table.
	149	Then, each call to
	150	.B sym_next
	151	will return a different value from the symbol table, until all of them
	152	have been enumerated, at which point,
	153	.B sym_next
	154	returns a null pointer.
	155	.PP
	156	It's safe to remove the symbol you've just been returned by
	157	.BR sym_next .
	158	However, it's not safe to remove any other symbol. So don't do that.
	159	.PP
	160	When you've finished with an iterator, it's safe to just throw it away.
	161	You don't need to call any functions beforehand.
	162	.SH "USE IN PRACTICE"
	163	In normal use, the keys are simple strings (usually identifiers from
	164	some language), and the values are nontrivial structures providing
	165	information about types and values.
	166	.PP
	167	In this case, you'd define something like the following structure for
	168	your values:
	169	.VS
	170	typedef struct val {
	171	sym_base _base; /* Symbol header */
	172	unsigned type; /* Type of this symbol */
	173	int dispoff; /* Which display variable is in */
	174	size_t frameoff; /* Offset of variable in frame */
	175	} val;
	176	.VE
	177	Given a pointer
	178	.I v
	179	to a
	180	.BR val ,
	181	you can find the variable's name by calling
	182	.BI SYM_NAME( v )\fR.
	183	.PP
	184	You can look up a name in the table by saying something like:
	185	.VS
	186	val *v = sym_find(t, name, -1, 0, 0);
187	if (!v)
188	error("unknown variable `%s'", name);
189	.VE
190	You can add in a new variable by saying something like
191	.VS
192	unsigned f;
193	val *v = sym_find(t, name, -1, sizeof(val), &f);
194	if (f)
195	error("variable `%s' already exists", name);
196	/* fill in v */
197	.VE
198	You can examine all the variables in your symbol table by saying
199	something like:
200	.VS
201	sym_iter i;
202	val *v;
203
204	for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
205	/* ... */
206	}
207	.VE
208	That ought to be enough examples to be getting on with.
209	.SH CAVEATS
210	The symbol table manager requires the suballocator (see
08da152e	211	.BR sub (3)
b6b9d458	212	for details). You must ensure that
	213	.B sub_init
	214	has been called before using any symbol tables in your program.
	215	.SH IMPLEMENTATION
	216	The symbol table is an extensible hashtable, using a 32-bit CRC as the
	217	hash function. The hash chains are kept very short (probably too short,
	218	actually). Every time a symbol is found, its block is promoted to the
	219	front of its bin chain so it gets found faster next time.
	220	.SH SEE ALSO
08da152e	221	.BR sub (3),
08da152e	222	.BR mLib (3).
b6b9d458	223	.SH AUTHOR
b6b9d458	224	Mark Wooding, <mdw@nsict.org>