1 #+TITLE: Xyla -- a fast, hard-mode binary tree library
3 #+EMAIL: <mdw@distorted.org.uk>
4 #+LaTeX_CLASS: strayman
5 #+LaTeX_HEADER: \usepackage[greek, english]{babel}
9 Xyla is a C library which implements a number of binary trees. A binary
10 trees is a versatile data structure which maintains a sequence of
11 /nodes/ and, when properly used, allow operations such as search,
12 insertion, deletion, splitting, and concatenation, in $O(\log n)$ time.
16 Xyla currently implements four kinds of binary trees:
18 + /Adelson-Velsky and Landis/ (`AVL') trees, which maintain a fairly
19 rigid height balance and offer good deterministic behaviour;
21 + /red-black/ trees, which maintain a slightly looser height balance
22 than AVL trees, but are otherwise fairly similar;
24 + /splay/ trees, which provide excellent /amortized/ performance, but
25 restructure the tree during searches, which makes them unsuitable
26 for concurrent access, and can offer poor performance for
27 /individual/ operations; and
29 + /treaps/, which provide good /expected/ performance, but require a
30 source of random bits which is independent of the data, and may
31 behave poorly with small but finite probability in any particular
34 ** Hard mode implementations
36 Xyla implements these trees in /hard mode/.
38 + Textbook search structures need to repeat a search to insert or
39 delete a node. In practice, it's common to want to want to update a
40 matching node if one exists, or create a new one if it doesn't, or
41 to have to example a node before taking the decision to remove it.
42 Xyla provides /probe/ operations, which search a tree and captures
43 the /path/ taken. The path from a failed search can be used to
44 insert a new node; the path from a successful search can be used to
45 remove the node found.
47 + Xyla places no requirements on node contents. For example, it
48 doesn't assume that nodes are sorted in any particular way.
50 + Xyla is written in obsessively portable C. The build system is
51 Unix-biased, but compiling Xyla is essentially trivial. Except for
52 the check/walk framework, intended for diagnostic purposes, Xyla
53 makes minimal demands on the C library, so it's suitable for use in
54 embedded environments. It uses only memory provided by the
55 application. By default, it makes use of only =memcpy= and
56 =assert=, but even these can be removed. See [[Freestanding build]]
59 + Xyla's algorithms are all nonrecursive.
61 + Xyla supports maintenance of arbitrary application-defined /summary
62 data/ for nodes, i.e., anything which can be computed efficiently
63 from a node itself and the summaries of its children. Summary data
64 can be used, for example, to implement determining the ordinal
65 position of a node within the tree (given its path), and search by
68 + Xyla implements join and split operations for all of its trees.
69 Trees can be joined with or without an explicit middle node; trees
70 can be split at any path, whether resulting from a successful or
71 unsuccessful search. It also implements set operations -- union,
72 intersection, and difference -- for trees which are ordered by a
77 Xyla is /fast/. This was kind of an accident: I was aiming at
78 `acceptable' performance and overachieved. In my testing, splay trees
79 offered speeds competitive with (i.e., closely behind, or actually
80 faster than) Rust's state-of-the-art =HashMap=; AVL trees and treaps
81 were somewhat slower, but still faster than Rust's =BTreeMap=,
82 competitive with GNU =libstdc++='s =std::unordered_map= (and
83 significantly faster on large data sets) and consistently faster than
84 =std::map=. Xyla's trees are consistently and significantly faster
85 than those in GNU =libavl=, and generally provide a more versatile,
86 capable, and flexible interface. Xyla's trees also compared favourably
87 with other less versatile search structures, such as QP tries.
92 I'm not actually selling anything here. I don't gain anything other
93 than a sense of personal satisfaction if you end up using Xyla in your
94 project. So why should you /not/ use Xyla?
96 ** Hash tables are fine, actually
98 Good hash tables are really fast. If your application can work with a
99 hash table and you have a good one lying about, you should totally use
100 that. That means that you don't care about maintaining elements in any
101 particular order; that you can tolerate an occasional long pause while
102 the hash table grows; and that you have a suitable hash function. This
103 last is not so easy: it needs to be fast, well distributed, and --
104 because everything needs more cryptography nowadays -- unpredictable by
109 Xyla isn't thread-safe. It doesn't have any global state or anything,
110 but you'll have to use read/write locks or something if you want to
111 access a Xyla tree from multiple concurrent threads. If you anticipate
112 enough contention that locking is a problem, or the thought of having to
113 slow down the read-only happy path by a couple of atomic operations
114 makes you sad, then use a data structure that's purpose-designed for the
117 Splay trees are particularly inappropriate for concurrent access,
118 because you'd need a fully exclusive lock even for read operations in
119 order to restructure the tree; otherwise, you don't get the good
120 amortized performance.
122 ** Some cool tricks are still out of reach
124 Xyla is hard-mode, but it's `taxing' rather than `mayhem'. It can't do
125 tricks like copy-on-write. (The reason it's difficult is that it
126 introduces a failure point into the middle of the fiddly tree
127 restructuring algorithms, which then need a recovery strategy. See
128 Tatham's [[https://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.html][/An Efficient Data Structure for a Hex Editor/]] for why you
129 might want to do it anyway.)
131 ** You might hate the interface
133 If you like your interfaces abstract, high-level, type-safe, sealed up,
134 and pre-packaged, then Xyla isn't for you. Implementation details are
135 exposed all over. The interface makes use of =void *= arguments; I've
136 tried to minimize them, but type casts are necessary in some places.
137 Overall, I've found that this approach improves the interface
138 ergonomics, improves speed, keeps bloat down, and avoids potential
146 Xyla uses a traditional Autotools build system.
148 If you're building from Git, you'll need GNU Autoconf, Automake, and
149 Libtool, and the Autoconf Archive; even ancient versions of these will
150 be fine. To prepare the working tree, run
154 If you're building from a source tarball, then this has already been
157 The soak test needs Python. Even Python 2.7 will work.
162 : (cd build && ../configure)
164 : make -j12 -Cbuild/ check
165 : make -j12 -Cbuild/ install
167 The included =INSTALL= file explains how drive the =configure= script
168 and =Makefile= if you're unfamiliar with this.
170 This will install header files, static and dynamic libraries, and a
171 =pkg-config= file, which is the preferred way for applications to find
174 ** Building on other systems
176 A fairly basic CMake-based build system is provided for non-Unix
177 targets. This doesn't get as much love and attention as the primary
178 Autotools system, though it is capable of building the library and
179 running most of the tests.
181 There is a booby-trap which prevents the CMake-based system from working
182 on Unix-like systems. Disarming the trap is left as an easy exercise.
184 ** Freestanding build
186 Xyla supports building for /freestanding/ C environments which lack a
187 run-time library; for example, embedded systems, or operating system
188 kernels. To make this work, the following steps are needed.
190 + Build Xyla with the preprocessor symbol =XYLA_FREESTANDING= defined
191 (to any value). This will omit the checking framework, and replace
192 the calls to =memcpy= with simple loops. Don't try to compile the
193 =*-check.c= files containing the checking routines, which require
194 memory allocation and I/O.
196 * Optionally, define =XYLA__ASSERT= to be an =assert=-like macro,
197 i.e., a macro of a single argument which will abort the program if
198 its argument is zero or null. If you don't provide a definition,
199 then Xyla will run without assertions.
201 + If you want to use treaps, then define a function
202 =xyla__treap_boundfail= (with no arguments), which will abort the
203 program. This is used in the extremely unlikely event that a
204 treap's height exceeds the calculated bounds; unfortunately, a
205 treap's structure is rigidly determined by the order of its elements
206 and their weights, so rebalancing is impossible and the situation is
212 + I'm thinking about a =touch= node operation to be called before a
213 node is changed. It would be able to return a replacement node to
214 act on instead, to be be spliced into the tree. This could be used
215 to implement copy-on-write, for example.
217 + Additional kinds of trees.
219 - I'm considering a Bayer tree -- essentially, an encoding of 2-3
220 trees as binary trees in the same way that red-black trees
221 encode 2-3-4 trees. I think I'd make these asymmetric, in the
222 same way that Andersson trees are asymmetric. The differences
223 from an actual Andersson tree are that each node only records a
224 single bit rather than a height, and that updates is much more
225 similar to a red-black tree than the `split' and `skew'
226 operations that Andersson describes. The nice feature of such
227 trees is that the first element has the shortest path from the
228 root of any leaf or semileaf, which is nice for some kinds of
232 * Compatibility policy
234 Xyla is still stabilizing. There are a small number of issues which I
235 still want to nail down before I declare release 1.0.0. I don't expect
236 this to take more than a month or so.
238 Once 1.0.0 does drop, I'm committed to retaining source and binary
239 compatibility indefinitely. The cool kids seem to like removing things
240 all the time, but that's just not the way I roll.
245 The ancient Greek word `\foreignlanguage{greek}{ξύλον}' (`xulon')
246 doesn't have a direct English equivalent. Pretty much any woody thing
247 is a \foreignlanguage{greek}{ξύλον}. A thing made out of wood, like a
248 table or chair, a plank, or a club, is a \foreignlanguage{greek}{ξύλον},
249 but so is a naturally woody thing, like a tree. The (nominative and
250 accusative) plural is `\foreignlanguage{greek}{ξύλα}' (`xula').
252 The initial `x' is a `ks', not a `z' like the word's descendants have
253 ended up with in English. The `u' is fronted, like French `u', or
254 German `ü'. When the Romans started borrowing Greek vocabulary, they
255 realized that they didn't have a way to represent this sound, so they
256 added a new letter `y' for it. Hence, the Latin transliterations would
257 have been `xylon' and `xyla'.
259 The accent is on the first syllable, but ancient Greek had a pitch
260 accent, not the stress accent that English has, so the first syllable is
261 higher pitched than the second.
263 There's a more specific word `\foreignlanguage{greek}{δένδρον}'
264 (`dendron') which just means `tree', but \foreignlanguage{greek}{ξύλον}
265 is just an intrinsically better word.
270 Xyla is free software. You can modify and/or redistribute it under the
271 terms of the GNU Lesser General Public License, version 3, as published
272 by the Free Software Foundation, or, at your option, any later version.
275 * COMMENT Emacs cruft
277 # LocalWords: Adelson autoreconf AVL BTreeMap Cbuild cd CMake fis Landis
278 # LocalWords: libavl libstdc memcpy mkdir pre QP rebalancing treap's treaps
279 # LocalWords: Velsky xyla Xyla's Andersson