1 #+TITLE: Xyla -- a fast, hard-mode binary tree library
3 #+EMAIL: <mdw@distorted.org.uk>
4 #+LaTeX_CLASS: strayman
5 #+LaTeX_HEADER: \usepackage[greek, english]{babel}
9 Xyla is a C library which implements a number of binary trees. A binary
10 trees is a versatile data structure which maintains a sequence of
11 /nodes/ and, when properly used, allow operations such as search,
12 insertion, deletion, splitting, and concatenation, in $O(\log n)$ time.
16 Xyla currently implements four kinds of binary trees:
18 + /Adelson-Velsky and Landis/ (`AVL') trees, which maintain a fairly
19 rigid height balance and offer good deterministic behaviour;
21 + /red-black/ trees, which have slightly faster update algorithms than
22 AVL trees, at the cost of maintaining a somewhat looser height
23 balance, but are otherwise fairly similar;
25 + /splay/ trees, which provide excellent /amortized/ performance, but
26 restructure the tree during searches, which makes them unsuitable
27 for concurrent access, and can offer poor performance for
28 /individual/ operations; and
30 + /treaps/, which provide good /expected/ performance, but require a
31 source of random bits which is independent of the data, and may
32 behave poorly with small but nonzero probability in any particular
35 ** Hard mode implementations
37 Xyla implements these trees in /hard mode/.
39 + Textbook search structures need to repeat a search to insert or
40 delete a node. In practice, it's common to want to want to update a
41 matching node if one exists, or create a new one if it doesn't, or
42 to have to example a node before taking the decision to remove it.
43 Xyla provides /probe/ operations, which search a tree and captures
44 the /path/ taken. The path from a failed search can be used to
45 insert a new node; the path from a successful search can be used to
46 remove the node found.
48 + Xyla places no requirements on node contents. For example, it
49 doesn't assume that nodes are sorted in any particular way.
51 + Xyla is written in obsessively portable C. The build system is
52 Unix-biased, but compiling Xyla is essentially trivial. Except for
53 the check/walk framework, intended for diagnostic purposes, Xyla
54 makes minimal demands on the C library, so it's suitable for use in
55 embedded environments. It uses only memory provided by the
56 application. By default, it makes use of only =memcpy= and
57 =assert=, but even these can be removed. See [[Freestanding build]]
60 + Xyla's algorithms are all nonrecursive.
62 + Xyla supports maintenance of arbitrary application-defined /summary
63 data/ for nodes, i.e., anything which can be computed efficiently
64 from a node itself and the summaries of its children. Summary data
65 can be used, for example, to implement determining the ordinal
66 position of a node within the tree (given its path), and search by
69 + Xyla implements join and split operations for all of its trees.
70 Trees can be joined with or without an explicit middle node; trees
71 can be split at any path, whether resulting from a successful or
72 unsuccessful search. It also implements set operations -- union,
73 intersection, and difference -- for trees which are ordered by a
78 Xyla is /fast/. This was kind of an accident: I was aiming at
79 `acceptable' performance and overachieved. In my testing, splay trees
80 offered speeds competitive with (i.e., closely behind, or actually
81 faster than) Rust's state-of-the-art =HashMap=; AVL trees and treaps
82 were somewhat slower, but still faster than Rust's =BTreeMap=,
83 competitive with GNU =libstdc++='s =std::unordered_map= (and
84 significantly faster on large data sets) and consistently faster than
85 =std::map=. Xyla's trees are consistently and significantly faster
86 than those in GNU =libavl=, and generally provide a more versatile,
87 capable, and flexible interface. Xyla's trees also compared favourably
88 with other less versatile search structures, such as QP tries.
93 I'm not actually selling anything here. I don't gain anything other
94 than a sense of personal satisfaction if you end up using Xyla in your
95 project. So why should you /not/ use Xyla?
97 ** Hash tables are fine, actually
99 Good hash tables are really fast. If your application can work with a
100 hash table and you have a good one lying about, you should totally use
101 it. That means that you don't care about maintaining elements in any
102 particular order; that you can tolerate an occasional long pause while
103 the hash table grows; and that you have a suitable hash function. This
104 last is not so easy: it needs to be fast, well distributed, and --
105 because everything needs more cryptography nowadays -- unpredictable by
110 Xyla isn't thread-safe. It doesn't have any global state or anything,
111 but you'll have to use read/write locks or something if you want to
112 access a Xyla tree from multiple concurrent threads. If you anticipate
113 enough contention that locking is a problem, or the thought of having to
114 slow down the read-only happy path by a couple of atomic operations
115 makes you sad, then use a data structure that's purpose-designed for the
118 Splay trees are particularly inappropriate for concurrent access,
119 because you'd need a fully exclusive lock even for read operations in
120 order to restructure the tree; otherwise, you don't get the good
121 amortized performance.
123 ** Some cool tricks are still out of reach
125 Xyla is hard-mode, but it's `taxing' rather than `mayhem'. It can't do
126 tricks like copy-on-write. (The reason it's difficult is that it
127 introduces a failure point into the middle of the fiddly tree
128 restructuring algorithms, which then need a recovery strategy. See
129 Tatham's [[https://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.html][/An Efficient Data Structure for a Hex Editor/]] for why you
130 might want to do it anyway.)
132 ** You might hate the interface
134 If you like your interfaces abstract, high-level, type-safe, sealed up,
135 and pre-packaged, then Xyla isn't for you. Implementation details are
136 exposed all over. The interface makes use of =void *= arguments. I've
137 tried to minimize them, but type casts are necessary in some places.
138 Overall, I've found that this approach improves the interface
139 ergonomics, improves speed, keeps bloat down, and avoids potential
147 Xyla uses a traditional Autotools build system.
149 If you're building from Git, you'll need GNU Autoconf, Automake, and
150 Libtool, and the Autoconf Archive; even ancient versions of these will
151 be fine. To prepare the working tree, run
155 If you're building from a source tarball, then this has already been
158 The soak test needs Python. Even Python 2.7 will work.
163 : (cd build && ../configure)
165 : make -j12 -Cbuild/ check
166 : make -j12 -Cbuild/ install
168 The included =INSTALL= file explains how drive the =configure= script
169 and =Makefile= if you're unfamiliar with this.
171 This will install header files, static and dynamic libraries, and a
172 =pkg-config= file, which is the preferred way for applications to find
175 ** Building on other systems
177 A fairly basic CMake-based build system is provided for non-Unix
178 targets. This doesn't get as much love and attention as the primary
179 Autotools system, though it is capable of building the library and
180 running most of the tests.
182 There is a booby-trap which prevents the CMake-based system from working
183 on Unix-like systems. Disarming the trap is left as an easy exercise.
184 Patches to make CMake more attractive on Unix systems are unwelcome.
186 ** Freestanding build
188 Xyla supports building for /freestanding/ C environments which lack a
189 run-time library; for example, embedded systems, or operating system
190 kernels. To make this work, the following steps are needed.
192 + Build Xyla with the preprocessor symbol =XYLA_FREESTANDING= defined
193 (to any value). This will omit the checking framework, and replace
194 the calls to =memcpy= with simple loops. Don't try to compile the
195 =*-check.c= files containing the checking routines, which require
196 memory allocation and I/O.
198 * Optionally, define =XYLA__ASSERT= to be an =assert=-like macro,
199 i.e., a macro of a single argument which will abort the program if
200 its argument is zero or null. If you don't provide a definition,
201 then Xyla will run without assertions.
203 + If you want to use treaps, then define a function
204 =xyla__treap_boundfail= (with no arguments), which will abort the
205 program. This is used in the extremely unlikely event that a
206 treap's height exceeds the calculated bounds; unfortunately, a
207 treap's structure is rigidly determined by the order of its elements
208 and their weights, so rebalancing is impossible and the situation is
214 + I'm thinking about a =touch= node operation to be called before a
215 node is changed. It would be able to return a replacement node to
216 act on instead, to be be spliced into the tree. This could be used
217 to implement copy-on-write, for example.
219 + Additional kinds of trees.
221 - I'm considering a Bayer tree -- essentially, an encoding of 2-3
222 trees as binary trees in the same way that red-black trees
223 encode 2-3-4 trees. I think I'd make these asymmetric, in the
224 same way that Andersson trees are asymmetric. The differences
225 from an actual Andersson tree are that each node only records a
226 single bit rather than a height, and that updates are much more
227 similar to a red-black tree than the `split' and `skew'
228 operations that Andersson describes. The nice feature of such
229 trees is that the first element has the shortest path from the
230 root of any leaf or semileaf, which is nice for some kinds of
234 * Compatibility policy
236 Xyla's programming interfaces are now stable.
238 I'm now committed to retaining source and binary compatibility
239 indefinitely. The cool kids these days seem to like changing and
240 removing things all the time, but that's just not the way I roll.
245 The ancient Greek word `\foreignlanguage{greek}{ξύλον}' (`xulon')
246 doesn't have a direct English equivalent. Pretty much any woody thing
247 is a \foreignlanguage{greek}{ξύλον}. A thing made out of wood, like a
248 table or chair, a plank, or a club, is a \foreignlanguage{greek}{ξύλον},
249 but so is a naturally woody thing, like a tree. The (nominative and
250 accusative) plural is `\foreignlanguage{greek}{ξύλα}' (`xula').
252 The initial `x' is a `ks', not a `z' like the word's descendants have
253 ended up with in English. The `u' is fronted, like French `u', or
254 German `ü'. When the Romans started borrowing Greek vocabulary, they
255 realized that they didn't have a way to represent this sound, so they
256 added a new letter `y' for it. Hence, the Latin transliterations would
257 have been `xylon' and `xyla'.
259 The accent is on the first syllable, but ancient Greek had a pitch
260 accent, not the stress accent that English has, so the first syllable is
261 higher pitched than the second.
263 There's a more specific word `\foreignlanguage{greek}{δένδρον}'
264 (`dendron') which just means `tree', but \foreignlanguage{greek}{ξύλον}
265 is just an intrinsically better word.
270 Xyla is free software. You can modify and/or redistribute it under the
271 terms of the GNU Lesser General Public License, version 3, as published
272 by the Free Software Foundation, or, at your option, any later version.
275 * COMMENT Emacs cruft
277 # LocalWords: Adelson autoreconf AVL BTreeMap Cbuild cd CMake fis Landis
278 # LocalWords: libavl libstdc memcpy mkdir pre QP rebalancing treap's treaps
279 # LocalWords: Velsky xyla Xyla's Andersson