documentation: parametrize the search binary data type sizes.
Needed in order to support more than 65k symbols or files larger than 16
MB. What I thought was "more than enough" during the initial design was
quickly stepped over by various projects, including my own Magnum Python
bindings.
To avoid having to either maintain two separate formats and two separate
en/decoders or needlessly inflate the format for everyone, certain data
types are parametrized based on how large the data is:
* RESULT_ID_BYTES describes how many bytes is needed to store result
IDs. By default it's 2 (so 65536 results) but can be also 3 (16M
results) or 4.
* FILE_OFFSET_BYTES describes how many bytes is needed to store file
offsets. By default it's 3 (so 16 MB), but can be also 4.
* NAME_SIZE_BYTES describes how many bytes is needed to store various
name lengths (prefix, suffix lengths etc). By default it's 1 (so 256
bytes at most), but can be also 2.
At first I tried to preserve 32-bit alignment as much as possible, but
eventually realized this is completely unimportant in the browser
environment -- there's other much worse performance pitfalls than
reading an unaligned value. This is also why there are 24-bit integer
types, even though they're quite annoying to pack from Python.
Furthermore, the original hack to reserve 11 bits for result count at
the cost of having only 4 bits for child count was changed to instead
expand the result count to a 15-bit value if there's > 127 results. Some
endianness tricks involved, but much cleaner than before. I briefly
considered having a global RESULT_COUNT_BYTES parameter as well, but
considering >90% of result counts fit into 8 bits and this is only for
weird outliers like Python __init__(), it would be a giant waste of
precious bytes.
The minor differences in the test file sizes are due to:
* The header expanding symbol count from 16 to 32 bits (+2B)
* The header containing type description and associated padding (+4B)
* The result map no longer packing flags and offsets together, thus
saving one byte from flags (-1B)
To ensure there's no hardcoded type size assumptions anymore, the tests
now go through all type size combinations.