chiark / gitweb /
documentation: parametrize the search binary data type sizes.
authorVladimír Vondruš <mosra@centrum.cz>
Sat, 8 Jan 2022 19:49:26 +0000 (20:49 +0100)
committerVladimír Vondruš <mosra@centrum.cz>
Sun, 9 Jan 2022 15:51:50 +0000 (16:51 +0100)
commitb0cf44e4ddbf42ce79a8612563e84e00e8a75808
tree8db744056778b82299ccf9b5e03324685e9b8d6a
parentc661a654d3d3837dfae92e48bfccde19ceea6dad
documentation: parametrize the search binary data type sizes.

Needed in order to support more than 65k symbols or files larger than 16
MB. What I thought was "more than enough" during the initial design was
quickly stepped over by various projects, including my own Magnum Python
bindings.

To avoid having to either maintain two separate formats and two separate
en/decoders or needlessly inflate the format for everyone, certain data
types are parametrized based on how large the data is:

 * RESULT_ID_BYTES describes how many bytes is needed to store result
   IDs. By default it's 2 (so 65536 results) but can be also 3 (16M
   results) or 4.
 * FILE_OFFSET_BYTES describes how many bytes is needed to store file
   offsets. By default it's 3 (so 16 MB), but can be also 4.
 * NAME_SIZE_BYTES describes how many bytes is needed to store various
   name lengths (prefix, suffix lengths etc). By default it's 1 (so 256
   bytes at most), but can be also 2.

At first I tried to preserve 32-bit alignment as much as possible, but
eventually realized this is completely unimportant in the browser
environment -- there's other much worse performance pitfalls than
reading an unaligned value. This is also why there are 24-bit integer
types, even though they're quite annoying to pack from Python.

Furthermore, the original hack to reserve 11 bits for result count at
the cost of having only 4 bits for child count was changed to instead
expand the result count to a 15-bit value if there's > 127 results. Some
endianness tricks involved, but much cleaner than before. I briefly
considered having a global RESULT_COUNT_BYTES parameter as well, but
considering >90% of result counts fit into 8 bits and this is only for
weird outliers like Python __init__(), it would be a giant waste of
precious bytes.

The minor differences in the test file sizes are due to:

 * The header expanding symbol count from 16 to 32 bits (+2B)
 * The header containing type description and associated padding (+4B)
 * The result map no longer packing flags and offsets together, thus
   saving one byte from flags (-1B)

To ensure there's no hardcoded type size assumptions anymore, the tests
now go through all type size combinations.
84 files changed:
documentation/_search.py
documentation/doxygen.py
documentation/python.py
documentation/search.js
documentation/test/_search_test_metadata.py
documentation/test/js-test-data/empty-ns1-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns1-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns1-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns1-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns1-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns1-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/empty-ns2-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/empty.bin [deleted file]
documentation/test/js-test-data/manyresults-ns1-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns1-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns1-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns1-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns1-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns1-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults-ns2-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/manyresults.bin [deleted file]
documentation/test/js-test-data/nested.bin
documentation/test/js-test-data/searchdata-ns1-ri2-fo3.b85 [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns1-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri2-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri2-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri3-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri3-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri4-fo3.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata-ns2-ri4-fo4.bin [new file with mode: 0644]
documentation/test/js-test-data/searchdata.b85 [deleted file]
documentation/test/js-test-data/searchdata.bin [deleted file]
documentation/test/js-test-data/short.bin
documentation/test/js-test-data/unicode.bin
documentation/test/js-test-data/wrong-magic.bin
documentation/test/js-test-data/wrong-result-id-bytes.bin [new file with mode: 0644]
documentation/test/js-test-data/wrong-version.bin
documentation/test/populate-js-test-data.py
documentation/test/test-search.js
documentation/test/test_search.py
documentation/test_doxygen/layout/pages.html
documentation/test_doxygen/layout_generated_doxyfile/index.html
documentation/test_doxygen/layout_minimal/index.html
documentation/test_doxygen/layout_search_binary/index.html
documentation/test_doxygen/layout_search_opensearch/index.html
documentation/test_doxygen/test_search.py
documentation/test_doxygen/test_undocumented.py
documentation/test_doxygen/undocumented/File_8h.html
documentation/test_doxygen/undocumented/annotated.html
documentation/test_doxygen/undocumented/classClass.html
documentation/test_doxygen/undocumented/dir_4b0d5f8864bf89936129251a2d32609b.html
documentation/test_doxygen/undocumented/files.html
documentation/test_doxygen/undocumented/group__group.html
documentation/test_doxygen/undocumented/namespaceNamespace.html
documentation/test_doxygen/undocumented/structNamespace_1_1ClassInANamespace.html
documentation/test_python/layout/index.html
documentation/test_python/layout_search_binary/index.html
documentation/test_python/layout_search_open_search/index.html
documentation/test_python/link_formatting/c.link_formatting.Class.Sub.html
documentation/test_python/link_formatting/c.link_formatting.Class.html
documentation/test_python/link_formatting/c.link_formatting.pybind.Foo.html
documentation/test_python/link_formatting/m.link_formatting.html
documentation/test_python/link_formatting/m.link_formatting.pybind.html
documentation/test_python/link_formatting/m.link_formatting.sub.html
documentation/test_python/link_formatting/p.page.html
documentation/test_python/link_formatting/s.classes.html
documentation/test_python/link_formatting/s.modules.html
documentation/test_python/link_formatting/s.pages.html
documentation/test_python/test_search.py