chiark / gitweb /
utf32_word_split() and utf8_word_split() splits a string into words
using the UAX #29 word boundary algorithm. words() is therefore now a
wrapper around this. There is scope for improvement in the use of
this function as currently we do some needless converting back and
forth between encoding forms.
casefold() now uses the compatibility case-folding algorithm, which
seems more appropriate for searching.
dbversions are now integers not strings. Some dbversion=2
functionality can be selectively disabled for testing purposes.
README.dbversions documents the differences between the dbversions.
12 files changed: