Commit | Line | Data |
---|---|---|
8818b7fc RK |
1 | DisOrder Database Versions |
2 | ========================== | |
3 | ||
4 | If no _dbversion global preference is found then database version 1 is | |
5 | assumed. Database versions 2 and above always have a _dbversion | |
6 | global preference. | |
7 | ||
8 | Old database versions can be PARTIALLY emulated for testing purposes | |
9 | by setting the undocument dbversion configuration item. Setting it on | |
10 | a production system would be a terrible idea. | |
11 | ||
12 | Database Version 1 | |
13 | ------------------ | |
14 | ||
15 | Path names are in UTF-8, but with no normalization applied: you get | |
16 | whatever the filesystem gives you. | |
17 | ||
18 | Search terms are split according to the old words() function. | |
19 | - "/", ".", "+", "&", ":", "_" and "-" are considered to be separators | |
20 | - anything in General_Category Cc, Cf, Co, Cs, Zl, Cp, Sz, Pe or Ps | |
21 | is considered to be a separator | |
22 | - anything else in General_Category Ll, Lm, Lo, Lt, Lu, Nd, Nl, No, | |
23 | Sc, Sk, Sm or So is considered to be part of a word | |
24 | - everything else is ignored | |
25 | ||
26 | Search terms are case-folded by applying the CaseFolding.txt mapping, | |
27 | without any attempt at normalization. | |
28 | ||
29 | Database Version 2 | |
30 | ------------------ | |
31 | ||
32 | Path names are in UTF-8, normalized to NFC. | |
33 | ||
34 | Search terms are split according to the default Unicode word boundary | |
35 | detection algorithm. | |
36 | ||
37 | Search terms are case-folded using the Unicode case-folding algorithm, | |
38 | normalizing to NFKD. | |
39 | ||
40 | Things that haven't been done yet: | |
41 | - undump support for new dbversion | |
42 | - automatic upgrade from dbversion 1 |