[disorder] / server / README.dbversions

DisOrder Database Versions
==========================

If no _dbversion global preference is found then database version 1 is
assumed.  Database versions 2 and above always have a _dbversion
global preference.

Old database versions can be PARTIALLY emulated for testing purposes
by setting the undocument dbversion configuration item.  Setting it on
a production system would be a terrible idea.

Database Version 1
------------------

Path names are in UTF-8, but with no normalization applied: you get
whatever the filesystem gives you.

Search terms are split according to the old words() function.
  - "/", ".", "+", "&", ":", "_" and "-" are considered to be separators
  - anything in General_Category Cc, Cf, Co, Cs, Zl, Cp, Sz, Pe or Ps
    is considered to be a separator
  - anything else in General_Category Ll, Lm, Lo, Lt, Lu, Nd, Nl, No,
    Sc, Sk, Sm or So is considered to be part of a word
  - everything else is ignored

Search terms are case-folded by applying the CaseFolding.txt mapping,
without any attempt at normalization.

Database Version 2
------------------

Path names are in UTF-8, normalized to NFC.

Search terms are split according to the default Unicode word boundary
detection algorithm.

Search terms are case-folded using the Unicode case-folding algorithm,
normalizing to NFKD.

Things that haven't been done yet:
  - undump support for new dbversion
  - automatic upgrade from dbversion 1
Commit	Line	Data
8818b7fc RK	1	DisOrder Database Versions
	2	==========================
	3
	4	If no _dbversion global preference is found then database version 1 is
	5	assumed. Database versions 2 and above always have a _dbversion
	6	global preference.
	7
	8	Old database versions can be PARTIALLY emulated for testing purposes
	9	by setting the undocument dbversion configuration item. Setting it on
	10	a production system would be a terrible idea.
	11
	12	Database Version 1
	13	------------------
	14
	15	Path names are in UTF-8, but with no normalization applied: you get
	16	whatever the filesystem gives you.
	17
	18	Search terms are split according to the old words() function.
	19	- "/", ".", "+", "&", ":", "_" and "-" are considered to be separators
	20	- anything in General_Category Cc, Cf, Co, Cs, Zl, Cp, Sz, Pe or Ps
	21	is considered to be a separator
	22	- anything else in General_Category Ll, Lm, Lo, Lt, Lu, Nd, Nl, No,
	23	Sc, Sk, Sm or So is considered to be part of a word
	24	- everything else is ignored
	25
	26	Search terms are case-folded by applying the CaseFolding.txt mapping,
	27	without any attempt at normalization.
	28
	29	Database Version 2
	30	------------------
	31
	32	Path names are in UTF-8, normalized to NFC.
	33
	34	Search terms are split according to the default Unicode word boundary
	35	detection algorithm.
	36
	37	Search terms are case-folded using the Unicode case-folding algorithm,
	38	normalizing to NFKD.
	39
	40	Things that haven't been done yet:
	41	- undump support for new dbversion
	42	- automatic upgrade from dbversion 1