X-Git-Url: https://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?p=ypp-sc-tools.db-test.git;a=blobdiff_plain;f=pctb%2FREADME.charset;h=0d7f1623d00dbb735e43a56e4e5a293008cb4e4f;hp=c57f2f5187a537b85c77f793493eefd9f7f0a3e3;hb=21b1420b1f35ea2ae9440f9db9009093a8b6eae2;hpb=8d6cf0f224b5df9866eba9350343067edcee78dd diff --git a/pctb/README.charset b/pctb/README.charset index c57f2f5..0d7f162 100644 --- a/pctb/README.charset +++ b/pctb/README.charset @@ -54,10 +54,11 @@ the uppercase and lowercase dictionaries; if one matches and the other doesn't, or one matches a wider character than the other, we use it. If that fails to resolve the ambiguity we must ask. -*Do not* make an entry in the character set dictionary mapping `vertical -stick' to `l' or `I'. Instead, select enough of the whole word in -question that no word would start with the other letter, and enter the -whole word or part of it as a new glyph as a new Word. +*Do not* make an entry in the character set dictionary mapping +`vertical stick' to `l' or `I'. Instead, select enough of the whole +word in question that no word would start with the other letter, and +enter the whole word or part of it as a new glyph as a new entry in +the Word dictionary. For example, in the supplied dictionary there is already a glyph for `Iron'; this is OK because there are no words which start `lron'. @@ -91,7 +92,8 @@ Overlapping characters - ligatures Some of the characters in the font used overlap with the next character. When this happens, select both the characters and enter -them together as one glyph with a multi-character definition. +them together as one glyph with a multi-character definition, as a new +entry in the Lower or Upper dictionary. For example `yw' is rendered with the top right corner of the `y' and the top left corner of the `w' overlapping. This is dealt with by @@ -102,9 +104,12 @@ containing `yw' and define it as `yw'. Fixing mistakes --------------- -The OCR query UI allows you to delete things from the glyph dictionary. -However since you are not guaranteed to actually get an OCR query at -all if the dictionary contains errors, you shouldn't rely on this. +The OCR query UI allows you to delete things from the glyph +dictionary. However since you are not guaranteed to actually get an +OCR query at all (and since it is not possible to override the +presence of an entry in the master database with the absence of one in +the local database), if the dictionary contains errors, you shouldn't +rely on this. If you think you have made mistakes answering OCR queries (for example, the recognised data is wrong), you should delete the file @@ -129,14 +134,22 @@ display the part of the text it is having trouble with, showing where it has got to, and allow you to edit the character set dictionary it uses for recognising the text. -*This is subtle* and it is important to understand the way the +The process is subtle and it is important to understand the way the machinery works, and the possible mistakes you can make, before -answering the program. *Please read this documentation*, which +answering the program. So *Please read this documentation*, which explains the meaning of the entries you make. -Also, the character set updates you make will by default be submitted -to my server so that they can be checked by me and shared with other -users. See README.privacy. +You must specify the dictionary to which the new glyph should be +added, by selecting the appropriate radiobutton or by pressing one of +U D L W for Upper, Digit, Lower, Word. Word is only correct +right if the match failure is a new word starting with l or I (see +above). Upper or Lower is correct for single letters and ligatures. +for new ligatures. Use Upper for punctuation and Digit for `>' and +digits. + +The character set updates you make will by default be submitted to my +server so that they can be checked by me and shared with other users. +See README.privacy. If you need help please ask me (ijackson@chiark.greenend.org.uk, or Aristarchus on Midnight in game if I'm on line, or ask any pirate of