X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?a=blobdiff_plain;f=pctb%2FREADME.charset;h=770d3f3c29313a6b4d6c30f936e7fe7ecc6952e9;hb=41ac8cfeadd4eac0927b8fce086c805c2753ba77;hp=31e1221d7b4340c1f2bb1ed68fb0366f69c0478c;hpb=a16191c13cdc1eab1e43cc9662c2271481b3a9b8;p=ypp-sc-tools.main.git diff --git a/pctb/README.charset b/pctb/README.charset index 31e1221..770d3f3 100644 --- a/pctb/README.charset +++ b/pctb/README.charset @@ -140,11 +140,17 @@ errors. If you think you have made mistakes answering OCR queries (for example, the recognised data is wrong), you should delete the file -#local-char*#.txt, which contains your local updates. It will then +_local-char*.txt, which contains your local updates. It will then only use the centrally provided (and vetted) master file (which is automatically updated when you run the PCTB client, by default). -Alternatively you can edit #local-char15#.txt with a text editor. The +It is also possible to have the OCR system reject particular strings. +If you put a regexp in _local-reject.txt, any OCR result which +matches this string will instead cause an OCR failure, invoking the +OCR dictionary editor if appropriate. _master-reject.txt is the +centrally maintained version of this file. + +Alternatively you can edit _local-char*.txt with a text editor. The format is not documented at the moment. @@ -155,7 +161,7 @@ Now that you have read this document, you should rerun your OCR job with the --edit-charset option. So run ./ypp-commodities --edit-charset In future, this option is not usually needed, because it is the -default if there is a local character set dictionary #local-#.txt +default if there is a local character set dictionary _local-.txt for the relevant character height. With --edit-charset, when the OCR finds characters it does not