X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?a=blobdiff_plain;f=pctb%2FREADME.charset;h=770d3f3c29313a6b4d6c30f936e7fe7ecc6952e9;hb=ce5b169e0b224fd9233736ef49eb9478d0b9f2cb;hp=49bc9c7eb0bae10f3e2e8c3e62aa64ced0d9ff33;hpb=d557fcda202bbf0217ceb2819c0adfc7c33a77fb;p=ypp-sc-tools.db-live.git diff --git a/pctb/README.charset b/pctb/README.charset index 49bc9c7..770d3f3 100644 --- a/pctb/README.charset +++ b/pctb/README.charset @@ -140,11 +140,17 @@ errors. If you think you have made mistakes answering OCR queries (for example, the recognised data is wrong), you should delete the file -#local-char*#.txt, which contains your local updates. It will then +_local-char*.txt, which contains your local updates. It will then only use the centrally provided (and vetted) master file (which is automatically updated when you run the PCTB client, by default). -Alternatively you can edit #local-char15#.txt with a text editor. The +It is also possible to have the OCR system reject particular strings. +If you put a regexp in _local-reject.txt, any OCR result which +matches this string will instead cause an OCR failure, invoking the +OCR dictionary editor if appropriate. _master-reject.txt is the +centrally maintained version of this file. + +Alternatively you can edit _local-char*.txt with a text editor. The format is not documented at the moment. @@ -152,9 +158,11 @@ Enabling interactive character set update ----------------------------------------- Now that you have read this document, you should rerun your OCR job -with the --edit-charset option. So - ./ypp-commodities --edit-charset -In future, always run it with the --edit-charset option. +with the --edit-charset option. So run + ./ypp-commodities --edit-charset +In future, this option is not usually needed, because it is the +default if there is a local character set dictionary _local-.txt +for the relevant character height. With --edit-charset, when the OCR finds characters it does not understand, it will put up an OCR resolution query window. This will