X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?p=ypp-sc-tools.web-live.git;a=blobdiff_plain;f=pctb%2FREADME.charset;h=25eb5d88feb9e6d8ad603533610219eb34c682b5;hp=49bc9c7eb0bae10f3e2e8c3e62aa64ced0d9ff33;hb=276388b200fb1d938558ffea66938034e988da88;hpb=d557fcda202bbf0217ceb2819c0adfc7c33a77fb diff --git a/pctb/README.charset b/pctb/README.charset index 49bc9c7..25eb5d8 100644 --- a/pctb/README.charset +++ b/pctb/README.charset @@ -144,6 +144,12 @@ example, the recognised data is wrong), you should delete the file only use the centrally provided (and vetted) master file (which is automatically updated when you run the PCTB client, by default). +It is also possible to have the OCR system reject particular strings. +If you put a regexp in #local-reject#.txt, any OCR result which +matches this string will instead cause an OCR failure, invoking the +OCR dictionary editor if appropriate. #master-reject#.txt is the +centrally maintained version of this file. + Alternatively you can edit #local-char15#.txt with a text editor. The format is not documented at the moment. @@ -152,9 +158,11 @@ Enabling interactive character set update ----------------------------------------- Now that you have read this document, you should rerun your OCR job -with the --edit-charset option. So - ./ypp-commodities --edit-charset -In future, always run it with the --edit-charset option. +with the --edit-charset option. So run + ./ypp-commodities --edit-charset +In future, this option is not usually needed, because it is the +default if there is a local character set dictionary #local-#.txt +for the relevant character height. With --edit-charset, when the OCR finds characters it does not understand, it will put up an OCR resolution query window. This will