X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?a=blobdiff_plain;f=pctb%2FREADME.charset;h=f2bfc7dcef96a34c9d5f1dcff3ac00b96d6664a7;hb=997f6b419424acb60c5277b8c525df6338422ab9;hp=bbabb057e6bedd1016f92dd1d2fc63acc41ac24f;hpb=2337ae5465a29659b44037dcbdaf6fa03eb46d84;p=ypp-sc-tools.db-live.git diff --git a/pctb/README.charset b/pctb/README.charset index bbabb05..f2bfc7d 100644 --- a/pctb/README.charset +++ b/pctb/README.charset @@ -1,19 +1,14 @@ -Character set query tool, and semantics of the glyphs ------------------------------------------------------ +Handing OCR failures +-------------------- -Sometimes the OCR will not be able to recognise some text and you will -have to help it out. It will display the part it is having trouble -with, showing where it has got to, and allow you to edit the character -set database it uses for recognising the text. +Sometimes the OCR will not be able to recognise some text. By +default, when this happens, the program will stop with a fatal error +and refer you to this document. -*This is subtle* and it is important to understand the way the -machinery works, and the possible mistakes you can make, before -answering the program. *Please read this documentation* - -If you need help please ask me (ijackson@chiark.greenend.org.uk, or -Aristarchus on Midnight in game if I'm on line, or ask any pirate of -the crew Special Circumstances if they happen to know where I am -and/or can get in touch). +It is possible to fix this by editing the character set database used +by the OCR algorithm. But, it is important to get these inputs right +or your client may misrecognise text in future. You *must* read the +documentation here first. Recognition algorithm @@ -115,6 +110,33 @@ copy of charset-15.txt from http://www.chiark.greenend.org.uk/~ijackson/ypp-sc-tools/master/pctb/charset-15.txt +Enabling interactive character set update +----------------------------------------- + +Now that you have read this document, you should rerun your OCR job +with the --edit-charset option. You probably want to supply --same as +well, to avoid having to wait for it to page through and recapture all +the screenshots. So, this time, + ./ypp-commodities --edit-charset --same +and in future, just always run it with the --edit-charset option. + +With --edit-charset, when the OCR finds characters it does not +understand, it will put up an OCR resolution query window. This will +display the part of the text it is having trouble with, showing where +it has got to, and allow you to edit the character set database it +uses for recognising the text. + +*This is subtle* and it is important to understand the way the +machinery works, and the possible mistakes you can make, before +answering the program. *Please read this documentation*, which +explains the meaning of the entries you make. + +If you need help please ask me (ijackson@chiark.greenend.org.uk, or +Aristarchus on Midnight in game if I'm on line, or ask any pirate of +the crew Special Circumstances if they happen to know where I am +and/or can get in touch). + + Send me your updates -------------------- @@ -123,3 +145,6 @@ characters, they are added there. If you do this, please email me your charset file (ijackson@chiark.greenend.org.uk) so that I can include your contributions in future versions. This will also let me check that they seem right :-). + +In future I may have the program phone home automatically so that I +can double-check your answers and distribute them in the next version.