X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?a=blobdiff_plain;f=pctb%2FREADME;h=e9f4f097b06adb1e6fe73a9c4a5a5b34766170a7;hb=2c6eeb55b7339bf639ab93826446ffb4390e8f78;hp=003df0ce009eff0690535e20554a2e846bf1ce39;hpb=e888c1dd3476ca49bccf82b93b4a3633587d400d;p=ypp-sc-tools.web-live.git diff --git a/pctb/README b/pctb/README index 003df0c..e9f4f09 100644 --- a/pctb/README +++ b/pctb/README @@ -12,6 +12,8 @@ To run it, change to this directory, type `make', and then: While it is capturing the screenshots, do not move the mouse or use the keyboard. Keyboard focus must stay in the YPP client window. +You will probably need to turn off `Use antialiased font' in the YPP +client. This is in the Ye panel, Options, tab `General'. Command-line options -------------------- @@ -28,8 +30,12 @@ Options to vary the processing: --screenshot-file F Store or read screenshots in F rather than #pages#.pnm --window-id ID Specified X window is the YPP client - do not search --edit-charset Enable character set editing. See README.dictionary. + --find-island Find and print the ocean and island. Suppresses OCR + and output unless used with result processing option. + --test-servers Set default servers to be the test servers, not + the real live ones (doesn't affect explicit settings). -Controlling what happens to the results: +Controlling what happens to the results - only one at a time: --upload (default) Upload to the PCTB server --tsv Print data as clean tab-separated-values file --raw-tsv Dump the raw (not deduped, unsorted) OCR'd data @@ -69,17 +75,24 @@ The program reads and writes the following files: it. Don't try `display vid:#pages#.pnm' as this will consume truly stupendous quantities of RAM - it wedged my laptop. - * charset-15.txt + * #master-char*#.txt #local-char*#.txt + #master-pixmap#.txt #local-pixmap#.txt - Character set dictionary. For the semantics of the contents of this - file see README.charset. There is not currently any accurate + Character set and image dictionaries. For the semantics of the + char* files README.charset. There is not currently any accurate documentation of this dictionary format. - If you delete this file you'll have to re-enter a lot of glyph data - (and probably get it wrong and make the program misrecognise - things). If you want to undo any mistakes you may have made - answering OCR questions you can safely revert this to the version - I've supplied. + #master-*#.txt contain the centrally defined and approved data. + They are downloaded automatically from the SC PCTB server and + updated each run. You can safely delete this file, if everything + is online, if you want to fetch a fresh copy. + + #local-*#.txt are a local copy of your submissions, so that they + will be used by your client pending approval by me. You can delete + this file if you think you may have made a mistake. + + See README.privacy for details of the communications with the SC + server about the contents of these dictionaries. * #commodmap#.tsv @@ -87,6 +100,12 @@ The program reads and writes the following files: server. This is fetched and updated automatically as necessary. It can safely be deleted as it will then be refetched. + * #upload-1#.html #upload-2#.html + + We screenscrape the pages from the PCTB upload server. The actual + HTML returned from the upload server is left in these dropping + files for debugging etc. + * .new When any of these tools overwrite one of the persistent dictionary @@ -112,6 +131,7 @@ This program has quite a few dependencies: - pnm command line utilities for image manipulation netpbm - X11 libraries, including dev files for building libx11-dev - XTEST library, including dev files for building libxtst-dev + - Perl-compatible regexp library, including dev files libpcre3-dev - Tk interpreter /usr/bin/wish tk8.4 - Perl module XML::Parser libxml-parser-perl - Perl module JSON::Parser libjson-perl