Rename yppsc-ocr-resolver to dictionary-manager, and database and charset to dictiona...

author Ian Jackson <ian@liberator.relativity.greenend.org.uk>

Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)

committer Ian Jackson <ian@liberator.relativity.greenend.org.uk>

Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)
author Ian Jackson <ian@liberator.relativity.greenend.org.uk>
Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)
committer Ian Jackson <ian@liberator.relativity.greenend.org.uk>
Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)
diff --git a/pctb/README b/pctb/README

index 9402240f4845ef5a7df0fde465dacc9def9597f0..12612966d17304005f21794ee100dab017659475 100644 (file)
--- a/pctb/README
+++ b/pctb/README
@@ -27,7 +27,7 @@ Options to vary the processing:
    --quiet               Suppress progress messages
    --screenshot-file F   Store or read screenshots in F rather than #pages#.pnm
    --window-id ID        Specified X window is the YPP client - do not search
    --quiet               Suppress progress messages
    --screenshot-file F   Store or read screenshots in F rather than #pages#.pnm
    --window-id ID        Specified X window is the YPP client - do not search
-  --edit-charset        Enable character set editing.  See README.charset.
+  --edit-dictionary     Enable dictionary editing.  See README.dictionary.
  
  Controlling what happens to the results:
    --upload (default) Upload to the PCTB server
  
  Controlling what happens to the results:
    --upload (default) Upload to the PCTB server
@@ -58,9 +58,9 @@ The program reads and writes the following files:
  
   * charset-15.txt
  
  
   * charset-15.txt
  
-   Character set database.  For the semantics of the contents of this
+   Character set dictionary.  For the semantics of the contents of this
     file see README.charset.  There is not currently any accurate
     file see README.charset.  There is not currently any accurate
-   documentation of this database format.
+   documentation of this dictionary format.
  
     If you delete this file you'll have to re-enter a lot of glyph data
     (and probably get it wrong and make the program misrecognise
  
     If you delete this file you'll have to re-enter a lot of glyph data
     (and probably get it wrong and make the program misrecognise
@@ -73,9 +73,10 @@ The program reads and writes the following files:
     Map from commodity names to the numbers required by the PCTB
     server.  This is fetched and updated automatically as necessary.
     It can safely be deleted as it will then be refetched.
     Map from commodity names to the numbers required by the PCTB
     server.  This is fetched and updated automatically as necessary.
     It can safely be deleted as it will then be refetched.
+
   * <file>.new
  
   * <file>.new
  
-   When any of these tools overwrite one of the persistent database
+   When any of these tools overwrite one of the persistent dictionary
     files, they temporarily write to <file>.new.
  
  These files are all in the current working directory.  There is not
     files, they temporarily write to <file>.new.
  
  These files are all in the current working directory.  There is not
diff --git a/pctb/README.charset b/pctb/README.dictionary

similarity index 89%

rename from pctb/README.charset

rename to pctb/README.dictionary

index f2bfc7dcef96a34c9d5f1dcff3ac00b96d6664a7..4fdc37dba10aa0decfe662e89f00bf4cea52d8f5 100644 (file)
--- a/pctb/README.charset
+++ b/pctb/README.dictionary
@@ -5,7 +5,7 @@ Sometimes the OCR will not be able to recognise some text.  By
  default, when this happens, the program will stop with a fatal error
  and refer you to this document.
  
  default, when this happens, the program will stop with a fatal error
  and refer you to this document.
  
-It is possible to fix this by editing the character set database used
+It is possible to fix this by editing the character set dictionary used
  by the OCR algorithm.  But, it is important to get these inputs right
  or your client may misrecognise text in future.  You *must* read the
  documentation here first.
  by the OCR algorithm.  But, it is important to get these inputs right
  or your client may misrecognise text in future.  You *must* read the
  documentation here first.
@@ -19,7 +19,7 @@ of `glyph' bitmaps, against the bitmap in each cell in the commodity
  table.  We match from left to right.
  
  We do not insist that each glyph is followed by whitespace, and nor do
  table.  We match from left to right.
  
  We do not insist that each glyph is followed by whitespace, and nor do
-we insist that glyphs do not contain whitespace.  Our glyph database
+we insist that glyphs do not contain whitespace.  Our glyph dictionary
  can contain entries which are strict prefixes of other entries - that
  is, a glyph for (say) `v' which is the leftmost part of another glyph
  for (say) `w'.  We resolve these ambiguities by taking the longest
  can contain entries which are strict prefixes of other entries - that
  is, a glyph for (say) `v' which is the leftmost part of another glyph
  for (say) `w'.  We resolve these ambiguities by taking the longest
@@ -29,7 +29,7 @@ So you should not be surprised if the program has matched the
  left-hand half of some letter and thinks it is a different letter.  If
  the part that it did recognise does look like the letter in question,
  that isn't wrong.  All you need to do is insert the whole of the
  left-hand half of some letter and thinks it is a different letter.  If
  the part that it did recognise does look like the letter in question,
  that isn't wrong.  All you need to do is insert the whole of the
-actual letter in the database - move the LH cursor to the start of the
+actual letter in the dictionary - move the LH cursor to the start of the
  letter, and the RH cursor to its end, and hit `return' and enter the
  correct character.  The longest match rule will mean it will prefer
  the entry you have just made.
  letter, and the RH cursor to its end, and hit `return' and enter the
  correct character.  The longest match rule will mean it will prefer
  the entry you have just made.
@@ -38,7 +38,7 @@ the entry you have just made.
  Upper vs lower case - important note regarding `l' and `I'
  ----------------------------------------------------------
  
  Upper vs lower case - important note regarding `l' and `I'
  ----------------------------------------------------------
  
-We maintain separate databases for upper and lower case.  At the
+We maintain separate dictionaries for upper and lower case.  At the
  beginning of each cell in the table, we expect uppercase; in the
  middle of a word we expect lowercase; and, unfortunately, after an
  inter-word gap, we are not sure.
  beginning of each cell in the table, we expect uppercase; in the
  middle of a word we expect lowercase; and, unfortunately, after an
  inter-word gap, we are not sure.
@@ -47,12 +47,12 @@ This is troublesome because `l' and `I' look identical on the screen.
  So any time we see a word starting with `l' or `I', the program has to
  ask about it.
  
  So any time we see a word starting with `l' or `I', the program has to
  ask about it.
  
-*Do not* make an entry in the character set database mapping `vertical
+*Do not* make an entry in the character set dictionary mapping `vertical
  stick' to `l' or `I'.  Instead, select enough of the whole word in
  question that no word would start with the other letter, and enter the
  whole word or part of it as a new glyph.
  
  stick' to `l' or `I'.  Instead, select enough of the whole word in
  question that no word would start with the other letter, and enter the
  whole word or part of it as a new glyph.
  
-For example, in the supplied database there is already a glyph for
+For example, in the supplied dictionary there is already a glyph for
  `Iron'; this is OK because there are no words which start `lron'.
  
  Do not make an entry for a string more than 7 characters long;
  `Iron'; this is OK because there are no words which start `lron'.
  
  Do not make an entry for a string more than 7 characters long;
@@ -72,7 +72,7 @@ recognised as.
  
  *You should check the alleged context before entering a character*.
  If it is wrong, you should fix it, rather that just making an entry
  
  *You should check the alleged context before entering a character*.
  If it is wrong, you should fix it, rather that just making an entry
-for the uppercase letter in the lowercase database.
+for the uppercase letter in the lowercase dictionary.
  
  Instead, make a new glyph for the last letter of the previous word
  plus the (unusually narrow) inter-word space, and end that entry with
  
  Instead, make a new glyph for the last letter of the previous word
  plus the (unusually narrow) inter-word space, and end that entry with
@@ -100,9 +100,9 @@ containing `yw' and define it as `yw'.
  Fixing mistakes
  ---------------
  
  Fixing mistakes
  ---------------
  
-The OCR query UI allows you to delete things from the glyph database.
+The OCR query UI allows you to delete things from the glyph dictionary.
  However since you are not guaranteed to actually get an OCR query at
  However since you are not guaranteed to actually get an OCR query at
-all if the database contains errors, you shouldn't rely on this.
+all if the dictionary contains errors, you shouldn't rely on this.
  
  If you think you have made mistakes answering OCR queries (for
  example, the recognised data is wrong), you should download a fresh
  
  If you think you have made mistakes answering OCR queries (for
  example, the recognised data is wrong), you should download a fresh
@@ -123,7 +123,7 @@ and in future, just always run it with the --edit-charset option.
  With --edit-charset, when the OCR finds characters it does not
  understand, it will put up an OCR resolution query window.  This will
  display the part of the text it is having trouble with, showing where
  With --edit-charset, when the OCR finds characters it does not
  understand, it will put up an OCR resolution query window.  This will
  display the part of the text it is having trouble with, showing where
-it has got to, and allow you to edit the character set database it
+it has got to, and allow you to edit the character set dictionary it
  uses for recognising the text.
  
  *This is subtle* and it is important to understand the way the
  uses for recognising the text.
  
  *This is subtle* and it is important to understand the way the
diff --git a/pctb/convert.c b/pctb/convert.c

index dd7f5d1b426dc32d5315ab37cb11793535f7c526..4cc45b4a7981d09821b76136490fad3bf5c3bb83 100644 (file)
--- a/pctb/convert.c
+++ b/pctb/convert.c
@@ -121,8 +121,8 @@ int main(int argc, char **argv) {
        o_single_page= 1;
      else if (!strcmp(arg,"--quiet"))
        o_quiet= 1;
        o_single_page= 1;
      else if (!strcmp(arg,"--quiet"))
        o_quiet= 1;
-    else if (!strcmp(arg,"--edit-charset"))
-      o_resolver= "./yppsc-ocr-resolver";
+    else if (!strcmp(arg,"--edit-dictionary"))
+      o_resolver= "./dictionary-manager";
      else if (!strcmp(arg,"--raw-tsv"))
        o_outputmode= 0;
      else if (!strcmp(arg,"--upload") ||
      else if (!strcmp(arg,"--raw-tsv"))
        o_outputmode= 0;
      else if (!strcmp(arg,"--upload") ||
diff --git a/pctb/yppsc-ocr-resolver b/pctb/dictionary-manager

similarity index 100%

rename from pctb/yppsc-ocr-resolver

rename to pctb/dictionary-manager
diff --git a/pctb/resolve.c b/pctb/resolve.c

index 052ec32fbb3981451ee25f3186a198c936f4c4fd..2c61e3b9c035622c844575c570105a5e82ed36c8 100644 (file)
--- a/pctb/resolve.c
+++ b/pctb/resolve.c
@@ -52,7 +52,7 @@ FILE *resolve_start(void) {
              DEBUGP(callout) ? "--debug" : "--noop-arg",
              "--automatic-1",
              (char*)0);
              DEBUGP(callout) ? "--debug" : "--noop-arg",
              "--automatic-1",
              (char*)0);
-      sysassert(!"execlp ocr-resolver failed");
+      sysassert(!"execlp dictionary-manager failed");
      }
      sysassert(! close(jobpipe[0]) );
      sysassert(! close(donepipe[1]) );
      }
      sysassert(! close(jobpipe[0]) );
      sysassert(! close(donepipe[1]) );
@@ -78,7 +78,7 @@ void resolve_finish(void) {
    }
  
    if (r==0) {
    }
  
    if (r==0) {
-    waitpid_check_exitstatus(resolver_pid, "character resolver");
+    waitpid_check_exitstatus(resolver_pid, "dictionary manager");
      fclose(resolver);
      close(resolver_done);
      resolver= 0;
      fclose(resolver);
      close(resolver_done);
      resolver= 0;
author	Ian Jackson <ian@liberator.relativity.greenend.org.uk>
	Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)
committer	Ian Jackson <ian@liberator.relativity.greenend.org.uk>
	Sat, 20 Jun 2009 09:14:45 +0000 (10:14 +0100)
pctb/README		patch \| blob \| history
pctb/README.dictionary	[moved from pctb/README.charset with 89% similarity]	patch \| blob \| history
pctb/convert.c		patch \| blob \| history
pctb/dictionary-manager	[moved from pctb/yppsc-ocr-resolver with 100% similarity]	patch \| blob \| history
pctb/resolve.c		patch \| blob \| history