X-Git-Url: https://www.chiark.greenend.org.uk/ucgi/~yarrgweb/git?a=blobdiff_plain;f=pctb%2FREADME.charset;h=e1fd3ff5f86ba8f41e9843d2255e11edefe120b2;hb=cde017ed6b76840ce2ae1aa5fc740a6e06352f92;hp=bbabb057e6bedd1016f92dd1d2fc63acc41ac24f;hpb=2337ae5465a29659b44037dcbdaf6fa03eb46d84;p=ypp-sc-tools.web-live.git

diff --git a/pctb/README.charset b/pctb/README.charset
deleted file mode 100644
index bbabb05..0000000
--- a/pctb/README.charset
+++ /dev/null
@@ -1,125 +0,0 @@
-Character set query tool, and semantics of the glyphs
------------------------------------------------------
-
-Sometimes the OCR will not be able to recognise some text and you will
-have to help it out.  It will display the part it is having trouble
-with, showing where it has got to, and allow you to edit the character
-set database it uses for recognising the text.
-
-*This is subtle* and it is important to understand the way the
-machinery works, and the possible mistakes you can make, before
-answering the program.  *Please read this documentation*
-
-If you need help please ask me (ijackson@chiark.greenend.org.uk, or
-Aristarchus on Midnight in game if I'm on line, or ask any pirate of
-the crew Special Circumstances if they happen to know where I am
-and/or can get in touch).
-
-
-Recognition algorithm
----------------------
-
-We recognise the text in the commodity screen by doing exact matching
-of `glyph' bitmaps, against the bitmap in each cell in the commodity
-table.  We match from left to right.
-
-We do not insist that each glyph is followed by whitespace, and nor do
-we insist that glyphs do not contain whitespace.  Our glyph database
-can contain entries which are strict prefixes of other entries - that
-is, a glyph for (say) `v' which is the leftmost part of another glyph
-for (say) `w'.  We resolve these ambiguities by taking the longest
-(widest) glyph which matches.
-
-So you should not be surprised if the program has matched the
-left-hand half of some letter and thinks it is a different letter.  If
-the part that it did recognise does look like the letter in question,
-that isn't wrong.  All you need to do is insert the whole of the
-actual letter in the database - move the LH cursor to the start of the
-letter, and the RH cursor to its end, and hit `return' and enter the
-correct character.  The longest match rule will mean it will prefer
-the entry you have just made.
-
-
-Upper vs lower case - important note regarding `l' and `I'
-----------------------------------------------------------
-
-We maintain separate databases for upper and lower case.  At the
-beginning of each cell in the table, we expect uppercase; in the
-middle of a word we expect lowercase; and, unfortunately, after an
-inter-word gap, we are not sure.
-
-This is troublesome because `l' and `I' look identical on the screen.
-So any time we see a word starting with `l' or `I', the program has to
-ask about it.
-
-*Do not* make an entry in the character set database mapping `vertical
-stick' to `l' or `I'.  Instead, select enough of the whole word in
-question that no word would start with the other letter, and enter the
-whole word or part of it as a new glyph.
-
-For example, in the supplied database there is already a glyph for
-`Iron'; this is OK because there are no words which start `lron'.
-
-Do not make an entry for a string more than 7 characters long;
-currently we cannot cope (and you'll have to remove it manually from
-the charset-15.txt file).
-
-
-Short inter-word gaps
----------------------
-
-It can happen that the problem you are being asked about is caused by
-the program failing to spot an inter-word gap and mistakenly thinks
-that the next word is necessarily in lowercase, so fails to recognise
-an uppercase letter.  The context in which each glyph was recognised
-is shown on the screen, underneath the text which shows what it was
-recognised as.
-
-*You should check the alleged context before entering a character*.
-If it is wrong, you should fix it, rather that just making an entry
-for the uppercase letter in the lowercase database.
-
-Instead, make a new glyph for the last letter of the previous word
-plus the (unusually narrow) inter-word space, and end that entry with
-\x20 (yes, type \ x 20).
-
-For example, you might find that `y<space>G' is treated as
-`y<??lowercase>' and the G doesn't get matched.  Select the `y<space>'
-region of the bitmap and type `y\x20' into the string box.
-Sorry for this rather poor UI!
-
-
-Overlapping characters - ligatures
-----------------------------------
-
-Some of the characters in the font used overlap with the next
-character.  When this happens, select both the characters and enter
-them together as one glyph with a multi-character definition.
-
-For example `yw' is rendered with the top right corner of the `y' and
-the top left corner of the `w' overlapping.  This is dealt with by
-matching the whole merged thing - select the region of the screen
-containing `yw' and define it as `yw'.
-
-
-Fixing mistakes
----------------
-
-The OCR query UI allows you to delete things from the glyph database.
-However since you are not guaranteed to actually get an OCR query at
-all if the database contains errors, you shouldn't rely on this.
-
-If you think you have made mistakes answering OCR queries (for
-example, the recognised data is wrong), you should download a fresh
-copy of charset-15.txt from
- http://www.chiark.greenend.org.uk/~ijackson/ypp-sc-tools/master/pctb/charset-15.txt
-
-
-Send me your updates
---------------------
-
-The character set is in the file `charset-15.txt'.  When you enter new
-characters, they are added there.  If you do this, please email me
-your charset file (ijackson@chiark.greenend.org.uk) so that I can
-include your contributions in future versions.  This will also let me
-check that they seem right :-).