Overview -------- This tool can: - screenscrape the commodities trading screen - produce the results as a tab separated values file - **TODO** upload the results to PCTB To run it, change to this directory, type `make', and then: ./ypp-commodities --tsv >commods.tsv While it is capturing the screenshots, do not move the mouse or use the keyboard. Keyboard focus must stay in the YPP client window. Command-line options -------------------- Setting the operation mode: --find-window-only Just check that we can find the YPP client window. --screenshot-only Page through and take screenshots, do not OCR --analyse-only | --same Process previously taken screenshots --everything (default) Take screenshots and process them Options to vary the processing: --single-page One screenful, no paging - results will be incomplete --quiet Suppress progress messages --screenshot-file F Store or read screenshots in F rather than #pages#.pnm --window-id ID Specified X window is the YPP client - do not search --edit-charset Enable character set editing. See README.charset. Controlling what happens to the results: --upload (default) Upload to the PCTB server --tsv Print data as clean tab-separated-values file --raw-tsv Dump the raw (not deduped, unsorted) OCR'd data --best-prices Print best buy and sell price for each commodity --arbitrage Print arbitrage opportunities Files we use and update ----------------------- The program reads and writes the following files: * #pages#.pnm Contains one or more images (as raw ppms, end-to-end) which are the screenshots taken in the last run. This is (over)written whenever we take screenshots from the YPP client. You can reprocess an existing set of screenshots with the --same (aka --analyse-only) option; in that case we just read the screenshots file. You can specify a different file with --screenshot-file. If you want to display the contents of this file, `display' can do it. Don't try `display vid:#pages#.pnm' as this will consume truly stupendous quantities of RAM - it wedged my laptop. * charset-15.txt Character set database. For the semantics of the contents of this file see README.charset. There is not currently any accurate documentation of this database format. If you delete this file you'll have to re-enter a lot of glyph data (and probably get it wrong and make the program misrecognise things). If you want to undo any mistakes you may have made answering OCR questions you can safely revert this to the version I've supplied. * #commodmap#.tsv Map from commodity names to the numbers required by the PCTB server. This is fetched and updated automatically as necessary. It can safely be deleted as it will then be refetched. * .new When any of these tools overwrite one of the persistent database files, they temporarily write to .new. These files are all in the current working directory. There is not yet any feature to have them be somewhere else. The helper programs yppsc-ocr-resolver yppsc-commod-processor must (currently) also be in the current directory. Future versions may have more helpers and more data files. Installation requirements ------------------------- This program has quite a few dependencies: Package (Debian etch) - For building, C compiler and build environment build-essential - pnm library, including dev files for building libnetpbm10-dev - pnm command line utilities for image manipulation netpbm - X11 libraries, including dev files for building libx11-dev - XTEST library, including dev files for building libxtst-dev - Tk interpreter /usr/bin/wish tk8.4 - Perl module XML::Parser libxml-parser-perl - Perl module JSON::Parser libjson-perl - XTEST extension in the X server (part of X package) - Perl interpreter and basic modules perl (usu.installed) On other Linux distros the packages may have different names, but these should be roughly right for Debian and its derivatives. Reporting problems ------------------ If you need to report a bug, for example an inability to recognise, please be sure to remember the exact error message and circumstances. Also, for recognition problems there will probably be a very useful screenshot file called `#pages#.pnm'. This is likely to be very large so don't just email it to me, but if you can put it up on a webpage for me to download that will help. At least keep a copy of it. If the problem is a failure to cope with some particular YPP client display and is reproducible, try running: ./ypp-commodities --raw-tsv --single-page If this reproduces the problem, please email me the screenshot file #pages#.pnm, which will consist only of the single screen, plus the error messasge. I'll then be able to understand what's wrong, hopefully. Phoning home - privacy ---------------------- The main purpose of this program is to connect to the PCTB server and upload data. The program does not currently phone home at all in modes other than --upload, and when it does it connects to the PCTB server not to a system of mine. However, there are some improvements which I may introduce in the future which may change this. I am considering: * Having the ocr character resolver talk to a server run by me to look for missing glpyhs, and/or upload those glyphs back to that server so that they can be shared. * Having the upload client upload a copy of the data to a server run by me, when run in --upload mode. If I do do this these new functions may be enabled by default, but it will be possible to turn them off, or direct them to different servers, with command-line options, and they will be documented here. - Ian Jackson ijackson@chiark.greenend.org.uk Aristarchus on the Midnight ocean