OurDigitalWorld shared our digitization expertise at Digital Odyssey 2009. Three of our board members gave talks about various aspects of their work:
Loren Fantin talked about Planning and Managing a Digitization Project – Robert Keshen blogged about her talk for the OLITA blog:
When starting a digitization project, partnerships should also be sought out. Shared resources and regional representations can result from these partnerships. An example of a successful collaboration is Picture St Marys, where the library partnered with the museum to digitize their collection for the betterment of the community….
Promotion often slips people’s minds but it is very important. There are two types of promotion strategies: push and pull. Push strategies include press releases, brochures, emails, presentations and advertising. Pull strategies include collaboration, events, and user feedback. There is no need to wait until project is over before promotion begins. Make it part of the process to make people aware of what you are doing.
Walter Lewis discussed The Perfectability of Data:
Walter noted the key measures of perfectability:
- granularity of metadata (a well defined set of name fields for example – payback: genealogists, last name faceting; name re-ordering for citations etc)
- internal consistency of data (HINT! use lookup tables to avoid typos and creative data entry staff …and note that punctuation should not be used for data separation)
- external consistency (so you play nice with others, and can be discovered and displayed in other spaces)
Digital rights is one area of metadata that is of growing use… and currently one of the least clear in any machine-readable format.
Art Rhyno covered OCR Options for Scanned Content:
As the publisher of the Essex Free Press, a smaller community newspaper with archives going back over many years, Art focused his OCR comments and examples on the heritage newspapers… The “problem” with newspaper content is the sheer volume of the material. reels and reels of microfilm are the usual source files ( or… even worse, microfiche!) Many of the small community newspapers participated in the move to microfilm as a way of throwing away the backlog of print archives…. so a microfilm or fiche may be the only source available. And the volume of content is HUGE – An average small weekly paper will generate a usual 16 page paper or 800 pages per year … these all accumulate dense text content.
Scanned and stored on microfilm reels- eventually these reels will involve about 27 hours of processing per reel, given current OCR engines – so there is a lot of wait which the scanned content is processed. (whether using the server or desktop based version)
One of the more interesting costs of OCR licensing for the engines is the fact that the density of content and the size of a newspaper page eats up and actual 3-4 pages of license! Even so, costs are quite reasonable. Samples of ABBYY OCR’ing and the actual microfilm were shared and the common kinds of errors were noted.