We’ll be presenting on Friday April 28th at the Creative Commons Summit, about moving cultural heritage into the Digital Commons. We’ll be talking with staff from the Internet Archive and people working on projects across the world!
Keep an eye on our Twitter account, where we’ll be sharing information from all the sessions most relevant to cultural heritage institutions in Ontario.
Here’s the abstract of his presentation:
Newspaper digitization can be very expensive for an organization. Since 2007, the University of Windsor in Ontario, Canada, and OurDigitalWorld/OurOntario (ODW) have been digitizing historical Ontario newspaper collections, often from microfilm and microfiche sources. With very little funding, the project has managed to assemble nearly 2 million pages of content. It has been necessary to find cost-effective strategies for every part of the digitization process, from scanning to Optical Character Recognition (OCR), right through to delivering newspapers online. This has been made possible by the richness and variety of Open Source solutions (particularly Tesseract for OCR, Olena for page segmentation, Hadoop for volume, and Solr/Lucene for indexing), and the cooperative nature of the project. Recently, ODW has tackled scanning directly, and constructed a prototype microform scanner using macro photography and a makerspace ethic. This session will outline the solutions identified from a decade of newspaper digitization and offer tips for making digitization an option on even the smallest of budgets.