We are working on revamping most of our tools and infrastructure to an open source, community-based infrastructure. We are not there yet, but we will continue to make our work available on our Github repository. Please contact us if you would like to get involved.
The metadata we aggregate and index from over 200 organizations via the OurOntario.ca portal is available as search results in various formats: Dublin Core, MODS, Solr, RDF, RSS and Atom.
We have pioneered several areas of Optical Character Recognition (OCR) development and deployment for newspaper digitization, with a focus on adding language support to OCR. In 2013, ODW trained Tesseract for Inuktitut to OCR a publication called Inuit Today for the Multicultural Historical Society of Ontario, and in 2015 assisted Nunavut Tunngavik Inc with an expanded training set.
DPLA Instance Pilot Project
In 2015, ODW hired intern Andrew Park (see posting) to work on a pilot project in collaboration with Ryerson University Library and Archives and the Digital Public Library of America community to set up and test a replication of the DPLA platform. His work on building a front end is here. We are working with the BCPDL group (British Columbia Provincial Digital Library) and continuing to experiment on an open source and community based solution to aggregating content. The work will be available on our Github site.
See our work using Open Refine for normalizing newspaper indexes prior to ingestion to the VITA Toolkit. Often legacy indexes are captured in systems and structured or unstructured documents. These records are copied into Open Refine and using this and Excel, the record contents are broken out to re-arranged to match the fields they’re going into (e.g. machine readable dates, subjects, titles, personal or corporate names, type of record, etc.), then duplicate records are identified and merged, inconsistencies in spelling and capitalization are removed and corrected, and authorized Library of Congress subject headings and geo-locations added to provide dynamic links in the final display in VITA.