Newspapers often represent the most extensive documentation of a community’s activities. The small weekly rural publication or the long running daily newspaper in larger urban centres bring forward a community’s priorities and perspectives in a way that no other material can. Large-scale digitization of newspapers, including by the now defunct Google News project, has been accomplished only rarely, but ODW is proud to host the 4th largest online newspaper collection. By enabling local organizations like public libraries, archives and museums to digitize and present their community newspapers online, ODW ensures open access to more than 2 million pages of heritage news, index records, vital statistics, clippings and more.
Largest collection of Ontario Community Newspapers online, from 1810 to present day
ODW’s two major newspaper portals, Ontario Community Newspapers (OCN) and INK, are gateways to hundreds of Ontario newspaper titles from more than 200 years of publishing history. The OCN portal brings together different kinds of newspaper content from across our VITA client collections: all Ontario-based VITA collections of newspaper indexes, clippings, or full-run digitized newspaper pages automatically flow into the OCN portal for a one-search end user experience. The INK portal is a sister site of digitized newspapers, the list of titles that include full-run digital newspapers from VITA collections plus others that have been digitized by University of Windsor.
The range of materials available online through ODW is extensive. Newspaper titles range from the abolitionist newspapers the Provincial Freeman and Voice of the Fugitive to community newspapers from across Ontario including the Border Cities Star, Georgetown Herald, Stouffville Sun-Tribune, and the British Whig. We endeavour to ensure that all legacy work is captured and linked wherever possible, bringing newspaper indexes online to match up with their parent publications with digital page views.
As more libraries find a place to share their local history online, ODW has become a host for many Illinois collections. The Illinois Newspapers portal aggregates full run digitized newspapers with index records for access to dozens of Chicago-area publications from Libertyville, Wilmette, Algonquin Area and more.
Large-scale, Multilingual Optical Character Recognition
ODW has pioneered several areas of Optical Character Recognition (OCR) development and deployment for newspaper digitization. OCR can be a tremendous bottleneck in newspaper digitization because of the text-heavy nature of newspapers. Newspaper pages often contain eight times the amount of text found on a book page, with more complex layouts and significant quality control challenges.
Building on open source technologies, ODW has mapped out an infrastructure to achieve high volume throughput and built expertise in adding new language support to OCR. In 2013, ODW successfully processed a collection of newspapers publications in 14 languages including Inuktitut, a syllabic script of one of Canada’s First Peoples. Read more…