21st Oct 2016


EU book digitisation project needs 'Wikipedia'-style army of volunteer editors

  • Significant European historical texts will be accessible and searchable online more rapidly (Photo:

An EU partnership with Israeli researchers from US computing firm IBM to digitise major European historical texts is seeking volunteers to help boost the accuracy of scanned texts in a process that will reduce from hours to just minutes the amount of time it currently takes to digitise documents.

The goal of the digitisation project, dubbed Impact (ImProving ACcess to Text), is to increase the accuracy of scanned texts and also editable and searchable online.

Dear EUobserver reader

Subscribe now for unrestricted access to EUobserver.

Sign up for 30 days' free trial, no obligation. Full subscription only 15 € / month or 150 € / year.

  1. Unlimited access on desktop and mobile
  2. All premium articles, analysis, commentary and investigations
  3. EUobserver archives

EUobserver is the only independent news media covering EU affairs in Brussels and all 28 member states.

♡ We value your support.

If you already have an account click here to login.

A new web-based optical character recognition (OCR) technology and online collaboration of institutions aims to help with recognition of texts with faded ink or unusually-shaped typefaces, which are currently scanned only as static images.

The project's researchers believes that the new system will provide between 25 percent and 50 percent greater accuracy than standard recognition programmes.

According to them, the online collaborative correction system hopes to attract volunteers to aid in the process similar to the unpaid editor army that corrects Wikipedia entries and then learn from errors that have been recognised by humans.

The new technology accelerates the process of locating questionable text scans and then enables reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination "r" and "n" can sometimes be difficult for a computer to distinguish from the letter 'm'. In these cases, the system collects a variety of instances of the letter 'm' and places these samples next to the letters in question, making it much easier to determine the letter's real identity.

And where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers then just accept or reject suggested substitutes with one keystroke.

Previously, a small book that normally takes four hours to key in manually or one hour using standard OCR technology with manual correction. But the new system cuts the process down to 30 minutes. Researchers believe they will soon be able to cut the time down to 15 minutes as the system enriches its dictionary, learning from the human volunteers.

Brussels and IBM announced on Thursday that they are to expand the new technology to some two dozen national libraries, research institutes, universities and companies, including the British Library, the German National Library and the Poznan Supercomputing and Networkign Centre in Poland.

The European Commission delivered fresh attention to digitisation last year after Google said it planned to make millions of books available on-line, a move that disquieted some European publishers and copyright owners.

Women shake Poland's pillars of power

Polish women are marching again this Sunday and Monday. They could succeed where the opposition, the European Commission and other protests failed, and redraw Poland's political map.

News in Brief

  1. Canada and Wallonia end talks without Ceta deal
  2. Juncker hopes for Canada accord in 'next few days'
  3. Romania drops opposition to Ceta
  4. Difficulties remain on Ceta deal, says Walloon leader
  5. Brexit could lead to 'some civil unrest' in Northern Ireland
  6. ECB holds rates and continues quantitive easing programme
  7. Support for Danish People's Party drops, poll
  8. Spain's highest court overturns Catalan ban on bullfighting

Stakeholders' Highlights

  1. EFADraft Bill for a 2nd Scottish Independence Referendum
  2. UNICEFCalls on European Council to Address Plight of Refugee and Migrant Children
  3. ECTAJoin us on 9-10 November in Brussels and Discover the new EU Digital Landscape
  4. Access NowCan you Hear me now? Verizon’s Opportunity to Stand for Global Users
  5. Belgrade Security ForumMeaningful Dialogue Missing Not Only in the Balkans, but Throughout Europe
  6. EASPDJoin the Trip! 20 Years on the Road. Conference & Photo Exhibition on 19-21 October
  7. EuropecheEU Fishing Sector Celebrates Sustainably Sourced Seafood in EU Parliament
  8. World VisionWomen and Girls Urge EU Leadership to Help end Gender-based Violence
  9. Dialogue PlatformIs Jihadism Blind Spot of Western Intellectuals ? Wednesday 26 October
  10. Belgrade Security ForumGet the Latest News and Updates on the Belgrade Security Forum @BelSecForum
  11. Crowdsourcing Week EuropeMaster Crowdsourcing, Crowdfunding and Innovation! Conference 21 November - 10% Discount Code CSWEU16
  12. EJCEU Parliament's Roadmap for Relations with Iran a Massive Missed Opportunity

Latest News

  1. Dieselgate: German environment officials 'heard only rumours'
  2. Wallonia still refuses to buy the Ceta "cat in a bag"
  3. Women shake Poland's pillars of power
  4. Malta, Latvia, and Hungary top EU obesity charts
  5. British PM asserts her role in EU 'nest of doves'
  6. Italy shields Russia from EU sanctions threat
  7. EU and Wallonia still stuck on Canada accord
  8. Dieselgate isn't my fault, says German transport minister