Tuesday

17th Oct 2017

Focus

EU book digitisation project needs 'Wikipedia'-style army of volunteer editors

  • Significant European historical texts will be accessible and searchable online more rapidly (Photo: Flickr.com)

An EU partnership with Israeli researchers from US computing firm IBM to digitise major European historical texts is seeking volunteers to help boost the accuracy of scanned texts in a process that will reduce from hours to just minutes the amount of time it currently takes to digitise documents.

The goal of the digitisation project, dubbed Impact (ImProving ACcess to Text), is to increase the accuracy of scanned texts and also editable and searchable online.

Thank you for reading EUobserver!

Subscribe now for a 30 day free trial.

  1. €150 per year
  2. or €15 per month
  3. Cancel anytime

EUobserver is an independent, not-for-profit news organization that publishes daily news reports, analysis, and investigations from Brussels and the EU member states. We are an indispensable news source for anyone who wants to know what is going on in the EU.

We are mainly funded by advertising and subscription revenues. As advertising revenues are falling fast, we depend on subscription revenues to support our journalism.

For group, corporate or student subscriptions, please contact us. See also our full Terms of Use.

If you already have an account click here to login.

A new web-based optical character recognition (OCR) technology and online collaboration of institutions aims to help with recognition of texts with faded ink or unusually-shaped typefaces, which are currently scanned only as static images.

The project's researchers believes that the new system will provide between 25 percent and 50 percent greater accuracy than standard recognition programmes.

According to them, the online collaborative correction system hopes to attract volunteers to aid in the process similar to the unpaid editor army that corrects Wikipedia entries and then learn from errors that have been recognised by humans.

The new technology accelerates the process of locating questionable text scans and then enables reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination "r" and "n" can sometimes be difficult for a computer to distinguish from the letter 'm'. In these cases, the system collects a variety of instances of the letter 'm' and places these samples next to the letters in question, making it much easier to determine the letter's real identity.

And where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers then just accept or reject suggested substitutes with one keystroke.

Previously, a small book that normally takes four hours to key in manually or one hour using standard OCR technology with manual correction. But the new system cuts the process down to 30 minutes. Researchers believe they will soon be able to cut the time down to 15 minutes as the system enriches its dictionary, learning from the human volunteers.

Brussels and IBM announced on Thursday that they are to expand the new technology to some two dozen national libraries, research institutes, universities and companies, including the British Library, the German National Library and the Poznan Supercomputing and Networkign Centre in Poland.

The European Commission delivered fresh attention to digitisation last year after Google said it planned to make millions of books available on-line, a move that disquieted some European publishers and copyright owners.

Pressure mounts on EU cloud deal as deadline looms

The European Commission is under pressure to keep to its self-imposed September deadline to publish an EU cloud computing strategy, as new evidence revealed widespread public confusion about it.

News in Brief

  1. EU to keep 'Dieselgate' letter secret
  2. No deal yet on Mediterranean alliance for EU agencies
  3. EU Commission condemns Maltese journalist's murder
  4. Poland denies wrongdoing over forest logging
  5. Risk to asylum kids in EU increasing, says charity
  6. Schroeder warns of Turkey and Russia drifting towards China
  7. EU parliament wants equal pay for posted workers
  8. Catalan independence leaders taken into custody

Stakeholders' Highlights

  1. EU2017EENorth Korea Leaves Europe No Choice, Says Estonian Foreign Minister Sven Mikser
  2. Mission of China to the EUZhang Ming Appointed New Ambassador of the Mission of China to the EU
  3. International Partnership for Human RightsEU Should Seek Concrete Commitments From Azerbaijan at Human Rights Dialogue
  4. European Jewish CongressEJC Calls for New Austrian Government to Exclude Extremist Freedom Party
  5. CES - Silicones EuropeIn Healthcare, Silicones Are the Frontrunner. And That's a Good Thing!
  6. EU2017EEEuropean Space Week 2017 in Tallinn from November 3-9. Register Now!
  7. European Entrepreneurs CEA-PMEMobiliseSME Exchange Programme Open Doors for 400 Companies Across Europe
  8. CECEE-Privacy Regulation – Hands off M2M Communication!
  9. ILGA-EuropeHealth4LGBTI: Reducing Health Inequalities Experienced by LGBTI People
  10. EU2017EEEHealth: A Tool for More Equal Health
  11. Mission of China to the EUChina-EU Tourism a Key Driver for Job Creation and Enhanced Competitiveness
  12. CECENon-Harmonised Homologation of Mobile Machinery Costs € 90 Million per Year

Latest News

  1. Nepal troops arrive in Libya to guard UN refugee agency
  2. Is Banking Authority HQ the Brexit 'booby prize'?
  3. EU-Russia trade bouncing back - despite sanctions
  4. No sign of Brexit speed-up after May-Juncker dinner
  5. EU defence strategy 'outsourced' to arms industry
  6. EU privacy rules tilt to industry, NGO says
  7. Malta in shock after car bomb kills crusading journalist
  8. Spanish and Catalan leaders continue stand-off