Friday

2nd Dec 2022

EU book digitisation project needs 'Wikipedia'-style army of volunteer editors

  • Significant European historical texts will be accessible and searchable online more rapidly (Photo: Flickr.com)

An EU partnership with Israeli researchers from US computing firm IBM to digitise major European historical texts is seeking volunteers to help boost the accuracy of scanned texts in a process that will reduce from hours to just minutes the amount of time it currently takes to digitise documents.

The goal of the digitisation project, dubbed Impact (ImProving ACcess to Text), is to increase the accuracy of scanned texts and also editable and searchable online.

Read and decide

Join EUobserver today

Become an expert on Europe

Get instant access to all articles — and 20 years of archives. 14-day free trial.

... or subscribe as a group

A new web-based optical character recognition (OCR) technology and online collaboration of institutions aims to help with recognition of texts with faded ink or unusually-shaped typefaces, which are currently scanned only as static images.

The project's researchers believes that the new system will provide between 25 percent and 50 percent greater accuracy than standard recognition programmes.

According to them, the online collaborative correction system hopes to attract volunteers to aid in the process similar to the unpaid editor army that corrects Wikipedia entries and then learn from errors that have been recognised by humans.

The new technology accelerates the process of locating questionable text scans and then enables reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination "r" and "n" can sometimes be difficult for a computer to distinguish from the letter 'm'. In these cases, the system collects a variety of instances of the letter 'm' and places these samples next to the letters in question, making it much easier to determine the letter's real identity.

And where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers then just accept or reject suggested substitutes with one keystroke.

Previously, a small book that normally takes four hours to key in manually or one hour using standard OCR technology with manual correction. But the new system cuts the process down to 30 minutes. Researchers believe they will soon be able to cut the time down to 15 minutes as the system enriches its dictionary, learning from the human volunteers.

Brussels and IBM announced on Thursday that they are to expand the new technology to some two dozen national libraries, research institutes, universities and companies, including the British Library, the German National Library and the Poznan Supercomputing and Networkign Centre in Poland.

The European Commission delivered fresh attention to digitisation last year after Google said it planned to make millions of books available on-line, a move that disquieted some European publishers and copyright owners.

Phone spying scandal exposes 'impotent' Europe, says lead MEP

Democracy in Europe is being undermined by alleged government-led spyware on citizens, journalists and politicians, says Dutch liberal MEP Sophie In't Veld, who is lead report writer for a European Parliament probe into the abuse.

Cyber-risk from Internet of Things prompts new EU rules

With evermore connected devices on the market, new EU rules aim to minimise cybersecurity risks from innocuous household appliances and industrial operating systems — amid concern over the increasing number of cyberattacks and their cost for companies.

EU parliament spyware inquiry eyes Italian firms

An investigation by Lighthouse Reports and media partners including EUobserver found Italian firms Tykelab and RCS Lab were using surreptitious phone network attacks and sophisticated spyware against targets. The findings have spiked the interest of MEPs already probing spyware abuse.

Investigation

NSO surveillance rival operating in EU

As European Parliament hearings into hacking scandals resume this week, an investigation led by Lighthouse Reports with EUobserver, Der Spiegel, Domani and Irpimedia reveals the unreported scale of operations at a shady European surveillance outfit.

Portugal was poised to scrap 'Golden Visas' - why didn't it?

Over the last 10 years, Portugal has given 1,470 golden visas to people originating from countries whose tax-transparency practices the EU finds problematic. But unlike common practice in other EU states with similar programmes, Portugal has not implemented "due diligence".

Stakeholders' Highlights

  1. Nordic Council of MinistersCOP27: Food systems transformation for climate action
  2. Nordic Council of MinistersThe Nordic Region and the African Union urge the COP27 to talk about gender equality
  3. International Sustainable Finance CentreJoin CEE Sustainable Finance Summit, 15 – 19 May 2023, high-level event for finance & business
  4. Friedrich Naumann Foundation European DialogueGender x Geopolitics: Shaping an Inclusive Foreign Security Policy for Europe
  5. Obama FoundationThe Obama Foundation Opens Applications for its Leaders Program in Europe
  6. EFBWW – EFBH – FETBBA lot more needs to be done to better protect construction workers from asbestos

Latest News

  1. EU must break Orbán's veto on a tax rate for multinationals
  2. Belarus dictator's family loves EU luxuries, flight data shows
  3. How Berlin and Paris sold-out the EU corporate due diligence law
  4. Turkey's EU-funded detention centres ripe with abuse: NGO
  5. In green subsidy race, EU should not imitate US
  6. EU Commission proposes suspending billions to Hungary
  7. EU: Russian assets to be returned in case of peace treaty
  8. Frontex leadership candidates grilled by MEPs

Join EUobserver

Support quality EU news

Join us