Every day, billions of photos are shared online. These photos can be scraped and end up in facial recognition databases owned by commercial firms, without people’s consent or knowledge.
The practice, which allows police and intelligence services to identify individuals of interest by comparing images with biometric facial features held in the databases, is not always regulated.
For example, the UK’s data privacy law has been deemed inapplicable to foreign firms offering this service to foreign government agencies. However, the Information Commissioners Office (ICO) plans to appeal this decision despite being denied permission to do so on its first attempt.
But EU authorities have taken a hardline stance, by imposing multiple sanctions on Clearview AI, which, according to publicly available information, have not been challenged by the company.
Scraping is an automated extraction of information from websites and social media using programmes known as web crawlers. Crawlers harvest, analyse and structure photos, videos, and associated metadata (e.g., links, and data suggesting location and activities).
Companies like Clearview AI, with over 50 billion photos in its database and PrimeEyes, extensively exploit scraping for their facial recognition services.
Generally, scraping information from the web and social media can promote competition by helping businesses track pricing trends and consumer preferences to improve their products and services.
Journalists and researchers can leverage scraping for public good, for instance, by gathering data revealing corruption, environmental pollution, and hate speech, while technology companies could use scraped data to train AI models that benefit society.
However, scraping also carries significant risks.
It raises serious privacy concerns, with people’s biometric data being processed without their consent. Privacy laws require scrapers to demonstrate a lawful basis for processing personal data, such as consent, and comply with rules relating to the rights to access, rectification, erasure, and transparency.
Last year, the French Data Protection Authority (CNIL) sanctioned KASPR (a browsers extension) for scraping LinkedIn users’ contact details, in violation of consent and transparency requirements, and restrictions on data retention.
But not all authorities successfully enforce privacy laws against scrapers. The inaccuracy of facial recognition also puts anyone whose photos are stored in a facial recognition database at risk.
In the US, there have been many documented wrongful arrests of innocent people misidentified by the technology. Last, scraping deepens surveillance, discouraging the free flow of information and sharing of personal data necessary for socio-political and civic engagements.
The EU AI Act bans the untargeted scraping of online photos, videos and CCTV footage to create facial recognition databases. Several EU authorities have imposed prohibitive sanctions against Clearview AI for GDPR violations, including €20m fines in France, Italy, and Greece, each and €30.5m in the Netherlands.
In the UK, the ICO’s £7.5m fine was overturned on appeal.
The UK privacy rules, like its EU counterpart, apply to foreign entities, except foreign government agencies, if their data processing relates to behavioural monitoring of people in the UK.
The ICO concluded that both creating facial recognition profiles and the subsequent use of the database by police to identify people involve behavioural monitoring.
However, the appellate body ruled that the creation of profiles alone does not count as behavioural monitoring, as the automated processing of creating facial biometric data, potentially along with other metadata, does not reveal anything about the targeted person’s behaviour.
Although the appellate tribunal agreed that the subsequent use of Clearview AI’s facial recognition database could involve behavioural monitoring, it insisted that the ICO lacks jurisdiction over Clearview AI, because the GDPR does not cover activities of foreign government agencies.
The French authority interprets behavioural monitoring to include the use of profiling techniques that assess personal aspects such preferences, behaviour, or location, making the creation of the facial recognition database itself profiling and thus behavioural monitoring.
EU authorities don’t consider the end users of the database, i.e., government agencies, to be relevant to determining their jurisdictions. This seems reasonable.
Foreign surveillance technology firms providing purely commercial services should not bypass the law by posing as government agencies. The GDPR excludes foreign government agencies from its scope, based on the principle that states do not subject foreign governments to their domestic laws.
Such a flawed interpretation of the GDPR enables foreign firms to carry out surveillance activities without any accountability mechanisms
However, this doctrine is now being exploited by commercial firms, by stretching the definition of foreign government activities to include purely commercial services.
Such a flawed interpretation of the GDPR enables foreign firms to carry out surveillance activities without any accountability mechanisms. The UK could extend the upcoming Digital Information and Data Protection Law to foreign firms that conduct scraping in the UK whether or not they conduct behavioural monitoring, making it irrelevant that foreign government agencies are their end users. Nevertheless, this does not seem to be a priority as of now.
On an individual level, adjusting privacy settings on social media and installing anti-crawler extensions could mitigate scraping. Websites and social networks can deploy anti-crawler technologies to tackle scraping.
Unfortunately, their efforts in this regard hinges on whether scraping threatens their business interests, with privacy taking a back seat.
Scraping requires a careful regulatory response. Banning it disproportionately stifles innovation and business, while leaving it unchecked poses significant risks.
Scraping fundamentally conflicts with privacy principles such as consent, transparency, purpose limitation and data minimisation. Personal data is scrapped on a mass scale without our knowledge making it infeasible to obtain individuals’ consents or be transparent.
As the technology extracts data automatically potentially usable for multiple purposes, it is also nearly impossible to minimise the data collected or to limit its purpose.
Thus, complying with existing data protection laws is considerably challenging, requiring a new approach to regulating scrapping.
Prominent privacy scholars, Daniel Solove and Woodrow Hartzog propose allowing scraping only when a clear public benefit can be demonstrated by the scraper, without requiring consent or other legal grounds, suggesting prohibition in all other cases.
This may provide a realistic path forward.
Dr Asress Adimi Gikay is senior lecturer in AI, disruptive innovation and law at Brunel University of London. He is also a board member of the Centre for AI: Social and Digital Innovation.
Dr Asress Adimi Gikay is senior lecturer in AI, disruptive innovation and law at Brunel University of London. He is also a board member of the Centre for AI: Social and Digital Innovation.