CaraComp
Log inTry Free
CaraComp
Forensic-Grade AI Face Recognition for:
Start Free Trial
facial-recognition

UK Cops Scanned 1.7M Faces. The Algorithm Won't Hold Up in Court.

UK Cops Scanned 1.7M Faces. The Algorithm Won't Hold Up in Court.

Here's something that should stop you mid-scroll: The Metropolitan Police scanned more than 1.7 million faces in the first months of 2026 alone — an 87% rise over the same period in 2025. Not a pilot. Not a trial. Routine operations. And somewhere in those 1.7 million scans, buried in fractions of a second, an algorithm is making a decision about whether your face belongs on a list.

TL;DR

Live facial recognition — now operational across 13 of 43 UK police forces — is a speed-and-volume watchlist tool that works nothing like the expert forensic comparison investigators use when analyzing case images, and mixing them up creates serious problems for evidence quality.

Thirteen of the 43 police forces in England and Wales now run live facial recognition as standard operational infrastructure, according to UK Parliament's POST briefing on facial recognition in policing. That's a significant jump from the handful of forces cautiously running camera trials just a few years ago. But the adoption numbers, while striking, aren't actually the most important thing to understand here. What matters — especially for investigators, legal professionals, and anyone who deals with facial image evidence — is that live facial recognition and forensic facial comparison are not two versions of the same thing. They're different technologies, built for different problems, with different failure modes. And right now, those differences are being systematically blurred.


How Live Facial Recognition Actually Works

Picture a busy shopping street in London. Mounted cameras scan the crowd continuously, pulling a face from the stream every fraction of a second. Each captured image is converted into a numerical representation — a mathematical map of facial geometry — and compared against a pre-loaded watchlist drawn from national police databases. Suspects wanted by courts. People on bail conditions. Missing persons, sometimes. This comparison happens fast. The system flags potential matches above a similarity threshold and routes them to a human operator for visual review. If the operator agrees there's a match, an investigating officer takes a second look before any action is taken.

That's the operational chain. Notice how many human checkpoints exist — because the algorithmic match is a lead, not a verdict. The system running this process is doing what's called one-to-many matching: one live face checked against thousands of stored templates simultaneously. Speed is everything. Precision at individual level matters less than throughput across the crowd.

The accuracy thresholds are set deliberately conservative. According to Biometric Update, UK forces operate at similarity thresholds between 0.6 and 0.64 — the Metropolitan Police sits at 0.64. At those settings, in documented deployments, 2,067 of 2,077 potential alerts resulted in confirmed true matches, producing a false positive rate of roughly 0.0003%. On the surface, that sounds almost impossibly accurate. This article is part of a series — start with Deepfakes Outpacing Governance Authenticity Triage Crisis.

4.7M
faces scanned by UK police live facial recognition cameras in 2024 — more than double the 2023 figure
Source: Liberty Investigates

But here's where scale turns good-sounding numbers into a real-world problem. Apply 0.0003% to 1.7 million scans and you're still generating dozens of false alerts requiring officer investigation — real people stopped, checked, and sent on their way. Each of those interactions has a cost. And that's before you factor in the bias data, which is considerably less comfortable than the headline accuracy figures.


The Bias Problem Hiding Inside the Accuracy Numbers

Testing on facial recognition technology used by UK forces found that Black women were subject to the highest percentage of false positive identifications — 9.9% at a 0.8 similarity threshold — according to Liberty Investigates. A 2025 assessment of retrospective facial recognition algorithms showed higher false positive rates for faces of Black and Asian individuals. This isn't a surprising finding to anyone who has studied how these systems are built — if the training data skews toward one demographic, the model learns that demographic's features with more precision and everything else less so.

"If the data used to train AI lacks diversity, it can internalize bias in algorithms, which can in turn affect FRT systems used by police forces and have real-world effects." — UK Parliament POST Briefing, Parliamentary Office of Science and Technology

The phrasing "real-world effects" is doing a lot of quiet work in that sentence. Real-world effects means real people incorrectly flagged, stopped, and questioned — disproportionately from communities that are already over-policed. The aggregate accuracy figure of 0.0003% doesn't capture this distribution. A system can be highly accurate overall and still be systematically wrong about specific groups. That's not a paradox; it's just how averages work when the underlying distribution isn't uniform.


Trusted by Investigators Worldwide
Run Forensic-Grade Comparisons in Seconds
2 free forensic comparisons with full reports. Results in seconds.
Run My First Search →

Forensic Facial Comparison: A Completely Different Discipline

Now flip the scenario entirely. A crime has occurred. There's CCTV footage of a suspect's face. An investigator wants to know if that face matches a specific person of interest. This is forensic facial comparison — and it has almost nothing in common with the live scanning system described above.

Think of it this way. Live facial recognition is like a border checkpoint with a wanted-person list, scanning every traveler's face against posted notices in real time. Forensic facial comparison is like a detective examining two photographs under a magnifying lamp, measuring the proportion of eye socket width to chin length, noting the exact curve of the ear helix, looking for consistent or inconsistent features across dozens of reference points. One is speed and volume. The other is precision and judgment. They answer different questions entirely.

The retrospective side of this work is enormous. UK government figures show police forces carry out over 25,000 facial image searches every month on the Police National Database. The number of retrospective facial recognition searches nearly doubled from 138,720 in 2023 to 252,798 in a single year. This is the investigator's tool — post-event image matching — and it's being used at vastly higher rates than any live deployment. Yet it receives a fraction of the public debate. Previously in this series: Deepfakes Just Cost One Firm 25m Your Investigation Could Be.

Research published in Nature Scientific Reports found that trained forensic facial examiners don't just outperform algorithms on difficult comparison tasks — they outperform fingerprint examiners and untrained participants too. Their advantage isn't speed. It's that they are slow, deliberate, and strategically avoid the kinds of errors that come from overconfident pattern recognition. That's a feature, not a bug. Expert forensic comparison is built around controlled decision-making under uncertainty, not volume throughput.

The average person — including many professionals not specifically trained in facial comparison — makes errors on 20 to 30% of unfamiliar face identification tasks, according to research cited in PMC/NIH. Passport officers have shown similar error rates in controlled studies. The expertise of a trained forensic facial examiner is genuinely specialized and not easily replicated by either an untrained human or an algorithm optimized for something else.


The Misconception That Can Wreck a Case

Here's the belief that causes the most damage in practice: "If the algorithm returned a 99% match confidence, the identification is reliable." It's an understandable assumption. The number sounds authoritative. The machine sounds objective. And honestly, if a human said they were 99% sure, you'd probably trust them.

But accuracy scores describe algorithm performance under controlled conditions — clean images, frontal angle, good lighting, matched demographics in training data. Real investigations involve CCTV footage shot from above at 15 frames per second in a dimly lit car park. The algorithm's controlled-condition accuracy doesn't transfer automatically to that scenario.

Worse, human and algorithmic errors compound. Georgetown Law's Center on Privacy and Technology documents a case where an officer manually copied facial features from high-resolution images and pasted them onto a low-quality suspect photo in software before running a database search. The algorithm then operated on manipulated input. The human error upstream contaminated the algorithmic output downstream — and the reported match confidence said nothing about this. An accuracy metric on a clean benchmark tells you nothing about what happened in that specific workflow. Up next: Deepfakes Just Cost One Firm 25m Your Investigation Could Be.

What You Just Learned

  • 🧠 Live systems are one-to-many, forensic is one-to-one — different architectures built for different questions, with different failure modes at every step
  • 🔬 Tiny error rates become real problems at scale — 0.0003% applied to 1.7 million scans still produces dozens of false stops, unevenly distributed across demographics
  • 📋 Retrospective searches dwarf live deployments in volume — over 252,000 database searches in one year, yet this gets far less scrutiny than cameras in the street
  • 💡 Algorithm confidence scores don't absorb human error — a manipulated input going into a clean algorithm still produces a corrupted output, and the match score won't tell you that

At CaraComp, we work at the intersection of these two disciplines every day — and the distinction between live detection and forensic comparison isn't academic. It shapes what evidence means, how it should be presented, and what questions an investigator should be asking before they accept a facial match as a lead worth acting on.

Key Takeaway

Live facial recognition and forensic facial comparison are not the same technology used at different speeds — they are structurally different tools built for different investigative purposes, with different accuracy characteristics, different bias profiles, and different standards for what "a match" actually means. An investigator who treats them as interchangeable will eventually get burned by the difference.

The real shift happening right now isn't just that more police forces are deploying cameras. It's that the use cases are separating — live detection, retrospective database search, and expert one-to-one comparison are increasingly treated as distinct disciplines by agencies, regulators, and courts. That separation is overdue, and it will matter enormously as the case volumes keep climbing.

So here's the question worth sitting with: If a court is shown a facial match from a retrospective database search run on CCTV footage, and the presenting officer doesn't know whether the algorithm was optimized for live detection or forensic comparison — does anyone in that courtroom actually know what the confidence score means? Because right now, in a significant number of cases, the honest answer is no.

As live facial recognition expands, what do you think investigators need most: better public understanding of the tech, clearer legal guardrails, or stronger standards for when each type of facial analysis should be used? Drop your perspective in the comments — this is a conversation worth having before the case volumes make it unavoidable.

Ready for forensic-grade facial comparison?

2 free comparisons with full forensic reports. Results in seconds.

Run My First Search