CaraComp
Log inTry Free
CaraComp
Forensic-Grade AI Face Recognition for:
Start Free Trial
facial-recognition

99% Accurate Still Means Thousands of Wrong Arrests

When 99% Accurate Still Means Thousands of Wrong Arrests

Do the math. A system that is 99% accurate sounds, intuitively, like it's almost never wrong. Run it against one million comparisons — a realistic volume for any major metropolitan police database — and that 1% error rate quietly produces 10,000 false positives. Ten thousand times the system said yes when the correct answer was no. And somewhere inside that pile of errors, real people are getting arrested.

TL;DR

High headline accuracy rates in biometric systems are genuinely impressive — but the real investigative risk isn't the technology's error rate, it's investigators treating a single facial match as sufficient evidence to build an entire case on.

This isn't a theoretical problem. It's documented, it's recurring, and it follows a pattern so consistent you could almost call it a playbook — except it's a playbook for catastrophically bad investigative methodology.

The Celebration and the Contradiction

Last month, Biometric Update reported that Brazil's Polícia Civil do Distrito Federal — the PCDF — has achieved something genuinely remarkable. The Medical Examiner's Office in Brasília now processes over 1,700 bodies per year and has reached a 99% positive identification rate using Innovatrics ABIS fingerprint analysis, a system combining fingerprints, face biometrics, and advanced latent print analysis. They used it to crack cold cases. They identified murder victims in the 2023 Itapuã family murder case even as the killers had attempted to speed up decomposition to defeat forensic identification. That's the technology working exactly as intended — powerful, precise, serving justice.

Hold that image. Now travel to Delhi.

An investigation by The Wire and the Pulitzer Center uncovered something that sits in uncomfortable contrast to that Brazilian success story. In the early hours of a March morning in 2020, a man named Ali was arrested in the narrow alleys of Chand Bagh, a poor locality in Northeast Delhi. The evidence connecting him to the alleged crime? A facial recognition match. What came next was more than four and a half years of pre-trial incarceration — trapped in procedural limbo, waiting for a bail decision that would take years to arrive. This article is part of a series — start with Why Youre Looking At The Wrong Part Of Every Face.

The Pulitzer Center investigation found that Ali's case was not an anomaly. It was part of a documented pattern: "individuals were arrested solely on the basis of facial recognition — without solid corroborating evidence or credible public witness testimonies." No independent evidence. No corroboration. Just a match — and then handcuffs.

4.5+
Years of pre-trial incarceration for Ali in Delhi, arrested solely on a facial recognition match with no corroborating evidence
Source: The Wire / Pulitzer Center Investigation, July 2025

New York Already Wrote This Chapter

Delhi isn't writing a new story. New York already did. ABC7 New York documented how a wrongful arrest put the NYPD's use of facial recognition under intense scrutiny — a case that followed the same structural failure: a facial match treated as confirmation, an investigation that stopped gathering corroborating evidence once the algorithm said yes, and a person detained on technology's word alone.

Over 100 U.S. police departments now subscribe to facial recognition services, according to The Regulatory Review. Modern systems measure up to 68 distinct facial datapoints — eye corners, nose bridge, jaw contours — to generate a faceprint comparison. The technology itself is not in question here. What's in question is the investigative culture around what happens after a match comes back positive.

"An investigation by The Wire and the Pulitzer Center uncovered troubling instances where individuals were arrested solely on the basis of facial recognition — without solid corroborating evidence or credible public witness testimonies." — Astha Savyasachi, Pulitzer Center
Trusted by Investigators Worldwide
Run Forensic-Grade Comparisons in Seconds
2 free forensic comparisons with full reports. Results in seconds.
Run My First Search →

The Methodology Failure Nobody Wants to Talk About

Here's the thing that gets buried in every conversation about facial recognition accuracy: the technology didn't fail in any of these wrongful detention cases. The system returned a match. Maybe the match was even correct at a technical level — same face, different person in the wrong place. The failure happened after the result came back, in the room where investigators decided what to do next.

There's a well-documented psychological phenomenon at work here — call it authority bias applied to algorithms. When a system reports a "high confidence" match, investigators unconsciously shift from a posture of investigation to a posture of confirmation. The algorithm's output becomes the anchor, and everything after is filtered through the assumption that the suspect is already identified. Independent evidence-gathering slows. Alternative leads get deprioritized. The match becomes the case. Previously in this series: Face Search Vs Facial Comparison Why The Legal Lin.

The National Institute of Standards and Technology has published guidance on exactly this failure mode, emphasizing that biometric matches should function as investigative leads — a starting point, not a destination. Some U.S. jurisdictions are now codifying this into policy. (The fact that it needs to be codified tells you something about how common the opposite practice is.)

The counterargument often raised by proponents of the technology is worth taking seriously: facial recognition, even with its error rate, outperforms eyewitness testimony, which carries a documented misidentification rate exceeding 25%. That's true. But "better than eyewitness testimony" is a remarkably low bar to clear, and clearing it doesn't make a single data point courtroom-ready on its own. Better than the worst evidence type isn't the same as sufficient evidence.

Why This Matters Right Now

  • Scale amplifies the math problem — At 1 million comparisons, a 99% accurate system still generates 10,000 false positives; most investigators never see that number presented next to the "99%" headline
  • 📊 Lab accuracy ≠ field accuracy — Headline rates are measured under controlled conditions, not against the partial-angle, variable-lighting images that street cameras and CCTV actually produce
  • ⚖️ The liability gap is widening — As documented wrongful detention cases accumulate across multiple jurisdictions, the question in court is shifting from "did the system match?" to "was this the only evidence?"
  • 🔍 Binary outputs are the wrong format — A yes/no match result gives investigators none of the probabilistic context they need to calibrate how much weight it should carry relative to other evidence

The Output Format Is Part of the Problem

Dig into the technical side of this and a specific issue emerges. Many deployed systems return a binary result — match or no match — with a confidence label like "high" or "very high" attached. That sounds informative. It isn't, really, because it strips out the gradient. It tells an investigator the system is confident without telling them how that confidence was calculated, what the score differential was between the top candidate and the second candidate, or how that specific comparison performed relative to the system's baseline error rate for similar image quality.

Systems that return probability scores using something like Euclidean distance scoring — a quantified confidence gradient rather than a label — give investigators an actual number they can reason about and, critically, explain in a courtroom. "The system returned a match" is a statement. "The system returned a match with a confidence score placing it in the top 0.01% of all comparisons in this database, and we then verified against three independent corroborating sources" is a case.

This is precisely why understanding the specific limitations of facial recognition software in operational contexts matters more than any headline accuracy statistic — the difference between a tool that starts an investigation and one that prematurely ends it often comes down to what kind of output the system returns and what protocols govern how that output gets used. Up next: Law Enforcement Facial Recognition Regulation Docu.

"The Medical Examiner's Office in Brasília can point to a 99 percent positive identification rate using fingerprint analysis as it examines over 1,700 bodies each year. This impressive identification rate rests not only on expertise but on the integration of modern biometric technologies incorporating fingerprints, face biometrics and advanced latent print analysis." — Lu-Hai Liang, Biometric Update

Notice something in that Brazil story: the success isn't just the biometric technology — it's the integration of multiple biometric tools working together. Fingerprints, face biometrics, latent print analysis. No single modality carrying the whole case. That's not an accident. That's exactly the methodology that produced a 99% identification rate instead of a 99% wrongful accusation rate.

Key Takeaway

A facial recognition match is an investigative lead — the beginning of a case, not the end of one. The headline accuracy rate of a biometric system tells you almost nothing about the risk you're accepting when that single match becomes the only evidence connecting a person to a crime. The technology isn't the liability. The methodology is.


Every investigator who has sat in a courtroom being cross-examined knows there is one question defense counsel will always ask. It doesn't matter what the technology is or how accurate the system claims to be. The question is always the same: "Was this the only evidence connecting my client to this event?"

Ali spent four and a half years in pre-trial detention in Delhi waiting for someone to answer that question correctly. The tragedy isn't that the facial recognition system was wrong. The tragedy is that nobody stopped to ask whether it needed to be right on its own.

Ready for forensic-grade facial comparison?

2 free comparisons with full forensic reports. Results in seconds.

Run My First Search