A Face Match Is a Lead, Not a Verdict

Trevis Williams is eight inches taller and seventy pounds heavier than the man who committed the crime he was arrested for. His phone's location data put him on a highway driving from Connecticut to Brooklyn at the exact moment a different man was photographed flashing a woman in Manhattan's Union Square. Two months after that incident, officers arrested Williams anyway — because a facial recognition system flagged him as a match. He spent two days in jail. The case was dismissed.

TL;DR

A facial comparison result is a hypothesis, not a conclusion — and every documented wrongful arrest traced to facial AI failed at the same place: the corroboration step that should have come immediately after the match.

The technology didn't fail Williams. The workflow did. That's not a minor semantic distinction — it's the entire ballgame. And understanding exactly where that workflow breaks down, and how to build one that doesn't, is the difference between facial comparison being a powerful investigative tool and a civil liberties catastrophe.

What a "Match" Actually Means (It's Not What You Think)

Here's the thing most people get wrong the moment they see a facial comparison result: they read it as an answer. It isn't. It's a question with a very specific and useful probability attached to it.

Facial recognition systems don't "see" faces the way humans do. They measure. A modern algorithm maps up to 68 distinct data points — eye corners, nose bridge, jaw contours, the geometry between your pupils — and converts all of that into a mathematical vector. A "match" is what happens when two vectors are geometrically similar beyond a defined threshold. The system reports a similarity score. A score of 0.99 means the geometric profiles of two images are nearly identical. It does not mean the people in those images are the same person.

NIST's Face Recognition Vendor Testing (FRVT) program — the most rigorous independent evaluation of facial comparison algorithms in existence — is explicit about this. FRVT outputs are similarity scores, not identifications. That interpretive leap from "high similarity" to "same person" is a human and investigative responsibility. The algorithm hands you a candidate. You have to do the rest. This article is part of a series — start with Deepfake Detection Accuracy Gap Investigator Workf.

"The man they were looking for, he was eight inches shorter than me and 70 pounds lighter." — Trevis Williams, wrongfully arrested New York City resident, ABC7 New York

The real kicker? Even a near-perfect similarity score doesn't rule out an entirely different person. Twin studies and doppelgänger research have repeatedly demonstrated that unrelated individuals can produce near-identical geometric facial profiles. The math can be perfect. The wrong person can still end up in handcuffs. That's not an algorithm problem. That's what happens when you skip the next three steps.

Why Image Quality Makes Everything Worse

Now add surveillance footage into the equation and the problem compounds fast. The images driving most real-world facial comparison work — CCTV grabs, social media screenshots, field photography — are almost never the clean, well-lit, front-facing shots that algorithms perform best on.

30–50%

Drop in facial comparison accuracy when working with low-resolution or degraded source images versus controlled-condition photography

Source: NIST Face Recognition Vendor Testing (FRVT), 2019 benchmark

A 2019 NIST benchmark found that low-resolution images significantly degrade algorithm performance — not by a few percentage points, but by 30 to 50 percent in controlled studies. Think about what that means practically. A "strong" match derived from a grainy parking lot camera at 2 a.m. carries dramatically less statistical weight than it appears to on a results screen. The confidence display doesn't always adjust to tell you that. The investigator has to know to ask.

Lighting angle alone matters enormously. The same face photographed under different lighting conditions can produce similarity scores that vary wildly — not because the algorithm is broken, but because facial geometry as captured is a function of shadow, resolution, and angle, not just bone structure. This is why understanding the technical limitations of facial recognition software isn't just academic — it directly changes how you weight a result when you're deciding what to do next.

Delhi's policing experience makes this painfully concrete. An investigation by The Wire and the Pulitzer Center uncovered cases where individuals were arrested solely on the basis of facial recognition — without solid corroborating evidence or credible witness testimony. One man, Ali, spent more than four and a half years in pre-trial incarceration after being arrested in the aftermath of Delhi's 2020 riots based on a facial match. Four and a half years. Before trial. Previously in this series: Cctv Still To Court Ready Lead Facial Comparison D.

The Three Places Wrongful Arrest Cases Break Down

⚡ The match is treated as a conclusion — investigators stop investigating the moment a result appears, collapsing the corroboration step entirely
📊 Physical descriptors are ignored — documented cases consistently show mismatches in height, weight, and build that were available from victim statements but never cross-referenced against the matched candidate
🔮 Location and timeline data is skipped — cell tower records, transaction data, and alibi witnesses can disprove a match within hours, but only if someone thinks to check before an arrest is made

Trusted by Investigators Worldwide

Run Forensic-Grade Comparisons in Seconds

2 free forensic comparisons with full reports. Results in seconds.

Run My First Search →

The Three-Step Workflow That Actually Works

Georgetown Law's Center on Privacy & Technology has done formal analysis of documented facial comparison misidentification cases. Their finding is consistent across cases: the failure was procedural, not algorithmic. The workflow collapsed the corroboration step. So what does a workflow that doesn't collapse look like?

Think of a facial comparison result the way you'd think of a GPS pin drop. The GPS tells you approximately where to look. It does not confirm the address, verify the building number, or check that you're at the right door. Nobody arrests the GPS for sending them to the wrong street. The investigator is the last mile. Always.

Step one: AI face comparison. Run the comparison, review the similarity score, and treat the output as a shortlist of candidates — never a single confirmed identity. If the system returns a strong match, you now have a direction. That's genuinely useful. That's what the technology is for. But you are at the beginning of the investigative process, not the end.

Step two: Human review. A trained examiner — not the same person who ran the query — reviews the candidate result against the source image independently. This isn't redundancy for its own sake. It catches the errors that occur when confirmation bias sets in after a "strong" match result. The examiner should be asking: does this match hold up under different lighting conditions in the image? Are there distinguishing features the algorithm may have weighted incorrectly? Would I have landed on this candidate without the algorithmic result?

Step three: Independent corroboration. This is where cases are won or lost — and where Trevis Williams' case should have ended before he was ever arrested. Physical descriptors from the victim or witnesses must be cross-referenced against the candidate: height, weight, distinguishing marks, age range. Location data — cell tower records, transaction history, documented travel — must be checked against the timeline of the incident. At least one piece of independent evidence, wholly separate from the facial comparison, must place the candidate at the scene before any enforcement action is taken. Up next: Facial Comparison Triage Multi Camera Investigatio.

Notice what this workflow does. It doesn't distrust the technology. It uses the technology correctly — as a powerful narrowing tool — and then brings investigative discipline to bear on what the technology produced. Brazil's Polícia Civil do Distrito Federal offers a useful counterpoint: their biometric identification work achieves a 99 percent positive identification rate by integrating facial biometrics with fingerprint analysis and advanced latent print comparison — multiple independent evidence streams working together, not a single result standing alone.

Why This Protects Investigators, Not Just Suspects

Look, this isn't only about civil liberties — though that's obviously the most important part. It's also about what happens to investigators and agencies when the workflow fails. The NYPD is now facing demands for investigation from civil rights groups over the Williams arrest. Cases built on uncorroborated facial matches get dismissed, evidence gets suppressed, and affidavits get challenged. Prosecutors don't forget the agencies that handed them blown cases.

For solo investigators and private practitioners, the stakes are different but equally real. An affidavit that rests on a facial comparison without documented corroboration is an affidavit waiting to fall apart in cross-examination. The opposing attorney will ask one question: "And what independent evidence, other than the facial comparison result, places my client at the scene?" If the answer is nothing, the case is over.

Key Takeaway

A facial comparison result is the most useful first step in an identification workflow — and a dangerous last step. The technology narrows your candidate pool with mathematical precision. Corroborating that candidate with physical descriptors, timeline data, and independent evidence is what converts a lead into a case. Skip that step and you haven't used the technology wrong. You've just stopped using your judgment at the exact moment it matters most.

Here's the question worth sitting with: when you get a strong visual match between two photos, what is the very next piece of evidence you insist on before you trust it? If your answer is "another look at the photos," you're still inside the match. The corroboration that matters is everything outside it — the height, the timeline, the location, the witness. The algorithm got you to the door. Now go knock on it like a detective, not a machine.

A Face Match Is a Lead, Not a Verdict

What a "Match" Actually Means (It's Not What You Think)

Why Image Quality Makes Everything Worse

The Three Places Wrongful Arrest Cases Break Down

The Three-Step Workflow That Actually Works

Why This Protects Investigators, Not Just Suspects

Ready for forensic-grade facial comparison?

More Education

Deepfakes Fool Your Eyes in 30 Seconds. The Math Catches Them Instantly.

The Hidden Number That Decides if Your Biometric Door Opens

Age Verification Is a Lie: 3 Hidden Flaws That Make "Passed" Meaningless