CaraComp
Log inTry Free
CaraComp
Forensic-Grade AI Face Recognition for:
Start Free Trial
Podcast

A 95% Match Score Sounds Certain. Here's the 3-Filter Process That Actually Makes It Trustworthy

A 95% Match Score Sounds Certain. Here's the 3-Filter Process That Actually Makes It Trustworthy

A 95% Match Score Sounds Certain. Here's the 3-Filter Process That Actually Makes It Trustworthy

0:00-0:00

This episode is based on our article:

Read the full article →

A 95% Match Score Sounds Certain. Here's the 3-Filter Process That Actually Makes It Trustworthy

Full Episode Transcript


A facial match takes under two hundred and fifty milliseconds. A quarter of a second. But in that sliver of time, three separate filters decide whether the result you're looking at means anything at all — and most people never see any of them.


Trusted by Investigators Worldwide
Run Forensic-Grade Comparisons in Seconds
2 free forensic comparisons with full reports. Results in seconds.
Run My First Search →

If you work anywhere near facial recognition — law

If you work anywhere near facial recognition — law enforcement, access control, corporate security — you've probably seen a confidence score pop up on screen. Maybe it said zero point nine five. And you probably assumed that meant ninety-five percent trustworthy. That assumption is wrong, and it's wrong in a way that can tank an investigation or lock out the right person. Over the next few minutes, we're going to walk through the three-filter process that actually determines whether a match is reliable — quality assessment, threshold tuning, and human review. So what really happens inside that quarter-second?

The first filter kicks in before the algorithm even touches your photo. It's called quality assessment, and it acts like a bouncer at the door. If your image has bad lighting, weird angles, or over-exposure, the system rejects it outright. No match attempt. No score. Just a rejection. And that quality check itself makes two kinds of mistakes. It can falsely reject a perfectly fine image, which wastes time and money. Or it can falsely accept a bad image, which poisons every result downstream. According to N.I.S.T.'s Face Analysis Technology Evaluation, poor photography can even create demographic effects. Under-exposure makes dark-skinned faces harder to match. Over-exposure does the same for fair-skinned faces. Even camera pitch matters — if the lens isn't adjusted for someone very tall or very short, the geometry shifts enough to degrade accuracy. So the photo you feed in isn't just a photo. It's the ceiling on how good your result can ever be.

Now, assume the image passes. The algorithm runs and spits out a confidence score — a number between zero and one. And this is where the biggest misconception lives. Most investigators treat that score like a grade on a test. A zero point nine five feels like an A. It's easy to see why. The number looks like a percentage, and we're trained to think higher means better. But that score isn't a verdict. It's a probability estimate. What actually determines whether zero point nine five means anything is the threshold you've set underneath it. If your system's threshold sits at zero point five zero, you'll flag a lot of matches — but according to Microsoft's technical documentation on facial recognition, you'd misidentify roughly one face out of every ten. Crank that threshold up to zero point nine nine nine, and your false match rate drops to about one in a million. Same algorithm. Same faces. Completely different reliability. The tradeoff? Every time you tighten the threshold to block false matches, you create more false rejections — genuine matches the system now misses. It's like adjusting your speed on the road. You slow down through turns for safety, and you accelerate on the straightaway when you need to pass. Security teams slow down — they set strict thresholds because a false match is catastrophic. Convenience-focused systems speed up — they loosen thresholds because locking out a real user is the bigger problem. Neither setting is universally right. The threshold has to match the consequence.

So how good are these systems when everything's dialed in? According to N.I.S.T.'s Face Recognition Vendor Test, the top-performing identification system hit ninety-nine point eight eight percent authentication accuracy against a database of twelve million people. And the failure rate for searches dropped from five percent in twenty ten to just zero point two percent by twenty eighteen. That's a twenty-five-fold improvement in eight years. But N.I.S.T. runs those tests under controlled conditions — consistent lighting, good cameras, stable networks. Real-world performance varies enormously based on the algorithm you choose, the hardware you deploy, and the environment you're operating in. N.I.S.T. itself notes that capability across the industry remains a very wide spread. The best algorithm and the worst algorithm aren't even playing the same sport.


The Bottom Line

That's why the third filter exists — human review. Even after quality assessment and threshold tuning, a trained examiner looks at the result. The algorithm narrows the field. The human makes the call.

The score you see on screen isn't the answer. It's the last step of a three-stage process — and the two stages you never see are the ones that determine whether that number is worth trusting.

Every facial match result passes through three gates. First, a quality check decides if the photo is even usable. Second, a threshold setting controls how many false matches you're willing to accept. Third, a human reviewer confirms what the algorithm suggested. Skip any one of those gates, and your ninety-five percent score could mean almost nothing. Next time you see a confidence number on screen, don't ask how high it is. Ask what threshold produced it — and whether a human ever looked at the result. The full story's in the description if you want the deep dive.

Ready for forensic-grade facial comparison?

2 free comparisons with full forensic reports. Results in seconds.

Run My First Search