CaraComp
Log inStart Free Trial
CaraComp
Forensic-Grade AI Face Recognition for:
Start Free Trial
digital-forensics

A 98% Match Score Can Still Mean a Fake: Why Liveness Detection Must Come First

A 98% Match Score Can Still Mean a Fake: Why Liveness Detection Must Come First

Picture a lab bench with 1,500 attack attempts lined up and ready to go. Silicone masks molded to real faces. 3D-printed facial shells. Latex overlays. And AI-generated deepfake videos, indistinguishable to the naked eye from genuine captures. One biometric system. One question: how many of these does it let through?

According to testing reported by Biometric Update on Neurotechnology's MegaMatcher platform, the answer was zero. Not "nearly zero." Zero fake acceptances across all 1,500 sophisticated attack attempts — while incorrectly rejecting only 1 out of 550 legitimate users. That number matters more than it looks, and we'll come back to it. But the bigger story isn't the score. It's what this test reveals about a fundamental error in how investigators have been evaluating evidence for the past decade.

TL;DR

A facial match score measures similarity between faces — not whether the face in your evidence is a real human capture. In a world of deepfakes and synthetic media, authenticity verification must happen first, or you're comparing a fiction to a fact and calling it forensics.

The Mistake Everyone Is Making (And Why It Made Sense Until Now)

For years, the investigator's mental model was clean and logical: collect the image or video, run facial comparison, get a confidence score, act on the result. A 95% match meant you likely had your person. A 60% match meant look elsewhere. The score was the answer.

Here's the problem. That workflow was designed for a world where the threat was wrong identity — was this the same person in two images? It was never designed to answer a different question entirely: is this image a genuine human capture at all? These are two separate questions requiring two separate tools, and conflating them is now an active liability.

A deepfake video can achieve a 98% similarity score against a real person in your database. The algorithm is doing exactly what it was built to do — comparing geometric relationships between facial landmarks with remarkable precision. It just has no mechanism to notice that the face it's measuring was assembled by a generative AI model at 2 a.m. and never existed in front of a camera. Similarity and authenticity are orthogonal. Always have been. We just didn't need to care about that distinction until synthetic media became cheap and convincing enough to show up in evidence files. This article is part of a series — start with Deepfake Detection Accuracy Gap Investigator Workf.


What PAD Levels Actually Test (This Is the Part Nobody Explains)

The industry has been quietly building a tiered defense system against exactly this problem. It's called Presentation Attack Detection, governed by the ISO/IEC 30107-3 standard, and it has three levels — each one calibrated to a different class of threat. Most people have heard the phrase "liveness detection" without ever understanding that there are meaningfully different grades of it.

Level 1 covers basic presentation attacks: printed photographs, phone or monitor screen replays, low-effort masks. If someone holds a photo of your CEO up to a camera to spoof an authentication system, Level 1 catches it. This is table stakes. Most commercial systems pass Level 1.

Level 2 is where it gets serious. Testing at this tier involves 2D paper masks with eye cutouts, curved 3D surface projections, balaclava-style overlays, shallow fakes (partial video manipulation), and AI-generated deepfakes. The attack artifacts at Level 2 are expensive and sophisticated — these aren't things someone improvises. They're professional fraud attempts. Passing Level 2 means a system has been tested against the actual threat profile that investigators and enterprise security teams face right now.

Level 3 enters lab-grade territory: hyper-realistic silicone prosthetics, fully synthetic faces generated by high-end models, attacks that would require significant resources to mount. Most operational deployments don't yet require this tier, but it exists and the standards are clear.

Here's the critical thing to understand: Level 1 certification tells you nothing about Level 2 performance. These aren't progressive grades on the same scale — they test fundamentally different attack categories. A system that blocks every photograph attack may completely fail against a competent deepfake. The two test regimes don't overlap. An investigator asking "does my tool have liveness detection?" without asking "what level is it certified for?" is roughly like asking "does this car have brakes?" without asking if they work above 20 mph.

50B+
face liveness detection transactions projected annually by 2027
Source: HyperVerge liveness detection market analysis

That number — more than 50 billion liveness checks annually by 2027, up roughly 250% from 2025 totals, according to HyperVerge's market research — tells you something important about industry direction. Authenticity checking is no longer a niche concern for biometric hardware vendors. It's being baked into financial onboarding, border control, access management, and increasingly, forensic workflows. The field has already voted. The question is whether individual investigators have updated their practice to match. Previously in this series: Brain Detects Deepfakes Facial Landmarks Visual In.


Trusted by Investigators Worldwide
Run Forensic-Grade Comparisons in Seconds
Full platform access for 7 days. Run real searches — no credit card, no commitment.
Run My First Search →

The Crash Test Analogy That Makes This Click

Think of PAD certification tiers like vehicle crash-test ratings. Level 1 is the 5-mph barrier test — it verifies that the car handles low-speed parking impacts without structural damage. Every modern vehicle passes this. Level 2 is the 35-mph frontal collision, the one that actually determines whether occupants survive. These aren't the same test with a higher number attached. They measure completely different structural properties under completely different conditions.

A tool certified for Level 1 passes the basic test but will crumple at Level 2 attack speeds. An investigator can't compensate for this with skill or caution — if the tool's architecture wasn't built to detect mask artifacts and frequency-domain deepfake signatures, careful examination of the match score won't save you. The car's structure determines survivability. Your analysis workflow determines evidentiary validity.

"The MegaMatcher SDK and Toolkit now include Presentation Attack Detection (PAD) algorithms that have been independently tested and confirmed compliant with the ISO/IEC 30107-3 Presentation Attack Detection standard at Level 2." — Biometric Update, reporting on Neurotechnology's MegaMatcher update, Biometric Update

The New Two-Step Workflow — In the Right Order

The operational change required here is simple to describe and genuinely important to internalize. The old workflow was: evidence → facial comparison → similarity score → match/no match.

The new workflow is: evidence → liveness/authenticity check (first) → facial comparison → similarity score → match/no match.

If step two fails — if the media doesn't pass an authenticity check — everything downstream is invalid. You are not examining a face. You are examining a model's output. Feeding that into a facial comparison engine and trusting the result is like carbon-dating a piece of plastic and citing the result as historical evidence. The tool is working correctly. The input is the problem.

That 1-in-550 false rejection rate from the Level 2 testing is worth revisiting here. In a forensic context, that statistic marks a threshold: roughly 0.18% of genuine video captures will be flagged by the automated system and need manual review. That's not a flaw — that's the system working as designed and telling you where human judgment is needed. Tools like CaraComp's AI face comparison platform operate in exactly this space — where automated analysis sets the boundaries and forensic expertise handles the edge cases that sit at those margins. Up next: Political Deepfakes Video Evidence Authentication .

What You Just Learned

  • 🧠 Match scores measure similarity, not authenticity — a deepfake can achieve a 98% confidence match against a real person in your database
  • 🔬 PAD Level 1 and Level 2 test completely different threats — Level 1 covers photos and screen replays; Level 2 covers masks, shallow fakes, and AI-generated deepfakes; they don't overlap
  • 📊 The industry has already shifted — 50+ billion annual liveness checks projected by 2027 means authenticity verification is now standard practice, not advanced technique
  • ⚠️ The correct workflow has two stages in a specific order — authenticity first, then comparison; reversing or skipping the first step invalidates the second entirely

The Question That Changes Everything

Three years ago, the first question an investigator asked about image evidence was: "Is this the same person?" Today, that question is still important — but it can only be asked second. The question that has to come first is: "Is this a real person, captured by a real camera, at a real moment in time?"

That's not a philosophical question. It's a technical one, with a testable answer, using tools that now have standardized certification tiers and published false acceptance rates. The methodology exists. The standards exist. The only remaining variable is whether the person reviewing the evidence knows to ask.

Key Takeaway

Facial comparison confidence scores measure how similar two faces are — they have no mechanism to detect whether either face is synthetic. In any case involving photo or video evidence, liveness and authenticity detection must happen before comparison, using tools certified to at least ISO/IEC 30107-3 Level 2. A match score without a prior authenticity check is not evidence of a real person. It's evidence that two mathematical representations resemble each other.

Here's the detail that should keep you up at night — or at least make you audit your current workflow. The 1,500 attacks that were blocked in Level 2 testing? Those weren't exotic laboratory curiosities. Silicone masks, 3D-printed faces, and AI deepfakes are commercially accessible today. The threat level assumed by Level 2 testing is already the operational environment. The only gap left is whether your tools — and your workflow — have caught up to it.

When you receive a key image or video in a case today, what's the very first thing you do to convince yourself it's genuine, not AI-generated or manipulated? That answer is worth examining carefully.

Ready to try AI-powered facial recognition?

Match faces in seconds with CaraComp. Free 7-day trial.

Start Free Trial