Your 94% Face Match Just Became a €35M Problem

Here's a fact that stops most people cold: an AI facial comparison system can score a 96% match — and still expose your organization to a €35 million fine. Not because the algorithm was wrong. Because nobody documented why a human trusted the result and acted on it.

TL;DR

A facial comparison score is evidence — not a verdict — and under EU law, using AI results without a documented audit trail and human review can cost more than the decision itself was worth.

This is the mistake that smart, well-meaning people make every single day with AI identity tools. They get a score. The score looks high. They act on it. They move on. What they don't realize is that the score is only one part of what makes an AI-assisted decision defensible — and under the EU AI Act, "defensible" now has a price tag attached to it.

How a Match Score Actually Works (And Why One Number Isn't the Whole Story)

Before we talk about what can go wrong, let's talk about what's happening inside the tool when it spits out "94% match."

Modern facial comparison doesn't look at a photo the way your eyes do. Instead, it converts a face into something called an embedding — basically a long list of numbers that represents the distances and angles between facial features. Think of it as turning your face into GPS coordinates. The system then does the same to a second photo, and measures how far apart those two sets of coordinates are. That distance measurement is called Euclidean distance (how far apart two points are in space, once you've mapped faces as points on a grid). Close together? Likely the same person. Far apart? Probably not.

The confidence score you see — that "94% match" — is really just a way of saying: "these two face-maps are this close together." Simple enough, right?

Here's where it gets interesting. Someone, at some point, had to decide: how close is close enough? That decision is called setting a threshold. And moving that threshold even slightly changes everything. Set it too tight, and you'll miss real matches (a legitimate person gets rejected). Set it too loose, and you'll flag false ones (two different people look like a match). According to 3DiVi, the balance between minimizing false positives and false negatives can be adjusted by setting thresholds based on specific requirements and use case — but only if you know who set the threshold, for what purpose, and under what conditions.

Your match score doesn't tell you any of that. It just tells you the result. Not the context behind it. This article is part of a series — start with Age Verification Identity Data Security Risks.

€35M

maximum penalty under the EU AI Act for non-compliance with prohibited AI systems

Source: MD+DI

The Breathalyzer Problem

Think about how a breathalyzer works in a DUI stop. The device gives an officer a number — say, 0.09. But that number alone is never what shows up as evidence in court. The officer documents the equipment's calibration record, the time of day, the conditions of the test, the suspect's behavior, and every procedural step followed. The number is evidence. The paper trail is what makes it defensible evidence.

Facial comparison works exactly the same way. The match score is the number on the breathalyzer. The audit trail — who ran the comparison, which model version was used, what threshold was applied, which human reviewed it and signed off — is the calibration record. Without it, you don't have evidence. You have an assertion.

And unlike a breathalyzer reading, nobody's going to hand you a checklist at the scene.

"Audit trails transform AI from a black box into an accountable system with the evidence to prove it operates as intended." — Swept AI, on AI governance documentation

Trusted by Investigators Worldwide

Run Forensic-Grade Comparisons in Seconds

Court-ready facial comparison reports. Results in seconds.

Get Started

7-day refund guarantee**

Why Smart People Get This Wrong

Nobody who trusts a match score too much is being reckless. They're being human.

A 94% confidence score sounds definitive. It's close to 100. It passed. Your brain does something very natural: it anchors on that single number and assumes the algorithm handled all the hard thinking already. The percentage feels like a grade — and a 94 is an A. Why would you question an A?

But here's the actual situation. According to research on explainable confidence scores in face verification published on arXiv, predictions that fall close to the matching threshold are inherently uncertain — even when the number looks high. A result of 94% means "this pair of faces is close to the match line." It does not mean "this is definitely the same person." The distance between those two ideas is where innocent people get caught in bad decisions.

There's also a second thing the number hides: which type of error the system is optimized to avoid. A false positive is when the system says two different people are a match (that's usually the dangerous error in identity work). A false negative is when the system fails to match two photos of the same person. These trade off against each other — reduce one and you typically increase the other. No single confidence score tells you which trade-off was chosen, or whether it was the right one for your situation. Previously in this series: Your Face Your Kids Passport Their Database The Age Check Qu.

This is why the misconception is so understandable, and also so consequential. The number looks like the answer. It's really just one data point in a decision that should also include context, human judgment, and documentation of all three.

What the EU AI Act Actually Cares About

The EU AI Act — which began its phased enforcement on August 1, 2024, with full high-risk AI system obligations arriving by August 1, 2026 — is not primarily a law about accuracy. That surprises most people. They assume regulators care most about whether the tool gets the right answer.

Regulators care about something harder to fake: can you show your work?

Under the Act, high-risk AI systems (and facial comparison in professional identity contexts almost certainly qualifies) must be designed so that human oversight is possible, with appropriate levels of accuracy, and — critically — with a quality management system that documents how compliance is maintained. As MD+DI explains in their breakdown of the Act's phased timeline, a confidence score alone satisfies none of these requirements.

What satisfies them? According to Kiteworks, a compliant audit trail includes: the timestamp of the comparison, the identity of the user who ran it, the decision rationale (why this result was trusted), any human approvals or overrides, and records of errors or guardrail actions. That's not a policy binder sitting on a shelf. It's a living record generated by your process, every single time you use the tool.

Most current AI workflows don't produce this. Most generate some kind of log — but logs that record inputs and outputs aren't the same as records that document why a human decided to act on a result. That gap is exactly where the liability lives.

What You Just Learned

🧠 A match score is a distance measurement — it tells you how close two face-maps are, not whether two people are the same person
🔬 Thresholds determine what "close enough" means — and without knowing who set the threshold and why, the score has no context
💡 EU law measures governance, not just accuracy — documentation, human review, and explainability are the legal standard, not the percentage score
🧠 Most current AI logs don't pass audit — recording a result is not the same as documenting why a human trusted it

What a Defensible Process Actually Looks Like

So what does "doing it right" look like in practice? It's less complicated than the regulation makes it sound — but it requires changing a habit. Up next: Your Face Cant Be Reset The Hidden Cost Of Proving Youre Ove.

Before you act on an AI comparison result, five things need to be on record: the match score, the model version that produced it, the threshold that was applied (and who set it), the name of the human reviewer who assessed the result, and a plain-English note explaining why that reviewer trusted it enough to proceed. Not a checkbox. An actual reason.

That last part is the one most workflows skip. It feels redundant when the score is high. It feels obvious when the photos look similar. But six months later, in a dispute or an audit, "the score was 94%" is not an explanation. "The score was 94%, the threshold was validated for this use case, reviewer Jane Smith confirmed the match was consistent with supporting documentation" — that's an explanation.

At CaraComp, this is the principle that shapes how we think about facial comparison in professional identity work: accuracy is a starting point, not a destination. The comparison shows you something worth examining. The human reviewer, the documented reasoning, and the audit record are what transform that observation into a defensible decision.

According to the Observer Research Foundation, bias evaluation and mitigation techniques represent the ability to identify and remedy deviations — but only when documentation approaches provide a trail for transparency and accountability. Documentation isn't the bureaucratic part of using AI. It is the work.

Key Takeaway

A facial comparison score tells you the algorithm's opinion. Documentation, human review, and a clear audit trail are what turn that opinion into something you can defend — in court, in a compliance audit, or in a conversation with someone whose identity was affected by the decision.

Here's the question worth sitting with: if you had to defend an AI-assisted identity result six months from now, would your notes explain why you trusted it — or would they only show the final score?

Because the fine isn't for getting the answer wrong. It's for not being able to show how you got the answer at all.

Your 94% Face Match Just Became a €35M Problem

How a Match Score Actually Works (And Why One Number Isn't the Whole Story)

The Breathalyzer Problem

Why Smart People Get This Wrong

What the EU AI Act Actually Cares About

What You Just Learned

What a Defensible Process Actually Looks Like

Ready for forensic-grade facial comparison?

More Education

Your Face Can't Be Reset: The Hidden Cost of Proving You're Over 18 Online

Your Kid's Face, Their Data: The Age-Check Trap Nobody Warned You About

That 95% Face Match Could Be a Total Lie — Here's the Trick Fooling the Camera