That Facial Match Score Is Lying to Your Face
That Facial Match Score Is Lying to Your Face
This episode is based on our article:
Read the full article →That Facial Match Score Is Lying to Your Face
Full Episode Transcript
Every time your phone unlocks with a glance, it isn't recognizing your face. It's measuring the distance between two points in a space with a hundred and twenty-eight dimensions. And that distance can lie.
That might sound abstract, but it touches everyone
That might sound abstract, but it touches everyone. If you've ever been tagged in a photo you didn't post, or watched your phone unlock for you in a split second, facial comparison math is already running on your data. And if the idea of an algorithm deciding whether you are who you say you are makes you uneasy — that's a reasonable response. Because most people, including many professionals who rely on these systems, don't understand what's actually happening behind that match score. Today we're going to open the hood on facial comparison — not the marketing version, but the actual math. How does a machine turn your face into numbers, and why do those numbers break down exactly when the stakes are highest?
The starting point is something called an embedding. A facial recognition system doesn't store a picture of your face. It converts your face into exactly a hundred and twenty-eight numerical values — a string of decimals like zero-point-two-three, negative zero-point-four-five, zero-point-seven-eight, and so on. According to the original Google FaceNet research published on arXiv, modern systems achieve their performance using only a hundred and twenty-eight bytes per face. That's it. Your entire identity, as far as the algorithm is concerned, is a list of a hundred and twenty-eight numbers.
Two different photos of you will produce very similar lists. Two photos of you and a stranger will produce very different lists. The system then measures the gap between those two lists using something called Euclidean distance — basically, how far apart two points sit in a hundred-and-twenty-eight-dimensional space. A distance of zero means the faces are mathematically identical. A distance of four means they're completely different people. And somewhere in between, there's a cutoff line — a threshold — where the system decides: match or no match.
The article from Holistic News puts it this way
The article from Holistic News puts it this way. Imagine converting a face into G.P.S. coordinates inside a vast city with a hundred and twenty-eight dimensions. Two photos of the same person land at nearly identical coordinates. Two photos of different people land miles apart. But if the photo is dark, or the face is turned, or half-hidden — the signal scrambles, and the algorithm drops the pin in the wrong neighborhood.
So where does the threshold come from? That's where the system learns. During training, the network looks at three faces at once — a reference face, a second photo of the same person, and a photo of someone else. It adjusts its internal math to push the matching pair closer together in that hundred-and-twenty-eight-dimensional space and shove the non-matching face further away. Millions of these triplet comparisons teach the system what "same" and "different" look like — geometrically. For anyone who's trained a dog with treats and corrections, the logic is similar. Reward closeness, penalize distance, repeat millions of times.
Now, on clean, well-lit, front-facing photos, these systems are extraordinary. According to researchers using the Labeled Faces in the Wild benchmark, modern models score above ninety-nine percent accuracy. A threshold of about one-point-one in that hundred-and-twenty-eight-dimensional space classifies nearly every pair correctly under those conditions. That number is real. But it creates a dangerous illusion.
Vendors publish that number because it comes from
Vendors publish that number because it comes from clean, controlled lab conditions. And most buyers — agencies, companies, even courts — never think to ask what happens when the conditions aren't clean. According to peer-reviewed research published in ScienceDirect, when a face is partially blocked — by sunglasses, a hat, a hand, anything covering thirty to forty percent of the face — recognition rates collapsed below forty percent. Let that land. Below forty percent means the algorithm got it wrong more often than it got it right. More than half the images in the test database were wrongly matched. A partial face doesn't just make the embedding a little less accurate. It corrupts the embedding entirely. The hundred and twenty-eight numbers the system generates aren't slightly off — they can be fundamentally broken. And the system has no way to tell you that.
For an investigator comparing a grainy surveillance still to a clean mugshot, that's the difference between evidence and noise. For the rest of us, it means a system that works flawlessly when you look straight at your phone can fail spectacularly on a candid photo taken from the side in bad light.
Which brings us to the number that fools almost everyone — the confidence score. When a system says "ninety-five percent match," it feels like an A grade. It feels safe. But that score only describes one pair of images at one moment. It's a local measurement, not a global guarantee. Run that same algorithm across a database of a hundred thousand faces, and false positives start to stack. A ninety-five percent score on a single comparison tells you almost nothing about whether that match will hold up at scale. That's the gap between what the number says and what people hear.
The Bottom Line
One more layer. Deepfakes make all of this harder. According to current detection benchmarks compiled by ScreenApp, the best deepfake detection tools reach ninety to ninety-six percent accuracy. Ninety-six percent sounds high — until you realize that means four out of every hundred synthetic faces slip through undetected. Tools like Intel's FakeCatcher try to catch fakes by analyzing something called photoplethysmography — tiny color changes in human skin caused by blood flow. Real skin pulses with each heartbeat in ways that are invisible to your eye but measurable by a sensor. Deepfake generators don't simulate blood flow, so its absence becomes a signal. That's a detection method no human could replicate just by looking.
Your eye asks "does that look like the same person?" The algorithm asks "are these two points close enough in a hundred-and-twenty-eight-dimensional space, given these specific image conditions?" Those are fundamentally different questions — and only the second one is defensible.
So — three things to carry with you. First, facial recognition doesn't see your face. It converts it into a hundred and twenty-eight numbers and measures the distance between them. Second, that math works beautifully on clean photos and can fall apart completely when a face is angled, obscured, or poorly lit. Third, a high confidence score describes one comparison, not overall reliability — and it never tells you when its own data is corrupted. Whether you evaluate evidence for a living or you just want to know what your phone is actually doing when it looks at you, understanding the math is how you stop being at its mercy. The full story's in the description if you want the deep dive.
Ready for forensic-grade facial comparison?
2 free comparisons with full forensic reports. Results in seconds.
Run My First SearchMore Episodes
Every Image Is Guilty Until Proven Authentic
A retiree in Saskatchewan handed over three thousand dollars to someone she believed was Prime Minister Mark Carney. She watched a video of him endorsing a cryptocurrency investment. His face, his vo
PodcastDeepfake Fraud Tripled to $1.1B. Your Evidence Workflow Didn't.
A billion dollars. That's how much Americans lost to deepfake fraud this year alone. Triple what it was just twelve months ago. And the people behind it? M
PodcastA Facial Recognition 'Match' Isn't Evidence Until It Survives These 4 Hidden Steps
A confidence score of ninety-five percent sounds rock solid. But according to research published by CaraComp, that same algorithm's accuracy can plummet by fifty percentage points — half i
