The Deepfake Type Investigators Keep Missing — and Why It's About to Dominate Fraud

Full Episode Transcript

A genuine video of someone talking produces a math score of about zero point sixteen when you measure how tightly their lip movements sync with their voice. A lip-synced deepfake scores around zero point six three. That's a four-fold gap — and almost nobody screening for fakes is measuring it.

If you've ever watched a badly dubbed movie, you

If you've ever watched a badly dubbed movie, you already know what audio-visual mismatch feels like. Something's off, but you can't quite name it. Now imagine that mismatch is subtle enough to fool you in a video call, a courtroom exhibit, or a clip your kid shares on social media. That uneasy feeling you get when you wonder whether a video is real? It's not paranoia. According to fraud-monitoring platform Sumsub, deepfake fraud attempts jumped seven hundred percent year over year by twenty twenty-six. This isn't a future problem. It's already operating at industrial scale. So the question isn't just "is this video fake?" It's "what kind of fake is it?" — because the answer changes everything about how you catch it.

Most people lump all deepfakes into one bucket. That's understandable. The word "deepfake" gets used like it's a single technology, and consumer detection tools mostly train on celebrity face-swaps — the most visible, most dramatic type. But researchers who study manipulated video split human-face deepfakes into at least four distinct categories: face swapping, face reenactment, lip-syncing, and face animation. For practical detection, those four collapse into two forensic buckets. Entire-face synthesis — where the whole face gets replaced or generated from scratch. And partial manipulation — where only a piece of the face changes, usually the mouth. That distinction matters because each bucket leaves completely different traces.

Face-swaps are the ones you've probably seen debunked online. Someone pastes one person's face onto another person's body. The forensic clues live in geometry and behavior. The face might warp slightly at the edges. The expressions might not match the body language. Identity markers — the proportions between eyes, nose, and jawline — don't quite line up with how that person actually moves. Detection tools trained on face-swaps look for exactly those visual anomalies: warping artifacts, irregular facial movements, boundary inconsistencies.

Trusted by Investigators Worldwide

Run Forensic-Grade Comparisons in Seconds

Court-ready facial comparison reports. Results in seconds.

Get Started

7-day refund guarantee**

🎆 July 4th Sale: 50% OFF your first month — use code JULY426 at checkout · ends July 11

— what happens when the face isn't swapped at all

Now — what happens when the face isn't swapped at all? In a lip-sync deepfake, the person's identity stays the same. Only the mouth region gets altered to match a different audio track. The face looks right because it is the real face. And that's exactly why investigators trained on face-swap detection keep missing these. They're scanning for the wrong artifacts. It's like searching a room for a broken window when the intruder picked the lock.

According to researchers publishing through IEEE and C.V.P.R., the forensic signature of a lip-sync fake lives in timing, not appearance. In natural speech, certain sounds force your mouth into specific shapes. The letters P, B, and M all require your lips to close completely. If you muted a real video and tried to read the speaker's lips, then turned the sound back on, the words would match. In a lip-synced deepfake, that coordination breaks down because the mouth was reanimated after the original recording. A single freeze-frame might look perfectly fine. But across a sequence of frames, the mouth drifts out of alignment with the voice.

Researchers quantified that drift. They measured something called audio-visual distance — basically a number representing how tightly the lips track the sound. Real videos cluster around a median of zero point one six. Lip-sync deepfakes land between zero point six three and zero point six six. Every authentic video in their dataset fell below a threshold of zero point five. And ninety-seven point five percent of synthetic lip-syncs scored above it. That's not a subtle difference. That's a measurable canyon — but only if you know to measure it.

For anyone who's ever been on a video call and

For anyone who's ever been on a video call and thought "something seemed off about that person," this is the mechanism underneath that instinct. Your brain detects audio-visual mismatch even when you can't articulate what's wrong.

One more layer makes lip-syncs especially dangerous. Because they only modify a small region of the face, they're lightweight enough to run in real time. A live video call, a streaming broadcast — the algorithm only needs to render the mouth, not an entire face. But speed costs quality. Teeth are one of the hardest things for these algorithms to animate convincingly under time pressure. In real-time lip-sync deepfakes, teeth often appear too uniform, too white, slightly blurry, or they shift position unnaturally mid-sentence. That's a speed-quality tradeoff baked into the technology. So if you're on a video call and the person's teeth look oddly perfect or seem to glitch during speech, that's worth noticing.

Commercial detection platforms have adapted by running multiple specialized models at once. One model targets face-swaps. Another targets lip-syncs. Another listens for cloned voices. Each model is tuned to different artifacts. But the ensemble only works if someone first classifies what type of manipulation they're dealing with. Running a face-swap detector on a lip-sync deepfake is like running a spell-checker to find math errors. The tool isn't broken. It's just pointed at the wrong problem.

The Bottom Line

The shift isn't from "real versus fake." It's from "is this fake?" to "what kind of fake is it?" Classification comes before detection — and skipping that step is exactly how the most common fraud type keeps slipping through.

So — three things to carry with you. One: deepfakes aren't one thing. Face-swaps replace who you're looking at. Lip-syncs change what they appear to say. Two: each type breaks in a different place — geometry for face-swaps, audio-visual timing for lip-syncs. Three: before you trust any detection tool or your own eyes, ask which type you're dealing with first. Whether you analyze evidence for a living or you're just trying to figure out if a video in your group chat is real, that one question — "what kind of fake?" — is the one that actually protects you. The written version goes deeper — link's below.

The Deepfake Type Investigators Keep Missing — and Why It's About to Dominate Fraud