Combating Falsification of Speech Videos with Live Optical Signatures

Motivation

Recent years have seen a surge in falsified videos of high-profile speech events, thanks to the rise of deepfake technology and other advanced video editing tools. Digital techniques for authenticating videos, like watermarking or content signing, are a promising solution to this growing problem. But they succeed only if all parties – from recording audience members to downstream video editors – cooperate by adding and retaining the appropriate credentials in their videos. This work explores a complementary physical approach that ensures all authentic videos of a speech can be verified, with no assumptions of external cooperation.

Our Approach

VeriLight creates dynamic physical signatures at the speech site and embeds them into all video recordings via imperceptible modulated light. These physical signatures encode features unique to the event, and are cryptographically-secured to prevent spoofing. The signatures can be extracted from any video downstream and validated to check the content's integrity.

This work focuses on combating visual falsification of speaker identity and lip/facial motion - two particularly popular and consequential forms of manipulation.^* Experiments on extensive video datasets and five deepfake models show VeriLight achieves AUCs ≥ 0.99 and a true positive rate of 100% in detecting such falsifications, outperforming ten state-of-the-art passive deepfake detectors. Further, VeriLight is robust across recording conditions, video post-processing techniques, and white-box adversarial attacks.

How it works

A speaker places a low-cost VeriLight core unit at her event. The core unit continually observes the scene and extracts two forms of visual features: 1) identity features - facial representations capturing the speaker's identity, and 2) dynamic features - face and lip motion signals related to delivered speech content (shown below).

The features are compressed using locality-sensitive hashing and secured with an HMAC to form a signature. The core unit encodes the signature data as modulated light that is invisible to the naked eye, yet manifests in videos as pixel-level signals that can be localized and decoded. The video below shows the decoding of one patch of the embedded signature.

A published video can be verified by extracting its embedded signatures and comparing the recovered feature hashes to those computed on the portrayed speech. If any hashes' difference exceeds a threshold, VeriLight reports a falsification. This comparison is made for successive windows of the speech to enable temporal localization of manipulations.

Attack 1: lipsync deepfake. This form of falsification involves re-animating a speaker's face and lips to match different audio. VeriLight detects a pinpointed lipsync that starts at time window #3 of the video below.

Attack 2: identity swap deepfake. Here, the delivered content (and thus the lip/facial motion) remains the same, but the speaker's face is swapped. VeriLight reports that the speaker identity has been falsified in the entire video below.

Acknowledgment

We sincerely thank our reviewers for their insightful feedback. This work is supported in part by the SEAS-KFAI Generative AI and Public Discourse Research program at Columbia. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the funding agency or others.

BibTeX

 
@article{
  schwartz2025combating,
  author = {Schwartz, Hadleigh and Yan, Xiaofeng and Carver, Charles J. and Zhou, Xia},
  title = {Combating Falsification of Speech Videos with Live Optical Signatures},
  year = {2025},
  booktitle = {Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS)}
}

Copyright

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.