Spatiotemporal deep learning for real-time video-based deepfake detection using 3DCNN, 3DResNet, TCN, and VAE

Advanced AI techniques enable the creation of deepfake videos which endanger digital security and produce misinformation while threatening individual privacy rights. The detection and classification of manipulated content represent an immediate priority for deep learning and computer vision research. Generative adversarial networks (GANs) and deep learning models have experienced rapid development which resulted in deepfakes becoming more realistic than ever before thus making them hard to distinguish from genuine media for both human observers and standard detection systems. The situation becomes worse due to easy access to deepfake generation tools and the advancing complexity of forgeries and the dangerous applications in political manipulation and financial deception and identity theft. Deepfake detection needs powerful AI-based solutions which can handle the evolving deepfake synthesis methods while providing accurate results that generalize to various dataset types. This research paper investigates how four deep learning models including 3D Convolutional Neural Network (3DCNN), 3D Residual Network (3DResNet), Temporal Convolutional Network (TCN), and Variational Autoencoder (VAE) perform in detecting and classifying deepfake videos. The models received training and evaluation using FaceForensics++ (FF++) and Deepfake Detection Challenge (DFDC) and Celeb-DF (CDF) datasets which contained more than 3500 real and fake video samples. The experiments took place on an NVIDIA DGX A100 workstation to achieve efficient model training. This work does not present a new architecture of detection but rather poses itself as a systematic benchmarking and generalization study. The experimental findings indicate that 3DCNN can reach the highest test accuracy of 64.68% which is higher than the results of 3DResNet, TCN, and VAE under the cross-dataset conditions. It analyses indicate that there is significant degradation in generalization and different failure behavior with models when presented with heterogeneous distributions of data. The results offer useful information about the shortcomings of generally embraced spatiotemporal models and guide the creation of more dependable deepfake detection systems that can be implemented in real life.