Current Issue
Romanian Journal of Information Technology and Automatic Control / Vol. 35, No. 4, 2025
RED-SM: A reinforced encoder-decoder transformer for unsupervised video summarization
Venkatachalam ARULKUMAR, Rajendran LATHAMANJU, Rajamani THANGAM, Krishnamoorthy DURGA DEVI
The rapid growth of the video content across online platforms has made it increasingly important to generate concise summaries that help users quickly understand and navigate long videos. However, creating high-quality video summaries typically requires large amounts of annotated data, which is costly and often unavailable. To address this challenge, the authors propose a fully unsupervised approach to video summarization built on Transformer architectures. The method introduces the Reinforced Encoder-Decoder Summarizer Model (RED-SM), which uses multi-head self-attention and feature extraction to identify informative video segments without human labels. RED-SM incorporates sparsity-promoting penalties and a reinforcement learning reward that balances diversity, representativeness, and temporal smoothness to guide frame selection. To further enhance the summarization quality, the RED-SM with a BERT-based text extractor is integrated, enabling multimodal fusion of visual and textual cues. The approach is evaluated on the SumMe and TVSum datasets, as well as a newly curated dataset of 30 categories of short videos. The experiments show that the method consistently produces concise and high-quality summaries across diverse domains. These results highlight the RED-SM as an effective and scalable solution for unsupervised video summarization in real-world applications.
Keywords:
Stochastic Optimization, Reinforcement Learning, Video Summaries.
CITE THIS PAPER AS:
Venkatachalam ARULKUMAR,
Rajendran LATHAMANJU,
Rajamani THANGAM,
Krishnamoorthy DURGA DEVI,
"RED-SM: A reinforced encoder-decoder transformer for unsupervised video summarization",
Romanian Journal of Information Technology and Automatic Control,
ISSN 1220-1758,
vol. 35(4),
pp. 79-94,
2025.
https://doi.org/10.33436/v35i4y202506