📞 +91-7667918914 | ✉️ iarjset@gmail.com
International Advanced Research Journal in Science, Engineering and Technology
International Advanced Research Journal in Science, Engineering and Technology A Monthly Peer-Reviewed Multidisciplinary Journal
ISSN Online 2393-8021ISSN Print 2394-1588Since 2014
IARJSET aligns to the suggestive parameters by the latest University Grants Commission (UGC) for peer-reviewed journals, committed to promoting research excellence, ethical publishing practices, and a global scholarly impact.
← Back to VOLUME 12, ISSUE 11, NOVEMBER 2025

AI Based Real Time Video Transcript Extraction and Summarization

Chaitrashree R, Harshitha V, Sowrabha J N, Spandana J, Najibul Rehman

👁 1 view📥 0 downloads
Share: 𝕏 f in

Abstract: The increasing reliance on digital classrooms, virtual meetings, and multimedia content has created a strong demand for systems that can quickly convert long audio-video streams into structured and meaningful information. This paper introduces a unified, AI-driven transcription and summarization framework that functions seamlessly across a Windows-based standalone desktop application for real-time system audio transcription using Stereo Mix, a Chrome browser extension that performs tab-level audio capture and streaming transcription through a floating overlay interface; and a docker-containerized Flask web application deployed on Google Cloud Run. that supports file uploads, URL processing, AI-driven summarization, translation, and subtitle generation (SRT/VTT). The system captures audio from multiple sources - system level outputs, active browser tabs, uploaded media files, and external URLs - and transforms them into accurate transcripts through an optimized pipeline featuring chunk-based processing, adaptive buffering, low-latency data streaming, and efficient WebSocket/SSE communication. Real-time transcription is delivered through tokenized streaming, while Google Gemini generates multilingual summaries, context-aware descriptions, and synchronized subtitles. Reliability is strengthened through UUID-based storage, parallel chunk processing, and noise-resilient preprocessing. The entire pipeline is powered by Soniox Speech-to-Text (STT) and Google Gemini models. Experimental evaluation confirms that the architecture successfully handles long-form recordings, noisy audio streams, browser restrictions, and fluctuating network conditions. The proposed solution provides a scalable and flexible platform suitable for students, educators, content creators, and accessibility-driven applications, enabling fast transcript generation, cross-platform usability, and intelligent AI-powered summarization.

Keywords: Real-time transcription, audio processing, speech-to-text, Multilingual summarization, Server-Sent Events (SSE), AI-based summarization, browser extension, Flask web application, desktop transcription application, WebSocket streaming, cloud deployment, Docker, Soniox STT.

How to Cite:

[1] Chaitrashree R, Harshitha V, Sowrabha J N, Spandana J, Najibul Rehman, “AI Based Real Time Video Transcript Extraction and Summarization,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2025.1211037

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.