📞 +91-7667918914 | ✉️ iarjset@gmail.com
International Advanced Research Journal in Science, Engineering and Technology
International Advanced Research Journal in Science, Engineering and Technology A Monthly Peer-Reviewed Multidisciplinary Journal
ISSN Online 2393-8021ISSN Print 2394-1588Since 2014
IARJSET aligns to the suggestive parameters by the latest University Grants Commission (UGC) for peer-reviewed journals, committed to promoting research excellence, ethical publishing practices, and a global scholarly impact.
← Back to VOLUME 12, ISSUE 7, JULY 2025

Image to Speech with GenAI – OmniRead AI

Mr. Jaydatta Dupade, Mr. Suyash Yadav, Mr. Nilesh Wani, Mr. Sanket More, Prof. C.T. Dhumal

👁 1 view📥 0 downloads
Share: 𝕏 f in

Abstract: OmniRead AI is an integrated web-based system that converts images (containing text or visual scenes) into natural-sounding speech. It combines modern OCR and vision-language models with speech synthesis: users upload an image or enter text, the system extracts and optionally interprets content, and then reads it aloud. The pipeline uses EasyOCR and Tesseract for text extraction, Moondream AI for vision-language understanding (via user prompts), and Google's gTTS (Text-to-Speech) for audio generation. Implemented in Python/Flask, OmniRead AI demonstrates a seamless "image-to-voice" experience. This paper elaborates the system's architecture (Fig. 1), preprocessing steps, OCR ensemble, generative AI integration, and TTS pipeline. We report example outputs and discuss application scenarios. OmniRead AI enhances accessibility for visually impaired users and serves as a multipurpose assistive tool for education and productivity.

Keywords: Image-to-Speech, Optical Character Recognition, Vision-Language Model, Text-to-Speech, EasyOCR, Tesseract OCR, Moondream AI, Gtts, Generative AI, Accessibility.

How to Cite:

[1] Mr. Jaydatta Dupade, Mr. Suyash Yadav, Mr. Nilesh Wani, Mr. Sanket More, Prof. C.T. Dhumal, “Image to Speech with GenAI – OmniRead AI,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2025.12734

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.