Image to Speech with GenAI – OmniRead AI

Mr. Jaydatta Dupade; Mr. Suyash Yadav; Mr. Nilesh Wani; Mr. Sanket More; Prof. C.T. Dhumal

doi:DOI 10.17148/IARJSET.2025.12734

Image to Speech with GenAI – OmniRead AI

Mr. Jaydatta Dupade, Mr. Suyash Yadav, Mr. Nilesh Wani, Mr. Sanket More, Prof. C.T. Dhumal

Abstract: OmniRead AI is an integrated web-based system that converts images (containing text or visual scenes) into natural-sounding speech. It combines modern OCR and vision-language models with speech synthesis: users upload an image or enter text, the system extracts and optionally interprets content, and then reads it aloud. The pipeline uses EasyOCR and Tesseract for text extraction, Moondream AI for vision-language understanding (via user prompts), and Google’s gTTS (Text-to-Speech) for audio generation. Implemented in Python/Flask, OmniRead AI demonstrates a seamless “image-to-voice” experience. This paper elaborates the system’s architecture (Fig. 1), preprocessing steps, OCR ensemble, generative AI integration, and TTS pipeline. We report example outputs and discuss application scenarios. OmniRead AI enhances accessibility for visually impaired users and serves as a multipurpose assistive tool for education and productivity.

Keywords: Image-to-Speech, Optical Character Recognition, Vision-Language Model, Text-to-Speech, EasyOCR, Tesseract OCR, Moondream AI, Gtts, Generative AI, Accessibility.

Downloads: | DOI: 10.17148/IARJSET.2025.12734

How to Cite:

[1] Mr. Jaydatta Dupade, Mr. Suyash Yadav, Mr. Nilesh Wani, Mr. Sanket More, Prof. C.T. Dhumal, "Image to Speech with GenAI – OmniRead AI," International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2025.12734

International Advanced Research Journal in
Science, Engineering and Technology

Image to Speech with GenAI – OmniRead AI

Call for Papers

Author Center

IARJSET Management

Archives