📞 +91-7667918914 | ✉️ iarjset@gmail.com
International Advanced Research Journal in Science, Engineering and Technology
International Advanced Research Journal in Science, Engineering and Technology A Monthly Peer-Reviewed Multidisciplinary Journal
ISSN Online 2393-8021ISSN Print 2394-1588Since 2014
IARJSET aligns to the suggestive parameters by the latest University Grants Commission (UGC) for peer-reviewed journals, committed to promoting research excellence, ethical publishing practices, and a global scholarly impact.
← Back to VOLUME 12, ISSUE 11, NOVEMBER 2025

Optimizing OCR Output: A Post-Processing Approach Using NLP

Ravi P, Thejashwini M A, Thanushree S R, Sonashree M S, Vignesh M G

👁 2 views📥 0 downloads
Share: 𝕏 f in

Abstract: The efficiency of Optical Character Recognition (OCR) decreases significantly when dealing with handwritten text, low-quality scans, and complex backgrounds, often resulting in fragmented, noisy, and syntactically incorrect output. These limitations affect the accuracy of subsequent Natural Language Processing (NLP) tasks such as summarization, information extraction, and automated document analysis. To address these issues, this research work proposes an combined OCR-NLP method that automatically detects text type using DenseNet-121 and applies either Tesseract or OCRSpace based on whether the input contains printed or handwritten text. The raw OCR output is then refined using the Phi-3 language model to correct grammar, enhance readability, and restore contextual meaning. Experimental results on mixed printed and handwritten datasets show a substantial improvement in accuracy, with reduction in Character Error Rate (CER) and Word Error Rate (WER) after NLP post-processing. The proposed system demonstrates a robust, scalable, and automated pipeline suitable for educational digitization, archival processing, and large-scale text-driven applications.

Keywords: OCR, NLP, DenseNet-121, Handwritten Recognition, Printed Text Recognition, Phi-3, Post-Processing.

How to Cite:

[1] Ravi P, Thejashwini M A, Thanushree S R, Sonashree M S, Vignesh M G, “Optimizing OCR Output: A Post-Processing Approach Using NLP,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2025.1211040

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.