Treffer: Improving Document Digitization with Machine Learning-Based OCR

Title:

Improving Document Digitization with Machine Learning-Based OCR

Authors:

Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha

Source:

International Journal on Science and Technology. 16

Publisher Information:

International Research Publication and Journals, 2025.

Publication Year:

2025

Document Type:

Fachzeitschrift Article

ISSN:

2229-7677

DOI:

10.71097/ijsat.v16.i1.1890

Accession Number:

edsair.doi...........46d610eb0ead66ee80a560a51207f49a

Database:

OpenAIRE

Weitere Informationen

In today’s digital era, the extraction of text from unstructured formats such as images, PDFs, and handwritten documents is critical for digitization and automation. Traditional methods often struggle with scalability , complex layouts and multi-language support. This project addresses these challenges by leveraging Machine Learning, Optical Character Recognition (OCR), AWS Textract model and microservices architecture to create a robust, scalable, and efficient text extraction system. The proposed solution integrates advanced technologies such as Java Spring Boot for backend development, PostgreSQL for secure data storage, and containerized microservices for enhanced modularity and scalability. The system performs preprocessing to improve image quality, employs deep learning algorithms for accurate text recognition. Parallel processing and task queuing ensure high throughput and low latency for real-time and bulk operations. By converting unstructured data into structured like JSON or CSV ,this system facilitates seamless integration into existing workflows. This study highlights the design, functionality, and benefits of this innovative approach to text extraction, driving efficiency in document management and automation.

Treffer: Improving Document Digitization with Machine Learning-Based OCR

Weitere Informationen

Links

Zusatz-Funktionen