Treffer: A deep learning solution for industrial OCR applications

Title:
A deep learning solution for industrial OCR applications
Contributors:
Di Stefano, Luigi, Goncalves, Luis, Mambelli, Filippo
Publisher Information:
Alma Mater Studiorum - Università di Bologna
Publication Year:
2019
Collection:
Università di Bologna: AMS Tesi di Laurea (Alm@DL)
Document Type:
Dissertation master thesis
File Description:
application/pdf
Language:
English
Relation:
https://amslaurea.unibo.it/id/eprint/19777/1/Tesi%20Magistrale%20Lorenzo%20Lamberti.pdf; Lamberti, Lorenzo (2019) A deep learning solution for industrial OCR applications. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria elettronica [LM-DM270] , Documento ad accesso riservato.
Rights:
Free to read
Accession Number:
edsbas.4E1406F3
Database:
BASE

Weitere Informationen

This thesis describes a project developed throughout a six months internship in the Machine Vision Laboratory of Datalogic based in Pasadena, California. The project aims to develop a deep learning system as a possible solution for industrial optical character recognition applications. In particular, the focus falls on a specific algorithm called You Only Look Once (YOLO), which is a general-purpose object detector based on convolutional neural networks that currently offers state-of-the-art performances in terms of trade-off between speed and accuracy. This algorithm is indeed well known for reaching impressive processing speeds, but its intrinsic structure makes it struggle in detecting small objects clustered together, which unfortunately matches our scenario: we are trying to read alphanumerical codes by detecting each single character and then reconstructing the final string. The final goal of this thesis is to overcome this drawback and push the accuracy performances of a general object detector convolutional neural network to its limits, in order to meet the demanding requirements of industrial OCR applications. To accomplish this, first YOLO's unique detecting approach was mastered in its original framework called Darknet, written in C and CUDA, then all the code was translated into Python programming language for a better flexibility, which also allowed the deployment of a custom architecture. Four different datasets with increasing complexity were used as case-studies and the final performances reached were surprising: the accuracy varies between 99.75\% and 99.97\% with a processing time of 15 ms for images $1000\times1000$ big, largely outperforming in speed the current deep learning solution deployed by Datalogic. On the downsides, the training phase usually requires a very large amount of data and time and YOLO also showed some memorization behaviours if not enough variability is given at training time.