Treffer: Word level script identification for scanned document images

Title:

Word level script identification for scanned document images

Authors:

HUANFENG MA, DOERMANN, David

Source:

Document recognition and retrieval XI (San Jose CA, 21-22 January 2004)SPIE proceedings series. 5296:124-135

Publisher Information:

Bellingham WA: SPIE, 2004.

Publication Year:

2004

Physical Description:

print, 19 ref

Original Material:

INIST-CNRS

Subject Terms:

Documentation, Electronics, Electronique, Optics, Optique, Physics, Physique, Telecommunications, Télécommunications, Sciences exactes et technologie, Exact sciences and technology, Sciences et techniques communes, Sciences and techniques of general use, Sciences de l'information. Documentation, Information science. Documentation, Systèmes de recherche d'informations. Système de gestion documentaire et d'information, Information retrieval systems. Information and document management system, Interfaces. Logiciels, Interfaces. Software, Sciences de l'information et de la communication, Information and communication sciences, Système de recherche documentaire. Système de gestion documentaire et d'information, Arabe, Arabic, Árabe, Bilinguisme, Bilingualism, Bilingüismo, Chinois, Chinese, Chino, Classificateur, Classifier, Clasificador, Coréen, Korean, Coreano, Dictionnaire, Dictionaries, Diccionario, Filtre Gabor, Gabor filter, Filtro Gabor, Identification, Identificación, Multilinguisme, Multilingualism, Multilingüismo, Numérisation, Digitizing, Numerización, Segmentation, Segmentación, Texte, Text, Texto, Document numérisé, Digitized document, Hindi, Langue, Language

Document Type:

Konferenz Conference Paper

File Description:

text

Language:

English

Author Affiliations:

Language and Media Processing Laboratory, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, United States

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=16075713

Rights:

Copyright 2004 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Sciences of information and communication. Documentation

FRANCIS

Accession Number:

edscal.16075713

Database:

PASCAL Archive

Weitere Informationen

In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k-Nearest-Neighbor (k-NN)) are used to identify different scripts at the word level (glyphs separated by white space). These three classifiers are applied to a variety of bilingual dictionaries and their performance is compared. Experimental results show the capability of Gabor filter to capture script features and the effectiveness of these three classifiers for script identification at the word level.

Treffer: Word level script identification for scanned document images

Weitere Informationen

Links

Zusatz-Funktionen