Treffer: Automatic content extraction of filled form images based on clustering component block projection vectors
Title:
Automatic content extraction of filled form images based on clustering component block projection vectors
Authors:
Source:
Document recognition and retrieval XI (San Jose CA, 21-22 January 2004)SPIE proceedings series. 5296:204-212
Publisher Information:
Bellingham WA: SPIE, 2004.
Publication Year:
2004
Physical Description:
print, 16 ref
Original Material:
INIST-CNRS
Subject Terms:
Documentation, Electronics, Electronique, Optics, Optique, Physics, Physique, Telecommunications, Télécommunications, Sciences exactes et technologie, Exact sciences and technology, Sciences et techniques communes, Sciences and techniques of general use, Sciences de l'information. Documentation, Information science. Documentation, Systèmes de recherche d'informations. Système de gestion documentaire et d'information, Information retrieval systems. Information and document management system, Interfaces. Logiciels, Interfaces. Software, Sciences de l'information et de la communication, Information and communication sciences, Système de recherche documentaire. Système de gestion documentaire et d'information, Analyse, Analysis, Análisis, Automatisation, Automation, Automatización, Classification, Clasificación, Document imprimé, Printed document, Documento impreso, Extraction, Extracción, Méthode vectorielle, Vector method, Método vectorial, Numérisation, Digitizing, Numerización, Contenu informationnel, Informational content, Document numérisé, Digitized document, Formulaire, Form
Document Type:
Konferenz
Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Computer Science and Mathematics Division, Oak Ridge National Lab., Office: A110 Life Science Building, 120 Green St., Athens, GA, 30605, United States
Computational Research Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 94720, United States
Center for Cognitive Neuroscience, Duke University, Durham, NC, 27710, United Kingdom
Computational Research Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 94720, United States
Center for Cognitive Neuroscience, Duke University, Durham, NC, 27710, United Kingdom
Rights:
Copyright 2004 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Sciences of information and communication. Documentation
FRANCIS
FRANCIS
Accession Number:
edscal.16091881
Database:
PASCAL Archive
Weitere Informationen
Automatic understanding of document images is a hard problem. Here we consider a sub-problem, automatically extracting content from filled form images. Without pre-selected templates or sophisticated structural/semantic analysis, we propose a novel approach based on clustering the component-block-projection-vectors. By combining spectral clustering and minimal spanning tree clustering, we generate highly accurate clusters, from which the adaptive templates are constructed to extract the filled-in content. Our experiments show this approach is effective for a set of 1040 US IRS tax form images belonging to 208 types.