Treffer: Automatic classification of company’s document stream: Comparison of two solutions

Title:

Automatic classification of company’s document stream: Comparison of two solutions

Authors:

Voerman, Joris, Souleiman Mahamoud, Ibrahim, Coustaty, Mickaël, Joseph, Aurélie, Poulain D’andecy, Vincent, Ogier, Jean-Marc

Contributors:

Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), La Rochelle Université (ULR), Yooz ITESOFT-Yooz Group, ANR-18-LCV3-0008,IDEAS,Laboratoire d'ingénierie, d'analyse et de la sécurité documentaire(2018)

Source:

Pattern Recognition Letters. 172:181-187

Publisher Information:

CCSD; Elsevier, 2023.

Publication Year:

2023

Collection:

collection:UNIV-ROCHELLE
collection:ANR
collection:ELSEVIER

Subject Terms:

Document processing, Imbalanced classification, Neural network, [INFO]Computer Science [cs]

Original Identifier:

PII: S0167-8655(23)00191-5
HAL: hal-04678432

Document Type:

Zeitschrift article<br />Journal articles

Language:

English

ISSN:

0167-8655

Relation:

info:eu-repo/semantics/altIdentifier/doi/10.1016/j.patrec.2023.06.012

DOI:

10.1016/j.patrec.2023.06.012

Access URL:

https://hal.science/hal-04678432
https://hal.science/hal-04678432v1/document
https://hal.science/hal-04678432v1/file/S0167865523001915.pdf

Rights:

info:eu-repo/semantics/OpenAccess
URL: http://hal.archives-ouvertes.fr/licences/copyright/

Accession Number:

edshal.hal.04678432v1

Database:

HAL

Weitere Informationen

Documents are essential nowadays and present everywhere. In order to manage the vast amount of documents managed by companies, a first step consists in automatically determining the type of the document (its class). Even if automatic classification has been widely studied in the state of the art, the strongly imbalanced context and industrial constraints bring new challenges which were not studied till now: how to classify as many documents as possible with the highest precision, in an imbalanced context and with some classes missing during training?To this end, this paper proposes to study two different solutions to address these issues. The first is a multimodal neural network reinforced by an attention model and an adapted loss function that is able to classify a great variety of documents. The second is a combination method that uses a cascade of systems to offer a gradual solution for each issue. These two options provide good results as well in ideal context than in imbalanced context. This comparison outlines the limitations and the future challenges.

Treffer: Automatic classification of company’s document stream: Comparison of two solutions

Weitere Informationen

Links

Zusatz-Funktionen