Result: An automatic keyphrase extraction system for scientific documents

Title:
An automatic keyphrase extraction system for scientific documents
Source:
Knowledge and information systems (Print). 34(3):691-724
Publisher Information:
London: Springer, 2013.
Publication Year:
2013
Physical Description:
print, 48 ref
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Informatique théorique, Theoretical computing, Algorithmique. Calculabilité. Arithmétique ordinateur, Algorithmics. Computability. Computer arithmetics, Logiciel, Software, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Systèmes d'information. Bases de données, Information systems. Data bases, Intelligence artificielle, Artificial intelligence, Reconnaissance et synthèse de la parole et du son. Linguistique, Speech and sound recognition and synthesis. Linguistics, Complexité calcul, Computational complexity, Complejidad computación, Efficacité, Efficiency, Eficacia, Evaluation performance, Performance evaluation, Evaluación prestación, Grosseur grain, Grain size, Grosor grano, Indexation automatique, Automatic indexing, Indización automática, Localisation, Localization, Localización, Recherche documentaire, Document retrieval, Búsqueda documental, Recherche information, Information retrieval, Búsqueda información, Recouvrement, Overlay, Recubrimiento, Traitement document, Document processing, Tratamiento documento, Candidate phrase identification, Keyphrases extraction, Scientific document processing
Document Type:
Academic journal Article
File Description:
text
Language:
English
Author Affiliations:
HEUDIASYC UMR CNRS 6599, Universite de Technologiede Compiègne, Centre de Recherches de Royallieu, BP 20529, Compiegne, France
ISSN:
0219-1377
Rights:
Copyright 2014 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.26920208
Database:
PASCAL Archive

Further Information

Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75 % without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.