Copyright 1995 INIST-CNRS CC BY 4.0 Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Sciences of information and communication. Documentation
FRANCIS
Accession Number:
edscal.3553671
Database:
PASCAL Archive
Weitere Informationen
In information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance within the document which can the be viewed as a vector in a N-dimensional espace. In this paper we demonstrate that a proper term weighting is at least as important as their selection and that different types of terms (e.g, words, phrases, names), and terms derived by different means (e. g. statistical, linguistic) must be treated differently for a maximum benefit in retrieval. We report results of selected experiments with our prototype natural language information retrieval performed in connection with the second Text REtrieval Conference (TREC-2) using a 550 MBytes Wall Street journal database and a and a 300 Mbytes San Jose Mercury database.