Copyright 2008 INIST-CNRS CC BY 4.0 Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.20081175
Database:
PASCAL Archive
Further Information
-The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.