Result: A classification approach for less popular webpages based on latent semantic analysis and rough set model

Title:
A classification approach for less popular webpages based on latent semantic analysis and rough set model
Source:
Expert systems with applications. 42(1):642-648
Publisher Information:
Amsterdam: Elsevier, 2015.
Publication Year:
2015
Physical Description:
print, 1/4 p
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Informatique théorique, Theoretical computing, Recherche information. Graphe, Information retrieval. Graph, Logiciel, Software, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Intelligence artificielle, Artificial intelligence, Reconnaissance et synthèse de la parole et du son. Linguistique, Speech and sound recognition and synthesis. Linguistics, Affiliation, Afiliación, Analyse circuit, Network analysis, Análisis circuito, Analyse sémantique, Semantic analysis, Análisis semántico, Annotation, Anotación, Classification, Clasificación, Cognition, Cognición, Fouille donnée, Data mining, Busca dato, Indexation, Indexing, Indización, Internet, Langage naturel, Natural language, Lenguaje natural, Linguistique, Linguistics, Linguística, Modélisation, Modeling, Modelización, Mot clé, Keyword, Palabra clave, Navigation information, Information browsing, Navegacíon informacíon, Protocole internet, Internet protocol, Protocolo internet, Réseau social, Social network, Red social, Résumé, Abstract, Resumen, Site Web, Web site, Sitio Web, Structure document, Document structure, Estructura documental, Sémantique algébrique, Algebraic semantic, Semántica algebraica, Texte, Text, Texto, Graphe de terrain, Complex network, Red compleja, Théorie ensemble approximatif, Rough set theory, Teoría de los Conjuntos Aproximados, Complex network analysis, Latent semantic analysis, Rough set, Webpage classification
Document Type:
Academic journal Article
File Description:
text
Language:
English
Author Affiliations:
School of Economics and Management, BeiHang University, Beijing 100191, China
School of Accounting and Finance, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong-Kong
ISSN:
0957-4174
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.28843430
Database:
PASCAL Archive

Further Information

Nowadays, with the explosive growth of web information, the webpage classification faces great challenge. Computers have difficulty in understanding the semantic meaning of textual or non-textual webpages. Fortunately, Web 2.0 based collaborative tagging system brings new opportunities to solve this problem. It abstracts structured tags from unstructured content in webpages. However, large numbers of webpages on the Internet are less popular. Their tagging information is sparse, which makes their topic unclear and leads to ambiguous classification. Illuminated by the ambiguous classification, we name the less popular webpage hesitant webpage. In this paper, we propose an advanced approach for hesitant webpages classification. Firstly, hesitant webpages are divided into bridges, hubs and attached webpages according to their roles on the Internet. Secondly, attached webpages are classified by mining and extending their information in two perspectives. One is the latent semantic analysis (LSA) which is applied to fully explore the semantic meaning of sparse tags. It promotes accurate cognition of webpages semantically close to attached webpages. Another is the proposed density-relation-based rough set model which measures the affiliation degree of attached webpages in different categories. Experiment on real data shows that our approach effectively classifies the hesitant webpages base on the semantic meaning.