Treffer: Text-attributed community detection in complex networks through LLMs and GNNs: A powerful fusion of language and graphs.

Title:
Text-attributed community detection in complex networks through LLMs and GNNs: A powerful fusion of language and graphs.
Authors:
K.S., Sruthi1 (AUTHOR) sruthyksreedharan@gmail.com, Sreekumar, A.1 (AUTHOR) sreekumar@cusat.ac.in, Balakrishnan, Kannan1 (AUTHOR) mullayilkannan@gmail.com
Source:
Neurocomputing. Sep2025, Vol. 647, pN.PAG-N.PAG. 1p.
Database:
Academic Search Index

Weitere Informationen

This paper introduces a novel framework for community detection in complex networks by considering advanced embedding techniques to integrate textual information associated with nodes and edges with graph structures. We propose a Text-Attributed Graph (TAG) approach, where textual data from nodes and edges, such as book descriptions and user reviews in book recommendation systems, is transformed into semantic embeddings using pre-trained language models (PLMs). Specifically, we employ the latest state-of-the-art embedding models, including E5-Base, variants of BERT (SBERT, DistiBERT, BERT-Base, and BERT-Large), and OpenAI's text-embedding-3-large, and the cost-effective text-embedding-ada-002, to enrich graph representations with meaningful contextual features as edge embeddings. These embeddings are integrated into graph neural networks (GNNs), enabling the model to exploit structural and textual contexts to improve community detection performance. The integration of textual embeddings and several GNNs in this manner offers a promising performance for enhancing community detection tasks in complex networks, opening new possibilities for applications in recommendation systems, information retrieval, predictive tasks, and beyond. • The paper proposes a unique approach to community detection in Complex Networks by effectively combining textual embeddings from edge texts with the inherent structural information of the graph. This fusion allows GNNs to leverage both semantic and relational context, potentially leading to more accurate and insightful community predictions. • Our work demonstrates a comparative analysis of various state-of-the-art pre-trained language models (PLMs) for generating textual embeddings, including BERT and OpenAI's text-embedding models. Additionally, it explores the efficacy of different GNN architectures (GeneralConv, GINE, and Graph Transformers) in capturing complex patterns within the knowledge graph. • Also, employs Optuna - a package in Python for automated hyperparameter tuning, systematically exploring a wide range of configurations to identify the optimal settings for the model. This process includes not only traditional hyperparameters like learning rate and dropout, but also the selection of the most suitable GNN architecture and embedding method, further enhancing the model's performance and generalization capabilities. [ABSTRACT FROM AUTHOR]