Result: TGDataset: Collecting and Exploring the Largest Telegram Channels Dataset

Title:
TGDataset: Collecting and Exploring the Largest Telegram Channels Dataset
Source:
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1. :2325-2334
Publication Status:
Preprint
Publisher Information:
ACM, 2025.
Publication Year:
2025
Document Type:
Academic journal Article<br />Conference object
DOI:
10.1145/3690624.3709397
DOI:
10.48550/arxiv.2303.05345
Rights:
CC BY
arXiv Non-Exclusive Distribution
Accession Number:
edsair.doi.dedup.....1b6f26d25f449b2276c202070e779d8d
Database:
OpenAIRE

Further Information

Telegram is one of the most popular instant messaging apps in today's digital age. In addition to providing a private messaging service, Telegram, with its channels, represents a valid medium for rapidly broadcasting content to a large audience (COVID-19 announcements), but, unfortunately, also for disseminating radical ideologies and coordinating attacks (Capitol Hill riot). This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages, making it the largest collection of Telegram channels to the best of our knowledge. After a brief introduction to the data collection process, we analyze the languages spoken within our dataset and the topic covered by English channels. Finally, we discuss some use cases in which our dataset can be extremely useful to understand better the Telegram ecosystem, as well as to study the diffusion of questionable news. In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.