Result: Improvisation of Crawling on Web Maps Using Distributed Algorithms with Socket Programming Model

Title:
Improvisation of Crawling on Web Maps Using Distributed Algorithms with Socket Programming Model
Source:
J-KOMA : Jurnal Ilmu Komputer dan Aplikasi. 8:10-17
Publisher Information:
Universitas Negeri Jakarta, 2025.
Publication Year:
2025
Document Type:
Academic journal Article
ISSN:
2620-4827
DOI:
10.21009/j-koma.v8i1.02
Rights:
CC BY
Accession Number:
edsair.doi...........07ee056c1f1178ecefe5bb6acc665917
Database:
OpenAIRE

Further Information

This research develops a distributed web crawler system to enhance the efficiency of data collection across multiple devices using socket-based communication and a master-slave architecture. The system employs a tracker to manage peer connections, including devices with public and private IP addresses, utilizing UDP hole-punching for peer-to-peer communication behind NATs. A load balancer ensures equitable distribution of URLs among crawler nodes, minimizing duplication and optimizing workload. The architecture consists of a tracker, a manager, and multiple clients, with private clients performing the crawling to mitigate restrictions like rate limiting. Testing over a one-hour period with the initial URL "https://www.detik.com/"; demonstrated that the distributed crawler collected 9175 unique URLs, a 30% increase compared to 7069 URLs by an individual crawler, with no data duplication. The system achieves improved resource efficiency and data optimization, though it is influenced by network latency and coordination overhead. This distributed approach significantly enhances crawling performance and scalability compared to single-device crawlers.