Treffer: FreeCore : an index system of summary of documents on an Distributed Hash Table (DHT) ; FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT)

Title:
FreeCore : an index system of summary of documents on an Distributed Hash Table (DHT) ; FreeCore : un système d'indexation de résumés de document sur une Table de Hachage Distribuée (DHT)
Authors:
Contributors:
DistributEd aLgorithms and sYStems (DELYS), Centre Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Sorbonne Université, Université Cheikh Anta Diop (Dakar, Sénégal, 1957-.), Mesaac Makpangou, Samba Ndiaye
Source:
https://theses.hal.science/tel-01921587 ; Recherche d'information [cs.IR]. Sorbonne Université; Université Cheikh Anta Diop (Dakar, Sénégal ; 1957-.), 2018. Français. ⟨NNT : 2018SORUS180⟩.
Publisher Information:
CCSD
Publication Year:
2018
Document Type:
Dissertation doctoral or postdoctoral thesis
Language:
French
Relation:
NNT: 2018SORUS180
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.FA88BC3C
Database:
BASE

Weitere Informationen

This thesis examines the problem of indexing and searching in Distributed Hash Table (DHT). It provides a distributed system for storing document summaries based on their content. Concretely, the thesis uses Bloom filters (BF) to represent document summaries and proposes an efficient method for inserting and retrieving documents represented by BFs in an index distributed on a DHT. Content-based storage has a dual advantage. It allows to group similar documents together and to find and retrieve them more quickly at the same by using Bloom filters for keywords searches. However, processing a keyword query represented by a Bloom filter is a difficult operation and requires a mechanism to locate the Bloom filters that represent documents stored in the DHT. Thus, the thesis proposes in a second time, two Bloom filters indexes schemes distributed on DHT. The first proposed index system combines the principles of content-based indexing and inverted lists and addresses the issue of the large amount of data stored by content-based indexes. Indeed, by using Bloom filters with long length, this solution allows to store documents on a large number of servers and to index them using less space. Next, the thesis proposes a second index system that efficiently supports superset queries processing (keywords-queries) using a prefix tree. This solution exploits the distribution of the data and proposes a configurable distribution function that allow to index documents with a balanced binary tree. In this way, documents are distributed efficiently on indexing servers. In addition, the thesis proposes in the third solution, an efficient method for locating documents containing a set of keywords. Compared to solutions of the same category, the latter solution makes it possible to perform subset searches at a lower cost and can be considered as a solid foundation for supersets queries processing on over-dht index systems. Finally, the thesis proposes a prototype of a peer-to-peer system for indexing content and searching by keywords. ...