Treffer: Comparison of spatial indexes

Title:
Comparison of spatial indexes
Contributors:
Équipe DIagnostic, Supervision et COnduite (LAAS-DISCO), Laboratoire d'analyse et d'architecture des systèmes (LAAS), Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National des Sciences Appliquées (INSA)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Université Toulouse - Jean Jaurès (UT2J), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Université Toulouse III - Paul Sabatier (UT3), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse), LAAS-CNRS
Source:
[Research Report] Rapport LAAS n° 16631, LAAS-CNRS. 2016, 13p
Publisher Information:
CCSD, 2016.
Publication Year:
2016
Collection:
collection:UNIV-TLSE2
collection:UNIV-TLSE3
collection:CNRS
collection:INSA-TOULOUSE
collection:LAAS
collection:UT1-CAPITOLE
collection:LAAS-DISCO
collection:LAAS-DECISION-ET-OPTIMISATION
collection:LARA
collection:INSA-GROUPE
collection:TOULOUSE-INP
collection:UNIV-UT3
collection:UT3-INP
collection:UT3-TOULOUSEINP
collection:TEST7-HALCNRS
Original Identifier:
HAL: hal-01701560
Document Type:
Report report<br />Reports
Language:
English
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.01701560v1
Database:
HAL

Weitere Informationen

In this report three different tree data structures were tested in order to find the index structure best suited to data groups retrieval for clustering purposes. Group retrieval is considered as the process of finding all the neighborhoods in the data set. This process involves a recursive implementation of radius-neighbor queries. Time and memory measures were taken for the tree construction process and for the group retrieval process. These measures are the average value over a hundred simulations. The selected structures are: the R*Tree [1], the BallTree [4] and the KDTree [2]. I Introduction The R*Tree, or rectangular Tree, was proposed in 1990 and has been largely used since. In clustering applications, the ClusTree algorithm uses an extension of this index to efficiently locate the right place for object insertion [3]. The key idea of this data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree. BallTrees is a completely binary tree in which each node refer to a region bounded by an d-dimensional hyper-sphere [4]. A parent node is hence the smallest hyper-sphere that contains all the hyper-spheres of its children. The BallTree is the only index, among the analyzed indexes, that accepts the intersection of sibling regions. A K-dimensional binary search tree (KDTree) partitions the data space into mutually exclusive hyper-rectangular regions. Each region is found by recursively splitting the space using axis-parallel hyperplanes. The splitting process finishes when each sub-region has a number of points less than or equal to a given threshold. The python implementations of the R*Tree 1 , the BallTree 2 , the KDTree 3 and the cKDTree 4 (A cython implementation of the KDTree), were tested using three different scenarios. The computation time and memory consumption (maximum resident set size used), were selected as comparative measures. II Performance with uniform distributions In the first scenario, data correspond to random samples extracted from a uniform distribution. The algorithms were tested for a data sets containing n data samples, where each sample is a d-dimensional vector. Tests were performed for n varying in the range n ∈ 10 1 , 10 2 , 10 3 , 10 4 and d in d ∈ 2 1 , 2 2 , 2 3 , 2 4 , 2 5. Figure 1 show the resources used for the tree building task. Computation time measured in seconds is presented in Figure 1a and the maximum resident set size measured in M b is shown in Figure 1b. It can be seen that the R*Tree implementation performs poorly with respect to the other indexes. The actual measurements of the computation time and maximum resident set size are presented in tables 1 and 2 respectively. The group retrieval task is computationally more expensive as can be seen in Figure 2, where the computation time for the worst case (R*Tree, n = 18,d = 10 4) passes from 33.12 seconds in the tree building task to 362.9 seconds for group retrieval (see Figure 2a). The best computation time is achieved by the cKDTree index but at cost of having the worst memory consumption (see Figure 2b). The recorded data for the group retrieval task is presented in tables 3 and 4. 1