Treffer: Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Title:
Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions
Authors:
Source:
Algorithmica. 66(2):310-328
Publisher Information:
Heidelberg: Springer, 2013.
Publication Year:
2013
Physical Description:
print, 43 ref
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Informatique théorique, Theoretical computing, Algorithmique. Calculabilité. Arithmétique ordinateur, Algorithmics. Computability. Computer arithmetics, Logiciel, Software, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Apprentissage probabilités, Probability learning, Aprendizaje probabilidades, Approche déterministe, Deterministic approach, Enfoque determinista, Approximation asymptotique, Asymptotic approximation, Aproximación asintótica, Arbre graphe, Tree(graph), Arbol grafo, Borne inférieure, Lower bound, Cota inferior, Classification automatique, Automatic classification, Clasificación automática, Dimensionnalité, Dimensionality, Dimensionalidad, Fonction Lipschitz, Lipschitz function, Función Lipschitz, Fonction décision, Decision function, Función decisión, Indexation, Indexing, Indización, Loi probabilité, Probability distribution, Ley probabilidad, Modélisation, Modeling, Modelización, Métrique, Metric, Métrico, Plus proche voisin, Nearest neighbour, Vecino más cercano, Structure donnée, Data structure, Estructura datos, Algorithm performance, Algorithms and data structures, Indexing schemes, Similarity search, Vapnik-Chernonenkis theory
Document Type:
Fachzeitschrift Article
File Description:
text
Language:
English
Author Affiliations:
Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario K1N 6N5, Canada
ISSN:
0178-4617
Rights:
Copyright 2014 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.27239168
Database:
PASCAL Archive

Weitere Informationen

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, p, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω, where the query points are subject to the same probability distribution μ as datapoints. Let F denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets {ω: f(ω) > a}, a ∈ ℝ is o(n1/4/log2 n). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption dO(1) is reasonable.) We deduce the Ω(n1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω, X). In paricular, this bound is superpolynomial in d.