Treffer: Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Weitere Informationen
Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, p, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω, where the query points are subject to the same probability distribution μ as datapoints. Let F denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets {ω: f(ω) > a}, a ∈ ℝ is o(n1/4/log2 n). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption dO(1) is reasonable.) We deduce the Ω(n1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω, X). In paricular, this bound is superpolynomial in d.