Result: Tight lower bounds for query processing on streaming and external memory data

Title:
Tight lower bounds for query processing on streaming and external memory data
Source:
Automata, languages and programming (ICALP 2005)Theoretical computer science. 380(1-2):199-217
Publisher Information:
Amsterdam: Elsevier, 2007.
Publication Year:
2007
Physical Description:
print, 46 ref
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences et techniques communes, Sciences and techniques of general use, Mathematiques, Mathematics, Logique mathématique, fondements, théorie des ensembles, Mathematical logic, foundations, set theory, Logique et fondements, Logic and foundations, Récursivité, Recursion theory, Topologie. Variétés et complexes cellulaires. Analyse globale et analyse sur variétés, Topology. Manifolds and cell complexes. Global analysis and analysis on manifolds, Analyse globale, analyse sur des variétés, Global analysis, analysis on manifolds, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Informatique théorique, Theoretical computing, Divers, Miscellaneous, Logiciel, Software, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Systèmes d'information. Bases de données, Information systems. Data bases, Algorithme optimal, Optimal algorithm, Algoritmo óptimo, Base de données, Database, Base dato, Borne inférieure, Lower bound, Cota inferior, Couche, Layer, Capa, Entrée ordinateur, Input, Entrada ordenador, Evaluation, Evaluación, Filtrage, Filtering, Filtrado, Gestion, Management, Gestión, Hiérarchie, Hierarchy, Jerarquía, Informatique théorique, Computer theory, Informática teórica, Langage interrogation, Query language, Lenguaje interrogación, Mémoire tampon, Buffer memory, Memoria tampón, Méthode séquentielle, Sequential method, Método secuencial, Objet, Object, Objeto, Performance, Rendimiento, Produit, Product, Producto, Requête, Query, Pregunta documental, Stockage donnée, Data storage, Almacenamiento datos, Système mémoire, Storage system, Sistema memoria, Technologie, Technology, Tecnología, Temps accès, Access time, Tiempo acceso, Théorie base donnée, Database theory, Traitement de la requête, Query processing, Tratamiento pregunta, Traitement donnée, Data processing, Tratamiento datos, Triage, Sorting, Tría, 03D55, 58A25, 68P10, 68P15, 68Wxx, Complexité donnée, Data streams, External memory, Lower bounds, Machine models, XML query languages
Document Type:
Conference Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Institut für Informatik, Humboldt-Universitdt zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
Database Group, Universität des Saarlandes, Postfach 15 11 50, 66041 Saarbrücken, Germany
ISSN:
0304-3975
Rights:
Copyright 2008 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems

Mathematics
Accession Number:
edscal.18821622
Database:
PASCAL Archive

Further Information

It is generally assumed that databases have to reside in external, inexpensive storage because of their sheer size. Current technology for external storage systems presents us with a reality that, performance-wise, a small number of sequential scans of the data is strictly preferable over random data accesses. Database technology - in particular query processing technology -has developed around a notion of memory hierarchies with layers of greatly varying sizes and access times. It seems that the current technologies scale up to their tasks and are very successful, but on closer investigation it may appear that our theoretical understanding of the problems involved - and of optimal algorithms for these problems - is not quite as developed. Recently, data stream processing has become an object of study by the database management community, but from the viewpoint of database theory, this is really a special case of the query processing problem on data in external storage where we are limited to a single scan of the input data. In the present paper we study a clean machine model for external memory and stream processing. We establish tight bounds for the data complexity of Core XPath evaluation and filtering. We show that the number of scans of the external data induces a strict hierarchy (as long as internal memory space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, i.e., of size o(n).