Treffer: ACME: A scalable parallel system for extracting frequent patterns from a very long sequence : Data-intensive cloud infrastructure

Title:
ACME: A scalable parallel system for extracting frequent patterns from a very long sequence : Data-intensive cloud infrastructure
Source:
The VLDB journal. 23(6):871-893
Publisher Information:
Heidelberg: Springer, 2014.
Publication Year:
2014
Physical Description:
print, 31 ref
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Gestion des mémoires et des fichiers (y compris la protection et la sécurité des fichiers), Memory and file management (including protection and security), Sciences biologiques et medicales, Biological and medical sciences, Sciences biologiques fondamentales et appliquees. Psychologie, Fundamental and applied biological sciences. Psychology, Generalites, General aspects, Mathématiques biologiques. Statistiques. Modèles. Métrologie. Informatique en biologie (généralités), Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects), Alphabet, Alfabeto, Analyse donnée, Data analysis, Análisis datos, Antémémoire, Cache memory, Antememoria, Bioinformatique, Bioinformatics, Bioinformática, Efficacité, Efficiency, Eficacia, Extensibilité, Scalability, Estensibilidad, Fichier log, Log file, Fichero actividad, Gestion mémoire, Storage management, Gestión memoria, Gestion ressources, Resource management, Gestión recursos, Génétique, Genetics, Genética, Indexation, Indexing, Indización, Motif structural, Structural unit, Motivo estructural, Méthode combinatoire, Combinatorial method, Método combinatorio, Méthode heuristique, Heuristic method, Método heurístico, Processeur multicoeur, Multicore processor, Procesador MultiNúcleo, Structure donnée, Data structure, Estructura datos, Suffixe, Suffix, Sufijo, Superordinateur, Supercomputer, Supercomputador, Système réparti, Distributed system, Sistema repartido, Série temporelle, Time series, Serie temporal, Unité centrale, Central unit, Unidad central, Informatique dans les nuages, Cloud computing, Computación en nube, Automatic tuning, Cache efficient, Cloud, Elastic, Motif, Suffix tree
Document Type:
Fachzeitschrift Article
File Description:
text
Language:
English
Author Affiliations:
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Qatar Computing Research Institute, Doha, Qatar
ISSN:
1066-8888
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Biological sciences. Generalities. Modelling. Methods

Computer science; theoretical automation; systems

Generalities in biological sciences
Accession Number:
edscal.28931086
Database:
PASCAL Archive

Weitere Informationen

Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud. .