Treffer: A sampling-based framework for parallel data mining

Title:

A sampling-based framework for parallel data mining

Authors:

CONG, Shengnan, JIAWEI HAN, HOEFLINGER, Jay, PADUA, David

Source:

PPoPP'05 (Proceedings of the 2005 ACM SIGPLAN symposium on principles and practice of parallel programming). :255-265

Publisher Information:

New York NY: ACM Press, 2005.

Publication Year:

2005

Physical Description:

print, 30 ref 1

Original Material:

INIST-CNRS

Subject Terms:

Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Systèmes d'information. Bases de données, Information systems. Data bases, Algorithme parallèle, Parallel algorithm, Algoritmo paralelo, Analyse donnée, Data analysis, Análisis datos, Base donnée très grande, Very large databases, Echantillonnage aléatoire, Random sampling, Muestreo aleatorio, Extraction information, Information extraction, Extracción información, Fouille donnée, Data mining, Busca dato, Information utile, Useful information, Información útil, Méthode diviser pour régner, Divide and conquer method, Método dividir para vencer, Programmation parallèle, Parallel programming, Programación paralela, Technique programmation, Programmation technique, Técnica programación

Document Type:

Konferenz Conference Paper

File Description:

text

Language:

English

Author Affiliations:

Department of Computer Science University of Illinois, Urbana, IL 61801, United States
KAI Software Lab Intel Americas, Inc, Champaign, IL 61820, United States

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=18182689

Rights:

Copyright 2006 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Computer science; theoretical automation; systems

Accession Number:

edscal.18182689

Database:

PASCAL Archive

Weitere Informationen

The goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with broad applications. Perhaps the most efficient way to solve these problems sequentially is to apply a pattern-growth algorithm, which is a divide-and-conquer algorithm [9, 10]. In this paper, we present a framework for parallel mining frequent itemsets and sequential patterns based on the divide-and-conquer strategy of pattern growth. Then, we discuss the load balancing problem and introduce a sampling technique, called selective sampling, to address this problem. We implemented parallel versions of both frequent itemsets and sequential pattern mining algorithms following our framework. The experimental results show that our parallel algorithms usually achieve excellent speedups.

Treffer: A sampling-based framework for parallel data mining

Weitere Informationen

Links

Zusatz-Funktionen