Treffer: Mining data to find subsets of high activity

Title:

Mining data to find subsets of high activity

Authors:

Source:

Special issue in honor of John W. Tukey (1915-2000). Contemporary data analysis: Theory and methodsJournal of statistical planning and inference. 122(1-2):23-41

Publisher Information:

Amsterdam; Lausanne; New York,NY: Elsevier Science, 2004.

Publication Year:

2004

Physical Description:

print, 18 ref

Original Material:

INIST-CNRS

Subject Terms:

Control theory, operational research, Automatique, recherche opérationnelle, Computer science, Informatique, Mathematics, Mathématiques, Sciences exactes et technologie, Exact sciences and technology, Sciences et techniques communes, Sciences and techniques of general use, Mathematiques, Mathematics, Combinatoire. Structures ordonnées, Combinatorics. Ordered structures, Combinatoire, Combinatorics, Problèmes combinatoires classiques, Classical combinatorial problems, Probabilités et statistiques, Probability and statistics, Théorie des probabilités et processus stochastiques, Probability theory and stochastic processes, Probabilités combinatoires, Combinatorial probability, Statistiques, Statistics, Applications, Biologie, psychologie, sciences sociales, Biology, psychology, social sciences, Base donnée, Database, Base dato, Biométrie, Biometrics, Biometría, Décision statistique, Statistical decision, Decisión estadística, Fouille donnée, Data mining, Busca dato, Méthode partition, Partition method, Método partición, Méthode récursive, Recursive method, Método recursivo, Méthode statistique, Statistical method, Método estadístico, Science médicale, Medical science, Ciencia Medica, Valeur prédictive, Predictive value, Valor predictivo, 62P10

Document Type:

Konferenz Conference Paper

File Description:

text

Language:

English

Author Affiliations:

Johnson & Johnson Pharmaceutical Research & Development, Raritan, NJ 08807, United States
Department of Statistics, Rutgers University, Piscataway, NJ 08855, United States

ISSN:

0378-3758

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=15570311

Rights:

Copyright 2004 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Mathematics

Accession Number:

edscal.15570311

Database:

PASCAL Archive

Weitere Informationen

hhMany data mining problems in biometrics research are concerned with trying to identify the characteristics of a subset of cases that responds substantially differently from the rest of the cases. For example, when studying the relationship between a response variable Y and a set of predictor variables, it is often of interest to determine what ranges of values of the predictor variables are associated with a high likelihood of Y = 1 (if Y is a Bernoulli variable) or with high values of Y (if Y is a continuous variable). We describe a criterion (H) and a recursive partitioning method (ARF) that directly addresses this question. A computational algorithm that makes ARF feasible for use even with very large datasets is presented. The basic version of ARF can be generalized to the case of multiple response variables, Y1,...,Yt and other settings. We illustrate the effectiveness of ARF by mining a structure activity database, a hospital database, and some other real and simulated datasets. We conclude by proposing a basic paradigm for data mining.

Treffer: Mining data to find subsets of high activity

Weitere Informationen

Links

Zusatz-Funktionen