Result: PCA and PLS with very large data sets

Title:
PCA and PLS with very large data sets
Source:
Partial least squaresComputational statistics & data analysis. 48(1):69-85
Publisher Information:
Amsterdam: Elsevier Science, 2005.
Publication Year:
2005
Physical Description:
print, 1 p.1/4
Original Material:
INIST-CNRS
Document Type:
Conference Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Umetrics Inc., 17 Kiel Ave, Kinnelon, NJ 07405, United States
Research Group for Chemometrics, Umeå University, 901 87 Umeå, Sweden
ISSN:
0167-9473
Rights:
Copyright 2005 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Mathematics
Accession Number:
edscal.16461607
Database:
PASCAL Archive

Further Information

Chemometrics was started around 30 years ago to cope with the rapidly increasing volumes of data produced in chemical laboratories. A multivariate approach based on projections-PCA and PLS-was developed that adequately solved many of the problems at hand. However, with the further increase in the size of our data sets seen today in all fields of science and technology, we start to see inadequacies in our multivariate methods, both in their efficiency and interpretability. Starting from a few examples of complicated problems seen in RD&P (research, development, and production), possible extensions and generalizations of the existing multivariate projection methods-PCA and PLS-will be discussed. Criteria such as scalability of methods to increasing size of problems and data, increasing sophistication in the handling of noise and non-linearities, interpretability of results, and relative simplicity of use, will be held as important. The discussion will be made from a perspective of the evolution of scientific methodology as (a) driven by new technology, e.g., computers and graphical displays, and the need to answer some always reoccurring and basic questions, and (b) constrained by the limitations of the human brain, i.e., our ability to understand and interpret scientific and data analytic results.