Treffer: Approximate em learning on large computer clusters

Title:
Approximate em learning on large computer clusters
Authors:
Contributors:
The Pennsylvania State University CiteSeerX Archives
Collection:
CiteSeerX
Document Type:
Fachzeitschrift text
File Description:
application/pdf
Language:
English
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Accession Number:
edsbas.7D259B93
Database:
BASE

Weitere Informationen

An important challenge in the field of unsupervised learning is not only the development of algorithms that infer model parameters given some dataset but also to implement them in a way so that they can be applied to problems of realistic size and to sufficiently complex benchmark problems. We developed a lightweight, easy to use MPI (Massage Passing In-terface) based Python framework that can be used to parallelize a variety of Expectation Maximization (EM) based algorithms. We used this infrastructure to implement standard algorithms such as Mixtures of Gaussians (e.g., [1]), Sparse Coding [2], or probabilistic PCA [3, 4], as well as novel algorithms such as Maximal Causes Analysis [5, 6], Occlusive Causes Analysis [7], Binary Sparse Coding [8] or mixture models for visual object learning [9, 10]. Once integrated into the framework the algorithms can be executed on large numbers of pro-cessor cores and can be applied to large sets of data. Some of the numerical experiments we performed ran on InfiniBand interconnected clusters and used up to 4000 parallel processor cores with more than 1017 floating point operations. Current experiments on a new cluster use still more cores (Loewe CSC,>10 000 cores). For reasonably balanced meta-parameters (number of data points vs. number of latent variables vs. number of model parameters to be