Treffer: Approximate em learning on large computer clusters
Weitere Informationen
An important challenge in the field of unsupervised learning is not only the development of algorithms that infer model parameters given some dataset but also to implement them in a way so that they can be applied to problems of realistic size and to sufficiently complex benchmark problems. We developed a lightweight, easy to use MPI (Massage Passing In-terface) based Python framework that can be used to parallelize a variety of Expectation Maximization (EM) based algorithms. We used this infrastructure to implement standard algorithms such as Mixtures of Gaussians (e.g., [1]), Sparse Coding [2], or probabilistic PCA [3, 4], as well as novel algorithms such as Maximal Causes Analysis [5, 6], Occlusive Causes Analysis [7], Binary Sparse Coding [8] or mixture models for visual object learning [9, 10]. Once integrated into the framework the algorithms can be executed on large numbers of pro-cessor cores and can be applied to large sets of data. Some of the numerical experiments we performed ran on InfiniBand interconnected clusters and used up to 4000 parallel processor cores with more than 1017 floating point operations. Current experiments on a new cluster use still more cores (Loewe CSC,>10 000 cores). For reasonably balanced meta-parameters (number of data points vs. number of latent variables vs. number of model parameters to be