Treffer: Cuda-level performance with python-level productivity for gaussian mixture model applications

Title:
Cuda-level performance with python-level productivity for gaussian mixture model applications
Contributors:
The Pennsylvania State University CiteSeerX Archives
Source:
http://parlab.eecs.berkeley.edu/sites/all/parlab/files/CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications.pdf.
Publication Year:
2011
Collection:
CiteSeerX
Document Type:
Fachzeitschrift text
File Description:
application/pdf
Language:
English
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Accession Number:
edsbas.30DAE983
Database:
BASE

Weitere Informationen

Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware; unless the hardware geometry and problem dimensions are both taken into account, large factors of performance may be left on the table. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS (Selective Embedded Just-in-Time Specialization), a set of techniques that leverages just-in-time code generation and compilation combined with reflection and metaprogramming. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert’s program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code. 1