Treffer: Automated Transformation of OpenMP to CUDA Kernels Using AI Models.
Weitere Informationen
The increasing demand for computational efficiency in high-performance computing (HPC) has driven research into automating the transformation of parallel programming paradigms. This paper investigates an AI-driven approach to translating OpenMP-based CPU parallel programs into CUDA-based GPU programs. Using omniCUDA, a custom fine-tuned large language model (LLM), functional CUDA kernels can be generated directly from OpenMP code, bypassing the need for traditional compiler optimization techniques. The training dataset consists of synthetic OpenMP-to-CUDA pairs and a selected subset of manually optimized algorithms from the PolyBench suite. Performance was evaluated on kernels not included in the training set, with only partial overlap, allowing me to assess the model's ability to generalize to unseen algorithms. Experimental results confirm that the model produces syntactically correct and compilable CUDA code, successfully replicating functional behavior across parallel loop structures. Performance evaluation on four benchmark algorithms, three of which were not included in the training dataset, shows that the model consistently outperforms OpenMP implementations and, in some cases, surpasses even manually optimized CUDA kernels from the PolyBench suite. The presented approach demonstrates the feasibility and competitiveness of AI-assisted OpenMP-to-CUDA transformation. The model exhibits generalization capabilities beyond the training set, and ongoing work focuses on refining memory access strategies and kernel configurations to further enhance performance across diverse parallel workloads. [ABSTRACT FROM AUTHOR]