Result: DPDispatcher: Scalable HPC Task Scheduling for AI-Driven Science.

Title:
DPDispatcher: Scalable HPC Task Scheduling for AI-Driven Science.
Authors:
Yuan F; Department of Physics, University of Alabama at Birmingham, Birmingham, Alabama 35205, United States., Ding Z; DP Technology, Beijing 100080, P. R. China., Liu YP; Laboratory of AI for Electrochemistry (AI4EC), IKKEM, Xiamen 361005, P. R. China.; State Key Laboratory of Physical Chemistry of Solid Surface, Collaborative Innovation Center of Chemistry for Energy Materials, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China., Cao K; DP Technology, Beijing 100080, P. R. China., Fan J; School of Mathematical Science, Peking University, Beijing 100871, P. R. China.; AI for Science Institute, Beijing 100080, P. R. China., Nguyen CT; Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, South Korea., Zhang Y; DP Technology, Beijing 100080, P. R. China., Wang H; School of Physics, Hefei University of Technology, Hefei 230061, P. R. China., Chen Y; Program of Applied and Computational Math, Princeton University, Princeton, New Jersey 08540, United States., Huang J; School of Intelligence Science and Technology, Peking University, Beijing 100871, P. R. China., Wen T; Center for Structural Materials, Department of Mechanical Engineering, The University of Hong Kong, Hong Kong 999077, P. R. China., Liu M; Department of Mechanical Engineering, National University of Singapore, Singapore 117575, Singapore., Li Y; Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States., Zhuang YB; Preferred Networks, Inc., 1-6-1 Otemachi, Chiyoda-ku 100-0004, Tokyo, Japan., Yu H; Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts 02215, United States., Tuo P; Bakar Institute of Digital Materials for the Planet, University of California, Berkeley, Berkeley, California 94720, United States., Zhang Y; Nanomaterials Research Institute, Kanazawa University, Kakuma-machi, Kanazawa 920-1192, Ishikawa, Japan., Wang Y; DP Technology, Beijing 100080, P. R. China., Zhang L; DP Technology, Beijing 100080, P. R. China.; AI for Science Institute, Beijing 100080, P. R. China., Wang H; National Key Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Fenghao East Road 2, Beijing 100094, P. R. China.; HEDPS, CAPT, College of Engineering, Peking University, Beijing 100871, P. R. China., Zeng J; School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, P. R. China.; Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, P. R. China.; Suzhou Big Data & AI Research and Engineering Center, Suzhou 215123, P. R. China.
Source:
Journal of chemical information and modeling [J Chem Inf Model] 2025 Nov 24; Vol. 65 (22), pp. 12155-12160. Date of Electronic Publication: 2025 Nov 03.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: American Chemical Society Country of Publication: United States NLM ID: 101230060 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1549-960X (Electronic) Linking ISSN: 15499596 NLM ISO Abbreviation: J Chem Inf Model Subsets: MEDLINE
Imprint Name(s):
Original Publication: Washington, D.C. : American Chemical Society, c2005-
Entry Date(s):
Date Created: 20251103 Date Completed: 20251124 Latest Revision: 20251124
Update Code:
20251124
DOI:
10.1021/acs.jcim.5c02081
PMID:
41183016
Database:
MEDLINE

Further Information

Artificial intelligence (AI) is reshaping computational science, but AI-driven workflows routinely span heterogeneous tasks executed across diverse high-performance computing (HPC) systems. We introduce DPDispatcher, an open-source Python framework for scalable, fault-tolerant task scheduling in such environments with an emphasis on lightweight submission, automatic retries, and robust resumption. DPDispatcher separates connection and file-staging concerns from scheduler control, supports multiple HPC job managers, and provides both local and secure shell (SSH) backends. DPDispatcher has been adopted by more than ten scientific packages. Representative use cases include active learning for machine-learning potentials, free-energy and thermodynamic integration workflows, large-scale materials screening, and large language model (LLM)-driven agents that launch HPC computations. Across these settings, DPDispatcher reduces operational overhead and error rates while improving portability and automation for reliable, high-throughput scientific computing.