Treffer: Optimizing Spark job scheduling with distributional deep learning in cloud environments.

Title:
Optimizing Spark job scheduling with distributional deep learning in cloud environments.
Source:
Journal of Cloud Computing (2192-113X); 10/27/2025, Vol. 14 Issue 1, p1-19, 19p
Database:
Complementary Index

Weitere Informationen

Apache Spark has emerged as a leading in-memory big data processing framework, with cloud deployments offering scalability and cost efficiency. However, effective job scheduling in cloud-based Spark environments remains challenging due to resource heterogeneity, dynamic workloads, and strict deadline requirements. Existing schedulers often optimize isolated objectives, failing to address the complex trade-offs inherent in real-world deployments. To bridge this gap, we propose a distributional deep reinforcement learning (DRL) framework that jointly optimizes five key objectives: minimizing virtual machine (VM) cost, enhancing energy efficiency, ensuring deadline adherence, maximizing job throughput, and optimizing resource utilization. We implemented two DRL agents, Rainbow DQN and C51, within a Python-based simulation environment using TensorFlow, explicitly modeling Spark's distributed execution patterns. Our experimental results demonstrate that Rainbow DQN achieves superior convergence stability and scheduling efficiency, reducing early-stage VM costs by 66%, improving CPU utilization by 12.5%, and enhancing deadline compliance by 4.2% compared to C51. While C51 exhibits faster initial convergence, Rainbow DQN delivers more robust long-term performance. These findings highlight the trade-offs between learning speed and optimization quality in DRL-based schedulers, providing actionable insights for cloud-based Spark deployments. [ABSTRACT FROM AUTHOR]

Copyright of Journal of Cloud Computing (2192-113X) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)