Treffer: Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions

Title:
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
Source:
ICSOC 2024, 22nd International Conference on Service-Oriented Computing
Publication Year:
2025
Collection:
Computer Science
Document Type:
Report Working Paper
Accession Number:
edsarx.2502.12017
Database:
arXiv

Weitere Informationen

As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.