Treffer: Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes

Title:

Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes

Authors:

Dinesh Gopalan

Publisher Information:

Open Engineering Inc, 2025.

Publication Year:

2025

Document Type:

Fachzeitschrift Article

DOI:

10.31224/5394

Rights:

CC BY

Accession Number:

edsair.doi...........ac23cee64877f4412936fcc3b4a132d7

Database:

OpenAIRE

Weitere Informationen

This document details a robust, production-ready artificial intelligence inference architecture specifically tailored for healthcare and pharmaceutical applications, leveraging Triton, FastAPI, and Kubernetes for efficient and secure deployment. It outlines the critical components, including a FastAPI Gateway, an optional NLP/CV Preprocessor, and a Triton Inference Server, designed to handle diverse AI models. The architecture also integrates a Model Registry, CI/CD with GitHub Actions, Kubernetes for orchestration, comprehensive Monitoring, and robust Security measures, including optional PHI de-identification. The system supports various use cases within healthcare and pharma inference, ensuring high availability and scalability. The architecture leverages specific ports: Triton uses HTTP 8000, gRPC 8001, and metrics 8002, while the Preprocessor container routes port 8080 to Service 80, facilitating seamless communication within the ecosystem. Key Kubernetes files, such as 'k8s.yaml', 'hpa.yaml', and 'preprocessor.yaml', manage deployment, scaling, and preprocessor configurations, respectively, while security protocols are thoroughly documented in 'SECURITY.md', complemented by a visual representation of the architecture in 'architecture.png'. This comprehensive setup ensures optimized performance and reliability for demanding AI workloads in regulated environments.

Treffer: Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes

Weitere Informationen

Links

Zusatz-Funktionen