Projects
Production ML infrastructure and security implementations. Chronological list of deployed systems and learning projects.
2025
Inference Gateway
FastAPI-based authentication and rate limiting layer for self-hosted Ollama LLM server. Solves the "everyone can DOS your GPU" problem.
Tech Stack: Python, FastAPI, Redis, Ollama
Why it matters: Self-hosted LLMs need production-grade access control.
Status: Production, serving requests daily.
Links: GitHub | Technical Writeup
Key Features:
- API key authentication
- Per-user rate limiting
- Request/response logging
- Health monitoring endpoints
What I learned: Protecting GPU resources requires different thinking than traditional API rate limiting. Token-based costs vs request-based costs.
MLSecOps Production Lab
3-node Proxmox cluster with segregated VLANs, ZFS storage, GPU workstation. Infrastructure-as-code for reproducible ML environments.
Tech Stack: Proxmox, ZFS, Ansible, VLANs 40/50/99x
Why it matters: Production ML needs production infrastructure.
Status: Operational, continuously evolved.
Links: Architecture Overview | Network Topology
Key Components:
- 3-node HA cluster with ZFS storage
- Segregated network zones for dev/prod/management
- GPU passthrough for ML workloads
- Automated backup and snapshot strategy
What I learned: Enterprise ML infrastructure at homelab cost is possible with the right architecture decisions.