Projects

Production ML infrastructure and security implementations. Chronological list of deployed systems and learning projects.

2025

Inference Gateway

October 2025 | Active

FastAPI-based authentication and rate limiting layer for self-hosted Ollama LLM server. Solves the "everyone can DOS your GPU" problem.

Tech Stack: Python, FastAPI, Redis, Ollama

Why it matters: Self-hosted LLMs need production-grade access control.

Status: Production, serving requests daily.

Links: GitHub | Technical Writeup

Key Features:

  • API key authentication
  • Per-user rate limiting
  • Request/response logging
  • Health monitoring endpoints

What I learned: Protecting GPU resources requires different thinking than traditional API rate limiting. Token-based costs vs request-based costs.

MLSecOps Production Lab

October 2025 | Active

3-node Proxmox cluster with segregated VLANs, ZFS storage, GPU workstation. Infrastructure-as-code for reproducible ML environments.

Tech Stack: Proxmox, ZFS, Ansible, VLANs 40/50/99x

Why it matters: Production ML needs production infrastructure.

Status: Operational, continuously evolved.

Links: Architecture Overview | Network Topology

Key Components:

  • 3-node HA cluster with ZFS storage
  • Segregated network zones for dev/prod/management
  • GPU passthrough for ML workloads
  • Automated backup and snapshot strategy

What I learned: Enterprise ML infrastructure at homelab cost is possible with the right architecture decisions.