r/softwarearchitecture • u/doublecore20 • 7d ago
Discussion/Advice Warm Pool vs KubeAPI
We have a debate at our workplace;
We're in the process of a big refactor of a monolithic project into micro services which will be deployed with k8s on EKS (and k8s on prem). We use Traefik as our gateway (important for option #2)
Our use-case is very specific and requires us to route a user to a specific pod which does a very user-specific isolated workload. The pod serves only 1 user at a time. When the workload ends - the worker must discarded (security requirement).
We have two options: 1. Use KubebAPI directly and spin up pods on demand. Assigning a label and routing by label with custom proxy. Allowing "native" scale per user request and delete when needed with manual monitoring also via KubeAPI.
- Having a warm pool of "workers" with HPA for elasticity with custom metric for min available workers.. Managing worker's (workload pods) state in redis (ZSET for heartbeat and O(1) allocation). Each worker has a random unique ID assigned on startup. Traefik (our Gateway) can use Redis as external provider and can create HTTP routes dynamically based on worker state (worker allocated = heartbeat creates kv in redis and this triggers an HTTP route creation). This allows us to route the user to a pod by the unique ID (Traefik route to pod IP by worker ID). Monitoring is done by querying Redis.
Option #1 is simple, easy to implement and mostly to maintain (code wise) - but couples us with k8s (cannot be deployment agnostic), sounds like a total abuse of KubeAPI specifically at larger scale.
Option #2 is more complex theoretically, but it avoids using KubeAPI for application specific needs. Decouples infrastructure from application without some high privileged RBAC policies. Allowing the infrastructure to support the application based on custom metrics and load.
The question - is option #2 really over-engineering and using KubeAPI is not as bad as is sounds? (Controllers and Operators exist for a reason, but I am not sure they are used like that)
1
u/Outrageous_Leek_6765 7d ago
Honestly the KubeAPI-abuse thing isn't what I'd worry about, and your instinct to keep it out of the connection path is right, just for a more practical reason than decoupling. If you hit KubeAPI synchronously on every user request, the API server's availability becomes your request path's availability, and at any real scale you'll hit client-side throttling and watch-cache pressure long before it's "abuse" in principle. So option 2 keeping the API server out of the hot path is sound regardless of the purity argument. The thing I'd actually push on is your security requirement, since that's what's justifying this whole design. You need the worker destroyed after one use, but a warm pool is in some tension with that, because a warm worker existed before the user touched it, and in your exit-0 model the pod restarts in place and reuses the same pod object, node, and local scratch unless you're very deliberate about it. If the model needs a guaranteed-clean environment per user, restart-in-place is weaker than a fresh pod, so you'd want to force a reschedule and wipe any local volumes between uses rather than just exit-0 and recycle. On scaling, I know you dismissed KEDA but it's worth another look, specifically because it replaces the custom Prometheus-adapter-querying-Redis path you're hand-rolling. It's got a native Redis scaler and does scale-to-zero properly, which plain HPA still doesn't outside alpha. It's not really another vendor so much as a CNCF project that's become the default for exactly the Redis-driven custom-metric scaling you described, and it'd let you delete the adapter glue instead of maintaining it. Your Traefik-Redis routing can stay completely separate from how you drive replica count.
So I think option 2 is the right call for your constraints, I'd just split the two decisions inside it, lean on KEDA for the elasticity, and tighten the per-use teardown because exit-0-restart might not actually give you the isolation that's the whole point.