r/softwarearchitecture 7d ago

Discussion/Advice Warm Pool vs KubeAPI

We have a debate at our workplace;

We're in the process of a big refactor of a monolithic project into micro services which will be deployed with k8s on EKS (and k8s on prem). We use Traefik as our gateway (important for option #2)

Our use-case is very specific and requires us to route a user to a specific pod which does a very user-specific isolated workload. The pod serves only 1 user at a time. When the workload ends - the worker must discarded (security requirement).

We have two options: 1. Use KubebAPI directly and spin up pods on demand. Assigning a label and routing by label with custom proxy. Allowing "native" scale per user request and delete when needed with manual monitoring also via KubeAPI.

  1. Having a warm pool of "workers" with HPA for elasticity with custom metric for min available workers.. Managing worker's (workload pods) state in redis (ZSET for heartbeat and O(1) allocation). Each worker has a random unique ID assigned on startup. Traefik (our Gateway) can use Redis as external provider and can create HTTP routes dynamically based on worker state (worker allocated = heartbeat creates kv in redis and this triggers an HTTP route creation). This allows us to route the user to a pod by the unique ID (Traefik route to pod IP by worker ID). Monitoring is done by querying Redis.

Option #1 is simple, easy to implement and mostly to maintain (code wise) - but couples us with k8s (cannot be deployment agnostic), sounds like a total abuse of KubeAPI specifically at larger scale.

Option #2 is more complex theoretically, but it avoids using KubeAPI for application specific needs. Decouples infrastructure from application without some high privileged RBAC policies. Allowing the infrastructure to support the application based on custom metrics and load.

The question - is option #2 really over-engineering and using KubeAPI is not as bad as is sounds? (Controllers and Operators exist for a reason, but I am not sure they are used like that)

5 Upvotes

20 comments sorted by

View all comments

Show parent comments

1

u/musty_mage 7d ago

Then replace or refactor the internal lib?

1

u/doublecore20 7d ago

I wish it was that simple

1

u/musty_mage 7d ago

How is it harder than refactoring the whole thing to microservices?

Now don't get me wrong, most in-house scalability / HA implementations are utter shit written by people who clearly thought waaayyy too highly of themselves. So switching to the one platform that actually works is probably a good idea. But if you can't solve the scalability issue in a monolith, what makes you think you have the skills to solve it in a distributed system, which is way harder?

K8s is getting VPA real soon now. A well constructed monolith will always be faster on the same resources than a bunch of microservices.

1

u/doublecore20 7d ago edited 7d ago

This is exactly the case. A lib which was written almost a decade ago tried to do k8s before it was cool (I guess?) . It does orchestration, internal service to service calls, and remote service calls - all in one process. You are basically at mercy of your CPU and RAM and you can't scale vertically infinitely. Also , it is very coupled to the host so you cannot untangle this mess even if you wanted to.

The solution is to break it down, ditch this cluster-fuck lib and do this properly. Let each service be a single unit and only one feature, which is mission critical, is currently in debate.

Regarding the skill question, well with over a decade of experience I tend to believe I know what I am doing. Also my team consists of a very intelligent people that take this thing very seriously.

1

u/musty_mage 7d ago edited 7d ago

Yeah you need to get rid of that library. You could of course ease the scaling problem by running on NUMA nodes, but that's just a stopgap solution. And fundamentally having that kind of functionality inside the JVM (somehow I'm assuming this is Java :) is just the wrong layer to do it.

Good thing is that because that library does the internal RPC, your monolith has already been decoupled. At least to some extent.

As for your original question, I would maybe learn something from your current situation and not try to over-engineer in-house when there are well-established best practices on how to do HPA with traefik.