r/devops 8d ago

Discussion Rego – yes or not? Are you Rego hater?

2 Upvotes

I have a small CLI tool for linting OTel Collector configuration, written in Go and Rego (Rego handles the validation rules)

Lately I've been noticing some real Rego haters out there. Given how popular Kyverno has become, I'm starting to think OPA — and Rego along with it — might gradually fade out.

Are these concerns reasonable, or am I overthinking it? Should I refactor the tool and rip out Rego?


r/devops 9d ago

Discussion If you were just starting devops How would you start differently than you did before?

33 Upvotes

I'm just getting into Devops. What shall I start with and is getting a job Guranteed? What makes difference between good and bad Devops. What should be avoided and what should be done to land a Job. I see people getting job Ready within six months. Im sorry if Im asking too many questions Im at my late 20's and confuse about career paths with People talking about AI is everything I know it is but still Devops seems good to me before diving into AI. what would you suggest?


r/devops 9d ago

Discussion Trying to create a more collaborative environment, but everything feels urgent and important now

29 Upvotes

Improving collaboration between dev, infra and product was at the top of our list for this quarter. But somehow it is turning our slack threads into an incidents. One minute, a PM drops a quick question about a release timeline. The next minute, someone flags a deployment risk, infra asks for Terraform context and suddenly everyone is in a thread with no clear owner.

Real incidents now compete with normal delivery work for attention. How are you separating actual urgency from cross-functional collaboration without slowing everyone down?


r/devops 9d ago

Discussion How to organize dynamic domains for project?

5 Upvotes

Hello everyone. Perhaps I can find help here with writing this system. I'd be very grateful for your help. Context:

I'm building a website where users can connect their own domain, which makes the site accessible (one of my frontend deployments). The question arises when users connect their domain. Initially, it seemed very simple:

CNAME | @ | proxy.mydomain.com

But the problem is that CNAME doesn't support apex domains. So, if I want a user to connect a domain other than a subdomain, I need to provide them a clear IP address of my server so they can create an A record. I don't want to provide a clear IP for two reasons: security, and the fact that I want to do domain connection flow via Domain Connect Protocol, which uses templates that undergo verification. If the IP address changes in the future, I'll need to change the template. One option is to migrate the deployment to something like Vercel (so i could provide their ip), which costs money, or through Cloudflare for SaaS (which allows to make CNAME for apex domains) , which also costs money. I'd like to hear people's opinions; maybe I'm missing something.


r/devops 9d ago

Security Cloud HSM Migration Basics

5 Upvotes

We’re 6-person healthtech SaaS, mostly devs, no real security hire yet. We’ve used cloud secrets and basic KMS so far, but now hospital networks are all asking about Cloud HSM migration and Cryptographic key lifecycle managment. Key gen, custody, rotation, RBAC, audit trails, break-glass etc. Every. Single. Time.

So I want to know: when is managed HSM enough, and when do we call real specialists? Feels fine in MVP, then suddenly auditors rip it apart. Anyone been thru this mess?


r/devops 8d ago

Vendor / market research AI "Solve Rates" are a joke. We need a Safe-to-Merge metric.

0 Upvotes

AI coding tools love bragging about high "Solve Rates." But fixing a bug while silently breaking three other things isn't a success—it's a production incident.

Current benchmarks only check if the one targeted test passed. They completely ignore second-order regressions.

We're prototyping an open standard called Safe-to-Merge Rate (STMR). An agent's PR only qualifies if:

  1. The targeted bug fix passes.
  2. 100% of the existing test suite still passes (zero regressions).
  3. Linters and type-checkers throw zero new errors.
  4. The full CI/CD pipeline builds successfully end-to-end.

Brutal feedback wanted: Is this a metric the industry actually needs, or is it just SWE-bench with extra steps? How will agents try to game it?


r/devops 9d ago

Career / learning Wait time for firewall inclusions is slowing me down. What am I doing wrong?

36 Upvotes

I'm in the process of laying down an infrastructure & CI/CD pipeline in our company (all of our deployments were manual until I got fed up with manual work and pitched CI/CD) for the rollout of a new version of a legacy app.

On multiple occasions I'm deep in a flow state, then I see "Connection refused" and realize I have to open up a ticket, then physically visit 2-3 offices on multiple occasions to get it approved within the next hour (cause then I may have to wait a day or two).

I could be asking for all the ports at once. But later down the road I always go like "Oh yeah the VM also needs to access gitlab, not just my PC" or "Oh yeah port 5050 needs to be released as well for the container registry on gitlab". Maybe theres a certain methodology I'm missing, id like to hear peoples thoughts.

P.S: I'm a junior DevOps (i.e. literally hired as a full stack and ended up doing DevOps) so everything im doing ATM (CI/CD, quadlets, ansible, automated E2E etc) is done either for the first time ever working with the tool, or working with the tool in a production setting.


r/devops 8d ago

Vendor / market research How are you actually correlating a failed synthetic check to the trace and infra behind it?

0 Upvotes

Affiliation disclaimer first: I build a synthetic monitoring tool, so I have a horse in this race. Not linking it here, this is genuinely a "how do you all handle this" question because I keep going back and forth on whether the thing that bugs me bugs anyone else.

Bit of background on me: I've been a front end web perf nerd for years, the old O'Reilly Velocity / now Performance.now() crowd, and I've now worked on synthetic monitoring/RUM three times (NCC Group/Eggplant/Keysight, then Elastic, now my own thing). The actual monitoring has hardly changed in all that time. Check goes red, you get paged. That bit's solved.

What I can't get comfortable with is the tradeoff after the red today. If you're all in on Datadog or Dynatrace you actually get the halfway decent version of this. Failed check, click into the trace, click into the infra, all one pane. That genuinely works (for a price), fair play to them. But you only get it because you've bought the whole suite and your synthetic data lives inside their walls.

Go OTel-native instead, pull your traces and metrics onto your own stack like a lot of teams have (not everyone, Datadog's clearly still doing fine), and you seem to lose that. Your synthetic results end up stuck off in whatever standalone tool made them, away from the traces and infra that explain the failure. So checkout breaks and it's a red dot in one tool, then tab over to your traces squinting at which one matches by timestamp, then go poke at the infra separately. Three tools, doing the correlation somewhere (Slack, causal RCA, DIY dashboard, google doc etc). I don't really see why you should have to give up one to get the other.

Same thing that makes the agentic RCA stuff underwhelm imo. Hand it a green dot and a latency number and that's a data point, not context. It wants the enriched, already-joined-up version to be any use, and the standalone synthetic data tools mostly don't emit.

So, genuinely asking the people who run this stuff:

  • If you're on an OTel stack rather than an all-in-one suite, how are you correlating a failed check back to the trace today? Manual timestamp matching, traceparent propagation, or honestly just not?
  • Anyone cracked the full failure -> trace -> infra walk WITHOUT being all-in on Datadog/Dynatrace? Curious what the setup looks like.
  • Or is this a non-problem, you're happy in the big suites, and I've talked myself into something nobody else feels?

No wrong answers, I'm trying to sanity-check my own assumptions here.


r/devops 9d ago

Weekly Self Promotion Thread

25 Upvotes

Hey r/devops, welcome to our weekly self-promotion thread!

Feel free to use this thread to promote any projects, ideas, or any repos you're wanting to share. Please keep in mind that we ask you to stay friendly, civil, and adhere to the subreddit rules!


r/devops 9d ago

Career / learning Is what I am doing DevOps or at least inline with it?

0 Upvotes

Hi fairly new to this subreddit, I am currently in a phase where I am thinking of changing my career path and leaning towards becoming a DevOps Engineer seems promising to me.

I started out as a typical web developer, I do things relating to frontend and backend. I am pretty confident on my skills on either frontend or backend, but here's the thing. Outside of the typical development life cycle I also do things in terms of deployment.

I do server setups, making sure that everything runs smoothly. I also config DNS records for the applications that I develop. I study what server architecture the applications should be deployed in, how it should scale, and fix things when everything goes south. Fixing things involve changing configurations in the server, debugging connectivity issues, resolving dependency issues for the developers on my team.

I cannot confidently say that I do DevOps since we do not have an automated CI/CD pipeline, just a clear "what to do list" whenever a new release needs to be pushed to production.

I have read several articles and watched some videos online and I do think what I do is related to DevOps. Its basically like this If I am not around, my team cannot push anything to production.


r/devops 10d ago

Career / learning How freshers going to survive this AI apocalypse? It's brutal

34 Upvotes

Market is brutal and it's getting worse every day.

New job openings are shrinking and all the freshers are competing for that one role.

Linkedin, indeed even reddit I can see the desperation for a job.

If it persists for 2 or atleast a year, new grads will come and it get worse twice. It gets more worse as time goes.

So what I'm thinking is, instead of trying one specific role. Just try every entry level ones to get landed on something before time flies.

But here is the problem, we need to get into something which is less impacted by AI.

Some people saying devOps is less likely impacted by AI, some say it's SOC.

To avoid this confusion I'm asking it here.

You guys are working and you know it well.

So kindly list those roles, it would be helpful for freshers like us.

Thank you


r/devops 10d ago

Career / learning How should I start learning DevOps as an absolute beginner in 2026? Is it still worth it?

111 Upvotes

I’m an absolute beginner interested in learning DevOps in 2026, but the amount of things to learn feels overwhelming. I keep seeing roadmaps with Linux, networking, Docker, Kubernetes, cloud, CI/CD, Terraform, scripting, monitoring, and more, and I honestly don’t know what I should focus on first. I wanted to ask people already in the field if DevOps is still worth learning in 2026, what the best roadmap would be for someone starting completely from zero, and what skills or projects actually help beginners stand out for internships or junior roles. I don’t want to spend months just watching tutorials without building real-world understanding, so I’d really appreciate advice on what you would personally learn first if you had to start over today.


r/devops 10d ago

Architecture Can you share your CI/CD pipeline approach?

61 Upvotes

Hi gus, can you share what tools are you using for your CI/CD pipeline? What are the modern best practises you guys follow.

I have been working in Product based company, our tools are nowhere else used except in our org.

Any of you are using Jenkins + Argo + K8S?


r/devops 9d ago

Career / learning LLM / Chat recommendation / preferences ?

0 Upvotes

I may have missed this topic if this was once a conversation on here, so apologies, but curious what you all are using to help with troubleshooting something you've been stuck on for ages, or something you dont use and now youre troubleshooting. I hop around between Free version of chatgpt and google AI on the google search engine. Was hoping to see if there was a better version of recommended version which you all use?

I dont use it daily, but looking for something that can help pinpoint issues when I can't see it immediately, at least on the infrastructure side of things, and not coding side.


r/devops 9d ago

Career / learning Learning devops in India from Vikas(clouddevopshub)

0 Upvotes

I am into IT with 6 yrs as infra support and now want to switch to devops. I have zero coding knowledge. And planning to take classes from Mr.Vikas(clouddevopshub). Please let me know if anybody has taken classes from him. What are your views on his classes? I find thise youtube videos boring so don't give vidoes suggestions. I need interactive sessions. Please let me know if anyother tutor is best. Also consider job assistance.


r/devops 9d ago

Discussion [ Removed by Reddit ]

1 Upvotes

[ Removed by Reddit on account of violating the content policy. ]


r/devops 9d ago

Discussion How are you securing the DBs when product teams deploy LLM agents?

0 Upvotes

Product teams are starting to ship autonomous agents that have access to our internal APIs and databases. From an infra/security perspective, how are you standardizing the access control for this? Traditional OPA/Cedar policies work for static parameters, but they don't understand the semantic intent of an agent (which makes them vulnerable to prompt injection attacks where the agent technically does a "read", but it's fetching data it shouldn't). Are you forcing product teams to use a specific middleware pattern to inject user context, or just relying heavily on Postgres RLS and hoping they don't leak cross-tenant data?


r/devops 9d ago

Discussion If your incident response strategy relies entirely on "everyone jumping into a loud Slack thread," you don’t have a strategy

0 Upvotes

Seeing the trending post about everything feeling urgent and important hits incredibly close to home.

It is so easy for teams to mask chaotic, unorganized communication channels as "cross-functional collaboration." A PM asks a question, an engineer raises an edge-case infrastructure risk, someone else links a Terraform trace, and suddenly a single channel has 45 people tag-pinging each other with absolutely zero clear ownership.

Slack and Discord are fantastic tools for real-time synchronization, but they are the absolute worst places for actual state tracking.

If a risk doesn't have an automated severity tier attached to it and a dedicated, logged ticket or paging alert with a single designated owner, it shouldn't be allowed to derail your team's current sprint. How did you guys successfully train your product and infrastructure teams to stop treating public chat rooms as a high-priority paging system?


r/devops 10d ago

Career / learning is kodekloud standard enough ?

3 Upvotes

i wanna buy kodekloud standard tier annual plan i already bought CKA CKAD CKS exam from linux foundation i dont know if standard tier will be enough to prepare for these certificates so any one who from kodekloud or bought this can tell please , thanks


r/devops 9d ago

Discussion Is it possible to point finger based on error logs

0 Upvotes

Tasked with creating a pipeline, where if the build process fails, i need to make it create a compilation failure via detailed output error message. But the problem is, the manager wants me to somehow make the pipeline find which commit caused the problem, and idk if thats possible. Hell, maybe i misunderstood it or something.

My idea is a broad one. Once pipeline gets an error, ill have it written into a txt file, publish as an artifact, and somehow alert the authors of all commit done within the last 24hr(this pipeline supposed to be scheduled at 6pm daily,checking previous 24hr).

Really appreciate any advice, as my playbook is still thin (just starting out).


r/devops 11d ago

Discussion Everyone in my company is discovering that Agentic Workflow is just CICD workflows

596 Upvotes

With all the buzz, people are just building the same CICD workflows.


r/devops 10d ago

Discussion Projeto Impactante de DevOps para Portfólio

0 Upvotes

Fala pessoal,

Já sou DevOps há 1 ano e meio, mas entrei inicialmente por estágio, então hoje praticamente não tenho nenhum projeto para portfolio.

Mesmo conseguindo apresentar bons números e entregas no trabalho, às vezes me sinto meio inseguro por ainda ter pouco tempo de mercado e não ter projetos públicos para mostrar.

Queria saber do pessoal mais experiente: que tipo de projeto vocês esperariam encontrar no portfolio de um DevOps junior/pleno para ele se destacar em entrevistas?

Queria algo que realmente me desafiasse.


r/devops 10d ago

Career / learning Interview Advice

2 Upvotes

I have recently started looking for newer opportunities and was wondering how is the interview format these days. I have cleared 1st round in 2-3 companies and my next rounds are scheduled in the next week.

I have been told that the next rounds would be Coding rounds and Technical discussions (50-50). My area of expertise are Platform Development, Cloud, Kubernetes, CI/CD with 7 YOE.

I’m looking to understand what topics should I cover. What should I expect from the live coding rounds?


r/devops 11d ago

Career / learning What are the best, most practical Coursera courses to learn AWS, Terraform, K8s, and Prometheus?

33 Upvotes

Hey everyone,

I want to transition into DevOps and I’ve decided to use Coursera to learn the following stack: AWS, Docker, Terraform, Kubernetes, and Prometheus + Grafana.

My goal is to acquire high-density, hands-on skills as fast as possible. I want to avoid massive, overly theoretical courses that repeat basic concepts (like explaining Git or "what is the cloud") over and over.

If you had to build a custom learning path using Coursera, which specific courses or specializations would you recommend combining to cover this entire stack efficiently?

I’m currently looking at options from IBM, Packt, and KodeKloud on the platform, but I don't want to limit myself. What combination actually gives you the best terminal/labs practice to get job-ready?

Thanks in advance for the recommendations!


r/devops 10d ago

Discussion I want to switch to DevOps posts

0 Upvotes

I see these at least one a day nowadays. Not sure if real people or bots. Can we consolidate into a megathread or something so we don’t have to keep repeating the same advice over and over?