r/Network • u/JimmyDry2 • 16d ago

Text alert fatique is making our monitoring system less trustworthy

our monitoring setup technically ctaches issues but the amount of noisy alert has become oberwhelming. minor spikes, temporary disconnects and dublicated notifications constantly trigger incidents that nobody reacts to anymore. the worst part is that real problems now get buried under all the noise. we spent months tuning thresholds and dependencies but every adjustment seems to create another edge case somewhere else. looking for ways to simplify alerting logic while still keeping proper visibility into infrastructure health.

3 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/Network/comments/1tggqjd/alert_fatique_is_making_our_monitoring_system/
No, go back! Yes, take me to Reddit

72% Upvoted

u/Alfred20367 16d ago

we faced very similar problem and eventually realized the issue was too much customization layered over time. Prtg helped because the default sensor logic and dependency handling were much cleaner than what we had built manually before. once we standardized monitoring templates and alert thresholds false positives dropped significantly and operators started trusting alerts again. having all monitoring in one interface also made root cause analysis much faster during incidents.

u/chickibumbum_byomde 16d ago

This happens when monitoring becomes “useful signal” to “constant activity.” once people stop trusting alerts...forget about it, it becomes redunant, even good monitoring becomes ineffective because real issues get ignored with the noise.

the fix usually isn’t more complicated logic, t’s fewer, less is more, more meaningful alerts. many eventually realize they should alert on actual impact and sustained problems, not every short spike or transient event. If the answer is unclear, or the issue usually resolves itself, it probably shouldn’t page anyone.

most healthy monitoring setups are actually quieter than people expect. They still collect lots of data, but they alert on far less of it.

u/magion 16d ago

fix your alarm definitions. you provide no insight into what tooling or things you use to surface alerts.

u/MemeLordAscendant 16d ago

Try this one out www.hypernetworkmonitor.com

u/XxTh3g04txX 15d ago

Solarwinds NPM for almost 20 years. 4000 nodes. Its supported one of the larger hospitals in NorCal.

very easy to write, tune, and mute alerts via filters.

u/wyohman Network/Design Professional 15d ago

Auvik ships with solid alerting with little noise

u/Dmelvin 15d ago

It honestly sounds like you're at the point where redesigning your alerting makes sense.

I'd go to the drawing board, and determine what sensor logic you want, draw it out in a flowchart, check for things that may conflict or cause false positives, then tear all of your existing logic out and replace it.

u/NPMGuru 14d ago

the duplicate notification problem is usually a correlation issue. One root cause firing 10 alerts. static thresholds make it worse because every transient spike looks like an incident.

I recommend continuous synthetic monitoring with baseline-aware alerting. way fewer false positives. You can use Obkio for that.

u/spuyet 10d ago

What are you using ? We're using Fivenines since 3 months now, it works great

Text alert fatique is making our monitoring system less trustworthy

You are about to leave Redlib