CodeMash 2024: Alerts Don’t Suck, YOUR Alerts Suck

If there’s anything you should know about Leon Adato, he’s just got this energy to him that can easily light up a room. We’ve been following each other in the community for years. And we’ve finally been able to meet in-person – granted, why did we have to travel away from home to meet when we’re not horribly far from each other?! 😁(That’s how it works for many of my local speaker people.)

I was especially looking forward to this talk because it’s one I could have used in my past roles and it’s a topic I can talk with my husband about, since we have various alerts set up for our home. Yes, I know my alert game is weak – and I hope Leon’s talk gives tips and tricks that will help us with better alerts.

What’s an Alert?

Not monitoring! Seriously – people get alerts and monitoring confused. Let’s talk about them:

  • Monitoring is gathering data from monitored systems.
  • Alerting is notifying that an action needs to take place because a condition was met.

Email Inbox Rules are Jerks and CPU Alerts may not be needed

Inbox rules are a symptom of bad alerting. Alerts should be defined specifically to get your attention and get you take action – rather than setting up a rule on autopilot to autosort and forget. Remember – alerts are actionable. They shouldn’t lead to alert fatigue.

CPU alerts are also triggering. Leon had some choice words about CPU alerts. A high CPU may be fine for an expected process that spikes periodically – such as processing payroll. It may not necessarily need to be an alert as just high CPU. When creating alerts around CPU usage, be specific with your alerts – when is high CPU not supposed to be there and should cause attention and review?

Monitoring vs Observability

Since these topics are tangentially related to alerts, Leon clarified what he means when he talks about monitoring vs what he talks about with regards to observability.

With monitoring:

  • All cardinalities are welcome – unique or common events.
  • There are known unknowns. We have an understanding of “We know we don’t know.”
  • Correlations are manually made.
  • There are domain-specific signals.

With observability:

  • Events are incredibly unique – high cardinality.
  • This gets into the area of unknown unknowns – “We don’t know what we don’t know”.
  • Correlation is baked into the observability process.
  • There are golden signals – these are four signals that generally indicate bad things. These include latency, errors, traffic, and saturation.

What Makes a Good Alert?

Remember that the goal of the alerts is to catch your attention in order for you to take an action. Alerts should be actionable. Otherwise – YAGNI!

Conclusion

I’m glad this talk was at the end of the day. Leon’s energy in this talk was what I needed to get reenergized for gatherings at night. He also is good at conveying the important part of alerting – why you should care and why you should get them right. If you missed this talk and need some guidance on alerts, be sure to check out Leon’s post on Best practices for fixing your alerts.

By sadukie

One thought on “CodeMash 2024: Alerts Don’t Suck, YOUR Alerts Suck”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.