There has been a lot of movement around observability (more precisely system observability) in the DevOps world of late, which is nothing but a control system terminology for monitoring production systems. Here is a much-needed rant about it.
So what is Observability?
Observability is an attribute of a system that lets the users understand how “it was doing” its job in the finite past. In short, observability makes a system monitorable. There are two ways to monitor systems 1. Black-box monitoring and 2. White-box monitoring. Observability enables white-box monitoring. White-box monitoring is about understanding a system behavior through system internals. So if you are a stakeholder and your job is to monitor the system and report only the system symptoms then Black box monitoring is good enough, but if you have to monitor, analyze and fix the issues i.e you want to the know granular details and a possible way to get to the root cause you need white-box monitoring. It is “What is broken” vs “Why is it broken” question. Don’t assume one is superior to the other, actually, they complement each other, we still need symptom-based alerts. A system can be considered as a patient, what is broken is like a symptom, which the DevOps can look out for and the DevOps might have to seek the help of a development engineer for a permanent fix. DevOps can be considered as a Primary care physician and Dev engineer as a specialist or surgeon in healthcare parlance.
The advantage of designing a monitorable system to the desired degree is, it makes the dev engineers think about possible failure modes or “what can break” ahead of time, so we can sort of have predictable failure modes. This can be great for proactive troubleshooting.
The 3 pillars of Observability and white box monitoring:
(Image courtesy: https://peter.bourgon.org/)
Monitoring = Viz + Alerting + Analysis
- Viz – tells you how things look but not why
- Alerting – Tells you something happened but not why
- Analysis – Tells you why but only if you know how to ask for what
Observability is an attribute of a system that enables system monitoring and often separates a great system from a good one.
References and further reads
People you might wanna follow
- Cindy Sridharan rants a lot about Observability in Medium and Twitter.