There has been a lot of movement around observability (more precisely system observability) in the DevOps world, which is nothing but a control systems terminology for monitoring production systems. Here is a much-needed rant about it.
So what is Observability?
Observability is a feature that a system offers which users can use to understand how it is and it was doing its job in the finite past. In short, observability makes a system monitorable. There are two ways systems are monitored 1. Black box monitoring and 2. White Box monitoring. Observability very specifically enables white box monitoring. White box monitoring is understanding a system behavior through system internals. Another nice way to put it is if you are a stakeholder and your job is to you to monitor the system and report only the system symptoms then Black box monitoring is good, but if you are a stakeholder who has to monitor, analyse and fix the issue i.e you want to the know granular details and a possible way to get to the root cause for the symptoms you need white box monitoring. It is “what is broken” vs “Why is it broken” question. Don’t assume one is superior to the other, actually, they complement each other, we still need symptom-based alerts. A system can be considered as a patient, what is broken is like a patient symptom, which the DevOps can look out for and the DevOps might have to seek the help of a development engineer for a permanent fix. Devops can be considered as a Primary care physician and Dev engineer as a specialist or surgeon in healthcare parlance.
Yet another advantage of building a monitorable system to the desired degree is, it makes the dev engineers think about possible failure modes or “what can break” ahead of time, so we can sort of have predictable failure modes. This can be great for proactive troubleshooting.
The 3 pillars of Observability and white box monitoring:
(Image courtesy: https://peter.bourgon.org/)
Monitoring = Viz + Alerting + Analysis
- Viz – Tells you how things look but not why
- Alerting – Tells you something happened but not why
- Analysis – Tells you why but only if you know how to ask for what
Observability is an attribute of a system that enables system monitoring and often separates a great system from a good one.
References and further reads
People you might wanna follow
- Cindy Sridharan rants a lot about Observability in Medium and Twitter.