Observability 101: A Beginner's Journey Free of Print Statements
Episode #22: Ready to level up your troubleshooting skills? Abandon your Print statements and embark on a long journey to master Observability.
It doesn't matter what stage of your career you are currently in.
At some point, you might have used Print statements to debug software.
While you might have thought that was fine, I suggest that that's not a good practice you should adopt in 2024.
This article should help abandon Print statements in favour of some form of Observability.
Before we go any further, what is Observability?
Observability pertains to how a system’s internal state can be understood by examining its external outputs, especially its data.
Wait a second. You might argue that you have used logs and metrics in the past when it was still called Monitoring.
Is Observability just a different name? Is it just semantics? Why do we need a different term for Monitoring?
Logs are still logs, no matter what you call the practice of using them in your applications. Right?
We will discuss all of that in this article and more.
This article's audience spans from Junior Software Developers who have only used Print statements to more experienced Senior Software Engineers who want to brush up on best practices and get pointers to up their Observability game.
In this article:
Meaning of the front page image
Print statements as a Poor Man's Observability solution
Application Observability
Short history of Observability
Conclusion
What is out of scope:
A deep dive into Logs, Metrics and Traces. I plan to cover those three pillars of Observability in future articles.
Error tracking, application profiles, crash dumps and other more advanced forms of Observability.
Libraries and tools to implement Observability.
Want to connect?
👉 Follow me on LinkedIn and Twitter.
If you need 1-1 mentoring sessions, please check my Mentorcruise profile.
Meaning of the front page image
A software engineer is sitting on the bank of the river, watching wood logs floating down the river.
Some logs get stuck on the river bank and never reach the bottom of the valley.
This is a metaphor for how not all logs are useful for getting to the root cause of an incident.
I generated the above picture with Midjourney using the prompt below.
A software engineer on his laptop is immersed in nature on the bank of a river, watching wood logs being transported by the river current down to the valley. Use a comic style.
Print statements as a Poor Man's Observability tool
Countless articles for beginners suggest using Print statements to debug software.
In the rest of this article, by Print statement, I assume that in your language of choice, there is a way to print text to the standard output. It doesn't matter what the API looks like; for me, this will always be a practice to avoid as much as possible.
I don't suggest using Print statements, even if you are starting your journey into Software Engineering now.
The reason is that, especially at the beginning, once you learn to use a tool, you might use it everywhere. The famous phrase if all you have is a hammer, everything looks like a nail
couldn't be more accurate in this case.
Instead of wasting time with print statements, they should teach some form of logging straight away.
As I have already discussed in my article Exploring Structured Logging with slog in Go, the bare minimum in logging should be using a logging library for structured logs. We will discuss logging more in the future.
Spending extra time implementing some form of Observability or even just using a debugger with breakpoints always pays dividends in the long term.
I find it even more troubling when Senior or even Principal Engineers still suggest using Print statements because it is the easiest option over more complex setups like debuggers or logs.
Observability is still hard to master.
Hopefully, with this article, we can make things clearer.
There are many ways a print statement should be replaced with better Observability options:
Tracking a specific event (e.g. an error or a warning) => Logs
Counting the number of times an event occurs (e.g. number of times a web application receives a request) => Metrics
Timing the duration of a function (e.g. how long does it take to query a database?) => Traces
Printing error Stacktraces => Error tracking software
Checking the internal state of variables while developing => Debugger with breakpoints
Application Observability
In this and future articles about Observability, we will primarily focus on application observability, even though the basic concepts are the same for other forms of Observability, like infrastructure observability.
Every Cloud-Native Engineer should be very familiar with application observability and at least have a basic understanding of infrastructure observability since he/she will probably be involved in deploying apps on the cloud.
The concept of application observability was popularised by Etsy’s John Allspaw and Paul Hammond in 2003, but 20 years later, it is still somewhat misunderstood.
I briefly discussed application observability in a previous article titled Exploring Structured Logging with slog in Go.
A quote from that article:
You can think of Observability like the 2020s version of Monitoring, where the sheer amount of logs and metrics introduced with cloud-native distributed applications pose some huge scaling challenges on what you can collect, how much you can store, and what is just noise and can be discarded. These new challenges made us rethink the tools we use, introduce new ones or adapt the old ones.
Short History of Observability
I can imagine that it all started with a Print statement, the most basic way to observe an application's internal state.
As discussed in this article, before Observability was a thing, we used to call the practice of using logs and metrics in our applications Monitoring.
So where does Observability come from, and why does it differ from Monitoring?
An in-depth discussion of the reasons Observability is different from Monitoring is outside the scope of this article. Here, I just want to mention some of those reasons that are more relevant to my audience.
Before the cloud-native movement started, there was a much clearer distinction between developers and System Administrators.
Once you were done writing the application code as a developer, you would metaphorically throw it over the fence to be deployed and monitored in production by a different set of people.
With the arrival of cloud-native applications, developers are now responsible not just for writing the code but also for deploying it to production, keeping it running via on-call rotations, and investigating issues via root cause analysis.
Things get very complicated when the application is not a process running on a single machine but a bundle of microservices running on multiple machines on the cloud. This is what is now called a cloud-native application.
It was about time that I covered observability in this newsletter.
What worked in the past with monitoring, doesn't scale anyone with tens, hundreds or even thousands of microservices talking to each other over the network.
What was before a local syscall is now a network call.
New types of applications call for different challenges and different ways of doing things. Hence, the move from monitoring to observability.
It's not just semantics. It is a different discipline, with its own practices, challenges, and tools.
The essential components, the three pillars (also called the three signals) logs, metrics and traces might be the same, but how they are implemented, interconnected, and processed in different ways.
You must keep observability in mind from the beginning when designing your applications.
Otherwise, if you are not careful, you may experience a cardinality explosion and some hefty bills at the end of the month.
Observability can't be an afterthought; otherwise, it will bite you when an incident happens, as you won't have all the information you need to investigate the root cause.
For more information about Observability, I highly recommend the book Observability Engineering.
If you are instead into a more practical approach to learning about the three pillars of observability and how to implement them into your application, have a look at the Observability whitepaper by the CNCF.
Conclusion
In this article, we just scratched the surface of what observability is and why it is necessary.
I plan to cover each pillar of observability (Logs, metrics, and Traces) in its own article so that I have enough space to suggest best practices and tools.
Furthermore, I would also like to discuss some other forms of observability that go beyond those three pillars, like debugging, application profiles, and crash dumps.
In the interest of brevity, I prefer to split the discussion into multiple articles.