Master Observability with Logs: An In-Depth Guide for Beginners

Episode #23: Dive into the world of observability with logs through this beginner-friendly in-depth guide. Gain valuable insights from real-world experience.

Jun 29, 2024

a beaver building a dam by collecting logs flowing down a river — A beaver building a dam by collecting logs flowing down a river. Image generated by the author with Midjourney

This is the second article I wrote about the basics of observability for beginners.

Previously, I have discussed how it's best to abandon print statements, or whatever primitive your programming language supports for writing to standard output, and move to proper observability with logs, metrics, and traces.

To learn more about this topic, head to Observability 101: A Beginner's Journey Free of Print Statements.

I've also already introduced the concept of observability as a modern version of monitoring at Exploring Structured Logging with slog in Go.

Here is an excerpt to refresh your memory

You can think of observability like the 2020s version of monitoring, where the sheer amount of logs and metrics introduced with cloud-native distributed applications pose enormous scaling challenges on what you can collect, how much you can store, and what is just noise and can be discarded. These new challenges made us rethink the tools we use, introduce new ones or adapt the old ones.

In this article, I would like to dive deeper into one of the three pillars of Observability: Logs.

Why start with Logs?

Logs are the most primitive form of observability.

You have already implemented some basic form of logging in your very first application and never called it observability.

In this article, you will discover the basics of logging, best practices and code snippets in Golang from the real-world knowledge I have collected during my career.

This article has the following sections:

Logs and Observability
Logs vs Metrics
Logs vs Traces
Logs vs Print statements
Unstructured Logs
Structured logs
Log levels
Log context
Conclusion

What is out of scope:

Metrics and Traces
In-depth discussion about the benefits of observability
Log collection
Log analysis

Want to connect?

👉 Follow me on LinkedIn and Twitter.

If you need 1-1 mentoring sessions, please check my Mentorcruise profile.

Logs and Observability

The challenges introduced by cloud-native applications and modern software concepts like observability have changed how we write and collect logs in 2024 compared to 10 years ago.

The biggest challenge with logging is to maximise observability by increasing the log context size and the amount of logs produced while at the same time keeping the costs down.

Since the cost is directly proportional to the amount of text produced, you only want to log the relevant events that might help a developer investigate an error.

But how do you know what is relevant and what is not?

As with anything in computer science, there is a trade-off between storage cost and increasing observability.

In recent years, since storage is getting cheaper with time, most recent optimisations on logging have focused on reducing the amount of processing required to extract useful contextual information from logs more than the storage cost.

Hence, there is a shift from unstructured logs that need to be parsed and processed to machine-readable logs in JSON format ready to be stored.

Given the vast amount of logs that modern applications produce, logs are no longer something a software engineer can eyeball in real time to understand what the software is doing.

The recent trend is to use human-friendly colour-coded logs in console output for the development phase and switch to machine-readable logs in JSON format for production.

This requires that the output format might change depending on the environment the application runs in.

Changing the default log level between different environments is also common.

Those challenges require a shift from writing logs with print statements to a logging library that allows customising the behaviour according to the environment in which the application is running.

In the following sections, we will see practical implementations of the benefits of using a logging library over print statements by using code snippets in Golang.

Thank you for reading Cloud Native Engineer. This post is public so feel free to share it.

Logs vs Metrics

How do Logs compare to the other pillars/signals of observability?

Metrics are collected at regular intervals (like with polling) even if the metric value hasn't changed from the last measurement.

Logs, instead, can happen at any time (similar to event-based systems).

From an observability point of view, log events (like for example errors) might happen less frequently than metrics measurements.

Unless you sample the logs, or logs go missing by mistake, no other event happens between two log events.

With metrics instead, you have no visibility of what happened between two consecutive metric collections.

You need to set the metrics frequency high enough not to miss important information but low enough to avoid high storage costs.

Metrics are generally more compact than logs.

Metrics have a predictable cost since if you know the size of a metric and its frequency, you can easily calculate the storage requirements in a given time frame. A small caveat is that the metric storage cost can get out of hand if you have metric labels that might cause a metric cardinality explosion.

More on this topic in a future article about metrics.

Logs at the right logging level and verbosity can be a lot cheaper than metrics since they are event-based, but log frequency is a lot less predictable.

You might have spikes in log cost exactly when you need it the most, meaning when errors occur.

The total cost of logging depends a lot on the verbosity of each log entry, the configured log level, and the frequency of events at that log level.

Thank you for reading Cloud Native Engineer. This post is public so feel free to share it.

Logs vs Traces

Traces can be considered an enhanced version of logs with a specific fixed structure.

Traces are mainly used to track the duration of operations and how operations are nested within each other.

Distributed traces are an extension of such logs where you can connect a trace from an application running on a node with another application running on another. This way, you can have a more complete picture of what is involved in a user transaction in a microservice architecture.

Logs vs Print Statements

Logging has been used in history for many purposes, from a simple form of debugging to a primitive form of tracing (like getting the duration of a piece of code) to error reporting or even exposing system or application metrics.

Some common uses of logs are:

application logs, a log entry is created at every event in the application
audit logs, capture events by recording who performed an activity, what activity and how the system responded (e.g. exit code or response code)
infrastructure log, record events that happen to the infrastructure (e.g. lambda invocations on AWS cloud or a pod starting in Kubernetes)

As I already discussed in a previous article Observability 101: A Beginner's Journey Free of Print Statements, you should abandon print statements in favour of one or more of the three pillars of observability.

Observability 101: A Beginner's Journey Free of Print Statements

Giuseppe Santoro

June 9, 2024

Read full story

In the following sections, I want to discuss in a bit more detail the benefits of a logging library over just using Print Statements with some real-world code snippets in Golang.

We will use code snippets that use a standard library called Slog, which I previously covered in my article Exploring Structured Logging with slog in Go.

Exploring Structured Logging with slog in Go

Giuseppe Santoro

September 20, 2023

Read full story

Unstructured logging

Even in its simplest form of unstructured logging, using a library significantly simplifies the user experience.

The standard format for unstructured logs is as follows:

{Timestamp} {log_level} {message}

These three fields are present in various forms in every log entry.

The timestamp is automatically calculated from the library at the time of the function call, while the log level is encoded in the function call name.

In this specific library, the default log level is Info, so it can be omitted.

package main

import (
    "log"
    "log/slog"
)

func main() {
    log.Print("Info message")
    // 2024/01/03 10:24:22 Info message

    slog.Info("Info message")
    // 2024/01/03 10:24:22 INFO Info message
}

There is no standard format for the message field.

Given that each application has its format, if you want to extract useful bits of information from a Nginx web server logs, you should consult the application documentation to understand what regex to use to extract fields from plain text.

If you ingest Apache Web server logs, the regex format will be totally different.

For expert only:
Elastic, the company where I work, provides a complete Observability solution to collect, parse, and ingest unstructured logs from various famous applications thanks to integrations.

Each integration is aware of the log format of the application that is observing and can extract useful bits of information from the message field.

Unstructured logs are automatically parsed by integrations into JSON logs to be ingested into Elasticsearch.

In my two years working in the Observability department at Elastic, I wrote an integration for Istio to parse proxy logs and Istiod logs. You can find this integration at Istio.

Structured logging

Using a library instead of print statements allows the output format to be switched from unstructured to structured (in this case, JSON format) in a single place.

Logging, traditionally meant for the human eyes, has historically been unstructured.

In recent years, given the vast amount of logs generated by cloud-native applications and the need for more observability, it is slowly becoming more machine-friendly.

package main

import (
    "log/slog"
)

func main() {
    opts := &slog.HandlerOptions{
        Level: slog.LevelDebug,
    }

    handler := slog.NewJSONHandler(os.Stdout, opts)

    logger := slog.New(handler)
    
    logger.Info("Info message")
    // {"time":"2023-03-15T12:59:22.227408691+01:00",
    // "level":"INFO","msg":"Info message"}
}

Applications that historically have using unstructured logging are transitioning to support both structured and unstructured logging, with JSON logging being the default option.

For experts only:
Elastic, can collect and ingest JSON logs without any parsing necessary. No need for integrations to parse the message field.

In support of what I mentioned above about the current trend to move to structured logs, Kubernetes has "recently" (in 2020) moved its application logs from unstructured text logs to structured JSON logs. More info at Kubernetes introduce structured logging.

Logging levels

Most logging libraries support changing the default logging level in a single statement.

The logging levels in order of importance from the most important are:

Error
Warn or Warning
Info
Debug
Trace

It is common to set the default logging level in production to WARN or WARNING in order to reduce the logging cost and use a Trace or Debug logging level while in development.

If you set the default logging level to WARN, any call to lower logging levels will be handled as a NoOp and print nothing to the standard output.

package main

import (
    "log/slog"
)

func main() {
    opts := &slog.HandlerOptions{
        Level: slog.LevelDebug,
    }

    handler := slog.NewJSONHandler(os.Stdout, opts)

    logger := slog.New(handler)
    logger.Debug("Debug message")
    logger.Info("Info message")
    logger.Warn("Warning message")
    
    logger.Error("Error message")
    // {"time":"2023-03-15T12:59:22.227472149+01:00",
    // "level":"ERROR","msg":"Error message"}
}

Log context

Logging libraries can avoid code duplication when some logging context needs to be shared between multiple log events.

As described below, you can create a child logger with the common context and use that logger instead for the different events (e.g. Info and Warn events here). Both events will have fields shared from the common context.

package main

import (
    "log/slog"
)

func main() {
    handler := slog.NewJSONHandler(os.Stdout, nil)
    buildInfo, _ := debug.ReadBuildInfo()

    logger := slog.New(handler)
    child := logger.With(
        slog.Group("program_info",
            slog.Int("pid", os.Getpid()),
            slog.String("go_version", buildInfo.GoVersion),
        ),
    )

    child.Info("image upload successful", 
        slog.String("image_id", "39ud88"))
    child.Warn("storage is 90% full",
        slog.String("available_space", "900.1 mb"),
    )
}

Conclusion

Various Logging libraries can provide more features like hiding sensitive information (e.g. passwords or API keys) or adding stack traces when logging errors.

You can read more about Slog for Golang at Logging in Go with Slog: The Ultimate Guide.

I hope you enjoyed learning about logging and observability.

I might write future articles about the other pillars of Observability.