Logging in ML Systems

Logging in ML Systems

Introduction

Logging is the process of tracking and recording key events that occur in our applications. We want to log events so we can use them to inspect processes, fix issues, etc. They’re a whole lot more powerful than print statements because they allow us to send specific pieces of information to specific locations, not to mention custom formatting, shared interface with other Python packages, etc. This makes logging a key proponent in being able to surface insightful information from the internal processes of our application.

Components

There are a few overarching concepts to be aware of first before we can create and use our loggers.

  • Logger: the main object that emits the log messages from our application.

  • Handler: used for sending log records to a specific location and specifications for that location (name, size, etc.).

  • Formatter: used for the style and layout of the log records.

There is so much more to logging such as filters, exception logging, etc. but these are the basics someone will need to get started.

Levels

A log level or log severity is a piece of information telling how important a given log message is. It is a simple, yet very powerful way of distinguishing log events from each other.

Let us define log levels by using basic configurations:

import logging 
import sys
#Create super basic logger
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
#logging levels(from lowest to highest priority)
logging.debug("Used for debugging your code.")
logging.info("Informative messages from your code.")
logging.warning("Everything works but there is something to be aware of.")
logging.error("There's been a mistake with the process.")
logging.critical(There is something terribly wrong and process may terminate.")

These are the basic levels of logging, where DEBUG is the lowest priority and CRITICAL is the highest. We defined our logger using basicConfig to emit log messages to our stdout console (we also could've written to any other stream or even a file) and to be sensitive to log messages starting from level DEBUG. This means that all of our logged messages will be displayed since DEBUG is the lowest level. Had we made the level, then only ERROR and CRITICAL log message would be displayed.

Configuration

It is a good practice to configure where our logs will be written to. It is good to create a separate logs the directory which is part of the .gitignore file so it won't be written to git.

Below is an example of a configuration of loggers:

#Logger
logging_config = {
      "version": 1,
      "disable_existing_loggers": False,
      "formatters": {
          "minimal": {"format": "%(message)s"}'
          "detailed":{
              "format": "%(levelname)s %(asctime)s [%(filename)s:%(funcName)s:%(lineno)d]\n%(message)s\n"
          },
      },
      "handlers": {
          "console": {
              "class": "logging.StreamHandler",
              "stream": sys.stdout,
              "formatter": "minimal",
              "level": logging.DEBUG,
      },
      "info": {
          "class": "logging.handlers.RotatingFileHandler",
          "filename": Path(LOGS_DIR, "info.log"),
          "maxBytes": 10485760, # 1 MB
          "backupCount": 10,
          "formatter": "detailed",
          "level": logging.INFO,
      },
      "error": {
          "class": "logging.handlers.RotatingFileHandler",
          "filename": Path(LOGS_DIR, "error.log"),
          "maxBytes": 10485760, # 1 MB
          "backupCount": 10,
          "formatter": "detailed",
          "level": logging.ERROR,
      },
   },
   "loggers": {
       "root": {
           "handlers": ["console", "info", "error"],
           "level": logging.INFO,
           "propagate": True,
      },
   },
}
  1. First part: define two different Formatters (determine format and style of log messages), minimal and detailed, which use various LogRecord attributes to create a formatting template for log messages.

  2. Second part: define the different Handlers (details about the location of where to send log messages):

    • console: sends log messages (using the minimal formatter) to the stdout stream for messages above the level DEBUG.

    • info: send log messages (using the detailed formatter) to logs/info.log (a file that can be up to 1 MB and we'll back up the last 10 versions of it) for messages above the level INFO.

    • error: send log messages (using the detailed formatter) to logs/error.log (a file that can be up to 1 MB and we'll back up the last 10 versions of it) for messages above the level ERROR.

  3. Third part: attach our different handlers to our Logger.

The above example uses a dictionary configuration for our logger but there are other ways too such as coding directly in scripts, using a config file, etc.

Logs are an essential part of troubleshooting application and infrastructure performance. When our code is executed on a production environment in a remote machine, let’s say Google cloud, we can’t really go there and start debugging stuff. Instead, in such remote environments, we use logs to have a clear image of what’s going on*. Logs are not only to capture the state of our program but also to discover possible exceptions and errors.*

Conclusion

In this tutorial, we have discussed why is it important to track key events that happen when developing applications or modelling and how to do that. In the next tutorial, we will be able to demonstrate using an example how to log and track ML systems.

Connect with me on Twitter & LinkedIn 🤗