Canonical Observability Stack

Highly-integrated, low-operations observability stack powered by Juju and Microk8s.

The Canonical Observability Stack gathers, processes, visualizes and alerts on telemetry signals generated by workloads running both within, and outside of, Juju.

By leveraging the topology model of Juju to contextualize the data, and charm relations to automate configuration and integration, it provides a low-ops observability suite based on best-in-class, open-source observability tools.

For Site-Reliability Engineers, Canonical Observability Stack provides a turn-key, out-of-the-box solution for improved day 2 operational insight.

COS Lite

COS Lite, currently in BETA, is designed for the Edge and is capable of running reliably with limited computing resources (around 4 GB of overall memory, including MicroK8s and the Juju controller, limited CPU power).

COS Components

The charmed operators that make up COS are available as the pre-configured COS Lite bundle. COS Lite is made up of the following Juju charmed operators:

Additionally, there are charmed operators designed to work with COS to provide additional functionality:

  • The Prometheus Scrape Config charm allows you to tweak the scrape jobs resulting from relating a charmed operator exposing the /metrics endpoint with the Prometheus charmed operator.

  • The Prometheus Scrape Target charmed operator allows you to represent in a Juju model /metrics endpoints provided by software not managed by Juju, e.g., LXD or MaaS, so that the Prometheus charmed operator can scrape metrics from them.

  • The COS Proxy charmed operator is a machine charm designed to “translate” the relations supported by the previous iteration, LMA, to COS.

  • The COS Configuration provides a GitOps approach to manage Prometheus alerts, Loki alerts and Grafana dashboards that are specific to your Juju deployments, rather than any particular charm.

  • The Karma charmed operator runs for you on Kubernetes the Karma UI, which enables you to visualize alerts from various Alertmanager clusters, e.g., when you were to deploy many different COS on Edge computing on in different production environments, and wanted to keep a centralized overview.

  • The Grafana Agent charmed operator provides a way to fetch and forward telemetry and dashboards from remote workloads and send them over to the observability stack.

Design goals

There are several design goals we want to accomplish with COS:

  • Provide a set of high-quality observability charmed operators that are designed to work well on their own, and better together.

  • Make COS run on Kubernetes, with specific focus on MicroK8s, to achieve a very “appliance-like” user experience.

  • Ensure a consistent, cohesive experience: all alerts go through Alertmanager, Grafana can plot all telemetry, etc.

  • Provide a highly-integrated observability stack with the simplest possible deployment experience.

  • Take the toil out of setting up monitoring of your Juju workloads: monitoring your Juju applications should be as simple as establishing a couple of relations with the COS charms.

  • Showcase the declarative power of the Juju model: for example, if some can be modelled as relation, rather that a configuration, it should be. Also, relations must be semantically meaningful: by looking at juju status --relations, you should intuitively understand what comes out of two charms relating with one another.

COS HA

COS HA (for “high-availability”) is to be worked on in late 2022 / early 2023, with the goal of monitoring large sites, with a high-availability setup through redundancy, a careful design of the architecture, and potentially a different set of charms involved (for example, likely swapping out Prometheus for Cortex).

Why a new stack?

At Canonical, we have been referring to LMA as a system of machine charms currently in use to monitor Canonical and customer systems.

COS draws a lot of learning from years of operational experience with LMA, but it is also different enough that we felt we needed to make a distinction from the previous iteration.

Further reading

The “What is observability?” page provides an overview of the relation between observability and monitoring. An overview of the Canonical offerings related to observability and monitoring can be found on the Observability page. COS will join those pages when it goes GA :slight_smile:

We have been keeping a sort of “development blog” as a series of blog posts about model-driven observability. Keep in mind that not everything that COS can do is already showcased in one of those blog entries, and we will provide proper documentation for COS both here and on the pages of the charms on Charmhub.io.

Project and community

The Canonical Observability Stack is a member of the Ubuntu family. It’s an open source project that warmly welcomes community projects, contributions, suggestions, fixes and constructive feedback.

Thinking about using the Canonical Observability Stack for your next project? Get in touch!


Last updated a minute ago.