loki-k8s

Loki

  • Canonical Observability
Channel Revision Published Runs on
latest/stable 190 15 Apr 2025
Ubuntu 20.04
latest/candidate 192 15 Apr 2025
Ubuntu 20.04
latest/beta 194 15 Apr 2025
Ubuntu 20.04
latest/edge 195 08 May 2025
Ubuntu 24.04 Ubuntu 20.04
latest/edge 193 03 Apr 2025
Ubuntu 24.04 Ubuntu 20.04
1.0/stable 104 12 Dec 2023
Ubuntu 20.04
1.0/candidate 104 22 Nov 2023
Ubuntu 20.04
1.0/beta 104 22 Nov 2023
Ubuntu 20.04
1.0/edge 104 22 Nov 2023
Ubuntu 20.04
juju deploy loki-k8s
Show information

Platform:

After relating loki to other charms, you may encounter situations where log lines appear to be missing.

Checklist

  • Source of log files is related to loki.
  • Loki url (from grafana-agent or promtail config files) is reachable from the source container.
  • Loki is not out of disk space.
  • You can manually post a log line to loki from within the loki pod (via localhost), from the host (via pod IP), from the traefik container (via k8s fqdn) and from another model (via ingress url).

Check status

You can curl the loki unit IP for status. In the sample output below, the ingester isn’t ready yet.

❯ curl 10.1.166.94:3100/ready
Ingester not ready: waiting for 15s after being ready

❯ curl 10.1.166.94:3100/services
querier => Running
query-frontend-tripperware => Running
ring => Running
query-scheduler => Running
query-frontend => Running
ingester-querier => Running
compactor => Running
ruler => Running
ingester => Running
distributor => Running
server => Running
memberlist-kv => Running
analytics => Running
store => Running
cache-generation-loader => Running
query-scheduler-ring => Running

Confirm if Loki received anything at all

You can curl the loki unit IP for labels and alerts.

❯ curl 10.1.166.94:3100/loki/api/v1/labels
{"status":"success"}

❯ curl 10.1.166.94:3100/loki/api/v1/labels
{"status":"success","data":["filename","job","juju_application","juju_charm","juju_model","juju_model_uuid","juju_unit"]}

❯ curl 10.1.166.94:3100/loki/api/v1/label/juju_unit/values
{"status":"success","data":["pg/0"]}

❯ curl 10.1.166.94:3100/loki/api/v1/rules
no rule groups found

Now that you know which labels exist, you can retrieve some logs:

❯ curl -sG 10.1.166.94:3100/loki/api/v1/query_range --data-urlencode 'query={juju_unit="pg/0"}' | jq '.data.result[0]'

You can query for the average logging rate. In the sample below, it is 0.1 log lines per second (6 log lines per minute).

❯ curl -sG 10.1.166.94:3100/loki/api/v1/query --data-urlencode 'query=rate({job=~".+"}[10m])' | jq '.data.result'
[
  {
    "metric": {
      "filename": "/var/log/postgresql/patroni.log",
      "job": "juju_test-bundle-iwfn_f427ffe2_pg",
      "juju_application": "pg",
      "juju_charm": "postgresql-k8s",
      "juju_model": "test-bundle-iwfn",
      "juju_model_uuid": "f427ffe2-9d96-482c-80c4-f200a20eb1bd",
      "juju_unit": "pg/0"
    },
    "value": [
      1715247333.466,
      "0.1"
    ]
  }
]

Query for particular log lines

If only a subset of logs is missing, you can confirm their existence in Loki by filtering labels and/or content. In the sample below loki is queried for log lines that contain “leader”.

❯ curl -sG 10.1.166.94:3100/loki/api/v1/query --data-urlencode 'query=({job=~".+"} |= "leader")' | jq '.data.result'
[
  {
    "stream": {
      "filename": "/var/log/postgresql/patroni.log",
      "job": "juju_test-bundle-iwfn_f427ffe2_pg",
      "juju_application": "pg",
      "juju_charm": "postgresql-k8s",
      "juju_model": "test-bundle-iwfn",
      "juju_model_uuid": "f427ffe2-9d96-482c-80c4-f200a20eb1bd",
      "juju_unit": "pg/0"
    },
    "values": [
      [
        "1715258886211320804",
        "2024-05-09 12:48:06 UTC [15]: INFO: no action. I am (pg-0), the leader with the lock "
      ],
      [
        "1715258876184953745",
        "2024-05-09 12:47:56 UTC [15]: INFO: no action. I am (pg-0), the leader with the lock "
      ],
      [
        "1715258866412113833",
        "2024-05-09 12:47:46 UTC [15]: INFO: no action. I am (pg-0), the leader with the lock "
      ]
    ]
  }
]

List active loggers

To obtain a list of all sources that logged something recently,

❯ curl -sG 10.1.166.94:3100/loki/api/v1/query --data-urlencode 'query=count_over_time({filename=~".+"}[1m]) > 2' | jq '.data.result'
[
  {
    "metric": {
      "filename": "/var/log/postgresql/patroni.log",
      "job": "juju_test-bundle-iwfn_f427ffe2_pg",
      "juju_application": "pg",
      "juju_charm": "postgresql-k8s",
      "juju_model": "test-bundle-iwfn",
      "juju_model_uuid": "f427ffe2-9d96-482c-80c4-f200a20eb1bd",
      "juju_unit": "pg/0"
    },
    "value": [
      1715249068.007,
      "6"
    ]
  }
]

Logs pushed by grafana-agent or promtail

Confirm that logs are being sent out:

# grafana-agent
juju ssh grafana-agent/0 curl localhost:12345/metrics | grep "promtail_sent_"

# promtail
juju ssh mysql-router/0 curl localhost:9080/metrics | grep -E "promtail_read_|promtail_sent_"

If the values are zero (or constant for quite some time), make sure the monitored log files exist and are not empty.

Confirm Loki is reachable

A typical loki deployment may look like this:

app --- grafana-agent --- loki --- traefik

Important: If logs are pushed to loki via a cross-model relation, make sure loki has an ingress relation!

Note: In the code samples below, the loki/0 IP address is assumed to be 10.1.27.247.

First, confirm loki itself is ready:

~> curl 10.1.27.247:3100/ready
Ingester not ready: waiting for 15s after being ready

...

~> curl 10.1.27.247:3100/ready
ready

Confirm loki is in the traefik config

~> juju ssh --container traefik trfk/0 grep "url:" -B4 /opt/traefik/juju/juju_ingress_ingress-per-unit_36_loki.yaml
  services:
    juju-cos-loki-0-service:
      loadBalancer:
        servers:
        - url: http://loki-0.loki-endpoints.cos.svc.cluster.local:3100

and reachable from within traefik:

~> juju ssh --container traefik trfk/0 curl http://loki-0.loki-endpoints.cos.svc.cluster.local:3100/ready
ready

and reachable from the host via traefik:

~> juju run trfk/0 show-proxied-endpoints
Running operation 1 with 1 task
  - task 2 on unit-trfk-0

Waiting for task 2...
proxied-endpoints: '{"trfk": {"url": "http://10.167.177.193"}, "loki/0": {"url": "http://10.167.177.193/cos-loki-0"}}'

~> curl http://10.167.177.193/cos-loki-0/ready
ready

and reachable from within grafana-agent:

~> juju switch lxd
microk8s:admin/cos -> lxd:admin/welcome-lxd

~> juju ssh ga/6 curl http://10.167.177.193/cos-loki-0/ready
ready

context deadline exceeded when attempting to POST to loki

Sometimes POST requests to loki fail with context deadline exceeded:

caller=client.go:419 level=warn component=logs
  logs_config=log_file_scraper component=client host=10.84.208.194
  msg="error sending batch, will retry" status=-1 tenant= 
  error="Post \"http://10.84.208.194/cos-loki-0/loki/api/v1/push\": context deadline exceeded"

First, try to manually POST a short and simple log line to loki (ref):

~> curl -H "Content-Type: application/json" \
  -s -X POST "http://10.167.177.193/selcem-loki-0/loki/api/v1/push" \
  --data-raw "{\"streams\": [{ \"stream\": { \"foo\": \"bar2\" }, \"values\": [ [ \"$(date +%s%9N)\", \"fizzbuzz\" ] ] }]}"

If the above POST request succeeds, then perhaps the payload that fails is too large.

Inspect the ingester timeout config and check if increasing it helps (remember to restart the pebble service after manually modifying the config):

~> curl -s http://10.167.177.193/selcem-loki-0/config \
  | yq '.ingester_client.remote_timeout'
5s

References