Loki not alerting Alertmanager

发布时间 2023-08-12 17:13:54作者: Oops!#
4

I got it working atlast .

Below is my ruler config

ruler:
  storage:
    type: local
    local:
      directory: /etc/loki/rulestorage
  rule_path: /etc/loki/rules
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
  enable_alertmanager_v2: true

Created below directories

  • /etc/loki/rulestorage/fake
  • /etc/loki/rules/fake
  • Copied alert_rules.yaml under /etc/loki/rulestorage/fake
  • Gave full permission for loki user under /etc/loki/rulestorage/fake

 

1

The config looks good, similar as mine. I would troubleshoot it with following steps:

  1. Exec to docker container and check if the rules file is not empty cat /etc/loki/rules/rules.yaml

  2. Check the logs of loki. When rules are loaded properly logs like this will pop up:

level=info ts=2021-05-06T11:18:33.355446729Z caller=module_service.go:58 msg=initialising module=ruler
level=info ts=2021-05-06T11:18:33.355538059Z caller=ruler.go:400 msg="ruler up and running"
level=info ts=2021-05-06T11:18:33.356584674Z caller=mapper.go:139 msg="updating rule file" file=/data/loki/loki-stack-alerting-rules.yaml
  1. During runtime loki also logs info messages about your rule (I will show you the one I am running, but slightly shortened)(notice status=200 and non-empty bytes=...):
level=info 
ts=... 
caller=metrics.go:83 
org_id=... 
traceID=... 
latency=fast 
query="sum(rate({component=\"kube-apiserver\"} |~ \"stderr F E.*failed calling webhook \\\"webhook.openpolicyagent.org\\\". an error on the server.*has prevented the request from succeeding\"[1m])) > 1" 
query_type=metric 
range_type=instant 
length=0s 
step=0s 
duration=9.028961ms 
status=200 
throughput=40MB 
total_bytes=365kB
  1. Then make sure you can access alertmanager http://171.11.3.160:9093 from loki container without any issues (there can be a networking problem or you have set up basic authentication, etc.).

  2. If the rule you set up (which you can test from grafana explore window) will exceed the threshold you set for 1 minute the alert should show up in alertmanager. It will be most likely ungrouped as you didn't add any labels to it.