day28 基于Loki的日志收集系统-基于Loki特性的场景变现及优化 (9.8-9.9)

发布时间 2024-01-05 21:35:38作者: ikubernetesi

9.8-基于Loki的日志收集系统

一、EFK vs LPG

架构和组件

  • Loki:Loki是一个开源的水平可扩展日志聚合系统,由Promtail、Loki和Grafana组成。
  • EFK:EFK是一个集成的解决方案,由Elasticsearch、Fluentd和Kibana组成。

存储和查询:

  • Loki:Loki使用基于日志流的存储方式,将日志数据存储为可压缩的块文件,并达到高度压缩效率
  • EFK:EFK使用Elasearch作为中心化的日志存储和索引存储

可扩展性和资源消耗

  • Loki:Loki的水平扩展性非常好,可处理大规模的日志数据。
  • EFK:Elasearch是一个高度可扩展的存储系统,但它对硬件资源的要求比较高,特别是存储大规模日志数据时。

配置和复杂性:

  • Loki:Loki的配置和部署比较简单。使用Promtail收集日志,使用Grafana进行查询和可视化,可相对快速地启动和使用。
  • EFK:EFK的配置和部署相对复杂,需要配置Fluentd的输入、过滤和输出插件,以及Elasticsearch和Kibana的集群配置

二、LPG介绍

Grafana Loki:https://grafana.com/docs/loki/latest/
Github Loki:https://github.com/grafana/helm-charts/tree/main/charts/loki-stack

2.1 Loki 架构

  1. Promtail(采集器):Loki默认客户端,负责采集并上报日志。
  2. Distributor(分发器):Distributor是Loki的入口组件,负责接收来自客户端的日志数据,并将其分发给不同的ingester节点
  3. Ingester(摄取器):Ingester负责接收并保存来自Distributor的日志数据。它将数据写入本地存储,并将索引相关的数据发送给index组件
  4. index(索引):index组件负责管理和维护Loki中的所有数据结构
  5. Chunks(块文件):Chunks 是Loki中日志数据的屋里存储形式。
  6. Querier(查询器):Querier是用于查询Loki中日志数据的组件

2.2 日志收集方式

Promtail 客户端采集日志数据,将其索引并存储在后端持久化存储中。

用户可以使用LogQL查询语言过滤和检索特定的日志记录,并通过Grafana的集成来进行可视化分析。

三、部署配置

3.1 数据配置

添加Loki的Chart 仓库

# helm repo add grafana https://grafana.github.io/helm-charts
"grafana" has been added to your repositories

#helm repo update

获取loki-stack的Chart包并解压:

[root@master-1-230 9.8]# helm search repo loki
NAME                        	CHART VERSION	APP VERSION	DESCRIPTION                                       
bitnami/grafana-loki        	2.11.20      	2.9.3      	Grafana Loki is a horizontally scalable, highly...
grafana/loki                	5.41.4       	2.9.3      	Helm chart for Grafana Loki in simple, scalable...
grafana/loki-canary         	0.14.0       	2.9.1      	Helm chart for Grafana Loki Canary                
grafana/loki-distributed    	0.78.0       	2.9.2      	Helm chart for Grafana Loki in microservices mode 
grafana/loki-simple-scalable	1.8.11       	2.6.1      	Helm chart for Grafana Loki in simple, scalable...
grafana/loki-stack          	2.9.12       	v2.6.1     	Loki: like Prometheus, but for logs.              
grafana/fluent-bit          	2.6.0        	v2.1.0     	Uses fluent-bit Loki go plugin for gathering lo...
grafana/lgtm-distributed    	1.0.0        	6.59.4     	Umbrella chart for a distributed Loki, Grafana,...
grafana/promtail            	6.15.3       	2.9.2      	Promtail is an agent which ships the contents o...

helm pull grafana/loki-stack --untar --version 2.9.10

修改values.yaml

test_pod:
  enabled: true
  image: bats/bats:1.8.2
  pullPolicy: IfNotPresent
...

loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storageclass
    accessModes:
      - ReadWriteOnce
    size: 30Gi
  isDefault: true
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
...

promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      ingestion_rate_strategy: local
      ingestion_rate_mb: 15
      ingestion_burst_size_mb: 20  
...

grafana:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storageclass
    accessModes:
      - ReadWriteOnce
    size: 10Gi
# cat values.yaml |egrep -v "^$|#"
 [root@master-1-230 loki-stack]# cat values.yaml |egrep -v "^$|#"
test_pod:
  enabled: true
  image: bats/bats:1.8.2
  pullPolicy: IfNotPresent
loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storageclass
    accessModes:
      - ReadWriteOnce
    size: 30Gi
  isDefault: true
  isDefault: true
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  livenessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  datasource:
    jsonData: "{}"
    uid: ""
promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      ingestion_rate_strategy: local
      ingestion_rate_mb: 15
      ingestion_burst_size_mb: 20  
fluent-bit:
  enabled: false
grafana:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storageclass
    accessModes:
      - ReadWriteOnce
    size: 10Gi
  sidecar:
    datasources:
      label: ""
      labelValue: ""
      enabled: true
      maxLines: 1000
  image:
    tag: 8.3.5
prometheus:
  enabled: false
  isDefault: false
  url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }}
  datasource:
    jsonData: "{}"
filebeat:
  enabled: false
  filebeatConfig:
    filebeat.yml: |
      filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
      output.logstash:
        hosts: ["logstash-loki:5044"]
logstash:
  enabled: false
  image: grafana/logstash-output-loki
  imageTag: 1.0.1
  filters:
    main: |-
      filter {
        if [kubernetes] {
          mutate {
            add_field => {
              "container_name" => "%{[kubernetes][container][name]}"
              "namespace" => "%{[kubernetes][namespace]}"
              "pod" => "%{[kubernetes][pod][name]}"
            }
            replace => { "host" => "%{[kubernetes][node][name]}"}
          }
        }
        mutate {
          remove_field => ["tags"]
        }
      }
  outputs:
    main: |-
      output {
        loki {
          url => "http://loki:3100/loki/api/v1/push"
        }
      }
proxy:
  http_proxy: ""
  https_proxy: ""
  no_proxy: ""

3.2 部署验证

kubectl create ns logging

[root@master-1-230 loki-stack]# helm upgrade --install loki -n logging -f values.yaml . 
Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Fri Jan  5 20:55:30 2024
NAMESPACE: logging
STATUS: deployed
REVISION: 1
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.

See http://docs.grafana.org/features/datasources/loki/ for more detail.

 

查看验证:

[root@master-1-230 loki-stack]#  kubectl get pods -n logging |grep loki 
loki-0                           1/1     Running   0              2m46s
loki-grafana-8f5f47b97-f7ptr     2/2     Running   0              2m46s
loki-promtail-2ncv9              1/1     Running   0              2m46s
loki-promtail-8lsf9              1/1     Running   0              2m46s
loki-promtail-fjz24              1/1     Running   0              2m46s
loki-promtail-sbkfn              1/1     Running   0              2m46s

[root@master-1-230 loki-stack]#  kubectl -n logging get svc |grep loki
loki                            ClusterIP   10.97.250.198   <none>        3100/TCP                     3m5s
loki-grafana                    ClusterIP   10.111.82.14    <none>        80/TCP                       3m5s
loki-headless                   ClusterIP   None            <none>        3100/TCP                     3m5s
loki-memberlist                 ClusterIP   None            <none>        7946/TCP                     3m5s

获取grafana的密码:

[root@master-1-230 loki-stack]# kubectl get secret --namespace logging loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
tPyFvT8MHczPNrsll5YKfuCHfGREgDvMOuwxl5C2

创建ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: logging
  name: grafana-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: grafana-logging.ikubernetes.cloud
    http:
      paths:
        - pathType: Prefix
          backend:
            service:
              name: loki-grafana
              port:
                number: 80
          path: /
[root@master-1-230 loki-stack]# kubectl  apply -f grafana_ingress.yaml 
ingress.networking.k8s.io/grafana-ingress created

测试验证:

[root@master-1-230 9.8]# curl -i grafana-logging.ikubernetes.cloud
HTTP/1.1 302 Found
Date: Fri, 05 Jan 2024 13:03:43 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 29
Connection: keep-alive
Cache-Control: no-cache
Expires: -1
Location: /login
Pragma: no-cache
Set-Cookie: redirect_to=%2F; Path=/; HttpOnly; SameSite=Lax
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block

<a href="/login">Found</a>.

使用用户名admin 和上面获取的密码可登录Grafana

由于Helm Chart 已经为Grafana 配置好Loki的数据源,所有可以直接获取日志数据。

点击左侧Explore菜单,可筛选Loki的日志数据

使用Helm安装的Promtail默认已经做好配置,已经针对Kubernetes做了优化:

[root@master-1-230 9.8]# kubectl get secret loki-promtail -n logging -o json | jq -r '.data."promtail.yaml"' | base64 --decode
server:
  log_level: info
  http_listen_port: 3101
  

clients:
  - url: http://loki:3100/loki/api/v1/push

positions:
  filename: /run/promtail/positions.yaml

scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_controller_name
        regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
        action: replace
        target_label: __tmp_controller_name
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
          - __meta_kubernetes_pod_label_app
          - __tmp_controller_name
          - __meta_kubernetes_pod_name
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: app
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          - __meta_kubernetes_pod_label_release
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: instance
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_component
          - __meta_kubernetes_pod_label_component
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: component
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
        - namespace
        - app
        target_label: job
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_uid
        - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        regex: true/(.*)
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
        - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
        - __meta_kubernetes_pod_container_name
        target_label: __path__
  
  

limits_config:

四、Loki查询案例

4.1 日志选择器

对于查询表达式的标签部分,将其用大括号括起来{},然后使用键值语法选择标签。多个标签表达式用逗号分隔:

= 完全相等。
!= 不相等。
=~ 正则表达式匹配。
!~ 不进行正则表达式匹配。

# 根据任务名称来查找日志
{app="ingress-nginx"}
{job="devops/metallb"}
{namespace="default",app="podstdr2"}
{app=~"kube-state-metrics|prometheus|zookeeper"}

4.2 使用日志过滤器查找数据

编写日志流选择器后,您可以通过编写搜索表达式来进一步过滤结果

|= 行包含字符串
!= 行不包含字符串。
|~ 行匹配正则表达式。
!~ 行与正则表达式不匹配。

regex表达式接受RE2语法。默认情况下,匹配项区分大小写,并且可以将regex切换为不区分大小写的前缀(?i)。

1. 精确查找名称空间为logging下container为zookeeper且包含有INFO关键字的日志
{namespace="logging",container="zookeeper"} |= "INFO"

2. 正则查找
{job="huohua/svc-huohua-batch"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"

3. 不包含。
{job="mysql"} |= "error" != "timeout"

五、常见问题

5.1 问题1

提示找不到**/var/log/pods**目录下的日志文件,无法tail。

level=error ts=2023-07-17T03:22:11.682802445Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log: no such file or directory" filename=/var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log
level=error ts=2023-07-17T03:22:11.682823944Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log: no such file or directory" filename=/var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log

首先进入promtail容器内,到该目录下查看是否有该文件,通过cat 命令查看是否有日志。

容忍安装promtail,它将主机/var/log/pods和/var/lib/docker/containers目录,通过volumes方式挂载到promtail容器内。

如果安装docker和k8s都采用默认配置,应该不存在读取不到日志问题。

{
    "name": "docker",
    "hostPath": {
        "path": "/var/lib/docker/containers",
        "type": ""
    }
},
    {
    "name": "pods",
    "hostPath": {
        "path": "/var/log/pods",
        "type": ""
    }
}

但是我们这边真实的企业场景是将docker的数据目录挂载磁盘**/data**目录下,所以需要修改默认volumes配置。

修改 values.yaml

promtail:
  enabled: true
  extraVolumes:
    - name: docker
      hostPath:
        path: /data/docker/containers
  extraVolumeMounts:
    - name: docker
      mountPath: /data/docker/containers
      readOnly: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push

上面volumes和volumeMounts都要修改,因为 /var/log/pods 目录下的日志文件其实是个软链接,指向的是 docker/containers 目录下的日志文件。
如果只修改了volumes,那么promtail容器内可以找到日志文件,但是打开确实空的,因为它只是个软连接。

[root@node1 log]# ll /var/log/pods/monitoring_promtail-bs5cs_5bc5bc90-bac9-480d-b291-4caadeff2236/promtail/
total 4
lrwxrwxrwx 1 root root 162 Dec 17 14:04 0.log -> /data/docker/containers/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f-json.log

5.2 问题2

Loki日志系统收集日志报429错误:

level=warn ts=2023-07-17T03:42:34.456086325Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2023-07-17T03:42:35.144739805Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

收集的日志太多了,超过了 loki 的限制,所以会报 429 错误,如果你要增加限制可以修改 loki 的配置文件:

promtail:
  enabled: true
  extraVolumes:
    - name: docker
      hostPath:
        path: /data/docker/containers
  extraVolumeMounts:
    - name: docker
      mountPath: /data/docker/containers
      readOnly: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      # 将直接将日志数据发送到运行在本地的 Loki 实例
      ingestion_rate_strategy: local
      # 每个用户每秒的采样率限制
      ingestion_rate_mb: 15
      # 每个用户允许的采样突发大小
      ingestion_burst_size_mb: 20
[root@master-1-230 loki-stack]# helm upgrade --install loki -n logging -f values.yaml . 
Release "loki" has been upgraded. Happy Helming!
NAME: loki
LAST DEPLOYED: Fri Jan  5 21:14:27 2024
NAMESPACE: logging
STATUS: deployed
REVISION: 2
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.

See http://docs.grafana.org/features/datasources/loki/ for more detail.

9.9-基于Loki特性的场景变现及优化

一、背景

  • 非k8s集群节点,单独部署某个特殊应用或Job的虚拟机
  • 各个应用间的日志数据非同一个目录
  • 当前K8S环境中已经部署Loki日志管理系统

二、非K8S集群虚拟机日志收集

2.1 安装和配置Promtail

数据采集配置:/mnt/config/promtail-config.yaml

server:         #用于提供服务并接收来自其他组件或客户端的请求
  http_listen_port: 9080
  grpc_listen_port: 0

positions:              #用于配置追踪和同步日志文件读取位置的设置
  filename:  /tmp/positions.yaml        #指定位置信息存储的文件路径和文件名
  sync_period: 10s

clients:                #指定与 Loki 服务器进行通信的客户端配置
  - url: http://loki.ikubernetes.cloud/loki/api/v1/push  #推送的数据接口

scrape_configs:         #用于配置抓取日志的目标和标签的设置。
- job_name: system
  static_configs:       #指定静态目标(如主机)的配置列表
  - targets:
      - localhost
    labels:
      job: varlogs
      app: varlogs
      __path__: /var/log/*.log #需要分析的日志目录

容器化部署Promtail

docker run -d --name promtail -v /mnt/config/:/mnt/config -v /var/log/:/var/log/ grafana/promtail:latest -config.file=/mnt/config/promtail-config.yaml

[root@node-1-231 data]# docker ps
CONTAINER ID   IMAGE                     COMMAND                   CREATED          STATUS          PORTS     NAMES
90a95a1e4ff3   grafana/promtail:latest   "/usr/bin/promtail -…"   29 seconds ago   Up 23 seconds             promtail
您在 /var/spool/mail/root 中有新邮件

针对Loki开启服务暴露,用来支持数据接收

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: logging
  name: loki-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: loki.ikubernetes.cloud
    http:
      paths:
        - pathType: Prefix
          backend:
            service:
              name: loki
              port:
                number: 3100
          path: /
[root@master-1-230 9.8]# kubectl  apply -f loki-ingress.yaml 
ingress.networking.k8s.io/loki-ingress created
[root@master-1-230 9.8]# kubectl  get ingress -n logging
NAME              CLASS   HOSTS                               ADDRESS         PORTS   AGE
grafana-ingress   nginx   grafana-logging.ikubernetes.cloud   192.168.1.204   80      19m
kibana-kibana     nginx   kibana.ikubernetes.cloud            192.168.1.204   80      5d3h
loki-ingress      nginx   loki.ikubernetes.cloud              192.168.1.204   80      27s

2.2 测试验证:

[root@node-1-231 data]# curl loki.ikubernetes.cloud/loki/api/v1/label 
{"status":"success","data":["app","component","container","filename","instance","job","namespace","node_name","pod","stream"]}

虚拟机日志数据:

[root@node-1-231 data]# ls /var/log/*.log
/var/log/boot.log  /var/log/yum.log

Grafana测试验证:

{job="varlogs",filename="/var/log/mysqld.log"}

{job="varlogs",filename="/var/log/yum.log"}

 三、使用Promtail收集Java应用日志发送给Loki

场景是一个Spring Boot的Java程序,日志框架使用了Logback。这个程序使用Logback将日志以文件形式写到磁盘上。

 Logback的配置如下:

<?xml version="1.0" encoding="UTF-8"?>
<configuration debug="false" scan="false" scanPeriod="30 seconds">

    <contextName>myapp-name</contextName>
    
    <property name="logsPath" value="logs" />
    <timestamp key="currentMonth" datePattern="yyyyMM" />

    <appender name="FILE" class="ch.qos.logback.core.FileAppender">
      <file>${logsPath}/${currentMonth}/myapp.log</file>
      <encoder>
        <charset>UTF-8</charset>
        <pattern>
          <![CDATA[
          %d{yyyy-MM-dd HH:mm:ss.SSS,+00:00} [%t] ${SPRING_PROFILES_ACTIVE} %p %logger ${CONTEXT_NAME} - %m%n
          ]]>
        </pattern>
      </encoder>
    </appender>

    <logger name="com.myapp" additivity="false" level="INFO">
      <appender-ref ref="FILE" />
    </logger>

    <root level="WARN">
      <appender-ref ref="FILE" />
    </root>
</configuration>

其中`FileAppender`的`encoder.pattern`决定了日志文件中每行日志的格式。处理java应用的日志是需要关注多行日志模式的,即一条日志可能由多行文本组成。例如:

2023-07-18 20:43:20     
[pool-8-thread-1] dev ERROR com.myapp.ProjectQuery myapp-name - query error
org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
        at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:77)
        at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
        at com.sun.proxy.$Proxy108.selectOne(Unknown Source)
        at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:166)
        at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:83)
        at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)

针对这个例子,Java应用使用logback将日志按照固定格式落盘,与应用在同一服务器节点上的Promtail将跟踪日志文件,解析日志并将日志发送到Loki。 promtail.yaml配置如下:

server:
  disable: true

clients:
- url: http://loki.ikubernetesi.cloud/loki/api/v1/push
  tenant_id: org1

positions:
  filename: /app/logs/positions.yaml

target_config:
  sync_period: 10s

scrape_configs:
- job_name: java_logs
  static_configs:
  - targets:
      - localhost
    labels:
      job: java_logs
      __path__: /app/logs/*/*.log
  pipeline_stages:
  - multiline:
      firstline: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}'
      max_lines: 256
      max_wait_time: 5s
  - regex:
      expression: '^(?P<time>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) (?P<message>\[(?P<thread>.*?)\] (?P<profileName>[^\s]+) (?P<level>[^\s]+) (?P<logger>[^\s]+) (?P<contextName>[^\s]+) - [\s\S]*)'
  - labels:
      contextName:
      profileName:
      level:
  - timestamp:
      source: time
      format: '2006-01-02 15:04:05.999'
      location: "UTC"
  - drop:
      older_than: 120h
      drop_counter_reason: "line_too_old"
  - labeldrop:
    - filename
  -  output:
      source: message

server.disable=true :禁用Promtail的HTTP和gRPC服务监听
clients :块配置了Promtail如何连接到Loki的实例,配置了loki write的地址,以及使用的租户id
positions.filename :设置了Promtail读取日志文件时记录读取位置的文件
scrape_configs :中配置了一个job java_logs ,将跟踪匹配 /app/logs/*/*.log 的日志,并为其配置了 multiline , regex , labels , timestamp , drop , labeldrop ,output 等7个 pipeline stage。

  • multiline :将多行合并成一个多行块,然后将其传递到管道中的下一个阶段。
  • regex :接受一个正则表达式,并提取捕获的命名分组,以便在后续阶段中使用。
  • labels :从前面regex阶段获取到contextName, profileName, level作为日志的label标记。
  • timestamp :从提取的map中解析数据,并覆盖Loki存储的日志的最终时间值。如果没有这个阶段,Promtail将日志条目的时间戳与读取该条目的时间关联起来。
  • drop :是一个筛选阶段,可以根据多个选项丢弃日志。这里配置的是丢弃120小时之前的日志。
  • labeldrop :从发送到Loki的日志条目的标签集合中删除标签,这里我们将filename这个标签删除了。
  • output :从提取的map中获取数据并更改将发送到Loki的日志行。

四、生产环境中Loki的优化

 4.1、Loki 中保留日志时长

当日志传送到 Loki,由 Loki 来存储日志,不可能将日志永久的存储在 Loki 服务器,也需要按照实际需求做数据保留!

在 Loki 配置文件中,做如下配置:

limits_config:
  reject_old_samples: true   # 是否拒绝旧样本
  reject_old_samples_max_age: 72h   # 72小时之前的样本被拒绝

chunk_store_config:
  max_look_back_period: 72h  # 为避免查询超过保留期的数据,必须小于或等于下方的时间值
table_manager:
  retention_deletes_enabled: true   # 保留删除开启
  retention_period: 72h  # 超过72h的块数据将被删除

4.2、Grafana 中 Loki 日志显示行数

Grafana 中 Loki 日志的默认显示行数为 1000,很多博文中都说在下图中更改即可,只不过查询时间较长。

对于 Loki 配置来说,默认最大值是 5000 行,这里无法显示超过 1000 行,还需修改Grafana 的 Data Sources 中的最大行数的值为 5000 以内。
选择:Data Sources / Loki

如果查询比 5000 更大的行数,需要修改 Loki 服务的配置文件: 

limits_config:
  # 没有该配置添加即可,数值改为自己想要的最大查询行数
  max_entries_limit_per_query: 9999

最后重启服务,再次修改 Grafana 的 Data Sources 中的行数限制即可。