SpringCloud之sleuth-526互联

在大型分布式系统中，一次调用可能要经过很多不同的系统，调用很多服务。每个服务之间的调用会越来越复杂。会引入以下问题：

如何快速发现问题？
如何判断故障影响范围？
如何梳理服务依赖以及依赖的合理性？
如何分析链路性能问题以及实时容量规划？

为了快速定位问题及时解决问题，引入了分布式链路追踪。分布式链路追踪（Distributed Tracing），就是将一次分布式请求还原成调用链路，进行日志记录，性能监控并将一次分布式请求的调用情况集中展示。比如各个服务节点上的耗时、请求具体到达哪台机器上IP、每个服务节点的请求状态200 500等等。市面上链路追踪产品，大部分基于google的Dapper论文。分布式链路追踪组件有：
1、zipkin,twitter开源的。是严格按照谷歌的Dapper论文来的。
2、韩国的Naver公司的pinpoint
3、美团点评的Cat
4、淘宝的EagleEye
5、新兴的SkyWalking

链路追踪要考虑的几个问题

探针的性能消耗。尽量不影响服务本尊。
易用。开发可以很快接入，别浪费太多精力。
数据分析。要实时分析。维度足够。

Sleuth简介

Sleuth是Spring cloud的分布式跟踪解决方案。

span(跨度)，基本工作单元。一次链路调用，创建一个span，

span用一个64位id唯一标识。包括：id，描述，时间戳事件，spanId,span父id。

span被启动和停止时，记录了时间信息，初始化span叫：root span，它的span id和trace id相等。
trace(跟踪)，一组共享“root span”的span组成的树状结构称为 trace，trace也有一个64位ID，trace中所有span共享一个trace id。类似于一颗 span 树。
annotation（标签），annotation用来记录事件的存在，其中，核心annotation用来定义请求的开始和结束。
- CS(Client Send客户端发起请求)。客户端发起请求描述了span开始。
- SR(Server Received服务端接到请求)。服务端获得请求并准备处理它。SR-CS=网络延迟。
- SS（Server Send服务器端处理完成，并将结果发送给客户端）。表示服务器完成请求处理，响应客户端时。SS-SR=服务器处理请求的时间。
- CR（Client Received 客户端接受服务端信息）。span结束的标识。客户端接收到服务器的响应。CR-CS=客户端发出请求到服务器响应的总时间。

其实数据结构是一颗树，从root span 开始。

要使用Sleuth，只需在每个要监控的服务加入以下依赖：

    <dependency>
		<groupId>org.springframework.cloud</groupId>
		<artifactId>spring-cloud-starter-sleuth</artifactId>
	</dependency>

在ConsumerByRibbon，Producer加入spring-cloud-starter-sleuth依赖后，启动后访问http://localhost:8003/helloByFeign，查看日志：

2023-04-02 20:10:09.150  INFO [consumer,8543a55ad8d08b8b,8543a55ad8d08b8b,true] 266333 --- [nio-8003-exec-1] c.n.u.concurrent.ShutdownEnabledTimer    : Shutdown hook installed for:     NFLoadBalancer-PingTimer-producer
2023-04-02 20:10:09.150  INFO [consumer,8543a55ad8d08b8b,8543a55ad8d08b8b,true] 266333 --- [nio-8003-exec-1] c.netflix.loadbalancer.BaseLoadBalancer  : Client: producer instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=producer,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2023-04-02 20:10:09.166  INFO [consumer,8543a55ad8d08b8b,8543a55ad8d08b8b,true] 266333 --- [nio-8003-exec-1] c.n.l.DynamicServerListLoadBalancer      : Using serverListUpdater PollingServerListUpdater
2023-04-02 20:10:09.203  INFO [consumer,8543a55ad8d08b8b,8543a55ad8d08b8b,true] 266333 --- [nio-8003-exec-1] c.n.l.DynamicServerListLoadBalancer      : DynamicServerListLoadBalancer for client producer initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=producer,current list of Servers=[192.168.31.148:8002, 192.168.31.148:8000],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone;	Instance count:2;	Active connections count: 0;	Circuit breaker tripped count: 0;	Active connections per server: 0.0;]

[consumer,8543a55ad8d08b8b,8543a55ad8d08b8b,true]的意思是 [服务名称，traceId（一条请求调用链中唯一ID），spanID（基本的工作单元，获取数据等），是否让zipkin收集和展示此信息]

zipkin

Sleuth看日志特别麻烦，zipkin可以提供友好的界面。

原理：

sleuth收集跟踪信息通过http请求发送给zipkin server，zipkin将跟踪信息存储，以及提供RESTful API接口，zipkin ui通过调用api进行数据展示。

默认内存存储，可以用mysql，ES等存储。
zipkin地址是https://github.com/openzipkin/zipkin。
通过curl -sSL https://zipkin.io/quickstart.sh | bash -s下载，也可以通过上面的地址下载。通过java -jar zipkin.jar启动。

要使用zipkin只需在每个要监控的服务添加依赖：

  <dependency>
		<groupId>org.springframework.cloud</groupId>
		<artifactId>spring-cloud-starter-zipkin</artifactId>
	</dependency>

在application.properties配置：

spring.zipkin.base-url=http://localhost:9411/
spring.zipkin.sleuth.sampler.rate=1 #采样比例

访问http://localhost:9411/zipkin/，界面什么都没有，访问http://localhost:8003/helloByFeign后再看：

点击右边的show按钮：

可以看到调用链路和每个服务的信息。

SpringCloud

sleuth

springcloud-sleuth

节点springcloud动力sleuth

链路springcloud sleuth

springcloud sleuth

链路springcloud sleuth zipkin

springcloud链路springcloud-sleuth springcloud-zipkin

sleuth

spring-cloud-starter-sleuth

链路sleuth

spring-cloud-starter-sleuth completablefuture