Redisson/Jedis 线程数不足报错问题的思考

背景

最近公司内总出现 Redis相关的错误
!-_-! 看我最近发的博客就可以看的出来.

这个错误提示其实是 两年前 清明节进行 压测时发现的.
当时其实没有将这个问题细致分析下去. 
最近学习的比较多. 感觉可以尝试分析一下这个问题.

报错的详细信息

Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.

Fri Apr 02 11:38:27 GMT+08:00 2021
There was an unexpected error (type=Internal Server Error, status=500).
Unable to send command! Try to increase 'nettyThreads' and/or connection pool size settings Node source: NodeSource 
[slot=2958, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=0,
freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=29, freeConnectionsCounter=value:61:queue:0, freezed=false, 
freezeReason=null, client=[addr=redis://127.0.0.1:7000], nodeType=MASTER, firstFail=0]]], connection: RedisConnection@1228620800 
[redisClient=[addr=redis://127.0.0.1:7000], channel=[id: 0x6e9362f5, L:/127.0.0.1:55126 - R:127.0.0.1/127.0.0.1:7000], currentCommand=null],
command: (HMSET), params: [[99, 97, 102, 45, 115, 101, 115, 115, 105, 111, ...], [108, 97, 115, 116, 65, 99, 99, 101, 115, 115, ...], 
[-84, -19, 0, 5, 115, 114, 0, 14, 106, 97, ...]] after 3 retry attempts; nested exception is org.redisson.client.RedisTimeoutException:
Unable to send command! Try to increase 'nettyThreads' and/or connection pool size settings Node source: NodeSource [slot=2958, addr=null, 
redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:50:queue:0,
freeConnectionsAmount=29, freeConnectionsCounter=value:61:queue:0, freezed=false, freezeReason=null, client=[addr=redis://127.0.0.1:7000], 
 nodeType=MASTER, firstFail=0]]], connection: RedisConnection@1228620800 [redisClient=[addr=redis://127.0.0.1:7000], 
channel=[id: 0x6e9362f5, L:/127.0.0.1:55126 - R:127.0.0.1/127.0.0.1:7000], currentCommand=null], command: (HMSET), params: 
[[99, 97, 102, 45, 115, 101, 115, 115, 105, 111, ...], [108, 97, 115, 116, 65, 99, 99, 101, 115, 115, ...], [-84, -19, 0, 5, 115, 114, 0, 14, 106, 97, ...]] 
after 3 retry attempts

本问题的思考

这个问题的核心其实很简单:
高并发情况下jedis后者是redisson的客户端需要创建太多的连接池, 连接池数不足时就会出现异常. 导致报错.
但是这个问题也说明, 我们的redis客户端,或者是我们的使用方式存在问题,可能有太多的独占的连接, 非短促型的导致大量的线程使用. 

另外此问题可能会有如下几个衍生的问题: 
1. 如果很多应用服务器 redis 的client 会很多 , client很多之后 
   redis的epoll的 io多路复用的 路径选择性能会下降. 而且也会产生大量的tcp 连接. 对系统层也不友好. 
2. 应用服务器如果netty有限 会偶发性的出现错误. 导致客户响应不好. 
   如果满足应用服务器的要求可能会对redis节点产生大量的redis连接. 甚至会超过redis服务器的限制. 
3. 集群模式下因为应用服务器需要与所有的节点进行联系, 3主3从的节点理论上会使用六倍左右的线程数,
   应用服务器的线程数量会剧增, 线程切换增多, 性能会下降. 对每个redis节点的的压力也不好

错误问题解决

1. 理论上通过增加redisson后者是jedis的线程数来进行解决. 
2. 解决方式也表简单. 通过修改配置就可以了. 

下面会给出两个例子, 可以修改 max 值来增加threads的线程数. 

理论上可以解决这个问题 但是这样仅是缓解 和 水多了加面来处理. 
理论redis的没一个请求都是毫秒级给予结果, 理论上使用都是短促的. 
应该通过优化代码来避免长时间占用这么多连接

jedis的设置

  redis:
    cluster:
      max-redirects: 3
      nodes: 10.110.139.190:8001,10.110.139.190:8002,10.110.139.190:8003,10.110.139.190:8004,10.110.139.190:8005,10.110.139.190:8006
# 这是是集群的配置地址
    password: Testxxxxxxxx
# Redis必须要设置密码
    jedis:
      pool:
        min-idle: 1
        max-active: 100
        max-wait: -1
        max-idle: 50
# 这个配置节是jedis的. 应该可以设置jedis的连接池配置.

redisson的设置

caching-configuration:
  enableRedis: true
  redisManagers:
  - name: default
    mode: cluster
    nodes: 10.110.139.190:8001,10.110.139.190:8002,10.110.139.190:8003,10.110.139.190:8004,10.110.139.190:8005,10.110.139.190:8006
    password: Testxxxxxxxx
    idleConnectionTimeout: 100000
    pingTimeout: 100000
    connectTimeout: 100000
    timeout: 30000
    retryAttempts: 30
    retryInterval: 60000
    reconnectionTimeout: 60000
    failedAttempts: 3
    masterConnectionPoolSize: 512
    readMode: SLAVE
    scanInterval: 10000
    threads: 400
    nettyThreads: 400