1、性能分析工具介绍

pprof 作为 golang 内置的性能分析工具，能够采集程序代码片段的运行数据分析(runtime/pprof)、采集 HTTP Server的运行时数据(net/http/pprof)进行性能分析。
根据输入的命令不同，它可以分析包括程序CPU占用情况、内存mem占用情况、阻塞分析、互斥锁分析。
- CPU Profile：CPU分析，按照固定的频率去抽样采集程序运行时CPU的情况，以确定程序占用CPU最长时间的位置；
- Mem Profile：内存分析，在应用程序进行堆内存分配时记录堆栈使用情况，同时可以检查程序代码内存泄漏；
- Block Profile：阻塞分析，记录goroutine 阻塞等待同步的位置；
- Mutex Profile：互斥锁分析。
可以以多种方式展示结果，包括文件、交互式终端显示、Web界面。
结合图形化工具 graphviz 可以以图形更加直观的查看性能瓶颈。

安装路径：https://blog.csdn.net/qq_37085158/article/details/126421102

2、短代码片段如何进行性能分析

2.1 代码片段中插入性能分析函数

package main

import (
	"log"
	"math/rand"
	"os"
	"runtime/pprof"
	"time"
)

const (
	row = 10000
	col = 10000
)

func fillMatrix(m *[row][col]int) {
	s := rand.New(rand.NewSource(time.Now().UnixNano()))
	for i := 0; i < row; i++ {
		for j := 0; j < col; j++ {
			m[i][j] = s.Intn(100000)
		}
	}
}

func calculate(m *[row][col]int) {
	for i := 0; i < row; i++ {
		tmp := 0
		for j := 0; j < col; j++ {
			tmp += m[i][j]
		}
	}
}

func main() {
	//1.监控CPU性能
	f, err := os.Create("cpu.prof")
	if err != nil {
		log.Fatal("could not create CPU profile:", err)
	}
	if err := pprof.StartCPUProfile(f); err != nil { //开始监控
		log.Fatal("could not start CPU profile:", err)
	}
	defer pprof.StopCPUProfile()
	//主逻辑处理
	x := [row][col]int{}
	fillMatrix(&x)
	calculate(&x)

	//2.监控内存
	f1, err := os.Create("mem.prof")
	if err != nil {
		log.Fatal("could not create memory profile:", err)
	}
	if err := pprof.WriteHeapProfile(f1); err != nil {
		log.Fatal("could not write memory profile:", err)
	}
	f1.Close()

	//3.监控goroutine
	f2, err := os.Create("goroutine.prof")
	if err != nil {
		log.Fatal("could not create goroutine profile:", err)
	}
	if gProf := pprof.Lookup("goroutine"); gProf == nil {
		log.Fatal("could not write goroutine profile:", err)
	} else {
		gProf.WriteTo(f2, 0)
	}
	f2.Close()
}

程序主要实现了初始化一个二维数组以及计算二维数组每一行的数据之和。通过导入 runtime/pprof 并在逻辑代码执行前后执行以下函数来达到监控的目的，结果会以文件的方式输出：

pprof.StartCPUProfile(output_file_path)：开启CPU性能监控；
pprof.StopCPUProfile()：关闭CPU性能监控；
pprof.WriteHeapProfile(output_file_path)：开启内存占用监控；
gProf = pprof.Lookup("goroutine")：监控协程运行情况；

2.2 代码运行开启性能监控

#编译代码
go build prof.go
#运行代码，生成各性能分析文件
./prof
#pprof分析指定文件   go tool pprof 可执行文件名 待分析文件名
go tool pprof prof cpu.prof
go tool pprof prof mem.prof
go tool pprof prof goroutine.prof

生成图形化界面： svg

以浏览器方式打开就可以很方便地查看程序执行过程中哪个部分耗时最长，方便进行详细的优化操作。

可以通过 pprof 可视化界面分析运行情况：go tool -http=:8080 pprof cpu.prof

3、服务器端运行程序如何进行性能分析

3.1 服务代码

package main

import (
	"fmt"
	"log"
	"net/http"
	_ "net/http/pprof"   //这个包必须引入才能进行性能分析
)

//获取 fibonacci 数列
func GetFibonacciSerie(n int) []int {
	ret := make([]int, 2, n)
	ret[0] = 1
	ret[1] = 1
	for i := 2; i < n; i++ {
		ret = append(ret, ret[i-2]+ret[i-1])
	}
	return ret
}

func Index(w http.ResponseWriter, r *http.Request) {
	w.Write([]byte("Welcome!"))
}

func CreateFib(w http.ResponseWriter, r *http.Request) {
	var fbs []int
	for i := 0; i < 1000000; i++ {
		fbs = GetFibonacciSerie(50)
	}
	w.Write([]byte(fmt.Sprintf("%v", fbs)))
}

func main() {
	http.HandleFunc("/", Index)
	http.HandleFunc("/fb", CreateFib)
	log.Fatal(http.ListenAndServe(":8081", nil))
}

3.2 性能分析

代码实现了访问 localhost:8081/fb 获取100000循环计算斐波拉契数列前50项的值，运行 web项目，访问 http://localhost:8081/debug/pprof 获取性能数据：

但实际上使用更多的是命令行交互模式进行性能分析。

（1） go tool pprof http://localhost:8081/debug/pprof/profile?seconds=60，它会按照固定60s的时间间隔采样分析CPU性能：

flat：给定函数上运行耗时
flat%：同上的 CPU 运行耗时总比例
sum%：给定函数累积使用 CPU 总比例
cum：当前函数加上它之上的调用运行总耗时
cum%：同上的 CPU 运行耗时总比例

（2）go tool pprof http://localhost:8081/debug/pprof/heap ，分析内存占用情况：

（3）go tool pprof http://localhost:8081/debug/pprof/block ，分析阻塞情况

（4）go tool pprof http://localhost:8081/debug/pprof/mutex ，分析互斥锁情况

4、性能调优

4.1 待调优代码

// struct.go
package testProfileUpdate

type Request struct {
	TransactionID string `json:"transaction_id"`
	PayLoad       []int  `json:"pay_load"`
}

type Response struct {
	TransactionID string `json:"transaction_id"`
	Expression    string `json:"expression"`
}



// optimization.go
package testProfileUpdate

import (
	"encoding/json"
	"strconv"
)

//构造请求
func createRequest() string {
	payload := make([]int, 100, 100)
	for i := 0; i < 100; i++ {
		payload[i] = i
	}
	req := Request{"demo_transaction", payload}
	v, err := json.Marshal(&req)
	if err != nil {
		panic(err)
	}
	return string(v)
}

//处理请求,优化重点
func processRequest(reqs []string) []string {
	reps := []string{}
	for _, req := range reqs {
		reqObj := &Request{}
		json.Unmarshal([]byte(req), reqObj)
		ret := ""
		for _, e := range reqObj.PayLoad {
			ret += strconv.Itoa(e) + ","
		}
		repObj := &Response{reqObj.TransactionID, ret}
		repJson, err := json.Marshal(repObj)
		if err != nil {
			panic(err)
		}
		reps = append(reps, string(repJson))
	}
	return reps
}

// optimization_test.go
package testProfileUpdate

import "testing"

func TestCreateRequest(t *testing.T) {
	str := createRequest()
	t.Log(str)
}

func TestProcessRequest(t *testing.T) {
	reqs := []string{}
	reqs = append(reqs, createRequest())
	reps := processRequest(reqs)
	t.Log(reps[0])
}

func BenchmarkProcessResultOld(b *testing.B) {
	reqs := []string{}
	reqs = append(reqs, createRequest())
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = processRequest(reqs)
	}
	b.StopTimer()
}

4.2 原代码性能分析

（1）运行 go test -bench=. -cpuprofile=cpu.prof 获取原代码CPU性能文件：

（2）执行 go tool pprof cpu.prof 分析性能：

可以发现 processRequest 函数占用大多数时间，接下来详细查看 processRequest 函数

（3）list processRequest 详细查看函数执行性能：

4.3 性能调优

（1）利用 easyjson 库替换 go 语言原生 json库文件：

在导入 easyjson 库后，先根据当前结构体文件 struct.go 生成复合 easyjson 格式的 json 文件；

（2）生成新的请求处理函数：

//处理请求,第一次优化后
func processRequestNew(reqs []string) []string {
	reps := []string{}
	for _, req := range reqs {
		reqObj := &Request{}
		reqObj.UnmarshalJSON([]byte(req))   //采用 easyjson
		ret := ""
		for _, e := range reqObj.PayLoad {
			ret += strconv.Itoa(e) + ","
		}
		repObj := &Response{reqObj.TransactionID, ret}
		repJson, err := repObj.MarshalJSON()   //采用 easyjson
		if err != nil {
			panic(err)
		}
		reps = append(reps, string(repJson))
	}
	return reps
}

（3）优化字符串拼接操作，利用 stringBuilder：

func processRequestNew(reqs []string) []string {
	reps := []string{}
	for _, req := range reqs {
		reqObj := &Request{}
		reqObj.UnmarshalJSON([]byte(req))
		var buf strings.Builder
		for _, e := range reqObj.PayLoad {    //这里优化了字符串拼接
			buf.WriteString(strconv.Itoa(e))
			buf.WriteString(",")
		}
		repObj := &Response{
			reqObj.TransactionID, buf.String(),
		}
		repJson, err := repObj.MarshalJSON()
		if err != nil {
			panic(err)
		}
		reps = append(reps, string(repJson))
	}
	return reps
}

（4）优化前后性能对比：

5、性能调优——锁

golang 中常见的锁包括互斥锁 sync.Mutex、读写锁 sync.RWMutex、以及 sync.WaitGroup

5.1 读情况多，读锁有没有影响?

这里测试下在读数据多情况下，测试加读锁和不加锁是否有影响：

package testReadLock

import (
	"fmt"
	"sync"
	"testing"
)

var cache map[string]string

const (
	NUM_OF_READER = 10000
)

func init() {
	cache = make(map[string]string)
	cache["a"] = "aa"
	cache["b"] = "bb"
}

//无锁
func lockFreeAccess() {
	var wg sync.WaitGroup
	wg.Add(NUM_OF_READER)
	for i := 0; i < NUM_OF_READER; i++ {
		go func() {
			for j := 0; j < NUM_OF_READER; j++ {
				_, err := cache["a"]
				if !err {
					fmt.Println("Nothing")
				}
			}
			wg.Done()
		}()
	}
	wg.Wait()
}

//加读锁
func lockAccess() {
	var wg sync.WaitGroup
	wg.Add(NUM_OF_READER)
	lock := new(sync.RWMutex)
	for i := 0; i < NUM_OF_READER; i++ {
		go func() {
			for j := 0; j < NUM_OF_READER; j++ {
				lock.RLock()
				_, err := cache["a"]
				if !err {
					fmt.Println("Nothing")
				}
				lock.RUnlock()
			}
			wg.Done()
		}()
	}
	wg.Wait()
}

func BenchmarkLockFree(b *testing.B) {
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		lockFreeAccess()
	}
	b.StopTimer()
}

func BenchmarkLock(b *testing.B) {
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		lockAccess()
	}
	b.StopTimer()
}

可以看出，读锁也是有影响的，而且影响几个数量级。

5.2 map不安全，各种情况下怎么选取安全的map

常用于golang 的协程安全的 map 有以下几种：

LockMap：map 与 sync.Mutex 的组合；
RwLockMap：map 与 sync.RWMutex 的组合；
sync.Map
concurrent-map

package testConcurrentMap

import (
	cmap "github.com/orcaman/concurrent-map"
	"strconv"
	"sync"
	"testing"
)

var NumOfReader int = 10000   //模拟读次数
var NumOfWriter int = 10      //模拟写次数

type LockMap struct {
	m map[string]interface{}
	sync.Mutex
}

type RwLockMap struct {
	m map[string]interface{}
	sync.RWMutex
}

type SyncMap struct {
	sync.Map
}

type ConcurrentMap struct {
	m cmap.ConcurrentMap
}

type Map interface {
	Set(key string, val interface{})
	Get(key string) (interface{}, bool)
	Del(key string)
}

//实现LockMap
func (lm *LockMap) Set(key string, val interface{}) {
	lm.Lock()
	lm.m[key] = val
	lm.Unlock()
}

func (lm *LockMap) Get(key string) (interface{}, bool) {
	lm.Lock()
	res, ok := lm.m[key]
	lm.Unlock()
	return res, ok
}

func (lm *LockMap) Del(key string) {
	lm.Lock()
	delete(lm.m, key)
	lm.Unlock()
}

//实现RwLockMap
func (rlm *RwLockMap) Set(key string, val interface{}) {
	rlm.Lock()
	rlm.m[key] = val
	rlm.Unlock()
}

func (rlm *RwLockMap) Get(key string) (interface{}, bool) {
	rlm.RLock()
	res, ok := rlm.m[key]
	rlm.RUnlock()
	return res, ok
}

func (rlm *RwLockMap) Del(key string) {
	rlm.Lock()
	delete(rlm.m, key)
	rlm.Unlock()
}

//实现SyncMap
func (sm *SyncMap) Set(key string, val interface{}) {
	sm.Store(key, val)
}

func (sm *SyncMap) Get(key string) (interface{}, bool) {
	return sm.Load(key)
}

func (sm *SyncMap) Del(key string) {
	sm.Delete(key)
}

//实现ConcurrentMap
func (cm *ConcurrentMap) Set(key string, val interface{}) {
	cm.m.Set(key, val)
}

func (cm *ConcurrentMap) Get(key string) (interface{}, bool) {
	return cm.m.Get(key)
}

func (cm *ConcurrentMap) Del(key string) {
	cm.m.Remove(key)
}

//创建LockMap
func CreateLockMap() Map {
	return &LockMap{
		m: make(map[string]interface{}),
	}
}

//创建RwLockMap
func CreateRwLockMap() Map {
	return &RwLockMap{
		m: make(map[string]interface{}),
	}
}

//创建SyncMap
func CreateSyncMap() Map {
	return &SyncMap{}
}

//创建ConcurrentMap
func CreateConcurrentMap() Map {
	return &ConcurrentMap{
		m: cmap.New(),
	}
}

func TestSyncMap(t *testing.T) {
	m := CreateSyncMap()
	m.Set("a", 1)
	m.Set("b", 2)
	t.Log(m.Get("a"))
}

func TestConcurrentMap(t *testing.T) {
	m := CreateConcurrentMap()
	m.Set("a", 1)
	m.Set("b", 2)
	t.Log(m.Get("a"))
}

func benchmarkMap(b *testing.B, hm Map) {
	b.ResetTimer()
	defer b.StopTimer()
	for i := 0; i < b.N; i++ {
		var wg sync.WaitGroup
		for j := 0; j < NumOfWriter; j++ {
			wg.Add(1)
			go func() {
				defer wg.Done()
				for i := 0; i < 100; i++ {
					hm.Set(strconv.Itoa(i), i*i)
					hm.Del(strconv.Itoa(i))
				}
			}()
		}
		for j := 0; j < NumOfReader; j++ {
			wg.Add(1)
			go func() {
				defer wg.Done()
				for i := 0; i < 100; i++ {
					hm.Get(strconv.Itoa(i))
				}
			}()
		}
		wg.Wait()
	}
}

func BenchmarkSyncmap(b *testing.B) {
	b.Run("map with Lock", func(b *testing.B) {
		hm := CreateLockMap()
		benchmarkMap(b, hm)
	})
	b.Run("map with RWLock", func(b *testing.B) {
		hm := CreateRwLockMap()
		benchmarkMap(b, hm)
	})
	b.Run("sync_map", func(b *testing.B) {
		hm := CreateSyncMap()
		benchmarkMap(b, hm)
	})
	b.Run("concurrent map", func(b *testing.B) {
		hm := CreateConcurrentMap()
		benchmarkMap(b, hm)
	})
}

（1）读多写少情况：

可以发现读多写少情况下，sync.Map 和 concurrent_map 性能好一些，但是 sync.Map 性能更好。

（2）读少写多情况：

可以发现读少写多情况下，concurrent_map 性能最好。

（3）读写差不多的情况：

可以发现读写差不多情况下，仍然是 concurrent_map 性能最好。

（4）简要分析下各种协程安全的 Map的区别：

sync.Map 利用空间换时间的方法来减少锁冲突，一块是 Read区域，一块是 Dirty区域，修改区域只在 Dirty区域进行。在读取操作时，首先去 Read区域查找，如果找不到再去 Dirty 区域查找，在 Dirty区域中查找是需要加锁的。Read 区域和 Dirty 区域分别存储指针指向实际的数据 data，这样避免了数据重复存储。由于修改时会比直接加锁还多一次读操作，因此适合于读操作远多于写操作的情况。

concurrentMap 将一个大的 Map划分为多个小 Map，这样每次锁住一个小Map不会影响其它Map的读写操作。