GPU监控终端部署(windows)

发布时间 2023-12-19 09:17:22作者: 牛奶打倒小怪兽
  • 目的:

使用nvidia_gpu_expoter配合prometheus+grafana监控GPU性能

  • 环境部署:

Windows PowerShell ISE 管理员打开

# [Net.ServicepointManager]::SecurityProtocol    检查TLS是否支持1.2

#如果支持下一步,不支持则输入(#[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

  • 安装:

#Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
#Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression
#iex "& {$(irm get.scoop.sh)} -RunAsAdmin"

#scoop install git

#scoop install nssm --global
#scoop bucket add nvidia_gpu_exporter https://github.com/utkuozdemir/scoop_nvidia_gpu_exporter.git
#scoop install nvidia_gpu_exporter/nvidia_gpu_exporter --global
#New-NetFirewallRule -DisplayName "Nvidia GPU Exporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9835
#nssm install nvidia_gpu_exporter "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current\nvidia_gpu_exporter.exe"
#Start-Service nvidia_gpu_exporter

  • 成果检测

打开网页localhost:9835/metrics