环境安装
tvm环境
查看conda版本为 4.12.0,对应2022.05的版本
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
tvm 官方conda安装教程
https://tvm.apache.org/docs/install/from_source.html
git clone --recursive https://github.com/apache/tvm tvm
sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
- 下载llvm预编译版本15.0.6, https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.6/clang+llvm-15.0.6-x86_64-linux-gnu-ubuntu-18.04.tar.xz
- ninja编译,在nerf环境下
- pybind11 not found, conda 安装
tvm tutorial
- tvmc Package "onnx" is not installed
tvmc
tvm命令行工具
使用方法
tvm --help
常用compile、run、tune
获取模型
tvmc支持Keras,ONNX,TensorFlow,TFLite,Torch,使用--model-format指定具体使用的模型。
wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx
编译ONNX模型
使用tvmc compile编译,输出一个tar包,包含了目标平台的动态库,可以使用TVM runtime在目标设备上运行。
# This may take several minutes depending on your machine
tvmc compile \
--target "llvm" \
--input-shapes "data:[1,3,224,224]" \
--output resnet50-v2-7-tvm.tar \
resnet50-v2-7.onnx
生成tar文件
mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
ls model
包括3个文件
- mod.so是模型,用c++库表示,可以被TVM runtime加载
- mod.json是文本文件,描述了TVM Relay计算图
- mod.params包含了与训练模型的参数
运行TVMC编译好的模型
使用tvmc运行需要准备两件事
- 编译好的模块,即上一部分的内容
- 对模型有效的输入
每种模型需要指定的张量形状,格式,数据类型。由于这个原因,大多数模型需要一些预处理和后处理过程,保证输入的有效性,并转换为指定的输出。tvmc采用NumPy的.npz
格式来处理输入和输出数据
预处理
ResNet-50 v2模型的输入是ImageNet格式
preprocess.py
#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")
# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))
# Normalize according to ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)
# Save to .npz (outputs imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)
运行编译好的模块
手中有了模型和输入数据,现在可以使用tvmc进行预测
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar
执行完以上指令之后,tvmc输出一个新的文件,predictions.npz,这个文件包含了NumPy格式的输出张量。
输出数据的后处理
把ResNet-50 v2的输出渲染为人类更可读的格式。
#!python ./postprocess.py
import os.path
import numpy as np
from scipy.special import softmax
from tvm.contrib.download import download_testdata
# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")
with open(labels_path, "r") as f:
labels = [l.rstrip() for l in f]
output_file = "predictions.npz"
# Open the output and read the output tensor
if os.path.exists(output_file):
with np.load(output_file) as data:
scores = softmax(data["output_0"])
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
运行这个脚本将得到以下输出:
python postprocess.py
# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
自动调优ResNet Model
之前的模型通过TVM runtime进行编译,但不包括任何平台相关的优化,在这一章节中,我们将展示如何使用tvmc给目标平台建立优化的模型。
有时我们得不到期待的性能,使用自动调优能够发现更好的模型配置,得到性能的提升。使用tune子命令进行操作,将结果存在调优记录中,并输出。
最简单的形式,调优需要提供一下三点
- 指定运行的目标平台
- 调优输出文件的路径
- 需要被调优的模型的路径
# The default search algorithm requires xgboost, see below for further
# details on tuning search algorithms
pip install xgboost
tvmc tune \
--target "llvm" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx
tvmc将尝试对模型使用不同的参数,尝试不同的算子配置然后选择在平台上运行最快的,搜索过程可能需要几个小时来完成,搜索结果的输出将被保存在resnet50-v2-7-autotuner_records.json
中,最终将被用于编译优化后的模型。
默认情况下搜索使用XGBoost Grid算法进行指导
输出的结果类似于
tvmc tune \
--target "llvm -mcpu=broadwell" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx
# [Task 1/24] Current/Best: 9.65/ 23.16 GFLOPS | Progress: (60/1000) | 130.74 s Done.
# [Task 1/24] Current/Best: 3.56/ 23.16 GFLOPS | Progress: (192/1000) | 381.32 s Done.
# [Task 2/24] Current/Best: 13.13/ 58.61 GFLOPS | Progress: (960/1000) | 1190.59 s Done.
# [Task 3/24] Current/Best: 31.93/ 59.52 GFLOPS | Progress: (800/1000) | 727.85 s Done.
# [Task 4/24] Current/Best: 16.42/ 57.80 GFLOPS | Progress: (960/1000) | 559.74 s Done.
# [Task 5/24] Current/Best: 12.42/ 57.92 GFLOPS | Progress: (800/1000) | 766.63 s Done.
# [Task 6/24] Current/Best: 20.66/ 59.25 GFLOPS | Progress: (1000/1000) | 673.61 s Done.
# [Task 7/24] Current/Best: 15.48/ 59.60 GFLOPS | Progress: (1000/1000) | 953.04 s Done.
# [Task 8/24] Current/Best: 31.97/ 59.33 GFLOPS | Progress: (972/1000) | 559.57 s Done.
# [Task 9/24] Current/Best: 34.14/ 60.09 GFLOPS | Progress: (1000/1000) | 479.32 s Done.
# [Task 10/24] Current/Best: 12.53/ 58.97 GFLOPS | Progress: (972/1000) | 642.34 s Done.
# [Task 11/24] Current/Best: 30.94/ 58.47 GFLOPS | Progress: (1000/1000) | 648.26 s Done.
# [Task 12/24] Current/Best: 23.66/ 58.63 GFLOPS | Progress: (1000/1000) | 851.59 s Done.
# [Task 13/24] Current/Best: 25.44/ 59.76 GFLOPS | Progress: (1000/1000) | 534.58 s Done.
# [Task 14/24] Current/Best: 26.83/ 58.51 GFLOPS | Progress: (1000/1000) | 491.67 s Done.
# [Task 15/24] Current/Best: 33.64/ 58.55 GFLOPS | Progress: (1000/1000) | 529.85 s Done.
# [Task 16/24] Current/Best: 14.93/ 57.94 GFLOPS | Progress: (1000/1000) | 645.55 s Done.
# [Task 17/24] Current/Best: 28.70/ 58.19 GFLOPS | Progress: (1000/1000) | 756.88 s Done.
# [Task 18/24] Current/Best: 19.01/ 60.43 GFLOPS | Progress: (980/1000) | 514.69 s Done.
# [Task 19/24] Current/Best: 14.61/ 57.30 GFLOPS | Progress: (1000/1000) | 614.44 s Done.
# [Task 20/24] Current/Best: 10.47/ 57.68 GFLOPS | Progress: (980/1000) | 479.80 s Done.
# [Task 21/24] Current/Best: 34.37/ 58.28 GFLOPS | Progress: (308/1000) | 225.37 s Done.
# [Task 22/24] Current/Best: 15.75/ 57.71 GFLOPS | Progress: (1000/1000) | 1024.05 s Done.
# [Task 23/24] Current/Best: 23.23/ 58.92 GFLOPS | Progress: (1000/1000) | 999.34 s Done.
# [Task 24/24] Current/Best: 17.27/ 55.25 GFLOPS | Progress: (1000/1000) | 1428.74 s Done.
调优过程将占用很长时间,所以tvmc tune提供了很多操作来定制调优的过程。
使用调优数据编译优化的模型
通过上面的调优过程,我们得到了调优记录,存储在resnet50-v2-7-autotuner_records.json
,这个文件可以在两方面使用
- 将来调优的输入,通过tvmc tune --tuning-records
- 编译的输入
编译器将使用这个结果来生成高性能代码,使用tvmc compile --tuning-records
tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json \
--output resnet50-v2-7-tvm_autotuned.tar \
resnet50-v2-7.onnx
检查优化后的模型输出相同的结果
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar
python postprocess.py
检查预测结果是相同的
# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
比较调优和未调优的模型
tvmc提供了基本的模型间的性能评测工具
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar
mean (ms) median (ms) max (ms) min (ms) std (ms)
90.9642 96.1843 492.3016 9.5005 64.9885
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar
mean (ms) median (ms) max (ms) min (ms) std (ms)
116.9617 99.7234 498.3171 11.4159 64.9802