tvmc-tutorial-526互联

环境安装

tvm环境

查看conda版本为 4.12.0，对应2022.05的版本
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh

tvm 官方conda安装教程
https://tvm.apache.org/docs/install/from_source.html

git clone --recursive https://github.com/apache/tvm tvm
sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev

下载llvm预编译版本15.0.6, https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.6/clang+llvm-15.0.6-x86_64-linux-gnu-ubuntu-18.04.tar.xz
ninja编译，在nerf环境下
pybind11 not found, conda 安装

tvm tutorial

tvmc Package "onnx" is not installed

tvmc

tvm命令行工具
使用方法

tvm --help

常用compile、run、tune

获取模型

tvmc支持Keras，ONNX，TensorFlow，TFLite，Torch，使用--model-format指定具体使用的模型。

wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx

编译ONNX模型

使用tvmc compile编译，输出一个tar包，包含了目标平台的动态库，可以使用TVM runtime在目标设备上运行。

# This may take several minutes depending on your machine
tvmc compile \
--target "llvm" \
--input-shapes "data:[1,3,224,224]" \
--output resnet50-v2-7-tvm.tar \
resnet50-v2-7.onnx

生成tar文件

mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
ls model

包括3个文件

mod.so是模型，用c++库表示，可以被TVM runtime加载
mod.json是文本文件，描述了TVM Relay计算图
mod.params包含了与训练模型的参数

运行TVMC编译好的模型

使用tvmc运行需要准备两件事

编译好的模块，即上一部分的内容
对模型有效的输入
每种模型需要指定的张量形状，格式，数据类型。由于这个原因，大多数模型需要一些预处理和后处理过程，保证输入的有效性，并转换为指定的输出。tvmc采用NumPy的.npz格式来处理输入和输出数据

预处理

ResNet-50 v2模型的输入是ImageNet格式
preprocess.py

#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize according to ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
      norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)

# Save to .npz (outputs imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)

运行编译好的模块

手中有了模型和输入数据，现在可以使用tvmc进行预测

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar

执行完以上指令之后，tvmc输出一个新的文件，predictions.npz，这个文件包含了NumPy格式的输出张量。

输出数据的后处理

把ResNet-50 v2的输出渲染为人类更可读的格式。

#!python ./postprocess.py
import os.path
import numpy as np

from scipy.special import softmax

from tvm.contrib.download import download_testdata

# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

output_file = "predictions.npz"

# Open the output and read the output tensor
if os.path.exists(output_file):
    with np.load(output_file) as data:
        scores = softmax(data["output_0"])
        scores = np.squeeze(scores)
        ranks = np.argsort(scores)[::-1]

        for rank in ranks[0:5]:
            print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

运行这个脚本将得到以下输出：

python postprocess.py
# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

自动调优ResNet Model

之前的模型通过TVM runtime进行编译，但不包括任何平台相关的优化，在这一章节中，我们将展示如何使用tvmc给目标平台建立优化的模型。

有时我们得不到期待的性能，使用自动调优能够发现更好的模型配置，得到性能的提升。使用tune子命令进行操作，将结果存在调优记录中，并输出。

最简单的形式，调优需要提供一下三点

指定运行的目标平台
调优输出文件的路径
需要被调优的模型的路径

# The default search algorithm requires xgboost, see below for further
# details on tuning search algorithms
pip install xgboost

tvmc tune \
--target "llvm" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx

tvmc将尝试对模型使用不同的参数，尝试不同的算子配置然后选择在平台上运行最快的，搜索过程可能需要几个小时来完成，搜索结果的输出将被保存在resnet50-v2-7-autotuner_records.json中，最终将被用于编译优化后的模型。
默认情况下搜索使用XGBoost Grid算法进行指导

输出的结果类似于

tvmc tune \
--target "llvm -mcpu=broadwell" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx
# [Task  1/24]  Current/Best:    9.65/  23.16 GFLOPS | Progress: (60/1000) | 130.74 s Done.
# [Task  1/24]  Current/Best:    3.56/  23.16 GFLOPS | Progress: (192/1000) | 381.32 s Done.
# [Task  2/24]  Current/Best:   13.13/  58.61 GFLOPS | Progress: (960/1000) | 1190.59 s Done.
# [Task  3/24]  Current/Best:   31.93/  59.52 GFLOPS | Progress: (800/1000) | 727.85 s Done.
# [Task  4/24]  Current/Best:   16.42/  57.80 GFLOPS | Progress: (960/1000) | 559.74 s Done.
# [Task  5/24]  Current/Best:   12.42/  57.92 GFLOPS | Progress: (800/1000) | 766.63 s Done.
# [Task  6/24]  Current/Best:   20.66/  59.25 GFLOPS | Progress: (1000/1000) | 673.61 s Done.
# [Task  7/24]  Current/Best:   15.48/  59.60 GFLOPS | Progress: (1000/1000) | 953.04 s Done.
# [Task  8/24]  Current/Best:   31.97/  59.33 GFLOPS | Progress: (972/1000) | 559.57 s Done.
# [Task  9/24]  Current/Best:   34.14/  60.09 GFLOPS | Progress: (1000/1000) | 479.32 s Done.
# [Task 10/24]  Current/Best:   12.53/  58.97 GFLOPS | Progress: (972/1000) | 642.34 s Done.
# [Task 11/24]  Current/Best:   30.94/  58.47 GFLOPS | Progress: (1000/1000) | 648.26 s Done.
# [Task 12/24]  Current/Best:   23.66/  58.63 GFLOPS | Progress: (1000/1000) | 851.59 s Done.
# [Task 13/24]  Current/Best:   25.44/  59.76 GFLOPS | Progress: (1000/1000) | 534.58 s Done.
# [Task 14/24]  Current/Best:   26.83/  58.51 GFLOPS | Progress: (1000/1000) | 491.67 s Done.
# [Task 15/24]  Current/Best:   33.64/  58.55 GFLOPS | Progress: (1000/1000) | 529.85 s Done.
# [Task 16/24]  Current/Best:   14.93/  57.94 GFLOPS | Progress: (1000/1000) | 645.55 s Done.
# [Task 17/24]  Current/Best:   28.70/  58.19 GFLOPS | Progress: (1000/1000) | 756.88 s Done.
# [Task 18/24]  Current/Best:   19.01/  60.43 GFLOPS | Progress: (980/1000) | 514.69 s Done.
# [Task 19/24]  Current/Best:   14.61/  57.30 GFLOPS | Progress: (1000/1000) | 614.44 s Done.
# [Task 20/24]  Current/Best:   10.47/  57.68 GFLOPS | Progress: (980/1000) | 479.80 s Done.
# [Task 21/24]  Current/Best:   34.37/  58.28 GFLOPS | Progress: (308/1000) | 225.37 s Done.
# [Task 22/24]  Current/Best:   15.75/  57.71 GFLOPS | Progress: (1000/1000) | 1024.05 s Done.
# [Task 23/24]  Current/Best:   23.23/  58.92 GFLOPS | Progress: (1000/1000) | 999.34 s Done.
# [Task 24/24]  Current/Best:   17.27/  55.25 GFLOPS | Progress: (1000/1000) | 1428.74 s Done.

调优过程将占用很长时间，所以tvmc tune提供了很多操作来定制调优的过程。

使用调优数据编译优化的模型

通过上面的调优过程，我们得到了调优记录，存储在resnet50-v2-7-autotuner_records.json，这个文件可以在两方面使用

将来调优的输入，通过tvmc tune --tuning-records
编译的输入
编译器将使用这个结果来生成高性能代码，使用tvmc compile --tuning-records

tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json  \
--output resnet50-v2-7-tvm_autotuned.tar \
resnet50-v2-7.onnx

检查优化后的模型输出相同的结果

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar

python postprocess.py

检查预测结果是相同的

# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

比较调优和未调优的模型

tvmc提供了基本的模型间的性能评测工具

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar


mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  90.9642      96.1843      492.3016      9.5005      64.9885  
  
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar

 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  116.9617     99.7234      498.3171     11.4159      64.9802

tvmc-tutorial tutorial tvmc

dgl-tutorials-reading-notes

tutorial workshop lecture seminar

tutorials过程