使用tvm高层api，tvmc python

模型下载

mkdir myscripts
cd myscripts
wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx
mv resnet50-v2-7.onnx my_model.onnx
touch tvmcpythonintro.py

step 0: Imports

from tvm.driver import tvmc

step 1: Load model

将模型导入tvmc，这一步骤把所支持框架的机器学习模型转换为TVM的高层图级别中间表示，称为Relay。这让tvmc中所有的模型有一个统一的七点，目前我们支持的框架有：Keras，ONNX，Tensorflow，TFLite和PyTorch

model = tvmc.load('my_model.onnx')  # step 1: load

如果需要查看Relay，可以使用：model.summary()
所有的框架都支持通过一个shape_dict变量重写输入形状，对于多数框架，这一步操作是可选的，但是对于Pytorch这是需要的，因为TVM不能自动搜索到它。

#model = tvmc.load('my_model.onnx', shape_dict={'input1' : [1, 2, 3, 4], 'input2' : [1, 2, 3, 4]}) #Step 1: Load + shape_dict

一个推荐的方式去查看模型的输入shape_dict是通过netron，打开模型之后，点击第一个节点查看名字和形状

step 2: Compile

现在模型已经是Relay了，我们的下一步工作是编译到指定的硬件，这个编译的过程将Relay模型转换为低层次的抽象，目标机器可以理解执行。
需要指定tvm.target，可以查看文档
一些例子

cuda (Nvidia GPU)
llvm (CPU)
llvm -mcpu=cascadelake(intel CPU)

package = tvmc.compile(model, target="llvm") #Step 2: Compile

编译步骤将返回一个package

step 3: Run

编译的package可以在目标硬件上运行，设备的输入选项是：CPU，Cuda，CL，Metal，和Vulkan

result = tvmc.run(package, device="cpu") #Step 3: Run

可以使用print(result)打印结果

step 1.5: Tune [Optional & Recommended]

运行速度可以通过tuning进一步提升，这个可选的步骤使用机器学习方法查看模型中的每种算子，然后尝试并找到更快的运行方式，我们通过一个代价模型进行实现，

target与compile步骤中的一致

tvmc.tune(model, target="llvm")  #Step 1.5: Optional Tune

输出结果示例

[Task  1/13]  Current/Best:   82.00/ 106.29 GFLOPS | Progress: (48/769) | 18.56 s
[Task  1/13]  Current/Best:   54.47/ 113.50 GFLOPS | Progress: (240/769) | 85.36 s
.....

可能会输出UserWarnings，可以忽略，这应该是为了让过程更快，但仍要数个小时来完成
可以保存结果

#tvmc.compile(model, target="llvm", tuning_records = "records.log") #Step 2: Compile

保存脚本启动进程

python my_tvmc_script.py

输出示例

Time elapsed for training: 18.99 s
Execution time summary:
mean (ms)   max (ms)   min (ms)   std (ms)
  25.24      26.12      24.89       0.38


Output Names:
['output_0']

TVMC额外功能

保存模型

将模型的Relay格式保存

model = tvmc.load('my_model.onnx') #Step 1: Load
model.save(desired_model_path)

保存package

compile过程结果

tvmc.compile(model, target="llvm", package_path="whatever") #Step 2: Compile

new_package = tvmc.TVMCPackage(package_path="whatever")
result = tvmc.run(new_package, device="cpu") #Step 3: Run

使用Autoscheduler

使用新一代tvm，自动调优

tvmc.tune(model, target="llvm", enable_autoscheduler = True)

保存tuning结果

method 1:

log_file = "hello.json"

# Run tuning
tvmc.tune(model, target="llvm", tuning_records=log_file)

...

# Later run tuning and reuse tuning results
tvmc.tune(model, target="llvm", prior_records=log_file)

method 2:

# Run tuning
tuning_records = tvmc.tune(model, target="llvm")

...

# Later run tuning and reuse tuning results
tvmc.tune(model, target="llvm", prior_records=tuning_records)

更复杂的模型调优

如果T打印类似.........T.T..T..T..T.T.T.T.T.T.提高搜索时间

tvmc.tune(model,trials=10000,timeout=10,)

在远程设备上编译模型

remote procedural call (rpc)十分有用，在不是本地的硬件上进行模型编译，为了启动RPC服务器，首先查看文档
TVMC脚本包括

tvmc.tune(
     model,
     target=target, # Compilation target as string // Device to compile for
     target_host=target_host, # Host processor
     hostname=host_ip_address, # The IP address of an RPC tracker, used when benchmarking remotely.
     port=port_number, # The port of the RPC tracker to connect to. Defaults to 9090.
     rpc_key=your_key, # The RPC tracker key of the target device. Required when rpc_tracker is provided
)