基于强化学习的量化调优

模型量化调优主要解决边侧推理设备及算力存在多样性、模型优化缺少硬件感知及自动优化过程的问题，根据边缘部署精度、算力、时延、内存不同约束目标，实现自动化的量化调优。

目前支持对包括但不限于表1中的模型进行基于强化学习的模型量化压缩。

表1 自动量化调优已验证模型列表
类型	名称	框架
图像分类	ResNet50	MindSpore、PyTorch
	MobileNetV2	MindSpore、PyTorch
	VIT	MindSpore
图像分割类	DeepLabV3	MindSpore
目标检测类	FasterRCNN	MindSpore
	YoloV5	MindSpore、PyTorch
	YoloV4	PyTorch
	YoloV3	MindSpore
	YoloV3-Tiny	PyTorch
	SSD	MindSpore、PyTorch
	RetinaNet	PyTorch
自然语言处理	BERT-Base	MindSpore
	ERNIE	MindSpore
	Transformer	MindSpore

前提条件

请执行以下命令安装fvcore、onnx包。

pip3 install fvcore onnx --user

基于强化学习的模型量化调优过程

量化调优过程（nas）包括以下步骤：

读取输入模型构建Graph IR,用于量化算子定位、图结构优化、算子特征提取。此步骤需要用户提供一个返回模型实例的接口。
进行模型特征提取，提取算子Kernel参数、Shape参数、算子上下文结构等信息。
通过强化学习算法进行量化位宽搜索，构建候选量化位宽策略。
针对候选量化位宽策略构建量化后的模型结构及其Graph IR。
通过量化算法进行图结构优化、权重和激活值的量化算子参数标定和校正。对复杂模型，例如图像检测分割和transformer类的模型，需要提供一个校正接口，用来校正激活值的量化参数，接口实现主要是用部分训练集做一个epoch的推理，没有反向传播操作。
对量化后的模型进行精度评估。对复杂模型，例如图像检测分割和transformer类的模型，需要提供一个精度评估接口，返回量化后模型的精度。
生成可部署量化模型，进行在环测评，并将测评结果反馈到位宽搜索算法。
重复3~6步，直到搜索出满足指定精度、压缩率和时延指标的量化模型。

模型量化调优操作步骤（以PyTorch框架的ResNet50为例）

进入{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/pytorch/quant/classification目录，已提供示例文件resnet_rl_quant.yml，建议拷贝至当前运行目录，并根据实际情况配置以下加粗字段。

general:
    backend: pytorch   # pytorch | tensorflow
    device_category: NPU
    task:
      local_base_path: /path/of/workdir       #工作路径
      task_id: "quant_resnet50_pytorch"
    device_evaluate_before_train: False

pipeline: [nas]

nas:
    pipe_step:
        type: SearchPipeStep
    model:
        model_desc:
            type: ResNet50_ModelZoo    #该模型的定义方法需参照自定义模型注册
            version: resnet50
            config: classic
        input_shape: [ 1, 3, 224, 224 ]
    dataset:
        type:  Imagenet
        common:
            data_path: /path/of/dataset      #数据集路径
            drop_last: False
            shuffer: False
        …

    search_algorithm:
        …

        #choice: acc_first | compress_first
        reward_type: 'acc_first' #模型量化压缩优先指标。acc_first：精度优先,compress_first：压缩率优先
        custom_reward: False
        latency_acc_ratio:   0.5 # ratio of latency to accuracy in reward.
                                 # If custom_reward if false,this value doesn't need to be configured.
                                 # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward.
                                 # Otherwise,this value is recommended to be greater than 0.1.
        stop_early: False   #是否找到同时满足acc_threshold、latency_threshold、compress_threshold三项指标即停止任务
        acc_threshold: 0.5  #精度损失阈值百分比
        latency_threshold: 5   #时延降低百分比
        compress_threshold: 40   #压缩率阈值百分比
    search_space:
        type: SearchSpace
        hyperparameters:
            -   key: network.bit_candidates
                type: CATEGORY
                range: [8, 32]
    trainer:
        type: Trainer
        epochs: 1
        seed: 234
        callbacks: [QuantPTQCallback, ConvertOnnxCallback]
        pretrained_model_file: /path/of/pretrain/model/file   #ResNet50预训练权重文件
        …

    evaluator:
        type: Evaluator
        device_evaluator:
            type: DeviceEvaluator
            custom: QuantCustomEvaluator
            remote_host: http://xxx.xxx.xxx.xxx:port/  #远端推理服务器URL，后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”，请将“http”更改为“https”
            backend: 'pytorch'
            om_input_shape: '1,3,224,224'
            delete_eval_model: True   #是否删除搜索出的量化模型

对图像分类网络模型进行模型量化时，以PyTorch框架的MobileNetV2为例，需要配置的yml文件为{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/pytorch/quant/classification/mobilenetv2_quant.yml。
对分割网络模型进行模型量化时，以PyTorch框架的DeepLabV3为例，需要配置的yml文件为{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/pytorch/quant/segmentation/deeplabv3_quant.yml。

启动模型量化压缩任务。
```
vega resnet_rl_quant.yml -d NPU
```
任务结束后会在指定工作路径的log文件夹中输出搜索日志。如果在评估服务evaluator中，device_evaluator的delete_eval_model字段配置成“False”，将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。

如果想对参考基于训练脚本的剪枝调优进行剪枝调优后的模型做模型量化调优，无需进行模型自定义注册和模型脚本，而需要提供模型描述文件（.json），具体可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_prune_rl_quant.yml进行修改。

模型量化调优操作步骤（以MindSpore框架的YoloV5为例）

进入{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/mindspore/quant/detection/yolov5目录，已提供示例文件yolov5_ms_quant.yml，建议拷贝至当前运行目录，根据实际情况配置以下加粗字段。

general:
    backend: mindspore  # pytorch | mindspore
    device_category: NPU
    task:
      local_base_path: ./tasks/
      task_id: "quant_yolov5"
    device_evaluate_before_train: False
    parallel_search: True
    logger:
      level: info
    worker:
      timeout: 720000000
pipeline: [nas]

nas:
    pipe_step:
        type: SearchPipeStep
    model:
        model_desc:
          type: PruneModel
          model_file_path: /home/examples/yolov5/src/yolo.py   #模型脚本，该脚本内需提供一个get_model方法返回需要量化的模型实例
          pkg_path: /home/examples/yolov5   #用户的训练脚本所在的目录
        pretrained_model_file: "/home/examples/yolov5/pre_train/0-300_274800.1130.ckpt"
        input_shape:
          - type: fp32
            tensor: True
            shape: [ 1,12,320,320 ]
          - type: int32
            tensor: False
            shape: [ 640,640 ]

    search_algorithm:
        type: MsQuantRL
        codec: QuantRLCodec
        policy:
            max_episode: 30   # Max eposide, recommended value>100, bigger is better, but it takes longer to learn.
                              # If this value cannot be determined, please set this value to a large value, and then /
                              # set a quantitative target and stop learning early to avoid the setting of this value.
            num_warmup: 10    # time without training but only filling the replay memory, recommended:10-20
        objective_keys: [ 'accuracy','compress_ratio','latency' ]

        #choice: acc_first | compress_first
        reward_type: 'compress_first'  #模型量化压缩优先指标。acc_first：精度优先；compress_first：压缩率优先

        custom_reward: False
        latency_acc_ratio: 0.5  # ratio of latency to accuracy in reward.
                             # If custom_reward if false,this value doesn't need to be configured.
                             # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward.
                             # Otherwise,this value is recommended to be greater than 0.1.
        stop_early: False   #是否找到同时满足acc_threshold、latency_threshold、compress_threshold三项指标即停止任务
        acc_threshold: 0.5  #精度损失阈值百分比
        latency_threshold: 5  #时延降低百分比
        compress_threshold: 40  #压缩率阈值百分比


    search_space:
        type: SearchSpace
        hyperparameters:
            -   key: network.bit_candidates
                type: CATEGORY
                range: [ 8, 32]

    trainer:
      type: OriTrainer
      seed: 234
      callbacks: [MsQuantPTQCallback, CustomMetricCallback, CustomExportCallback]
      calib_portion: 0.01
      custom_calib:    #前向校正，校正量化因子
        pkg_path: /home/examples/yolov5/    #calib_func接口所在包路径
        path: /home/examples/yolov5/train.py    #接口所在路径
        func: calib_func     
#对检测类模型，需要提供校正接口，校正方法可参考该{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_custom_func.py脚本
      custom_eval:     #评估精度
        pkg_path: /home/examples/yolov5/          #eval_func接口所在包路径
        path: /home/examples/yolov5/eval.py       #接口所在路径
        func: run_eval     #对检测类模型，需要提供验证接口，验证方法可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_custom_func.py脚本
        metric_name: "mAP"     #需要优化的指标

    evaluator:
        type: Evaluator
        device_evaluator:
          type: DeviceEvaluator
          custom: QuantCustomEvaluator
          om_input_shape: 'input_0:1,12,320,320'
          backend: mindspore
          delete_eval_model: False   #是否删除搜索出的量化模型
          hardware: "Davinci"
          remote_host: "http://x.x.x.x:xxxx"  #远端推理服务器URL，后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”，请将“http”更改为“https”
          repeat_times: 1
          muti_input: True
          save_intermediate_file: True

对于检测类模型需要提供一个校正接口，需要利用训练数据集做前向计算，校正量化参数。不需要做反向计算，反向传播相关的代码可以去掉，比如学习率，loss，优化器，校正接口具体配置可参考OriTrainer的custom_calib字段。

对于检测类模型需要提供一个评估接口，计算模型精度指标，需要返回dict类型的评估结果，评估接口具体配置可参考OriTrainer的custom_eval字段。其中，metric_name是返回的评估结果中的一个key值。

启动模型量化调优任务。
```
vega yolov5_ms_quant.yml -d NPU
```
任务结束后会在指定工作路径的log文件夹中输出搜索日志。如果在评估服务evaluator中，device_evaluator的delete_eval_model字段配置成“False”，将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。

如果想对参考基于训练脚本的剪枝调优进行剪枝调优后的模型做模型量化调优，无需提供模型脚本，提供模型描述文件（.json）即可，具体可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/mindspore/quant/detection/yolov5/yolov5_prune_ms_quant.yml进行修改。

transformer类模型量化调优操作步骤（以MindSpore框架的ERNIE为例）

进入{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/mindspore/quant/nlp/ernie目录。
参见ernie_chnsenticorp_quant.md下载ERNIE源码和数据，构建获取模型函数、模型输入函数、校正函数和评估函数。

在ernie_chnsenticorp_quant.yml文件中根据实际情况配置以下加粗字段。

general:
    backend: mindspore
    device_category: NPU
    device_evaluate_before_train: False
    task:
        local_base_path: ./tasks/  #工作路径
        task_id: "ernie_chnsenticorp_quant"
    logger:
        level: info
    worker:
        timeout: 720000000
pipeline: [nas]

register:
    pkg_path: [ "/xxx/ERNIE_for_MindSpore_1.6_code" ]   #ERNIE源码路径
    modules:
        - module: "run_ernie_classifier"  #模块导入
          script_network: "get_model"   #获取模型实例的函数
          script_network_input: ["get_input"]   #获取模型输入的函数
          ori_correct_func: ["calib_func"]    #量化参数校正的函数
          ori_eval_func: ["eval_func"]    #量化后模型精度评估的函数

nas:
    pipe_step:
        type: SearchPipeStep
    model:
        model_desc:
            type: ScriptModelGen
            common:
                network:
                    type: get_model
                    config:
                        checkpoint_path: "/xxx/chnsenticorp-0-10_400.ckpt"   #模型权重文件路径
                multiple_inputs:
                    type: get_input
                    config:
                        eval_batch_size: 1
                        eval_data_file_path: "/xxx/chnsenticorp_test.mindrecord"   #推理数据集路径



    search_algorithm:
        type: MsQuantRL
        codec: QuantRLCodec
        policy:
            max_episode: 30   # Max eposide, recommended value>100, bigger is better, but it takes longer to learn.
            # If this value cannot be determined, please set this value to a large value, and then /
            # set a quantitative target and stop learning early to avoid the setting of this value.
            num_warmup: 10    # time without training but only filling the replay memory, recommended:10-20
        objective_keys: [ 'accuracy','compress_ratio','latency' ]

        #choice: acc_first | compress_first
        reward_type: 'compress_first'

        custom_reward: False
        latency_acc_ratio: 0.5  # ratio of latency to accuracy in reward.
          # If custom_reward if false,this value doesn't need to be configured.
        # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward.
        # Otherwise,this value is recommended to be greater than 0.1.
        stop_early: False
        acc_threshold: 0.5
        latency_threshold: 5
        compress_threshold: 40


    search_space:
        type: SearchSpace
        hyperparameters:
            -   key: network.bit_candidates
                type: CATEGORY
                range: [8, 32]

    trainer:
        type: QuantTrainer
        seed: 234
        calib_portion: 0.1
        callbacks: [OptExportCallback]
        custom_calib:
            type: calib_func#已注册的校正函数
            config:
                train_batch_size: 32
                train_data_file_path: "/xxx/chnsenticorp_train.mindrecord"   #训练数据集路径
        custom_eval:
            type: eval_func   #已注册的评估函数
            metric_name: "accuracy"
            config:
                eval_batch_size: 1
                eval_data_file_path: "/xxx/chnsenticorp_test.mindrecord"   #推理数据集路径

    evaluator:
        type: Evaluator
        device_evaluator:
            type: DeviceEvaluator
            custom: QuantCustomEvaluator
            backend: mindspore
            delete_eval_model: True
            hardware: "Davinci"
            remote_host: "http://xx.xx.xx.xx:xxxx"    #远端推理服务器URL，后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”，请将“http”更改为“https”
            repeat_times: 1
            muti_input: True

启动模型量化任务。
```
vega ernie_chnsenticorp_quant.yml -d NPU
```
任务结束后会在指定工作路径的log文件夹中输出搜索日志。在评估服务evaluator中，如果在device_evaluator中配置“save_intermediate_file: True”，将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。

父主题： 模型自动调优