均匀量化是指量化后的数据比较均匀地分布在某个数值空间中,例如INT8量化就是用只有8比特的INT8数据来表示32比特的FP32数据或16比特的FP16数据,将FP32/FP16的卷积运算过程(乘加运算)转换为INT8的卷积运算,加速运算和实现模型压缩;均匀的INT8量化则是量化后数据比较均匀地分布在INT8的数值空间[-128, 127]中。
如果均匀量化后的模型精度无法满足要求,则需要进行量化感知训练或基于精度的自动量化或手工调优。
均匀量化支持量化的层以及约束如下,量化示例请参见获取更多样例>resnet50>执行均匀量化。
量化方式 |
支持的层类型 |
约束 |
---|---|---|
均匀量化 |
InnerProduct:全连接层 |
transpose属性为false,axis为1 |
Convolution:卷积层 |
filter维度为4 |
|
Deconvolution:反卷积层 |
dilation为1、filter维度为4 |
|
Pooling:平均下采样层 |
下采样方式为AVE,且非global pooling |
均匀量化接口调用流程如图1所示,均匀量化不支持多个GPU同时运行。
其中数据集用于在Caffe环境中对模型进行推理时,测试量化数据的精度;校准集用来产生量化因子,保证精度。
本章节详细给出训练后量化的模板代码解析说明,通过解读该代码,用户可以详细了解AMCT的工作流程以及原理,方便用户基于已有模板代码进行修改,以便适配其他网络模型的量化。
用户可以参见resnet50>执行均匀量化获取本章节的sample示例代码。训练后量化主要包括如下几个步骤:
如下流程详细演示如何编写脚本调用AMCTAPI进行模型量化。
1
|
import amct_caffe as amct |
AMCT支持CPU或GPU运行模式,所使用的API分别是set_cpu_mode和set_gpu_mode,其中GPU模式与Caffe框架相关,在此模式下,多GPU device的选择是通过Caffe的API caffe.set_mode_gpu()和caffe.set_device(args.gpu_id)来实现的,因此需要先配置Caffe的运行设备模式,再配置AMCT的设备模式。另外因为此处已经指定了运行设备,模型推理函数中无需再次配置运行设备,代码样例如下:
1 2 3 4 5 6 |
if args.gpu_id is not None and not args.cpu_mode: caffe.set_mode_gpu() caffe.set_device(args.gpu_id) amct.set_gpu_mode() else: caffe.set_mode_cpu() |
1 2 3 4 5 |
# Run original model without quantize test if args.pre_test: run_caffe_model(args.model_file, args.weights_file, args.iterations) print('[INFO]Run %s without quantize success!' %(args.model_name)) return |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Generate quantize configurations config_json_file = 'tmp/config.json' batch_num = 2 if args.cfg_define is not None: amct.create_quant_config(config_json_file, args.model_file, args.weights_file, config_defination=args.cfg_define) else: skip_layers = [] amct.create_quant_config(config_json_file, args.model_file, args.weights_file, skip_layers, batch_num) |
1 2 3 4 5 6 |
# Phase0: Init amct task scale_offset_record_file = 'tmp/scale_offset_record.txt' graph = amct.init(config_json_file, args.model_file, args.weights_file, scale_offset_record_file) |
1 2 3 4 5 |
# Phase1: Do conv+bn+scale fusion, weights calibration and fake # quantize, insert data-quantize layer modified_model_file = 'tmp/modified_model.prototxt' modified_weights_file = 'tmp/modified_model.caffemodel' amct.quantize_model(graph, modified_model_file, modified_weights_file) |
1 2 |
# Phase2: run caffe model to do activation calibration run_caffe_model(modified_model_file, modified_weights_file, batch_num) |
校准执行过程中提示“IfmrQuantWithOffset scale is illegal",则请参见校准执行过程中提示“IfmrQuantCalibration with offset scale is illegal"或“ IfmrQuantCalibration without offset scale is illegal”处理。
1 2 3 4 |
# Phase3: save final model, one for caffe do fake quant test, one # deploy model for ATC result_path = 'results/%s' %(args.model_name) amct.save_model(graph, 'Both', result_path) |
如果保存模型时,提示Error: Cannot find scale_d of layer '**' in record file信息,则请参见量化执行过程中提示“Error: Cannot find scale_d of layer '**' in record file”处理。
1 2 3 4 5 |
# Phase4: if need test quantized model, uncomment to do final fake quant # model test. fake_quant_model = 'results/{0}_fake_quant_model.prototxt'.format(args.model_name) fake_quant_weights = 'results/{0}_fake_quant_weights.caffemodel'.format(args.model_name) run_caffe_model(fake_quant_model, fake_quant_weights, args.iterations) |
如果用户想借助上述sample代码,量化自己的模型,则需要参见如下步骤修改部分代码:
用于传入AMCT所使用的的执行入参(该步骤非必须,用户可使用任意方式实现类似功能,也可以直接将参数写到sample样例代码里面)。代码样例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
class Args(object): """struct for Args""" def __init__(self): self.model_name = '' # Caffe model name as prefix to save model self.model_file = '' # user caffe model txt define file self.weights_file = '' # user caffe model binary weights file self.cpu = True # If True, force to CPU mode, else set to False self.gpu_id = 0 # Set the gpu id to use self.pre_test = False # Set true to run original model test, set # False to run quantize with amct_caffe tool self.iterations = 5 # Iteration to run caffe model self.cfg_define = None # If None use args = Args() #############################user modified start######################### """User set basic info to use amct_caffe tool """ # e.g. args.model_name = 'ResNet50' args.model_file = 'pre_model/ResNet-50-deploy.prototxt' args.weights_file = 'pre_model/ResNet-50-model.caffemodel' args.cpu = True args.gpu_id = None args.pre_test = False args.iterations = 5 args.cfg_define = None #############################user modified end########################### |
代码样例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def run_caffe_model(model_file, weights_file, iterations): """run caffe model forward""" net = caffe.Net(model_file, weights_file, caffe.TEST) #############################user modified start######################### """User modified to execute caffe model forward """ # # e.g. # for iter_num in range(iterations): # data = get_data() # forward_kwargs = {'data': data} # blobs_out = net.forward(**forward_kwargs) # # if have label and need check network forward result # post_process(blobs_out) # return #############################user modified end########################### |
代码解析如下,需要用户根据具体业务网络实现对传入模型的推理工作:
1
|
net = caffe.Net(model_file, weights_file, caffe.TEST) |
1 2 |
data = get_data() forward_kwargs = {'data': data} |
1
|
blobs_out = net.forward(**forward_kwargs) |
1
|
post_process(blobs_out) |