Estimator迁移

Estimator简介

Estimator API属于TensorFlow的高阶API，在2018年发布的TensorFlow 1.10版本中引入，它可极大简化机器学习的编程过程。Estimator有很多优势，例如：对分布式的良好支持、简化了模型的创建工作、有利于模型开发者之间的代码分享等。

使用Estimator进行训练脚本开发的流程为：

数据预处理，创建输入函数input_fn。
模型构建，构建模型函数model_fn。
运行配置，实例化Estimator，并传入Runconfig类对象作为运行参数。
执行训练，在Estimator上调用训练方法Estimator.train()，利用指定输入对模型进行固定步数的训练。

下面介绍如何迁移Estimator训练脚本，以便在昇腾AI处理器上进行训练。

头文件增加

对于以下步骤中涉及修改的python文件，新增以下头文件引用，用于导入NPU相关库。

from npu_bridge.npu_init import *

数据预处理

一般情况下，此部分代码无需改造。如下情况需要进行适配修改：

当原始网络脚本中使用dataset.batch(batch_size)返回动态形状时，由于数据流中剩余的样本数可能小于batch大小，导致网络中最后一个step的shape与之前的shape不一致，此种场景下会进入动态shape编译流程。为提升网络编译性能，建议将drop_remainder设置为True，丢弃文件中的最后几个样本，确保网络中每个step的shape一致。

  dataset = dataset.batch(batch_size, drop_remainder=True)

但需要注意的是：推理时，当最后一次迭代的推理数据量小于batch size时，需要补齐空白数据到batch size，因为有些脚本最后会加个断言，验证结果的数量要和验证数据的数量一致。

 assert num_written_lines == num_actual_predict_examples

模型构建

一般情况下，此部分代码无需改造。如下情况需要进行适配修改：

对于原始网络中的dropout，建议替换为CANN对应的API实现，以获得更优性能，但需关注对网络精度的影响。
- 如果存在tf.nn.dropout，建议修改为：
```
layers = npu_ops.dropout()
```
- 如果存在tf.layers.dropout/tf.layers.Dropout/tf.keras.layers.Dropout/tf.keras.layers.SpatialDropout1D/tf.keras.layers.SpatialDropout2D/tf.keras.layers.SpatialDropout3D，建议增加头文件引用：
```
from npu_bridge.estimator.npu import npu_convert_dropout
```

对于原始网络中的gelu，建议替换为CANN对应的API实现，以获得更优性能。

TensorFlow原始代码：

def gelu(x): 
  cdf = 0.5 * (1.0 + tf.tanh(
     (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))) 
  return x*cdf
layers = gelu()

迁移后的代码：

layers = npu_unary_ops.gelu(x)

运行配置

TensorFlow通过RunConfig配置运行参数，用户需要将RunConfig迁移为NPURunConfig。NPURunConfig类继承了RunConfig类，因此我们在迁移时可直接按照如下示例进行脚本修改，大多数参数可不变。

TensorFlow原始代码：

config=tf.estimator.RunConfig(
  model_dir=FLAGS.model_dir, 
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))

迁移后的代码：

npu_config=NPURunConfig(
  model_dir=FLAGS.model_dir,
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  # 如果原始网络中使用了tf.device相关代码，则需要增加session配置“allow_soft_placement=True”，允许TensorFlow自动分配设备。
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False) 
  )

但是，部分参数（包括train_distribute/device_fn/protocol/eval_distribute/experimental_distribute）在NPURunConfig中不支持，如果原始脚本使用到了，用户需要进行删除。

如果原始网络中使用了tf.device相关代码，需要增加session配置“allow_soft_placement=True”，允许TensorFlow自动分配设备。

同时，我们在NPURunConfig新增了部分参数，从而提升训练性能与精度，例如iterations_per_loop、precision_mode等，详细的参数信息可参见NPURunConfig构造函数。

创建Estimator

用户需要将TensorFlow的Estimator迁移为NPUEstimator，NPUEstimator类继承了Estimator类，因此我们在迁移时按照如下示例直接更改接口即可，参数可保持不变。

TensorFlow原始代码：

mnist_classifier=tf.estimator.Estimator(
  model_fn=cnn_model_fn,
  config=config,
  model_dir="/tmp/mnist_convnet_model")

迁移后的代码：

mnist_classifier=NPUEstimator(
  model_fn=cnn_model_fn,
  config=npu_config,
  model_dir="/tmp/mnist_convnet_model"
  )

执行训练

利用指定输入对模型进行训练，此部分代码无需改造。

mnist_classifier.train(
  input_fn=train_input_fn,
  steps=20000,
  hooks=[logging_hook])

父主题： 单机单卡脚本迁移