Baichuan 13B-PyTorch

baichuan-13B

概述

简介

LLaMA Factory是一个易于使用的LLM微调框架。它使用一个简单的Web界面。用户仅需10分钟即可完成模型的自感知微调。支持LLAMA、Falcon等多种主流开源模型。此外，还提供了语言、模型路径等自定义选项。训练完成后，您可以评估模型效果，并将模型导出给其他系统使用。

参考实现：

url=https://github.com/hiyouga/LLaMA-Factory/commits/v0.2.0
commit_id=7a5318804870b1f2bedec8d4a676e465b48d5c3e

适配昇腾 AI 处理器的实现：

url=https://gitee.com/ascend/ModelZoo-PyTorch.git
code_path=PyTorch/built-in/foundation

准备训练环境

准备环境

当前模型支持的 PyTorch 版本和已知三方库依赖如下表所示。

表 1 版本支持表

Torch_Version 三方库依赖版本

PyTorch 2.0 transformers == 4.31.0；accelerate==0.21.0
环境准备指导。

请参考《Pytorch框架训练环境准备》。

Torch_Version	三方库依赖版本
PyTorch 2.0	transformers == 4.31.0；accelerate==0.21.0

安装依赖。

在模型源码包根目录下执行命令，安装模型对应PyTorch版本需要的依赖。

  # python3.8
  conda create -n test python=3.8
  conda activate test

  # install torch and torch_npu
  pip install torch2.0.1-cp38-XXX.whl
  pip install torch_npu-2.0.1-XXX.whl
  pip install apex-0.1_ascend_XXX.whl

  # install deepspeed and deepspeed_npu
  pip install deepspeed==0.9.2
  git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
  cd deepspeed_npu
  pip3 install -e ./
  cd ..


  # install other packages
  pip install -r requirements.txt

准备数据集

项目的"./data"路径下已存在预训练所需数据集。

data/
├── alpaca_data_en_52k.json
├── alpaca_data_zh_51k.json
├── alpaca_gpt4_data_en.json
├── alpaca_gpt4_data_zh.json
├── belle_multiturn
│   └── belle_multiturn.py
├── comparison_gpt4_data_en.json
├── comparison_gpt4_data_zh.json
├── dataset_info.json
├── example_dataset
│   ├── example_dataset.py
│   └── examples.json
├── hh_rlhf_en
│   └── hh_rlhf_en.py
├── lima.json
├── oaast_rm.json
├── oaast_rm_zh.json
├── oaast_sft.json
├── oaast_sft_zh.json
├── README.md
├── README_zh.md
├── self_cognition.json
├── sharegpt_zh_27k.json
├── ultra_chat
│   └── ultra_chat.py
└── wiki_demo.txt

准备预训练权重

用户从链接自行获取模型配置文件和权重文件，并放于 model 目录下，微调依赖该模型权重，文件夹内容如下：

├──model
    ├── config.json
    ├── configuration_baichuan.py
    ├── generation_config.json
    ├── modeling_baichuan.py
    ├── pytorch_model-00001-of-00003.bin
    ├── pytorch_model-00002-of-00003.bin
    ├── pytorch_model-00003-of-00003.bin
    ├── pytorch_model.bin.index.json
    ├── quantizer.py
    ├── requirements.txt
    ├── special_tokens_map.json
    ├── tokenization_baichuan.py
    ├── tokenizer_config.json
    └── tokenizer.model

配置双机通信环境

1.安装pdsh url： https://github.com/chaos/pdsh/tree/pdsh-2.29

安装

git clone https://github.com/chaos/pdsh/archive/refs/tags/pdsh-2.29.tar.gz

tar -zxvf pdsh-2.29.tar.gz
cd pdsh-2.29
./configure --with-ssh --with-rsh --with-mrsh --with-mqshel --with-qshell  --with-dshgroups --with-machines=/etc/pdsh/machines  --without-pam

make
make install

安装完成后，执行pdsh -h命令。显示如下信息，表示安装成功。

# pdsh -h
Usage: pdsh [-options] command ...
-S                return largest of remote command return values
-h                output usage menu and quit
-V                output version information and quit
-q                list the option settings and quit
-b                disable ^C status feature (batch mode)
-d                enable extra debug information from ^C status
-l user           execute remote commands as user
-t seconds        set connect timeout (default is 10 sec)
-u seconds        set command timeout (no default)
-f n              use fanout of n nodes
-w host,host,...  set target node list on command line
-x host,host,...  set node exclusion list on command line
-R name           set rcmd module to name
-M name,...       select one or more misc modules to initialize first
-N                disable hostname: labels on output lines
-L                list info on all loaded modules and exit
-g groupname      target hosts in dsh group "groupname"
-X groupname      exclude hosts in dsh group "groupname"
-a                target all nodes
available rcmd modules: ssh,rsh,exec (default: rsh)

2.双机通信配置

首先，我们需要编辑两台服务器的/etc/hosts文件，添加两台服务器的IP地址，并将ip1和ip2替换为两台服务器的实际IP地址

vim /etc/hosts

ip1 node1
ip2 node2

然后，我们需要执行以下命令来生成sshkey。

ssh-keygen -t rsa

接着，将ssh-key拷贝到每个节点，本机也要拷贝。

ssh-copy-id root@ip1
ssh-copy-id root@ip2

然后，在每个节点上运行以下代码，首次执行时需要手动输入yes，然后执行exit退出。再次执行以下命令时，如果不需要输入密码，则表示配置成功。

ssh node1
ssh node2

开始训练

准备代码

git clone https://gitee.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/PyTorch/built-in/foundation/Baichuan-13B

git clone https://github.com/hiyouga/LLaMA-Factory/tree/7a5318804870b1f2bedec8d4a676e465b48d5c3e
cd ${模型文件夹名称}

然后，

使用utils目录下的train_bash.py文件替换./${模型文件夹名称}/src路径下的train_bash.py
使用utils目录下的misc.py文件替换./${模型文件夹名称}/src/llmtuner/extras路径下的misc.py；
使用utils目录下的modeling_baichuan.py文件替换../model路径下的modeling_baichuan.py.

cp ../run_baichuan_sft_1m.sh .
cp ../ds_config_zero3.json .

cp ../utils/train_bash.py ./src
cp ../utils/misc.py ./src/llmtuner/extras
cp ../utils/modeling_baichuan.py ../model

单机启动

1、将run_baichuan_sft_1m.sh、ds_config_zero3.json文件拷贝到${模型文件夹名称}路径下。

cp ../run_baichuan_sft_1m.sh .
cp ../ds_config_zero3.json .

2、启动脚本首先配置run_baichuan_sft_1m.sh脚本。

# 修改 MODEL_PATH 路径
MODEL_PATH="../model"

然后执行如下命令启动训练。

sh run_baichuan_sft_1m.sh

双机启动

1、将run_baichuan_sft_2m.sh、ds_config_zero2.json、hostfile文件拷贝到${模型文件夹名称}路径下。

cp ../run_baichuan_sft_2m.sh .
cp ../ds_config_zero2.json .
cp ../hostfile .

2、启动脚本首先配置run_baichuan_sft_2m.sh脚本。

# 修改 MODEL_PATH 路径
MODEL_PATH="../model"
然后执行如下命令启动双机16卡微调训练。
```shell
sh run_baichuan_sft_2m.sh

模型训练部分参数说明如下：

--deepspeed                     //使用DeepSpeed分布式训练框架。
--dataset                       //指定训练数据集。
--finetuning_type               //指定微调类型。
--output_dir                    //指定输出目录。
--per_device_train_batch_size   //每个设备的训练批次大小。
--gradient_accumulation_steps   //梯度累积步数。
--learning_rate                 //学习率。
--num_train_epochs              //训练的轮数。
--fp16                          //使用fp16精度浮点数进行训练。

注：zero3策略下也可以双机执行训练。为确保双机训练成功，请保证双机环境及路径一致，包括项目路径、conda环境、cann和驱动等。训练完成后，权重文件保存--output_dir参数指定的路径下，并输出模型训练相关信息。

训练结果展示

表 2 训练结果展示表

Device	Torch_Version	total epochs	train loss	train samples per second	train steps per second
16p-NPUs	2.0.1	10.0	0.903	11.378	0.022
16p-竞品	2.0.1	10.0	0.903	9.3	0.018

推理

推理环境搭建

推理环境搭建参考上述训练环境搭建。
准备推理权重。用户从链接自行获取模型配置文件和权重文件，并放于Baichuan-13B-Chat 目录下。

├── config.json
├── configuration_baichuan.py
├── generation_config.json
├── generation_utils.py
├── handler.py
├── modeling_baichuan.py
├── pytorch_model-00001-of-00003.bin
├── pytorch_model-00002-of-00003.bin
├── pytorch_model-00003-of-00003.bin
├── pytorch_model.bin.index.json
├── quantizer.py
├── README.md
├── requirements.txt
├── special_tokens_map.json
├── tokenization_baichuan.py
├── tokenizer_config.json
└── tokenizer.model

推理脚本

1）执行vim infer.py创建推理脚本，然后将下面代码写入infer.py文件中，然后按Esc键输入:wq退出并保存文件。

import torch
import torch_npu
from torch_npu.contrib import transfer_to_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig

model_weight_path = 'Baichuan-13B-Chat/'
tokenizer = AutoTokenizer.from_pretrained(model_weight_path, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_weight_path, device_map="npu:1", torch_dtype=torch.float16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_weight_path)
messages = []
messages.append({"role": "user", "content":"解释一下“温故而知新”" })
response = model.chat(tokenizer, messages)

print(response)

infer.py文件中的配置参数:

# 指定加载的模型权重为上述下载的权重和配置文件夹。
model_weight_path = 'Baichuan-13B-Chat/'     # 模型权重
device_map="npu:1"                           # 指定运行的NPU卡

2）运行下面命令执行推理任务

 python infer.py

推理结果展示

嘃 温故而知新,可以为师矣。 解释:温习学过的知识,从而得到新的理解和体会。也指回忆过去,能更好地认识现在。 温故而知新,可以为师矣。 解释:复习旧的知识,能够从中有新的收获。这样的人就可以做老师了。 温故而知新,可以为师矣。 解释:复习旧的知识,能够从中有新的收获。这样的人就可以做老师了。

评估

准备数据集任务

在的evaluation 目录下已经存在评估任务数据集：

evaluation
├── ceval
│   ├── ceval.py
│   ├── ceval.zip
│   └── mapping.json
├── cmmlu
│   ├── cmmlu.py
│   ├── cmmlu.zip
│   └── mapping.json
└── mmlu
    ├── mapping.json
    ├── mmlu.py
    └── mmlu.zip

运行评估任务

执行vim evaluation.sh创建推理脚本，然后将下面代码写入evaluation.sh文件中，然后按Esc键输入:wq退出并保存文件。

#!/bin/bash

MODEL_NAME_OR_PATH=./model_weight
CHECKPOINT=./model_weight


ASCEND_RT_VISIBLE_DEVICES=1 python src/evaluate.py \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --finetuning_type full \
    --checkpoint_dir $CHECKPOINT \
    --template default \
    --task ceval \
    --split validation \
    --lang en \
    --n_shot 5 \
    --batch_size 4

然后运行下面代码执行评估任务。

bash evaluation.sh

评估结果展示

表 3 评估结果展示表

任务	模型	昇腾值	参考值	社区值
CEval	Baichuan-13B	43.98	42.72	--

FAQ

为适配V0.2.0的代码，在配置完运行环境后做如下修改：

1、检测下面python包并安装对应版本。

pip install trl==0.7.2
pip install transformers==4.31.0
pip install transformers_stream_generator decorator absl-py cloudpickle synr==0.5.0 tornado

2、修改deepspeed版本检测。

注释 ${conda环境路径}/lib/python3.8/site-packages/transformers/deepspeed.py line65的deepspeed版本检测代码。
将 ${conda环境路径}/lib/python3.8/site-packages/accelerate/accelerator.py line289修改为if compare_versions("deepspeed", "<", "0.9.2"):

3、如果报ssh的错误，可以把hostfile文件的node1、node2修改为具体的IP地址。

4、如果报错timeout,请在添加环境变量export HCCL_ALGO="level1:H-D_R"。

引用

@Misc{llama-factory,
  title = {LLaMA Factory},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/LLaMA-Factory}},
  year = {2023}
}

使用模型资源和服务前，请您仔细阅读并理解透彻《昇腾深度学习模型许可协议 3.0》

Baichuan 13B-PyTorch

baichuan-13B

概述

简介

准备训练环境

准备环境

准备数据集

准备预训练权重

配置双机通信环境

开始训练

准备代码

单机启动

双机启动

训练结果展示

推理

推理环境搭建

推理脚本

推理结果展示

评估

准备数据集任务

运行评估任务

评估结果展示

FAQ

引用

关于昇腾

新闻与活动

交流与资讯

支持与服务

开源社区

About Ascend

Communication and Information

Links