环境安装与启动服务

前提条件

已参见《MindIE安装指南》中“安装驱动和固件”章节完成驱动和固件的安装。
已参见《MindIE安装指南》中“安装开发环境”章节完成CANN、Python 3.10.2、PyTorch 2.1.0框架和Torch_NPU 2.1.0插件的安装。
已参见《MindIE安装指南》中“物理机部署MindIE”章节完成MindIE的安装。

安装步骤

安装Rust与必要软件包：

# 对于ARM 64位CPU为aarch64，对于X86 64位CPU可将下面指令的aarch64替换为x86_64
wget https://static.rust-lang.org/dist/rust-1.79.0-aarch64-unknown-linux-gnu.tar.gz --no-check-certificate
tar -xvf rust-1.79.0-aarch64-unknown-linux-gnu.tar.gz
cd rust-1.79.0-aarch64-unknown-linux-gnu
bash install.sh

sudo apt update
apt install pkg-config

设置相关环境变量：

首先在命令行里运行python，通过torch.__file__的路径确认Protoc所在目录，以Python 3.10.2为例：

Python 3.10.2 (main, Sep 23 2024, 08:51:58) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__file__
'/usr/local/python3.10.2/lib/python3.10/site-packages/torch/__init__.py'

控制台输出的__init__.py所在目录的子文件夹bin下即为Protoc的放置路径。随后将Cargo的可执行文件目录和Protoc目录导出到$PATH（在进行下一步骤前该目录可能为空或不存在）：

# Cargo编译出的可执行文件目录
export PATH=$PATH:~/.cargo/bin/
# protoc所在目录 
export PATH=/usr/local/python3.10.2/lib/python3.10/site-packages/torch/bin:$PATH

参照样例代码设置文件目录与代码内容，并运行一键安装脚本install.sh进行环境安装与代码编译。脚本如下：

bash install.sh

安装完成后，可检查可执行文件和安装包是否已存在。

1）在~/.cargo/bin/目录下包括两个可执行文件：text-generation-launcher和text-generation-router

ll ~/.cargo/bin/
-rwxr-xr-x 1 root root  22714304 Nov  8 15:51 text-generation-launcher*
-rwxr-xr-x 1 root root 113330192 Nov  8 15:49 text-generation-router*

2）已安装两个python包：text-generation-server和tgi_npu

pip show text-generation-server
Name: text-generation-server
Version: 2.0.4
Summary: Text Generation Inference Python gRPC Server
Home-page: 
Author: Olivier Dehaene
Author-email: olivier@huggingface.co

pip show tgi_npu
Name: tgi-npu
Version: 0.1.0
Summary: NPU MindIE Adapter for TGI v2.0.4

安装并启动Nginx：

apt update
apt install nginx
service nginx start

编辑tgi代理配置文件：

vi /etc/nginx/sites-available/tgi_proxy

将以下内容粘贴至配置文件tgi_proxy：

server {
    listen 12346 ssl; # 设置Nginx监听端口并启用SSL，HTTPS默认端口为443；注意，此端口需要与启动TGI服务端口不一致，发送请求时请指定使用此端口号 （根据使用需求修改）
    server_name localhost; # 设置Nginx服务器名称，此处使用本机IP地址(等价于127.0.0.1)；注意，发送请求时请指定使用此名称（根据使用需求修改）

    ssl_certificate /path/to/your/certificate.crt; # 此处填写用于Nginx服务的证书(受信任的机构签发的crt文件)路径（根据实际路径修改）
    ssl_certificate_key /path/to/your/key.key; # 此处填写与证书相匹配的私钥（证书配套的key文件）路径（根据实际路径修改）

    location / {
        proxy_pass http://127.0.0.1:12347; # 此处填写启动TGI服务所使用的IP地址和端口（根据使用需求修改）
        proxy_set_header Host $host; # 保留原始主机头信息（无需修改）
        proxy_set_header X-Real-IP $remote_addr; # 向后端服务传递客户端的IP地址（无需修改）
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # 传递请求所经过的代理链信息（无需修改）
        proxy_set_header X-Forwarded-Proto $scheme; # 传递原始请求的协议类型（无需修改）
    }
}

通过创建符号链接启用tgi_proxy配置文件并重启Nginx：

ln -s /etc/nginx/sites-available/tgi_proxy /etc/nginx/sites-enabled/
service nginx restart

服务端使用拉起服务脚本拉起TGI在线推理服务：

# 控制框架占用显存比例
export CUDA_MEMORY_FRACTION=0.9
# 系统可见显卡id
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 本地模型权重路径或在Huggingface代码仓中的位置
model_path=/home/data/models/qwen2_7B_instruct
# 以下启动参数与原生TGI一致
text-generation-launcher \
 --model-id $model_path \
 --port 12347 \
 --max-input-length 2048 \
 --max-total-tokens 2560 \
 --sharded true \
 --num-shard 8 \
 --max-batch-prefill-tokens 8192 \
 --max-waiting-tokens 20 \
 --max-concurrent-requests 256 \
 --waiting-served-ratio 1.2

客户端使用curl、requests等方式向服务端发送基于HTTPS协议推理请求并接收响应：

curl  https://127.0.0.1:12346/generate -X POST -d '{"inputs":"Please introduce yourself.","parameters":{"max_new_tokens":64,"repetition_penalty":1.2}}' -H 'Content-Type: application/json'

原生TGI基于HTTP协议提供推理服务，请求命令如下（由于HTTP协议安全性问题，不推荐此方式）：

curl  http://127.0.0.1:12347/generate -X POST -d '{"inputs":"Please introduce yourself.","parameters":{"max_new_tokens":64,"repetition_penalty":1.2}}' -H 'Content-Type: application/json'

父主题： 适配说明