使用样例
限制与约束
Atlas 800I A2 推理产品 和Atlas 300I Duo 推理卡支持此特性。- LLaMa系列、Qwen1、Qwen1.5和Qwen2系列模型支持对接此特性。
- 当跨session公共前缀token数大于等于Page Attention中的block size,才会进行公共前缀token的KV Cache复用。
- 该特性只支持单机(非分布式)服务部署场景。
操作步骤
本章节以多轮对话为例,简单介绍Prefix Cache如何使用。
- 配置服务化参数,服务化参数说明请参见配置参数说明章节。
cd ${mindie-service安装路径} vi conf/config.json
Prefix Cache特性需要额外配置的参数:
- 在ModelDeployConfig中的ModelConfig下添加以下参数:
"plugin_params": "{\"plugin_type\":\"prefix_cache\"}"
- 在ScheduleConfig中添加以下参数:
"enablePrefixCache": true
保存修改后的配置后启动服务化:
./bin/mindieservice_daemon
- 在ModelDeployConfig中的ModelConfig下添加以下参数:
- 第一次使用以下指令发送请求,prompt为第一轮问题。
如需使用到Prefix Cache特性,第二次请求的prompt需要与第一次的prompt有一定长度的公共前缀,常见使用场景有多轮对话和few-shot学习等。
curl https://127.0.0.1:1025/generate \ -H "Content-Type: application/json" \ --cacert ca.pem --cert client.pem --key client.key.pem \ -X POST \ -d '{ "inputs": "Question: Parents have complained to the principal about bullying during recess. The principal wants to quickly resolve this, instructing recess aides to be vigilant. Which situation should the aides report to the principal?\na) An unengaged girl is sitting alone on a bench, engrossed in a book and showing no interaction with her peers.\nb) Two boys engaged in a one-on-one basketball game are involved in a heated argument regarding the last scored basket.\nc) A group of four girls has surrounded another girl and appears to have taken possession of her backpack.\nd) Three boys are huddled over a handheld video game, which is against the rules and not permitted on school grounds.\nAnswer:", "parameters": {"max_new_tokens":512} }'
- 第二次发送请求,prompt为:第一轮问题+第一轮答案+第二轮问题,此时第一轮问题为可复用的公共前缀(实际复用部分可能不是第一轮问题的完整prompt;由于cache实现以block为单位,Prefix Cache以blocksize的倍数储存,如第一轮问题prompt的token数量为164,当blocksize为128时,实际复用部分只有前128token)。
curl https://127.0.0.1:1025/generate \ -H "Content-Type: application/json" \ --cacert ca.pem --cert client.pem --key client.key.pem \ -X POST \ -d '{ "inputs": "Question: Parents have complained to the principal about bullying during recess. The principal wants to quickly resolve this, instructing recess aides to be vigilant. Which situation should the aides report to the principal?\na) An unengaged girl is sitting alone on a bench, engrossed in a book and showing no interaction with her peers.\nb) Two boys engaged in a one-on-one basketball game are involved in a heated argument regarding the last scored basket.\nc) A group of four girls has surrounded another girl and appears to have taken possession of her backpack.\nd) Three boys are huddled over a handheld video game, which is against the rules and not permitted on school grounds.\nAnswer:c) A group of four girls has surrounded another girl and appears to have taken possession of her backpack.\nExplanation: The principal wants to quickly resolve this, instructing recess aides to be vigilant. The principal is concerned about bullying during recess. The principal wants the aides to report any bullying behavior to him. The principal is not concerned about the other situations.\nQuestion: If the aides confront the group of girls from situation (c) and they deny bullying, stating that they were merely playing a game, what specific evidence should the aides look for to determine if this is a likely truth or a cover-up for bullying?\nAnswer:", "parameters": {"max_new_tokens":512} }'
父主题: Prefix Cache