文本/流式推理接口,将请求体中的stream参数改为false即为文本推理,改为true即为流式推理:
curl -H "Accept: application/json" -H "Content-type: application/json" --cacert ca.pem --cert client.pem --key client.key.pem -X POST -d '{ "inputs": "My name is Olivier and I", "stream": true, "parameters": { "temperature": 0.5, "top_k": 10, "top_p": 0.95, "do_sample": true, "seed": null, "repetition_penalty": 1.03, "details": true } }' https://127.0.0.1:1025/infer
其他接口请参见自研接口章节
创建MindIE Client的方法请参见使用兼容Triton接口,之后可使用MindIE Client的Python接口来提前终止请求。
from utils import create_client if __name__ == "__main__": # get argument and create client mindie_client = create_client() # create input prompt = "My name is Olivier and I" model_name = "llama_65b" parameters = { "do_sample": True, "temperature": 0.5, "top_k": 10, "top_p": 0.9, "truncate": 5, "typical_p": 0.9, "seed": 1, "repetition_penalty": 1, "watermark": True, "details": True, } # apply model inference results = mindie_client.generate_stream( model_name, prompt, request_id="1", parameters=parameters, ) # stop early generated_text = "" index = 0 for cur_res in results: index += 1 if index == 10: flag = mindie_client.cancel(model_name, "1") if flag: print("Test cancel api succeed!") sys.exit(0) else: print("Test cancel api failed!") sys.exit(1) print("current result: %s", cur_res)
其他MindIE Client接口请参见class MindIEHTTPClient章节。