使用自研接口

文本/流式推理接口,将请求体中的stream参数改为false即为文本推理,改为true即为流式推理:

curl -H "Accept: application/json" -H "Content-type: application/json" --cacert ca.pem --cert client.pem  --key client.key.pem -X POST -d '{
 "inputs": "My name is Olivier and I",
 "stream": true,
 "parameters": {
  "temperature": 0.5,
  "top_k": 10,
  "top_p": 0.95,
  "do_sample": true,
  "seed": null,
  "repetition_penalty": 1.03,
  "details": true
 }
}' https://127.0.0.1:1025/infer

其他接口请参见自研接口章节

创建MindIE Client的方法请参见使用兼容Triton接口,之后可使用MindIE Client的Python接口来提前终止请求。

from utils import create_client
if __name__ == "__main__":
    # get argument and create client
    mindie_client = create_client()
    # create input
    prompt = "My name is Olivier and I"
    model_name = "llama_65b"
    parameters = {
        "do_sample": True,
        "temperature": 0.5,
        "top_k": 10,
        "top_p": 0.9,
        "truncate": 5,
        "typical_p": 0.9,
        "seed": 1,
        "repetition_penalty": 1,
        "watermark": True,
        "details": True,
    }
    # apply model inference
    results = mindie_client.generate_stream(
        model_name,
        prompt,
        request_id="1",
        parameters=parameters,
    )
    # stop early
    generated_text = ""
    index = 0
    for cur_res in results:
        index += 1
        if index == 10:
            flag = mindie_client.cancel(model_name, "1")
            if flag:
                print("Test cancel api succeed!")
                sys.exit(0)
            else:
                print("Test cancel api failed!")
                sys.exit(1)
        print("current result: %s", cur_res)

其他MindIE Client接口请参见class MindIEHTTPClient章节。