推理接口

接口功能

提供文本/流式推理处理功能。

接口格式

操作类型:POST

URL:https://{ip}:{port}/v1/chat/completions

请求参数

参数

是否必选

说明

取值要求

model

必选

模型名,当前不校验该字段。

-

messages

必选

推理请求消息结构。

List类型。

role

可选

推理请求消息角色,当前不处理该字段。

-

content

必选

推理请求文本。

非空,0<字符数<=16000,支持中英文。tokenizer之后的token数量<=maxSeqLen-maxIterTimes(配置文件读取)。

max_tokens

可选

允许的最大新标记数目。控制从模型生成的文本中添加到最终输出中的最大词汇数量。该字段受到GIMIS配置文件maxIterTimes参数影响,推理token输出长度<=maxIterTimes。

int类型,取值范围(0, maxIterTimes]。默认值16。

presence_penalty

可选

存在惩罚介于-2.0和2.0之间,它影响模型如何根据到目前为止是否出现在文本中来惩罚新token。正值将通过惩罚已经使用的词,增加模型谈论新主题的可能性。

float类型,取值范围[-2.0, 2.0],默认值0.0。

frequency_penalty

可选

频率惩罚介于-2.0和2.0之间,它影响模型如何根据文本中词汇(token)的现有频率惩罚新词汇(token)。正值将通过惩罚已经频繁使用的词来降低模型一行中重复用词的可能性。

float类型,取值范围[-2.0, 2.0],默认值0.0。

seed

可选

用于指定推理过程的随机种子,相同的seed值可以确保推理结果的可重现性,不同的seed值会提升推理结果的随机性。

int_64类型,取值范围(0, 9223372036854775807],不传递该参数,系统会产生一个随机seed值。

temperature

可选

控制生成的随机性,较高的值会产生更多样化的输出。

float类型,大于0,默认值1.0。

  • 1.0表示不进行计算。
  • 大于1.0表示输出随机性提高。

top_p

可选

控制模型生成过程中考虑的词汇范围,使用累计概率选择候选词,直到累计概率超过给定的阈值。该参数也可以控制生成结果的多样性,它基于累积概率选择候选词,直到累计概率超过给定的阈值为止。

float类型,取值范围(0,1 ],默认值1.0。

stream

可选

指定返回结果是文本推理还是流式推理。

bool类型参数,默认值false。

使用样例

请求样例:

POST https://<ip>:<port>/v1/chat/completions

请求消息体:

{
 "model": "gpt-3.5-turbo",
 "messages": [{
     "role": "system",
     "content": "You are a student who is good at math."
    },
    {
     "role": "user", 
     "content": "what is your hobby?"
    }
],
 "max_tokens": 20,
 "presence_penalty": 1.03,
 "frequency_penalty": 1.0,
 "seed": null,
 "temperature": 0.5,
 "top_p": 0.95,
 "stream": false
}

文本推理(“stream”=“false”)响应样例:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0613",
    "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

流式推理(“stream”=“true”)响应样例(使用sse格式返回):

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" French"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"man"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" living"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" in"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" UK"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" keen"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" photograph"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"er"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" and"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" have"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" been"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{},"finish_reason":"length"}]}

输出说明

表1 文本推理结果说明

参数名

类型

说明

id

string

请求id。

created

integer

推理请求时间戳,精确到秒。

model

string

使用的推理模型。

object

string

返回结果类型目前都返回"chat.completion"。

usage

object

推理结果统计数据。

completion_tokens

int

推理token数量。

prompt_tokens

int

请求文本tokonizer之后的数量。

total_tokens

int

请求+推理总token数。

choices

list

推理结果列表。

finish_reason

string

  • stop:遇到stop条件自然停止。
  • length:达到max_tokens。

index

integer

choices消息index,从0开始。

message

object

推理消息。

role

string

角色,目前都返回"assistant"。

content

string

推理文本结果。

表2 流式推理结果说明

参数名

类型

说明

id

string

请求id。

created

integer

推理请求时间戳,精确到秒。

model

string

使用的推理模型。

object

string

目前都返回"chat.completion.chunk"。

choices

list

流式推理结果。

finish_reason

string

  • stop:遇到stop条件自然停止。
  • length:达到max_tokens。

最后一个消息结果才返回该值。

index

integer

choices消息index,从0开始。

delta

object

推理返回结果,最后一个响应为空。

role

string

角色,目前都返回"assistant"。

content

string

推理文本结果。