接口功能

提供文本/流式推理处理功能。

接口格式

操作类型：POST

URL：https://{ip}:{port}/v1/chat/completions

请求参数

参数	是否必选	说明	取值要求
model	必选	模型名，当前不校验该字段。	-
messages	必选	推理请求消息结构。	List类型。
role	可选	推理请求消息角色，当前不处理该字段。	-
content	必选	推理请求文本。	非空，0<字符数<=16000，支持中英文。tokenizer之后的token数量<=maxSeqLen-maxIterTimes（配置文件读取）。
max_tokens	可选	允许的最大新标记数目。控制从模型生成的文本中添加到最终输出中的最大词汇数量。该字段受到GIMIS配置文件maxIterTimes参数影响，推理token输出长度<=maxIterTimes。	int类型，取值范围(0, maxIterTimes]。默认值16。
presence_penalty	可选	存在惩罚介于-2.0和2.0之间，它影响模型如何根据到目前为止是否出现在文本中来惩罚新token。正值将通过惩罚已经使用的词，增加模型谈论新主题的可能性。	float类型，取值范围[-2.0, 2.0]，默认值0.0。
frequency_penalty	可选	频率惩罚介于-2.0和2.0之间，它影响模型如何根据文本中词汇（token）的现有频率惩罚新词汇（token）。正值将通过惩罚已经频繁使用的词来降低模型一行中重复用词的可能性。	float类型，取值范围[-2.0, 2.0]，默认值0.0。
seed	可选	用于指定推理过程的随机种子，相同的seed值可以确保推理结果的可重现性，不同的seed值会提升推理结果的随机性。	int_64类型，取值范围(0, 9223372036854775807]，不传递该参数，系统会产生一个随机seed值。
temperature	可选	控制生成的随机性，较高的值会产生更多样化的输出。	float类型，大于0，默认值1.0。 1.0表示不进行计算。大于1.0表示输出随机性提高。
top_p	可选	控制模型生成过程中考虑的词汇范围，使用累计概率选择候选词，直到累计概率超过给定的阈值。该参数也可以控制生成结果的多样性，它基于累积概率选择候选词，直到累计概率超过给定的阈值为止。	float类型，取值范围(0,1 ]，默认值1.0。
stream	可选	指定返回结果是文本推理还是流式推理。	bool类型参数，默认值false。

使用样例

请求样例：

POST https://<ip>:<port>/v1/chat/completions

请求消息体：

{
 "model": "gpt-3.5-turbo",
 "messages": [{
     "role": "system",
     "content": "You are a student who is good at math."
    },
    {
     "role": "user", 
     "content": "what is your hobby?"
    }
],
 "max_tokens": 20,
 "presence_penalty": 1.03,
 "frequency_penalty": 1.0,
 "seed": null,
 "temperature": 0.5,
 "top_p": 0.95,
 "stream": false
}

文本推理（“stream”=“false”）响应样例：

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0613",
    "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

流式推理（“stream”=“true”）响应样例（使用sse格式返回）：

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" French"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"man"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" living"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" in"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" UK"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" keen"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" photograph"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"er"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" and"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" have"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" been"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{},"finish_reason":"length"}]}

输出说明

表1 文本推理结果说明
参数名	类型	说明
id	string	请求id。
created	integer	推理请求时间戳，精确到秒。
model	string	使用的推理模型。
object	string	返回结果类型目前都返回"chat.completion"。
usage	object	推理结果统计数据。
completion_tokens	int	推理token数量。
prompt_tokens	int	请求文本tokonizer之后的数量。
total_tokens	int	请求+推理总token数。
choices	list	推理结果列表。
finish_reason	string	stop：遇到stop条件自然停止。 length：达到max_tokens。
index	integer	choices消息index，从0开始。
message	object	推理消息。
role	string	角色，目前都返回"assistant"。
content	string	推理文本结果。

表2 流式推理结果说明
参数名	类型	说明
id	string	请求id。
created	integer	推理请求时间戳，精确到秒。
model	string	使用的推理模型。
object	string	目前都返回"chat.completion.chunk"。
choices	list	流式推理结果。
finish_reason	string	stop：遇到stop条件自然停止。 length：达到max_tokens。最后一个消息结果才返回该值。
index	integer	choices消息index，从0开始。
delta	object	推理返回结果，最后一个响应为空。
role	string	角色，目前都返回"assistant"。
content	string	推理文本结果。

推理接口

接口功能

接口格式

请求参数

使用样例

输出说明