查询TGI EndPoint信息

最大程度兼容TGI接口返回格式，对于MindIE Server不支持的返回字段，返回null。

接口功能

查询TGI EndPoint信息。

接口格式

操作类型：GET

URL：https://{ip}:{port}/info

{ip}字段优先读取环境变量值MIES_CONTAINER_MANAGEMENT_IP；如果没有该环境变量，则取配置文件的“managementIpAddress”参数；如果配置文件中没有“managementIpAddress”参数，则取配置文件的“ipAddress”参数。
{port}字段优先读取配置文件的“managementPort”参数；如果配置文件中没有“managementPort”参数，则取配置文件的“port”参数。

请求参数

无

使用样例

请求样例：

GET https://{ip}:{port}/info

响应样例：

{
    "docker_label": null,
    "max_batch_total_tokens": 8192,
    "max_best_of": 1,
    "max_concurrent_requests": 200,
    "max_stop_sequences": null,
    "max_waiting_tokens": null,
    "sha": null,
    "validation_workers": null,
    "version": "1.0.RC3",
    "waiting_served_ratio": null,
    "models": [
        {
            "model_device_type": "npu",
            "model_dtype": "float16",
            "model_id": "llama_65b",
            "model_pipeline_tag": "text-generation",
            "model_sha": null,
            "max_total_tokens": 2560
        }
    ],
    "max_input_length": 2048
}

响应状态码：200

输出说明

参数	类型	说明
docker_label	string	暂不支持，默认返回null。
max_batch_total_tokens	int	取maxPrefillTokens。
max_best_of	int	暂不支持best_of参数，默认返回1，即每次只返回1个推理结果。
max_concurrent_requests	int	最大并发请求数，取maxBatchSize。
max_stop_sequences	int	暂不支持，默认返回null。
max_waiting_tokens	int	暂不支持，默认返回null。
sha	string	暂不支持，默认返回null。
validation_workers	int	暂不支持，默认返回null。
version	string	版本号。
waiting_served_ratio	float	暂不支持，默认返回null。
models	list	模型配置。
model_device_type	string	模型运行设备类型，默认返回"npu"。
model_dtype	string	模型数据类型，读取权重配置文件目录config.json文件中的torch_dtype字段。
model_id	string	模型名称。
model_pipeline_tag	string	模型任务类型，默认返回"text-generation"。
model_sha	string	暂不支持，默认返回null。
max_total_tokens	int	最大推理token总数，读取maxSeqLen的值。
max_input_length	int	最大输入长度，读取maxInputTokenLen的值。

父主题： 兼容TGI 0.9.4版本接口