跳转至

弹性部署API文档

新增接口: 设置调度黑名单获取地区GPU库存

使用弹性部署API需先认证企业。了解弹性部署请参考文档

API服务端HOST地址为:https://api.autodl.com

鉴权

token获取位置: 控制台 -> 设置 -> 开发者Token

headers = {"Authorization": "token"}

获取镜像

镜像为在AutoDL中创建并保存的自定义镜像,创建和保存可通过autodl.com网页完成。暂不支持从外部导入镜像。使用平台提供的基础公共镜像请看文末附录

请求

POST /api/v1/dev/image/private/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
page_index Int 页码
page_size Int 每页条目数
offset Int 查询的起始偏移量

样例:

{
    "page_index": 1,
    "page_size": 10,
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list List<Response对象>

Response对象参数:

参数 数据类型 备注
id Int 镜像ID
image_name String 镜像名称
image_uuid String 镜像的UUID

样例:

{
    "code": "Success",
    "msg": ""
    "data": {
        "list": [
            {
                "id": 111,
                "created_at": "2022-01-20T18:34:08+08:00",
                "updated_at": "2022-01-20T18:34:08+08:00",
                "image_uuid": "image-db8346e037",
                "name": "image name",
                "status": "finished",
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "max_page": 1,
        "offset": 0,
    },
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/image/private/list"
body = {
    "page_index": 1,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

创建部署

请求

POST /api/v1/dev/deployment

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
name String 部署名称
deployment_type String 部署类型。支持ReplicaSet、Job、Container
replica_num Int ReplicaSet、Job必填 创建容器的副本数量,ReplicaSet、Job必填
parallelism_num Int Job必填 Job类型部署同时在运行的容器容量
reuse_container Bool 是否复用已经停止的容器,可显著提升创建容器的速度
container_template Container Template对象

Container Template对象:

参数 数据类型 是否必须 备注
region_sign String 容器可调度的地区。地区参数值参考文档最下方附录
cuda_v Int 将选择GPU驱动支持该CUDA版本的主机进行调度,可选值及更多说明请参考文档最下方
gpu_name_set List<String> 可调度的GPU型号。参考网页创建弹性部署时显示的GPU型号名称
gpu_num Int 创建容器所需GPU数量
memory_size_from Int 可调度的容器内存大小范围。单位:GB
memory_size_to Int 同上
cpu_num_from Int 可调度的CPU核心数量范围。单位:1vCPU
cpu_num_to Int 同上
price_from Int 可调度的价格范围。单位:元 * 1000,如0.1元填写100
price_to Int 同上
image_uuid String 私有镜像UUID或平台公共基础镜像的UUID(参考文末附录)
cmd String 启动容器命令

样例:

{
    "name": "api自动创建", 
    "deployment_type": "ReplicaSet", 
    "replica_num": 2, 
    "reuse_container": true,
    "container_template": {
        "region_sign": "suqianDC1", 
        "gpu_name_set": [
            "RTX A5000"
        ], 
        "cuda_v": 113,
        "gpu_num": 1, 
        "cpu_num_from": 1, 
        "cpu_num_to": 100, 
        "memory_size_from": 1,
        "memory_size_to": 256, 
        "cmd": "sleep 100",
        "price_from": 100,  # 基准价格:0.1元/小时
        "price_to": 9000, # 基准价格:9元/小时
        "image_uuid": "image-db8346e037"
    }
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data Response对象

Response对象参数:

参数 数据类型 备注
deployment_uuid String 部署的UUID

样例:

{
    "code": "Success",
    "msg": "",
    "data": {
        "deployment_uuid": "833f1cd5a764fa3"
    }
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment"

# 创建ReplicaSet类型部署
body = {
    "name": "api自动创建",
    "deployment_type": "ReplicaSet",
    "replica_num": 2,
    "reuse_container": True,
    "container_template": {
        "region_sign": "suqianDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 100",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

# 附:
# 如果创建Job类型部署,Body为:
{
    "name": "api自动创建",
    "deployment_type": "Job",
    "replica_num": 4,
    "parallelism_num": 1,
    "reuse_container": True,
    "container_template": {
        "region_sign": "suqianDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 10",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}

# 如果创建Container类型部署,Body为:
{
    "name": "api自动创建",
    "deployment_type": "Container",
    "reuse_container": True,
    "container_template": {
        "region_sign": "neimengDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 100",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}

获取部署列表

POST /api/v1/dev/deployment/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
page_index Int 页码
page_size Int 每页条目数
deployment_uuid String 选填,可根据部署的UUID筛选

样例:

{
    "page_index": 1,
    "page_size": 10,
}

响应

字段含义同创建部署的传参字段含义

样例:

{
    "code": "Success",
    "data": {
        "list": [
            {
                "id": 214,
                "uid": 58,
                "uuid": "53a677bb3e281b8",
                "name": "xxxx",
                "deployment_type": "Container",
                "status": "stopped",
                "replica_num": 1,
                "parallelism_num": 1,
                "reuse_container": true,
                "starting_num": 0,
                "running_num": 0,
                "finished_num": 2,
                "image_uuid": "image-db8346e037",
                "template": {
                    "region_sign": "xxxxx",
                    "region_sign_list": [
                        "xxxxx",
                        "xxxxx"
                    ],
                    "gpu_name_set": [
                        "Tesla V100-SXM2-32GB"
                    ],
                    "gpu_num": 1,
                    "image_uuid": "image-db8346e037",
                    "image_name": "xxxx",
                    "cmd": "sleep 100",
                    "memory_size_from": 1073741824,
                    "memory_size_to": 274877906944,
                    "cpu_num_from": 1,
                    "cpu_num_to": 100,
                    "price_from": 10,
                    "price_to": 9000,
                    "cuda_v": 118
                },
                "price_estimates": 0,
                "created_at": "2023-01-05T20:34:07+08:00",
                "updated_at": "2023-01-05T20:34:07+08:00",
                "stopped_at": null
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "offset": 0,
        "max_page": 1,
        "result_total": 3,
        "page": 1
    },
    "msg": ""
}

查询容器事件

可以通过对请求中的offset参数进行设置,轮询该接口获取最新的容器事件

请求

POST /api/v1/dev/deployment/container/event/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署的UUID
deployment_container_uuid String 容器的UUID,可选
page_index Int 页码
page_size Int 每页条目数
offset Int 查询的起始偏移量

样例:

{
    "deployment_uuid": "da497aea1eb8343", 
    "deployment_container_uuid": "", 
    "page_index": 1, 
    "page_size": 10,
    "offset": 0
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list list<Response对象>

Response对象参数:

参数 数据类型 备注
deployment_container_uuid String 容器的UUID
status String 容器的状态类型
created_at String 状态发生时间

样例:

{
    "code": "Success",
    "data": {
        "list": [
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "shutdown",
                "created_at": "2022-12-13T16:42:45+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "shutting_down",
                "created_at": "2022-12-13T16:42:40+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "running",
                "created_at": "2022-12-13T16:34:57+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "oss_merged",
                "created_at": "2022-12-13T16:34:55+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "starting",
                "created_at": "2022-12-13T16:34:55+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "created",
                "created_at": "2022-12-13T16:34:54+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "creating",
                "created_at": "2022-12-13T16:34:47+08:00"
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "offset": 0,
        "max_page": 1,
    },
    "msg": ""
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/event/list"
body = {
    "deployment_uuid": "424446e02893b5f",
    "deployment_container_uuid": "",
    "page_index": 0,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

查询容器

如果您需要在容器内部获取到容器的UUID,可以通过变量变量AutoDLContainerUUID的值获取。

请求

POST /api/v1/dev/deployment/container/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署UUID
container_uuid String 筛选container uuid
date_from String 筛选容器创建时间范围
date_to String 筛选容器创建时间范围
gpu_name String 筛选GPU型号
cpu_num_from Int 筛选容器CPU核心数量范围
cpu_num_to Int 筛选容器CPU核心数量范围
memory_size_from Int 筛选容器内存大小范围
memory_size_to Int 筛选容器内存大小范围
price_from Float 筛选容器基准价范围
price_to Float 筛选容器基准价范围
released bool 是否查询已经释放的实例
page_index Int 缺省值0
page_size Int 缺省值10
offset Int 查询的起始偏移量

样例:

{
    "deployment_uuid": "da497aea1eb8343", 
    "page_index": 1, 
    "page_size": 10
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list list<Response对象>

Response对象参数:

参数 数据类型 备注
uuid String 容器的UUID
deployment_uuid String 部署的UUID
machine_id String 主机UUID
status String 容器的状态
gpu_name String GPU型号
gpu_num Int GPU数量
cpu_num Int CPU数量
memory_size Int 内存大小,单位byte
image_uuid String 镜像UUID
price Float 基准价格,单位:元*1000
info Info对象
started_at String 开始运行时间
stopped_at String 停止时间
created_at String 创建时间
updated_at String 更新时间

Info对象:

参数 数据类型 备注
ssh_command String SSH登录指令
root_password String SSH密码
service_url String 自定义服务地址
proxy_host String (废弃,请使用service_url)自定义服务HOST地址
custom_port Int (废弃,请使用service_url)自定义服务端口号

样例:

{
    "code": "Success", 
    "msg": "",
    "data": {
        "list": [
            {
                "id": 195, 
                "uuid": "53a677bb3e281b8-f94411a60c-63c24009",
                "machine_id": "f94411a60c", 
                "deployment_uuid": "da497aea1eb8343", 
                "status": "running", 
                "gpu_name": "TITAN Xp", 
                "gpu_num": 1, 
                "cpu_num": 4, 
                "memory_size": 2147483648, 
                "image_uuid": "image-db8346e037", 
                "price": 1881, 
                "info": {
                    "ssh_command": "ssh -p 21305 root@region-1.autodl.com",
                    "root_password": "xxxxxxxxxx", 
                    "service_url": "https://region-1.autodl.com:21294", 
                    "proxy_host": "region-1.autodl.com", 
                    "custom_port": 21294,
                }, 
                "started_at": "2022-12-13T16:43:03+08:00", 
                "stopped_at": null, 
                "created_at": "2022-12-13T16:42:50+08:00", 
                "updated_at": "2022-12-13T16:43:03+08:00"
            }
        ], 
        "page_index": 1, 
        "page_size": 10, 
        "max_page": 1, 
    }, 
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/list"
body = {
    "deployment_uuid": "424446e02893b5f",
    "container_uuid": "",
    "date_from": "",
    "date_to": "",
    "gpu_name": "",
    "cpu_num_from": 0,
    "cpu_num_to": 0,
    "memory_size_from": 0,
    "memory_size_to": 0,
    "price_from": 0,
    "price_to": 0,
    "released": False,

    "page_index": 1,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

停止某容器

除了可以设置副本数量由系统自动伸缩,管理容器生命周期外,该接口支持支持停止某具体容器。如果您希望停止某容器后不再自动启动新容器维持副本数量,可以通过传入decrease_one_replica_num=true完成,在停止容器的同时将replica num副本数量减少1。注意decrease_one_replica_num参数只对ReplicaSet类型部署有效

请求

PUT /api/v1/dev/deployment/container/stop

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_container_uuid String 部署的容器uuid
decrease_one_replica_num Boolean 对于ReplicaSet类型的部署,是否同时将replica num副本数减少1个

样例:

{
     "deployment_container_uuid": "da497aea1eb8343-f94411a60c-a394fb30",
     "decrease_one_replica_num": false
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/stop"
body = {
    "deployment_container_uuid": "da497aea1eb8343-f94411a60c-ec630659",
    "decrease_one_replica_num": False
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

设置副本数量

请求

PUT /api/v1/dev/deployment/replica_num

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid
replica_num Int 副本数量。仅支持ReplicaSet的部署类型

样例:

{
    "deployment_uuid": "xxx",
    "replica_num": 10
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/replica_num"
body = {
    "deployment_uuid": "5be3045703152b9",
    "replica_num": 16
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

停止部署

请求

PUT /api/v1/dev/deployment/operate

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid
operate String 操作类型。目前只能为:"stop"

样例:

{
    "deployment_uuid": "xxx",
    "operate": "stop"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/operate"
body = {
    "deployment_uuid": "5be3045703152b9",
    "operate": "stop"
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

删除部署

如果部署未停止直接执行删除操作,那么系统将会停止和删除部署

请求

DELETE /api/v1/dev/deployment

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid

样例:

{
    "deployment_uuid": "xxx"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment"
body = {
    "deployment_uuid": "5be3045703152b9"
}
response = requests.delete(url, json=body, headers=headers)
print(response.content.decode())

设置调度黑名单

如果在调度和使用容器的过程中发现某个容器出现未知异常,那么您可以将此容器所在主机设置为禁止调度状态(该禁止状态在24小时后自动解除),设置后在接下来24小时内将不会在该主机上调度任何您的部署

请求

POST /api/v1/dev/deployment/blacklist

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_container_uuid String 容器uuid
comment String 备注信息

样例:

{
    "deployment_container_uuid": "xxx",
    "comment": "开机缓慢,禁止在该主机上调度容器"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/blacklist"
body = {
    "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
    "comment": "开机缓慢,禁止在该主机上调度容器"
}
response = requests.delete(url, json=body, headers=headers)
print(response.content.decode())

获取弹性部署GPU库存

请求

POST /api/v1/dev/machine/region/gpu_stock

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
region_sign String 见附录中的不同地区的标识码
cuda_v Int 筛选GPU驱动支持该CUDA版本的主机,可选值及更多说明请参考文档最下方

样例:

{
    "region_sign": "westDC2",
    "cuda_v": 117
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list List

Response对象参数:

参数 数据类型 备注
GPU型号 库存对象

库存对象参数:

参数 数据类型 备注
idle_gpu_num Int 空闲数量
total_gpu_num Int 总数量

样例:

{
    "code": "Success",
    "msg": "",
    "data": [
        {
            "RTX 4090": {
                "idle_gpu_num": 215,
                "total_gpu_num": 2285
            }
        },
        {
            "RTX 3080 Ti": {
                "idle_gpu_num": 20,
                "total_gpu_num": 392
            }
        },
        {
            "RTX A4000": {
                "idle_gpu_num": 6,
                "total_gpu_num": 24
            }
        }
    ]
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/machine/region/gpu_stock"
body = {
    "region_sign": "westDC2",
    "cuda_v": 117
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

附录

  1. 创建部署时的region_sign参数值
地区 region_sign值
西北企业区(推荐) westDC2
华南企业区(推荐) southDC1
宿迁企业区(推荐) suqianDC1
西北B区 westDC3
北京A区 beijingDC1
北京B区 beijingDC2
北京C区 beijingDC4
华南A区(原北京C区) beijingDC3
芜湖区 wuhuDC1
内蒙A区 neimengDC1
佛山区 foshanDC1
西北A区 westDC1
西南A区 westDC4
  1. 公共基础镜像UUID
镜像UUID 框架 镜像
base-image-12be412037 PyTorch cuda11.1-cudnn8-devel-ubuntu18.04-py38-torch1.9.0
base-image-u9r24vthlk PyTorch cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.10.0
base-image-l374uiucui PyTorch cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.11.0
base-image-l2t43iu6uk PyTorch cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0
base-image-0gxqmciyth TensorFlow cuda11.2-cudnn8-devel-ubuntu18.04-py38-tf2.5.0
base-image-uxeklgirir TensorFlow cuda11.2-cudnn8-devel-ubuntu20.04-py38-tf2.9.0
base-image-4bpg0tt88l TensorFlow cuda11.4-py38-tf1.15.5
base-image-mbr2n4urrc Miniconda cuda11.6-cudnn8-devel-ubuntu20.04-py38
base-image-qkkhitpik5 Miniconda cuda10.2-cudnn7-devel-ubuntu18.04-py38
base-image-h041hn36yt Miniconda cuda11.1-cudnn8-devel-ubuntu18.04-py38
base-image-7bn8iqhkb5 Miniconda cudagl11.3-cudnn8-devel-ubuntu20.04-py38
base-image-k0vep6kyq8 Miniconda cuda9.0-cudnn7-devel-ubuntu16.04-py36
base-image-l2843iu23k TensorRT cuda11.8-cudnn8-devel-ubuntu20.04-py38-trt8.5.1
base-image-l2t43iu6uk TensorRT cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0
  1. CUDA版本的值
CUDA版本 cuda_v字段传参值(整型) 说明
11.1 111 主机上GPU驱动支持的最高CUDA版本>=11.1的主机可调度
11.3 113 主机上GPU驱动支持的最高CUDA版本>=11.3的主机可调度
11.7 117 主机上GPU驱动支持的最高CUDA版本>=11.7的主机可调度
11.8 118 主机上GPU驱动支持的最高CUDA版本>=11.8的主机可调度
12.0 120 主机上GPU驱动支持的最高CUDA版本>=12.0的主机可调度
12.1 121 主机上GPU驱动支持的最高CUDA版本>=12.1的主机可调度
12.2 122 主机上GPU驱动支持的最高CUDA版本>=12.2的主机可调度

说明:如果您的框架使用的CUDA版本=11.5,上述可选值中没有,那么选择兼容您所需CUDA版本中的最低可选版本,也就是11.8。因为高版本驱动可以兼容低版本CUDA,所以可以正常使用,但是如果选择的版本过高将导致可调度的机器范围缩小,影响可用卡的数量。