一个AI模型需要快速上线验证时,前期可以不考虑模型加速和并发的问题,直接基于python+pytorch+cuda(cpu)的方式打成docker,包成推理http服务上线。
打成docker有两种方式:
1. 基于NVIDIA的官方CUDA版本的docker镜像,逐步安装python、pytorch等模型推理依赖的第三方库
2. 使用一些开发者已经配置好的集成cuda、python、pytorch的docker镜像可以直接使用

下面就给出这两种方式的示例Dockerfile

1.1 基于NVIDIA的官方CUDA版本的镜像Dockerfile

示例Dockerfile如下:

# 使用NVIDIA CUDA 11.7作为基础镜像
FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04

# 设置非交互式模式
ENV DEBIAN_FRONTEND=noninteractive

# 设置时区
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# 更换apt源为阿里云镜像
RUN echo "deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse" > /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse" >> /etc/apt/sources.list

# 设置python相关环境变量
ENV PYTHON_VERSION=3.9.18
ENV PYTHON_ROOT=/opt/python/${PYTHON_VERSION}
ENV PATH=${PYTHON_ROOT}/bin:${PATH}

# 安装必要的依赖
RUN apt-get update && \
    apt-get install -y \
        wget build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev \
        libreadline-dev libffi-dev libsqlite3-dev libbz2-dev liblzma-dev \
    gcc-11 \
    g++-11 \
    tmux libgl1-mesa-glx \
    libglib2.0-dev \
    ffmpeg \
    locales && \
    apt clean && \
    rm -rf /var/lib/apt/lists/*

# 设置默认语言为中文
RUN locale-gen zh_CN.UTF-8
ENV LANG=zh_CN.UTF-8
ENV LANGUAGE=zh_CN:zh
ENV LC_ALL=zh_CN.UTF-8

# 安装对应的python版本
RUN cd /tmp && \
    wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz && \
    tar -xvf Python-${PYTHON_VERSION}.tgz && \
    cd Python-${PYTHON_VERSION} && \
    ./configure --enable-optimizations && \
    make && make install && \
    cd .. && rm Python-${PYTHON_VERSION}.tgz && rm -r Python-${PYTHON_VERSION} && \
    ln -s /usr/local/bin/python3 /usr/local/bin/python && \
    ln -s /usr/local/bin/pip3 /usr/local/bin/pip && \
    python -m pip install --upgrade pip && \
    rm -r /root/.cache/pip

# 设置gcc-11和g++-11为默认版本
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 110 \
    && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 110

# 验证Python版本
RUN python3 --version

# 安装PyTorch 1.13.1 (CUDA 11.7版本)
RUN pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

# 验证安装
RUN python3 -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available())" \
    && gcc --version \
    && g++ --version

# 设置工作目录
WORKDIR /app

# 复制文件
COPY . .

# 安装 requirements.txt 中的依赖
RUN pip install -r requirements_docker.txt

# 暴露服务端口
EXPOSE 8000

# 确保脚本有执行权限(可选)
RUN chmod +x main.py

# 启动 FastAPI 服务
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

1.2 基于已有开发者的集成镜像自定义Dockerfile

开发者集成的镜像可参考这个github仓库:https://github.com/cnstark/pytorch-docker
这个仓库集成了具有不同操作系统、cuda 和 python 版本的纯 pytorch docker 镜像,如下图所示

快速部署Python+PyTorch AI模型推理服务的Docker方法-StubbornHuang Blog

基于这些集成镜像,我们可以自定义模型推理时所需的Dockerfile

# 使用一个基础的Docker镜像,可以根据你的需求选择合适的镜像
FROM cnstark/pytorch:1.13.1-py3.9.16-cuda11.7.1-ubuntu20.04

# 设置时区
ENV TZ=Asia/Shanghai

# 创建新的 sources.list 文件,使用阿里云镜像源
RUN echo "deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse" > /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse" >> /etc/apt/sources.list && \
    echo "deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse" >> /etc/apt/sources.list

# 设置pip阿里源
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
    pip config set install.trusted-host mirrors.aliyun.com

# 设置环境变量,避免交互式安装提示
ENV DEBIAN_FRONTEND=noninteractive

# 更新 apt 包索引并安装 tmux 和 libgl1-mesa-glx
RUN apt-get update
RUN apt-get install -y tmux libgl1-mesa-glx libglib2.0-dev ffmpeg locales

# 设置默认语言为中文
RUN locale-gen zh_CN.UTF-8
ENV LANG=zh_CN.UTF-8
ENV LANGUAGE=zh_CN:zh
ENV LC_ALL=zh_CN.UTF-8


# 设置工作目录
WORKDIR /app

# 复制文件
COPY . .

# 安装 requirements.txt 中的依赖
RUN pip3 install -r requirements_docker.txt

# 暴露服务端口
EXPOSE 8000

# 确保脚本有执行权限(可选)
RUN chmod +x main.py

# 启动 FastAPI 服务
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

参考链接