1 引言

在深度学习中,通过简单的堆叠网络层增加网络深度的方式并不能增加网络的性能,另外,深度网络在训练时容易引起“梯度消失”的问题(即梯度反向传播到上层,重复的乘法可能会使梯度变得非常小)。

ResNet提出了残差学习来解决退化问题。对于一个堆积层结构(几层堆积而成)。对于一个堆积层结构(几层堆积而成)当输入为x时其学习到的特征记为F(x), 现在再加一条分支,直接跳到堆积层的输出,则此时最终输出H(x) = F(x) + x,如下图所示

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

这种跳跃连接就叫做shortcut connection(类似电路中的短路)。上面这种两层结构的叫BasicBlock,一般适用于ResNet18和ResNet34,而ResNet50以后都使用下面这种三层的残差结构叫Bottleneck。

在阅读本文之前,假设你已经了解什么是图像卷积,全连接,以及熟练使用Python和Pytorch

2 Pytorch中的卷积

2.1 torch.nn.Conv2d

详情见官方文档:https://pytorch.org/docs/1.8.0/generated/torch.nn.Conv2d.html?highlight=nn%20conv2d#torch.nn.Conv2d

1. 函数形式

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

2. 函数参数

  • in_channels:输入的通道数
  • out_channels:输出的通道数
  • kernel_size:卷积核的大小
  • stride:卷积的步长
  • padding:填充的padding大小

2.2 卷积的输出

在Pytorch中,对一个batch的图像进行卷积时,经常会使用以下形状为(b,c,h,w)的Tensor,
其中

  • b表示batch_size
  • c表示图片的通道数
  • h表示图片的高度
  • w表示图片的宽度

那么经过卷积之后,一个图片的大小通过以下公式计算

\begin{array}{c}
output = floor(\frac{input+2*padding-kernel}{stride} +1)
\end{array}

例如,假设输入的图片大小为(64,64)padding=1kernel=3stride=2,那么通过上述公式,经过卷积之后的图片大小为(32,32)

3 ResNet

ResNet有ResNet18,ResNet34,ResNet50等,如下表所示。

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog
Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

上述图中的框选的红色部分,3 \times 3表示卷积核的大小,64表示滤波器,\times 2表示重复的数量。

ResNet主要有以下三个重要的组成部分:

  • input层,conv1 + max pooling,通常作为第0层
  • ResBlocks,第1层-第4层
  • average pool + 全连接层,最后一层

3.1 ResNet18

我们先从最简单的ResNet18开始。假设输入的图片的大小为224 \times 224,输出为1000个分类。

ResNet18的网络结构如下所示,
Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

或者

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

3.1.1 ResBlock

根据两幅ResNet18的网络结构图,ResBlock有两种方式,一种是使用stride=2也就是下采样进行卷积,另一种则是步长为1,我们需要在代码中分别实现,ResBlock的代码如下:

class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        if downsample:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=2, padding=1)
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
            self.shortcut = nn.Sequential()

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = input + shortcut
        return nn.ReLU()(input)

3.1.2 输入层,第0层

输入层即第0层由7 \times 7的卷积层和3 \times 3的max pooling层组成,代码如下:

self.layer0 = nn.Sequential(
    nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
    nn.BatchNorm2d(64),
    nn.ReLU()
)

3.1.3 最后一层

最后一层由全局的平均池化层和全连接层组成,代码如下

self.gap = torch.nn.AdaptiveAvgPool2d(1)
self.fc = torch.nn.Linear(filters[4], outputs)

全局平均池化如下图所示:

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

3.1.4 ResNet18代码

将上述的代码进行组合,得到ResNet18的代码

class ResNet18(nn.Module):
    def __init__(self, in_channels, resblock, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        self.layer1 = nn.Sequential(
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False)
        )

        self.layer2 = nn.Sequential(
            resblock(64, 128, downsample=True),
            resblock(128, 128, downsample=False)
        )

        self.layer3 = nn.Sequential(
            resblock(128, 256, downsample=True),
            resblock(256, 256, downsample=False)
        )


        self.layer4 = nn.Sequential(
            resblock(256, 512, downsample=True),
            resblock(512, 512, downsample=False)
        )

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(512, outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input)
        input = self.fc(input)

        return input

我们可以使用torchsummary对上述的模型的网络结构进行统计,

from torchsummary import summary

resnet18 = ResNet18(3, ResBlock, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

3.2 ResNet34

ResNet34与ResNet18输入层与最后一层都相同,只需要改变layer1-layer4中重复的ResBlock的数量即可,ResNet34的代码如下:

class ResNet34(nn.Module):
    def __init__(self, in_channels, resblock, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        self.layer1 = nn.Sequential(
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False)
        )

        self.layer2 = nn.Sequential(
            resblock(64, 128, downsample=True),
            resblock(128, 128, downsample=False),
            resblock(128, 128, downsample=False),
            resblock(128, 128, downsample=False)
        )

        self.layer3 = nn.Sequential(
            resblock(128, 256, downsample=True),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False)
        )


        self.layer4 = nn.Sequential(
            resblock(256, 512, downsample=True),
            resblock(512, 512, downsample=False),
            resblock(512, 512, downsample=False),
        )

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(512, outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input)
        input = self.fc(input)

        return input

我们同样可以使用torchsummary统计网络信息,

resnet34 = ResNet34(3, ResBlock, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

3.3 ResNet50,ResNet101,ResNet152

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

如上图所示,左边的就是我们在ResNet18和ResNet34中使用的ResBlock结构,右边为在ResNet50,ResNet101,ResNet152中使用的bottleneck结构。

3.3.1 ResBottleneckBlock

ResNet34和ResNet50之间最明显的区别是ResBlocks,如上图所示。我们需要将此组件重写为一个名为“ResBottleneckBlock”的新组件。

class ResBottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels//4, kernel_size=1, stride=1)
        self.conv2 = nn.Conv2d(out_channels//4, out_channels//4, kernel_size=3, stride=2 if downsample else 1, padding=1)
        self.conv3 = nn.Conv2d(out_channels//4, out_channels, kernel_size=1, stride=1)
        self.shortcut = nn.Sequential()

        if self.downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2 if self.downsample else 1),
                nn.BatchNorm2d(out_channels)
            )

        self.bn1 = nn.BatchNorm2d(out_channels//4)
        self.bn2 = nn.BatchNorm2d(out_channels//4)
        self.bn3 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = nn.ReLU()(self.bn3(self.conv3(input)))
        input = input + shortcut
        return nn.ReLU()(input)
Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

结合上述代码以及以上图片,我们可以产生3中不同的ResBottleneckBlock:

  • Left:ResBottleneckBlock(64, 256, downsample=False)
  • Middle:ResBottleneckBlock(256, 256, downsample=False)
  • Right:ResBottleneckBlock(256, 512, downsample=True)

3.3.2 代码整合

为了可以兼容所有的RetNet网络,在上述代码ResNet18/34的代码中增加了一个新的布尔变量useBottleneck用于指定是否使用bottleneck。如果想要生成ResNet18/34则将useBottleneck设置为False;如果想要生成ResNet-50/101/152则将useBottleneck设置为True。

最后的代码如下:

class ResNet(nn.Module):
    def __init__(self, in_channels, resblock, repeat, useBottleneck=False, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        if useBottleneck:
            filters = [64, 256, 512, 1024, 2048]
        else:
            filters = [64, 64, 128, 256, 512]

        self.layer1 = nn.Sequential()
        self.layer1.add_module('conv2_1', resblock(filters[0], filters[1], downsample=False))
        for i in range(1, repeat[0]):
                self.layer1.add_module('conv2_%d'%(i+1,), resblock(filters[1], filters[1], downsample=False))

        self.layer2 = nn.Sequential()
        self.layer2.add_module('conv3_1', resblock(filters[1], filters[2], downsample=True))
        for i in range(1, repeat[1]):
                self.layer2.add_module('conv3_%d' % (i+1,), resblock(filters[2], filters[2], downsample=False))

        self.layer3 = nn.Sequential()
        self.layer3.add_module('conv4_1', resblock(filters[2], filters[3], downsample=True))
        for i in range(1, repeat[2]):
            self.layer3.add_module('conv2_%d' % (i+1,), resblock(filters[3], filters[3], downsample=False))

        self.layer4 = nn.Sequential()
        self.layer4.add_module('conv5_1', resblock(filters[3], filters[4], downsample=True))
        for i in range(1, repeat[3]):
            self.layer4.add_module('conv3_%d'%(i+1,), resblock(filters[4], filters[4], downsample=False))

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(filters[4], outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input, start_dim=1)
        input = self.fc(input)

        return input

我们可以通过以下的方式构造不同的ResNet,

# resnet18
resnet18 = ResNet(3, ResBlock, [2, 2, 2, 2], useBottleneck=False, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

# resnet34
resnet34 = ResNet(3, ResBlock, [3, 4, 6, 3], useBottleneck=False, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

# resnet50
resnet50 = ResNet(3, ResBottleneckBlock, [3, 4, 6, 3], useBottleneck=True, outputs=1000)
resnet50.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet50, (3, 224, 224))

# resnet101
resnet101 = ResNet(3, ResBottleneckBlock, [3, 4, 23, 3], useBottleneck=True, outputs=1000)
resnet101.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet101, (3, 224, 224))

# resnet152
resnet152 = ResNet(3, ResBottleneckBlock, [3, 8, 36, 3], useBottleneck=True, outputs=1000)
resnet152.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet152, (3, 224, 224))

4 ResNet网络代码

完整的ResNet网络代码如下:

import torch
from torchsummary import summary
from torch import nn

class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        if downsample:
            self.conv1 = nn.Conv2d(
                in_channels, out_channels, kernel_size=3, stride=2, padding=1)
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.conv1 = nn.Conv2d(
                in_channels, out_channels, kernel_size=3, stride=1, padding=1)
            self.shortcut = nn.Sequential()

        self.conv2 = nn.Conv2d(out_channels, out_channels,
                               kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = input + shortcut
        return nn.ReLU()(input)

class ResBottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels//4,
                               kernel_size=1, stride=1)
        self.conv2 = nn.Conv2d(
            out_channels//4, out_channels//4, kernel_size=3, stride=2 if downsample else 1, padding=1)
        self.conv3 = nn.Conv2d(out_channels//4, out_channels, kernel_size=1, stride=1)

        if self.downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                          stride=2 if self.downsample else 1),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.shortcut = nn.Sequential()

        self.bn1 = nn.BatchNorm2d(out_channels//4)
        self.bn2 = nn.BatchNorm2d(out_channels//4)
        self.bn3 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = nn.ReLU()(self.bn3(self.conv3(input)))
        input = input + shortcut
        return nn.ReLU()(input)

class ResNet(nn.Module):
    def __init__(self, in_channels, resblock, repeat, useBottleneck=False, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        if useBottleneck:
            filters = [64, 256, 512, 1024, 2048]
        else:
            filters = [64, 64, 128, 256, 512]

        self.layer1 = nn.Sequential()
        self.layer1.add_module('conv2_1', resblock(filters[0], filters[1], downsample=False))
        for i in range(1, repeat[0]):
                self.layer1.add_module('conv2_%d'%(i+1,), resblock(filters[1], filters[1], downsample=False))

        self.layer2 = nn.Sequential()
        self.layer2.add_module('conv3_1', resblock(filters[1], filters[2], downsample=True))
        for i in range(1, repeat[1]):
                self.layer2.add_module('conv3_%d' % (
                    i+1,), resblock(filters[2], filters[2], downsample=False))

        self.layer3 = nn.Sequential()
        self.layer3.add_module('conv4_1', resblock(filters[2], filters[3], downsample=True))
        for i in range(1, repeat[2]):
            self.layer3.add_module('conv2_%d' % (
                i+1,), resblock(filters[3], filters[3], downsample=False))

        self.layer4 = nn.Sequential()
        self.layer4.add_module('conv5_1', resblock(filters[3], filters[4], downsample=True))
        for i in range(1, repeat[3]):
            self.layer4.add_module('conv3_%d'%(i+1,),resblock(filters[4], filters[4], downsample=False))

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(filters[4], outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        # torch.flatten()
        # https://stackoverflow.com/questions/60115633/pytorch-flatten-doesnt-maintain-batch-size
        input = torch.flatten(input, start_dim=1)
        input = self.fc(input)

        return input

1. ResNet18

resnet18 = ResNet(3, ResBlock, [2, 2, 2, 2], useBottleneck=False, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

2. ResNet34

resnet34 = ResNet(3, ResBlock, [3, 4, 6, 3], useBottleneck=False, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

3. ResNet50

resnet50 = ResNet(3, ResBottleneckBlock, [
                  3, 4, 6, 3], useBottleneck=True, outputs=1000)
resnet50.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet50, (3, 224, 224))

4. ResNet101

resnet101 = ResNet(3, ResBottleneckBlock, [
                   3, 4, 23, 3], useBottleneck=True, outputs=1000)
resnet101.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet101, (3, 224, 224))

5. ResNet152

resnet152 = ResNet(3, ResBottleneckBlock, [
                   3, 8, 36, 3], useBottleneck=True, outputs=1000)
resnet152.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet152, (3, 224, 224))

也可以参考:https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py

参考链接