AlexNet
AlexNet虽然如今在实践中并不常用,但是很具有历史意义,它是第一个在ImageNet比赛中获奖的CNN结构。

AlexNet 结构

在ImageNet中获奖
VGG
VGGNet实际上和AlexNet一样都是对CNN进行堆叠,只不过层数更多一些。常用的有Vgga(11层),Vgg16和Vgg19。结构如下图所示:
 
 Vgg的结构
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | # VGG16 Sequential(   (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (1): ReLU(inplace)   (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (3): ReLU(inplace)   (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (6): ReLU(inplace)   (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (8): ReLU(inplace)   (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (11): ReLU(inplace)   (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (13): ReLU(inplace)   (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (15): ReLU(inplace)   (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (18): ReLU(inplace)   (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (20): ReLU(inplace)   (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (22): ReLU(inplace)   (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (25): ReLU(inplace)   (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (27): ReLU(inplace)   (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (29): ReLU(inplace)   (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) # Convs 0.conv1_1 [3, 64, 3, 3] 2.conv1_2 [64, 64, 3, 3] 5.conv2_1 [64, 128, 3, 3] 7.conv2_2 [128, 128, 3, 3] 10.conv3_1 [128, 256, 3, 3] 12.conv3_2 [256, 256, 3, 3] 14.conv3_3 [256, 256, 3, 3] 17.conv4_1 [256, 512, 3, 3] 19.conv4_2 [512, 512, 3, 3] 21.conv4_3 [512, 512, 3, 3] 24.conv5_1 [512, 512, 3, 3] 26.conv5_2 [512, 512, 3, 3] 28.conv5_3 [512, 512, 3, 3] # VGG19 Sequential(   (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (1): ReLU(inplace)   (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (3): ReLU(inplace)   (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (6): ReLU(inplace)   (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (8): ReLU(inplace)   (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (11): ReLU(inplace)   (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (13): ReLU(inplace)   (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (15): ReLU(inplace)   (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (17): ReLU(inplace)   (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (20): ReLU(inplace)   (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (22): ReLU(inplace)   (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (24): ReLU(inplace)   (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (26): ReLU(inplace)   (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)   (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (29): ReLU(inplace)   (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (31): ReLU(inplace)   (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (33): ReLU(inplace)   (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))   (35): ReLU(inplace)   (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) # Convs 0.conv1_1 [3, 64, 3, 3] 2.conv1_2 [64, 64, 3, 3] 5.conv2_1 [64, 128, 3, 3] 7.conv2_2 [128, 128, 3, 3] 10.conv3_1 [128, 256, 3, 3] 12.conv3_2 [256, 256, 3, 3] 14.conv3_3 [256, 256, 3, 3] 16.conv3_4 [256, 256, 3, 3] 19.conv4_1 [256, 512, 3, 3] 21.conv4_2 [512, 512, 3, 3] 23.conv4_3 [512, 512, 3, 3] 25.conv4_4 [512, 512, 3, 3] 28.conv5_1 [512, 512, 3, 3] 30.conv5_2 [512, 512, 3, 3] 32.conv5_3 [512, 512, 3, 3] 34.conv5_4 [512, 512, 3, 3] Total number of parameters: 20024384 | 
ResNet
  Arxiv https://arxiv.org/abs/1512.03385
  VGG将卷积层的层数加深了,那么是不是卷积层数越多,网络的效果一定越好呢?
  随着网络的加深,在反向传播的过程中会出现梯度消失和梯度爆炸的问题。使用正态分布进行参数初始化和Batch Normalization可以略微缓解这样的问题,但是准确率却无法再提高了。ResNet解决的问题是如何在加深网络的同时还能继续提高准确率(至少保证准确率不降低)。
  为此作者提出了残差映射的概念:令H(x) = F(x)+x 。如下图所示,其中F(x)表示权重层的简单堆叠,在文中使用两层或三层的堆叠(只使用1层没有明显的效果),F(x)与x相加实现了“快捷连接”。

  理想情况下,较深的网络不应该比较浅的网络效果更差(实际上却会出现这样的情况,如下图所示),ResNet将网络的处理与输入相加,如果网络的效果已经达到了饱和,那么再增加层数时,优化中会使F(x)——>0,这样更深的层数就不会降低网络的表现了。事实上,ResNet在深度大幅增加后能够很容易地获得准确率的提升。
 ResNet的层数大大加深,常用的有Res50(50层)和Res101(101层)。ResNet的参数个数比VGG更少,但是能够取得更好的效果。
   ResNet的层数大大加深,常用的有Res50(50层)和Res101(101层)。ResNet的参数个数比VGG更少,但是能够取得更好的效果。
DenseNet
DenseNet借鉴了ResNet的思想,每一层都和之前的所有层有连接,既增加了特征的利用率,又减少了深度的冗余性。

DenseNet示意图

DenseNet结构
以DenseNet121为例,第一个DenseBlock有6层,每层分别有一个1×1的卷积和3×3的卷积,先增大再减少channel的个数。每层都将之前的所有层作为输入,通道个数为(64+32×i)->32。PyTorch表示的结构如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | Dense(   (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)   (dense_block1): _DenseBlock(  # DenseBlock1有6层     (denselayer1): _DenseLayer(       (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # 64进32出     (denselayer2): _DenseLayer(       (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # (64+32)进32出     (denselayer3): _DenseLayer(       (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # (64+32×2)进32出     (denselayer4): _DenseLayer(       (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # (64+32×3)进32出     (denselayer5): _DenseLayer(       (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # (64+32×4)进32出     (denselayer6): _DenseLayer(       (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)       (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)     ) # (64+32×5)进32出   )   (dense_block2):... ) | 
参数比较
| 1 2 3 4 5 6 7 8 9 |   Model:     n_params     receptive field*   Alexnet    61,100,840         51   Vgg16     138,357,544         27   Vgg19     143,667,240         43   Res50      25,557,032         983   Res101     44,549,160        2,071   Res152     60,192,808        2,967   Dense121    7,978,856         239 | 
 
    
哒哒哒~