PyTorch-YOLO踩坑记录

Contents

1 前言
2 Yolo的Pytorch实现
3 检测的DataLoader
4 Yolo的损失函数
5 训练自己的数据集

前言

　　YOLO这个名字取自You Look Only Once，意思就是只要看一眼就得到结果了。它区分的是之前的Faster RCNN等先用RPN给出候选区域，然后再对候选区域分类的两步走的方法。
　　YOLO最新已经更新到v4了，v1和v2发表在CVPR上，v3发表在ICCV。作者Joseph Redmon，人称彩虹🌈小马哥，为什么叫这个名字呢🤔看看他的简历就知道了：

　　[点击查看彩虹🌈小马哥的简历]

　　不过，更让人震惊的是他那不（放）拘（弃）一（治）格（疗）的文风，YOLOv3的一段就是这样一边吹牛一边闲聊的：

Introduction
　　Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. Spent a lot of time on Twitter. Played around with GANs a little. I had a little momentum left over from last year [12] [1]; I managed to make some improvements to YOLO. But, honestly, nothing like super interesting, just a bunch of small changes that make it better. I also helped out with other people’s research a little.

　　最近彩虹小马哥退出cv界了，YOLOv4也换了作者。有点物是人非的感觉。。

　　闲聊完了，开始看代码吧👀

Yolo的Pytorch实现

　　找了一个能跑的pytorch代码https://github.com/andy-yun/pytorch-0.4-yolov3。这个代码是基于marvis的实现修改的，目前支持v2和v3。marvis的版本较为古老，只能运行在Python2和Pytorch0.4之前的版本。

　　打开代码，我们可以看到Yolo设计的巧妙之处：没有复杂的网络结构，只有最简单的Conv、Pooling和Relu，而且它的网络结构不是硬编码在python文件里的，而是用cfg的配置文件来定义。这样虽然原来是用Darknet实现的，迁移到其他语言和平台也十分容易。

[net]
batch=1
height=416
width=416
channels=3
momentum=0.9
decay=0.0005

learning_rate=0.001
burn_in=1000
max_batches = 802000
policy=steps
steps=-1,500,40000,60000
scales=0.1,10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

...
[region]
anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
bias_match=1
classes=20  # VOC有20类
coords=4
num=5
softmax=1
jitter=.3
rescore=1

absolute=1
thresh = .6
random=1

[net]

batch=1

height=416

width=416

channels=3

momentum=0.9

decay=0.0005

learning_rate=0.001

burn_in=1000

max_batches = 802000

policy=steps

steps=-1,500,40000,60000

scales=0.1,10,.1,.1

[convolutional]

batch_normalize=1

filters=32

size=3

stride=1

pad=1

activation=leaky

[maxpool]

size=2

stride=2

[convolutional]

batch_normalize=1

filters=64

size=3

stride=1

pad=1

activation=leaky

[maxpool]

size=2

stride=2

...

[region]

anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071

bias_match=1

classes=20 # VOC有20类

coords=4

num=5

softmax=1

jitter=.3

rescore=1

absolute=1

thresh = .6

random=1

检测的DataLoader

　　Yolo的DataLoader返回的是(data, target)这样一个元组，其中data是输入图片，大小为[batch, 3, 416, 416](416是图像默认尺寸)。target是边界框及分类，大小为[batch, 250]。

　　这个250是怎么确定的呢？我们知道，每个物体需要5个参数来唯一确定(如[14.0000, 0.2294, 0.4599, 0.0563, 0.1423]，第一个是类别，后面四个是归一化后的矩形框位置)。而Yolo中有个默认参数max_boxes=50。也就是每张图像最多只能给出50个物体检测的结果，所以target的大小是250。

Yolo的损失函数

　　Yolo的损失函数分为三个部分：

loss_coord = nn.BCELoss(reduction='sum')(coord[0:2], tcoord[0:2])/nB + \
             nn.MSELoss(reduction='sum')(coord[2:4], tcoord[2:4])/nB
loss_conf  = nn.BCELoss(reduction='sum')(conf*conf_mask, tconf*conf_mask)/nB
loss_cls   = nn.BCEWithLogitsLoss(reduction='sum')(cls, tcls)/nB

loss = loss_coord + loss_conf + loss_cls

loss_coord = nn.BCELoss(reduction='sum')(coord[0:2], tcoord[0:2])/nB + \

nn.MSELoss(reduction='sum')(coord[2:4], tcoord[2:4])/nB

loss_conf = nn.BCELoss(reduction='sum')(conf*conf_mask, tconf*conf_mask)/nB

loss_cls = nn.BCEWithLogitsLoss(reduction='sum')(cls, tcls)/nB

loss = loss_coord + loss_conf + loss_cls

　　第一部分为预测框的损失，包括对预测的中心坐标的误差和预测的宽和高的误差。这里中心坐标采用的是BCE，而宽和高采用的是MSE。这是因为对不同大小的bbox预测中，相比于大bbox预测偏一点，小box预测偏一点更不能忍受。作者用了一个比较取巧的办法，就是将box的width和height取平方根代替原本的height和width。

　　第二部分为置信度损失，由IOU决定。
　　第三部分为分类的损失，采用的是交叉熵函数。

训练自己的数据集

　　训练自己的数据集，大致有以下几个步骤：

　　① 准备好自己的数据集，推荐使用和VOC一样的格式。并在data目录下新建自己的.name文件，将所有的类别一一列在里面。

　　② 在cfg目录下新建自己的.data文件和.cfg文件。前者指定训练和测试数据集，后者指定网络结构。

　　③ 确定自己数据集的类别数。同时更改配置文件中的classes。另外，还要更改最后一层卷积层的filters数目。已知的是80类的coco这个数值是425，20类的VOC是125，具体的计算公式为num_anchors × (5+num_classes)，(5分别是预测的bbox的坐标和置信度)。num_anchors为配置文件中指定的anchors的数量，例如上面给出的cfg文件中有5个anchors，大小分别为[1.3221×1.73145, 3.19275×4.00944, 5.05587×8.09892, 9.47112×4.84053, 11.2364×10.0071]。

　　训练Yolo和测试mAP指标的代码如下：

# 加载预训练模型训练
python train.py -d cfg/mytask.data -c cfg/yolo-mytask.cfg -w yolo.weights 
# 重新开始新的训练
python train.py -d cfg/mytask.data -c cfg/yolo-mytask.cfg

# 生成检测结果(保存在results文件夹下)
python valid.py cfg/mytask.data cfg/yolo-mytask.cfg backup/000004.weights
# 评价mAP指标
python eval_voc.py results/comp4_det_test_ mytask_test.txt data/mytask.names

# 加载预训练模型训练

python train.py -d cfg/mytask.data -c cfg/yolo-mytask.cfg -w yolo.weights

# 重新开始新的训练

python train.py -d cfg/mytask.data -c cfg/yolo-mytask.cfg

# 生成检测结果(保存在results文件夹下)

python valid.py cfg/mytask.data cfg/yolo-mytask.cfg backup/000004.weights

# 评价mAP指标

python eval_voc.py results/comp4_det_test_ mytask_test.txt data/mytask.names