Paddle目标检测

Paddle目标检测

V1.8

1
2
paddlepaddle 1.8.3
paddlehub 1.8.3

预测

测试图像:

image-20210420212635571

代码(calParams可以计算参数量):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import os
import paddlehub as hub
import numpy as np


def calParams(model):
Total_params = 0
Trainable_params = 0
NonTrainable_params = 0
_, _, main_prog = model.context()
for block in main_prog.blocks:
for var in block.vars:
if 'tmp' in var:
continue
param = block.vars[var]
shape = param.shape
array = np.asarray(shape)
array[array == -1] = 1
mulValue = np.prod(array)

Total_params += mulValue
if param.persistable and not param.stop_gradient:
Trainable_params += mulValue
else:
NonTrainable_params += mulValue

print('Total params: {}'.format(Total_params))
print('Trainable params: {}'.format(Trainable_params))
print('Non-trainable params: {}'.format(NonTrainable_params))


def run(img_name):
ssd = hub.Module(directory="faster_rcnn_resnet50_fpn_coco2017")
path = "./images/"

result = ssd.object_detection(paths=[os.path.join(path, img_name)])
print(result[0]['save_path'])
return result[0]['save_path']


if __name__ == "__main__":
# ssd = hub.Module(directory="faster_rcnn_resnet50_fpn_coco2017")
# calParams(ssd)
run('test_img_cat.jpg')

结果:

image-20210420212751987

V2.0.2

1
paddlepaddle-gpu 2.0.2

finetune

参考文档:目标检测全流程教程

如何训练自定义数据集

如何准备训练数据

自定义数据集

VOC格式

首先官网文档中给出的说法不一,在目标检测全流程教程中,在准备数据一栏,指出VOC格式数据集的文件组织结构为:

1
2
3
4
5
6
7
8
9
10
11
12
13
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── train.txt
└── valid.txt

而在如何训练自定义数据集和如何准备训练两个文档中,指出VOC格式数据集的文件组织结构为:

1
2
3
4
5
6
7
8
9
10
VOCdevkit
├──VOC2007(或VOC2012)
│ ├── Annotations
│ ├── xxx.xml
│ ├── JPEGImages
│ ├── xxx.jpg
│ ├── ImageSets
│ ├── Main
│ ├── trainval.txt
│ ├── test.txt

需要进行的处理为:

image-20210420221319250

总结:

没有必要一定按照VOC原始格式组织文件,即文件夹名字不必遵从Annotaions、ImageSets、JPEGImages这种严格命名,且原本保存在ImageSets/Main下的test.txt、train.txt、val.txt等文件都可以不要,但在数据集根目录下(VOC2007)必须有train.txt、test.txt文件

如果严格按照VOC格式,数据集最好整理成以下结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
VOCdevkit
├──VOC2007(或VOC2012)
│ ├── Annotations
│ ├── xxx.xml
│ ├── JPEGImages
│ ├── xxx.jpg
│ ├── ImageSets
│ ├── Main
│ ├── trainval.txt
│ ├── test.txt
│ ├── train.txt
│ ├── val.txt
│ ├── label_list.txt

其中ImageSets/Main下面的trainval.txt、test.txt仅包含文件名(不含后缀),例如:

1
2
3
c000001
c000002
...

而根目录下的train.txt、val.txt应包含图片和xml文件的相对路径:

1
2
3
JPEGImages/train/c000001.jpg Annotations/train/c000001.xml
JPEGImages/train/c000002.jpg Annotations/train/c000002.xml
...

注意,以上路径前不要加’./‘,即不要变成 ./JPEGImages/train/c000001.jpg这种形式

xml

xml文件应处理成以下格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<annotation>
<frame>c000001</frame>
<object>
<name>holothurian</name>
<bndbox>
<xmin>712</xmin>
<ymin>415</ymin>
<xmax>838</xmax>
<ymax>573</ymax>
</bndbox>
<difficult>0</difficult>
</object>
<object>
<name>echinus</name>
<bndbox>
<xmin>129</xmin>
<ymin>940</ymin>
<xmax>300</xmax>
<ymax>1080</ymax>
</bndbox>
<difficult>0</difficult>
</object>
</annotation>

object中,difficult为必含项,如果原本的xml不含difficult,通过以下代码添加:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import xml.etree.ElementTree as ET
import os

# 输出之前通过indent对根节点进行处理,可以添加回车和缩进,处理成上面展示的xml格式
def indent(elem, level=0):
i = "\n" + level * "\t"
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + "\t"
if not elem.tail or not elem.tail.strip():
elem.tail = i
for elem in elem:
indent(elem, level + 1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i


def cvt(ori_dir, out_dir):
for file in os.listdir(ori_dir):
tree = ET.parse(os.path.join(ori_dir, file))
root = tree.getroot()

newroot = ET.Element('Annotation')
filename = ET.SubElement(newroot, 'filename')
filename.text = file.split('.')[0] + '.jpg'

for i in root.findall('object'):
d = ET.SubElement(i, 'difficult')
d.text = '0'
newroot.append(i)


indent(root, 0)
tree = ET.ElementTree(root) # root为修改后的root
tree.write(out_dir+file)


if __name__ == '__main__':
ori_dir = './train/box/'
out_dir = './VOC/temp_Annotations/'
cvt(ori_dir, out_dir)

安装环境

参考文档:安装说明

按照说明安装即可,PaddleDetection必须要在cd到clone下的目录运行,不是一个pip库。

NCCL安装(give up)

下载地址:nccl

  • 本地安装库

    选择对应的本地安装库:

image-20210420224935252

​ 安装储存库:sudo dpkg -i xxxxxxxx.deb

​ 更新apt库:sudo apt update

​ 安装:sudo apt install libnccl2=2.9.6-1+cuda11.0 libnccl-dev=2.9.6-1+cuda11.0

  • 网络安装库

    安装指南

    首先执行命令

    1
    2
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"`

    提示 No module named ‘apt_pkg’,解决方法:no module named apt_pkg ,执行1,2即可

    之后提示 cannot import name ‘_gi’ from ‘gi’, 解决方法:cannot import name ‘_gi’

    之后apt update报错如下:

    image-20210421170527846

    试了几种解决方法,只有一种有用:solution

    安装ping命令,接着 ping developer.download.nvidia.com 可以找到对应的ip地址,将地址添加到hosts文件中:45.43.38.238 developer.download.nvidia.cn

    ncll tests
    1
    2
    3
    git clone https://github.com/NVIDIA/nccl-tests.git 
    make
    ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4

修改配置文件

根据configs下的某一配置文件进行修改,只有就保存到configs文件夹下,比如根据 configs/faster_rcnn_r50_fpn_1x.yml 进行修改,另存为underwater.yml。

  • max_iters。 max_iters为最大迭代次数,而一个iter会运行batch_size * device_num张图片。

    注意:
    (1) LearningRate.schedulers.milestones需要随max_iters变化而变化。

    (2) milestones设置的是在PiecewiseDecay学习率调整策略中,在训练轮数达到milestones中设置的轮数时,学习率以gamma倍数衰减变化。
    (3) 1x表示训练12个epoch,1个epoch是将所有训练数据训练一轮。由于YOLO系列算法收敛比较慢,在COCO数据集上YOLO系列算法换算后约为270 epoch,PP-YOLO约380 epoch。

    根据实际情况修改max_iters,并参照如何训练自定义数据集下的demo,修改其他参数,如milestones、base_lr

  • 根据数据集格式修改mertic,这里修改为VOC

  • snapshot_iter,多少iter保存一次模型

  • weights参数, weights 参数用于设置评估预测使用的模型路径,这里可以是远程路径。
    本地路径是指以pdparams为后缀的模型权重文件。

    注意,这里说的是路径具体到了这一个权重文件,所以只写到文件夹是不够的,比如模型保存到 output/underwater,在此路径下有以下几个文件:model_final.pdmodelmodel_final.pdopt(优化相关)model_final.pdparams(模型权重)则weights必须设置为:output/underwater/model_final

    一定要写到model_final

  • num_classes, 模型中分类数 num_classes 模型中分类数量。注意在FasterRCNN中,需要将 with_background=true 且 num_classes=数据num_classes + 1

最后对Reader进行设置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TrainReader:
batch_size: 2
dataset:
!VOCDataSet
dataset_dir: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007
anno_path: train.txt
use_default_label: false
with_background: false

EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult']

dataset:
!VOCDataSet
dataset_dir: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007
anno_path: val.txt
use_default_label: false
with_background: false

TestReader:
dataset:
!ImageFolder
anno_path: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007/label_list.txt
use_default_label: false
with_background: false
  • 对于TrainReader、EvalReader这里的anno_path用的是相对路径,也就是在根目录下面的train.txt和val.txt,而TestReader中的anno_path一定不能省略,不然infer的时候无法加载出label_list。

  • EvalReader下一定要手动添加以下代码(在faster_rcnn_r50_fpn_1x.yml源文件中没有),不然评估的时候会报错

    1
    2
    inputs_def:
    fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult']

其他参数见目标检测全流程教程

underwater_fasterrcnn

基于 faster_rcnn_r50_fpn_1x.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
architecture: FasterRCNN
max_iters: 40000
use_gpu: false
snapshot_iter: 2000
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: VOC
weights: output/underwater/model_final
num_classes: 5

FasterRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner

ResNet:
norm_type: bn
norm_decay: 0.
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2

FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]

FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000

FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2

BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_lo: 0.0
bg_thresh_hi: 0.5
fg_fraction: 0.25
fg_thresh: 0.5

BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05

TwoFCHead:
mlp_dim: 1024

LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 1000

OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2

_READER_: 'faster_fpn_reader.yml'
TrainReader:
batch_size: 2
dataset:
!VOCDataSet
dataset_dir: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007
anno_path: train.txt
use_default_label: false
with_background: false

EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!VOCDataSet
dataset_dir: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007
anno_path: val.txt
use_default_label: false
with_background: false

TestReader:
dataset:
!ImageFolder
anno_path: /home/paddle2.0/underwater/dataset/VOCdevkit/VOC2007/label_list.txt
use_default_label: false
with_background: false

训练

python3 tools/train.py -c configs/underwater.yml -o ues_gpu=false --eval

命令行中可以通过-o设置配置文件里的参数内容

1
2
3
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 如果模型中参数形状与加载权重形状不同,将不会加载这类参数
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/underwater.yml -o use_gpu=true

评估、预测

参考文档:评估、预测

评估:

1
2
# 设置 save_prediction_only=true,会在当前文件夹下生成预测结果文件bbox.json
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true save_prediction_only=true

预测:

1
python3 tools/infer.py -c configs/underwater.yml -o weights=output/underwater/model_final --infer_img=../underwater/dataset/test-A-image/000001.jpg

预测后会在output文件夹下生成一张图片

GPU+infer_dir:

1
CUDA_VISIBLE_DEVICES=3 python3 tools/infer.py -c configs/underwater.yml -o weights=output/underwater/best_model use_gpu=true  --infer_dir=../underwater/dataset/test-A-image/

源码修改

  • PaddleDetection/ppdet/utils/coco_eval.py

    1
    2
    3
    342行,coco_res中添加[xmin, ymin, xmax, ymax]格式的box:

    'xxyy_bbox': [xmin, ymin, xmax, ymax]
  • PaddleDetection/tools/infer.py

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    42行,引用clip_bbox函数
    from ppdet.utils.coco_eval import clip_bbox

    227行,添加输出csv的代码:

    import pandas as pd
    # print(image_path)
    # print(bbox_results)
    df = pd.DataFrame(columns=('name', 'image_id', 'confidence', 'xmin', 'ymin', 'xmax', 'ymax'))
    df_image_id = str(image_path.split('/')[-1].split('.')[0])
    for i in bbox_results:
    df_temp_name = catid2name[i['category_id']]
    # 注意h,w是反过来的
    xmin, ymin, xmax, ymax = clip_bbox(i['xxyy_bbox'],[image.size[1], image.size[0]])
    df.loc[len(df)] = [df_temp_name, df_image_id, i['score'], int(xmin), int(ymin), int(xmax), int(ymax)]

    df.to_csv('/home/paddle2.0/PaddleDetection/output/result.csv', mode='a', header=False, index = False)