[2022 AI 경진대회] 객체 탐지(Object Detection) 참여 후기(코드 리뷰)
DETR fine tune¶
대회명: 2022 AI 경진대회
기간: 2022/06/07 ~ 2022/06/21
작성자 github: https://github.com/withAnewWorld/ai-challenge
짧다면 짧고 길다면 긴 기간동안 그동안 배웠던 내용들을 바탕으로 실질적인 프로젝트를 진행해보고 싶어서 경진대회에 참여하게 됐습니다.
마지막 날 제출을 위해 test image(zip)를 드라이브에 업로드하려고 했는데 10시간 이상 예상 소요시간이 걸려 결국 마감때문에 제출 fail....
그래도 첫 AI 프로젝트이므로 포스트를 통해 공부하는 기회로 삼아보려고 합니다.
# ref: cs231n/assignments
from google.colab import drive
import sys
drive.mount('/content/drive')
FOLDERNAME = '2022 AI 경진대회/'
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))
%cd /content/drive/My\ Drive/$FOLDERNAME
Mounted at /content/drive /content/drive/My Drive/2022 AI 경진대회
Environment¶
아무래도 경진대회이다 보니 주최측에서 특정 환경 하에서 코드를 일괄적으로 실행하기 위한 환경 설정 제한이 있습니다 이를 위한 setting 입니다.
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Not connected to a GPU')
else:
print(gpu_info)
Fri Jun 17 14:53:51 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P0 27W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# check GPU
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:09:46_PDT_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0
# check python version
import sys
print(sys.version)
3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0]
!pip uninstall torch torchvision
!pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
# check pytorch version
import torch
print(torch.__version__)
1.8.0+cu111
DETR github clone¶
# !git clone https://github.com/facebookresearch/detr.git
Cloning into 'detr'... remote: Enumerating objects: 260, done. remote: Total 260 (delta 0), reused 0 (delta 0), pack-reused 260 Receiving objects: 100% (260/260), 12.85 MiB | 14.55 MiB/s, done. Resolving deltas: 100% (142/142), done.
Unzip data¶
directory 구조
2022 AI 경진대회
├── data
│ ├── Train_imgs
│ ├── Test_imgs
│ ├── Train_label.json
│ └── Test_images_info.json(Empty Json)
│
├── detr(github clone)
│
├── checkpiont(model 저장소)
│
├── work_dirs
│ └── log.txt
│ └── tune.txt
│ └── best_tune.txt
│
├── Train.ipynb
│
├── Inference.ipynb
ref: 2022 AI 경진대회 baseline
# train unzip
# import os
# train_root = FOLDERNAME + '/data/Train_imgs'
# %cd /content/drive/My\ Drive/$train_root
# !unzip -qq "/content/drive/My Drive/2022 AI 경진대회/data/train"
/content/drive/My Drive/imgDetection/2022 AI 경진대회/data
# test unzip
# import os
# test_root = FOLDERNAME + '/data/Test_imgs'
# %cd /content/drive/My\ Drive/$test_root
# !unzip -qq "/content/drive/My Drive/2022 AI 경진대회/data/test.zip"
/content/drive/My Drive/2022 AI 경진대회/data/Test_imgs
[/content/drive/My Drive/2022 AI 경진대회/data/test.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /content/drive/My Drive/2022 AI 경진대회/data/test.zip or
/content/drive/My Drive/2022 AI 경진대회/data/test.zip.zip, and cannot find /content/drive/My Drive/2022 AI 경진대회/data/test.zip.ZIP, period.
import¶
import os
import math
from pathlib import Path
import numpy as np
import pandas as pd
import random
import json
import cv2
import matplotlib.pyplot as plt
from PIL import Image
import copy
# Torch
import torch
import torch.nn as nn
from torch.utils.data import Dataset,DataLoader
from torch.utils.data.sampler import SequentialSampler, RandomSampler
################# DETR FUCNTIONS FOR LOSS########################
sys.path.append('./detr/')
from detr.models.matcher import HungarianMatcher
from detr.models.detr import SetCriterion
from detr.datasets import coco
from detr.models.detr import PostProcess
from detr.datasets.coco_eval import CocoEvaluator
from detr.datasets import get_coco_api_from_dataset
import detr.util.misc as utils
config, data, model¶
1. config¶
def max_objects(root):
return maximum number of objects in train image dataset
DETR의 경우 num_queries(bounding box 갯수)를 통해 매번 back ground와 object를 가려내는 알고리즘으로 이미지 내에 객체의 갯수가 num_queries보다 많을 경우 모든 객체를 검출해내지 못합니다. 따라서 image의 객체 갯수를 train image를 통해 대략 알아내 보는 알고리즘을 작성해봤습니다.
물론 실질적인 문제 해결(ex. 자율주행에의 적용)에서는 적합한 방법이 아니지만 경진대회 같은 형식에서는 결국 주어진 상황 속에서 최고의 결과물을 뽑아내는 것이 중요하니...
확인결과 대략 30개의 object가 최대 갯수로 산출이 되었었습니다. detr 설계자들에 따르면 이미지 내에 최대 object 갯수가 default num queries(100)를 넘지 않는 경우 thin object를 효과적으로 검출하면서 성능을 좋게 유지하는 num queries는 default(100)라고 말합니다.
ref: https://github.com/facebookresearch/detr/issues/126
# to know maximum number of objects in overall train image set
def max_objects(root):
'''
inputs:
- root(str): root folder of train label file
returns:
- max_num(int): maximum number of objects in overall train image set
'''
with open(os.path.join(root, 'Train_label.json')) as j:
file = json.load(j)
anns = file['annotations']
max_num = 0
obj = 0
prev = anns[0]['image_id']
for ann in anns:
now = ann['image_id']
if prev != now:
obj = 0
else:
obj += 1
if obj > max_num:
max_num = obj
prev = now
return max_num
config = {
"LR": 0.0009967324530542163,
"null_class_coef": 0.9,
"init_embed": True,
"freeze": True,
"batch_size": 2,
"max_norm": 0.18121613279447868,
"optim": "AdamW",
"start_epoch": 4,
"end_epoch": 4
}
num_classes = 14 + 1 # 14: true num classes, 1: no_object(background)
# checkpoint
# checkpoint = os.path.join(os.getcwd(), 'checkpoint')
# data root
root = Path('/content/drive/My Drive/' + FOLDERNAME + '/data')
num_queries = max_objects(root) * 2 # test 이미지 dataset의 최대 object 갯수가 train dataset 최대 object 갯수보다 많을 수 있으니
'''
code taken from github repo detr , 'code present in engine.py'
'''
matcher = HungarianMatcher()
weight_dict = weight_dict = {'loss_ce': 1, 'loss_bbox': 1 , 'loss_giou': 1}
losses = ['labels', 'boxes', 'cardinality']
# postprocessors
postprocessors = {'bbox': PostProcess()}
Dataset¶
원래 coco dataset의 directory는 다음과 같습니다.
path/to/coco/
annotations/ # annotation json files
train2017/ # train images
val2017/ # val images
주최측에서 하나의 폴더 안에 모든 train image와 이에 대한 하나의 annotation 파일을 제공해주었습니다.
이를 train image에서 일정부분 추출하여 이에 matching되는 annotation 파일을 만들어 분리를 해야하지만 적합한 방법을 찾지 못했습니다.
그래서 모든 train image를 불러와 각각 train transform, val transform을 통해 두 dataset를 만들고 rand permutation을 통해 split을 해주었습니다.
ref: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
def get_dataset(root, split_ratio = 0.8, mini = False):
'''
Custom Dataset
For validation data set, I just use same file and annotation, but different transformation
and then split these by random permutation indices (code from pytorch object detection example)
inputs:
- root(str): data root folder
- split_ratio(float): ratio of spliting data into train & val
- mini(bool): whether to shirink dataset or not (usually for overfitting to check model's performance fast)
returns:
- dataset_train(dataset): dataset after train trasformation
- dataset_val(dataset): dataset after val transformation
'''
PATHS = {
'train': (root / 'Train_imgs', root / 'Train_label.json'),
'val': (root/'Train_imgs', root/ 'Train_label.json')
}
img_folder, ann_file = PATHS['train']
dataset_train = coco.CocoDetection(img_folder, ann_file, transforms=coco.make_coco_transforms('train'), return_masks=False)
dataset_val = coco.CocoDetection(img_folder, ann_file, transforms=coco.make_coco_transforms('val'), return_masks=False)
# split dataset into train and val
dset_size = len(dataset_train)
indices = torch.randperm(len(dataset_train)).tolist()
dataset_train = torch.utils.data.Subset(dataset_train, indices[:int(dset_size * split_ratio)])
dataset_val = torch.utils.data.Subset(dataset_val, indices[int(dset_size * split_ratio):])
if mini:
dataset_train = torch.utils.data.Subset(dataset_train, torch.arange(120))
dataset_val = torch.utils.data.Subset(dataset_val, torch.arange(40))
return dataset_train, dataset_val
Model(DETR)¶
Fine Tuning을 하기 위해
1) pretrained model load
2) init class embed
3) init query_embed if init_embed
4) gradient = False if freeze (except for class_embed or query_embed)
(cf: class_embed: class 예측하는 layer (nn.Linear) query_embed: num_queries에 따라 bounding box 예측 layer)
class DETRModel(nn.Module):
def __init__(self, num_classes, num_queries, init_embed=False):
'''
for finetuning change the fc(num classes) and num queries
make pretrained DETR model and modify number of classes and queires of it.
inputs:
- num_classes(int)
- num_queries(int)
'''
super(DETRModel,self).__init__()
self.num_classes = num_classes
self.num_queries = num_queries
self.model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
self.in_features = self.model.class_embed.in_features
self.model.class_embed = nn.Linear(in_features=self.in_features,out_features=self.num_classes)
torch.nn.init.normal_(self.model.class_embed.weight, mean=0.0, std=0.02)
torch.nn.init.zeros_(self.model.class_embed.bias)
if init_embed:
self.hidden_dim = self.model.query_embed.embedding_dim
self.model.query_embed = nn.Embedding(self.num_queries, self.hidden_dim)
def forward(self,images):
'''
inputs:
- images(list):
returns:
- output: img, dict
'''
return self.model(images)
def get_model(num_classes=14+1, num_queries=100, freeze = True, load = False, init_embed = False):
'''
inputs:
- freeze(bool): not use backprop of the network except for the linear classification(class_embed)
- load(bool): whether to load from checkpoint or not
returns:
- model
'''
model = DETRModel(num_classes=num_classes,num_queries=num_queries)
if freeze:
for name, param in model.named_parameters():
if init_embed:
if 'query_embed' in name or 'class_embed' in name:
param.requires_grad = True
else:
param.requires_grad = False
else:
if 'class_embed' in name:
param.requires_grad = True
else:
param.requires_grad = False
if load:
PATH = os.path.join(os.getcwd(), 'detr_best_.pth')
model.load_state_dict(torch.load(PATH))
return model
util functions¶
대부분의 경우 detr/engin.py copy & paste입니다.
def train_fn(data_loader, model, criterion, optimizer, device, epoch, max_norm):
model.train()
criterion.train()
###
metric_logger = utils.MetricLogger(delimiter=" ")
metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
metric_logger.add_meter('class_error', utils.SmoothedValue(window_size=1, fmt='{value:.2f}'))
header = 'Epoch: [{}]'.format(epoch)
print_freq = 10
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
output = model(images)
loss_dict = criterion(output, targets)
weight_dict = criterion.weight_dict
losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict)
###
loss_dict_reduced = utils.reduce_dict(loss_dict)
loss_dict_reduced_unscaled = {f'{k}_unscaled': v
for k, v in loss_dict_reduced.items()}
loss_dict_reduced_scaled = {k: v * weight_dict[k]
for k, v in loss_dict_reduced.items() if k in weight_dict}
losses_reduced_scaled = sum(loss_dict_reduced_scaled.values())
loss_value = losses_reduced_scaled.item()
optimizer.zero_grad()
losses.backward()
if max_norm > 0:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
optimizer.step()
metric_logger.update(loss=loss_value, **loss_dict_reduced_scaled, **loss_dict_reduced_unscaled)
metric_logger.update(class_error=loss_dict_reduced['class_error'])
metric_logger.update(lr=optimizer.param_groups[0]["lr"])
# gather the stats from all processes
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
def eval_fn(data_loader, model,criterion, postprocessors, device, base_ds):
model.eval()
criterion.eval()
metric_logger = utils.MetricLogger(delimiter=" ")
metric_logger.add_meter('class_error', utils.SmoothedValue(window_size=1, fmt='{value:.2f}'))
header = 'Test:'
iou_types = tuple(k for k in ('segm', 'bbox') if k in postprocessors.keys())
coco_evaluator = CocoEvaluator(base_ds, iou_types)
coco_evaluator = None # for fast, if you want to check IoU and other coco evaluations apply this line as comment
with torch.no_grad():
for images, targets in metric_logger.log_every(data_loader, 10, header):
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
outputs = model(images)
loss_dict = criterion(outputs, targets)
weight_dict = criterion.weight_dict
loss_dict_reduced = utils.reduce_dict(loss_dict)
loss_dict_reduced_scaled = {k: v * weight_dict[k]
for k, v in loss_dict_reduced.items() if k in weight_dict}
loss_dict_reduced_unscaled = {f'{k}_unscaled': v
for k, v in loss_dict_reduced.items()}
metric_logger.update(loss=sum(loss_dict_reduced_scaled.values()),
**loss_dict_reduced_scaled,
**loss_dict_reduced_unscaled)
metric_logger.update(class_error=loss_dict_reduced['class_error'])
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
results = postprocessors['bbox'](outputs, orig_target_sizes)
res = {target['image_id'].item(): output for target, output in zip(targets, results)}
if coco_evaluator is not None:
coco_evaluator.update(res)
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
if coco_evaluator is not None:
coco_evaluator.synchronize_between_processes()
coco_evaluator.eval_imgs['bbox'] = [coco_evaluator.eval_imgs['bbox']]
if coco_evaluator is not None:
coco_evaluator.accumulate()
coco_evaluator.summarize()
stats = {k: meter.global_avg for k, meter in metric_logger.meters.items()}
if coco_evaluator is not None:
if 'bbox' in postprocessors.keys():
stats['coco_eval_bbox'] = coco_evaluator.coco_eval['bbox'].stats.tolist()
return stats, coco_evaluator
def collate_fn(batch):
return tuple(zip(*batch))
1) load & save
colab 환경에서는 자주 런타임이 끊기기 때문에 checkpoint를 통해 계속해서 model을 추적해 나가야 했습니다. detr/engine.py에서 start epoch과 end eopch을 통해 이러한 설계방안을 채택한 것을 그대로 이용했습니다.
2) dataset
하나의 image dataset에서 split을 하는 것이라 random permutation을 매번 진행할 경우 두 dataset이 섞이는 경우가 생깁니다. cross validation을 적용할 것이기 아니기 때문에 seed를 통해 이러한 경우를 방지하려고 했습니다.
def run(config):
torch.manual_seed(42) # for finetune, dataset
model = get_model(num_classes = num_classes,
num_queries= num_queries,
freeze = config['freeze'],
load = True,
init_embed = config['init_embed'])
batch_size = config['batch_size']
# dataset
dataset_train, dataset_val = get_dataset(root, mini=False)
# make data loader
sample_train = torch.utils.data.RandomSampler(dataset_train)
batch_sampler_train = torch.utils.data.BatchSampler(sample_train,
batch_size, drop_last=True)
sampler_val = torch.utils.data.SequentialSampler(dataset_val)
data_loader_train = DataLoader(dataset_train, batch_sampler=batch_sampler_train,
collate_fn=collate_fn, num_workers=0)
data_loader_val = DataLoader(dataset_val, batch_size, sampler=sampler_val,
drop_last=False, collate_fn = collate_fn, num_workers=0)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
if config['optim'] == 'momentum':
optimizer = torch.optim.SGD(model.parameters(), lr=config['LR'], momentum = 0.9)
elif config['optim'] == 'AdamW':
optimizer = torch.optim.AdamW(model.parameters(), lr=config['LR'])
if os.path.isfile(os.path.join(os.getcwd(), f"checkpoint/detr{config['start_epoch']-1:04}.pth")):
ckpt_path = os.path.join(os.getcwd(), f"checkpoint/detr{config['start_epoch']-1:04}.pth")
print('model, optimizer loading from %s' % (ckpt_path))
ckpt = torch.load(ckpt_path)
model.load_state_dict(ckpt['model_state_dict'])
optimizer.load_state_dict(ckpt['optimizer_state_dict'])
for state in optimizer.state.values(): # https://developers-shack.tistory.com/6
for k, v in state.items():
if torch.is_tensor(v):
state[k] = v.to(device)
model = model.to(device)
criterion = SetCriterion(num_classes-1, matcher, weight_dict, eos_coef = config['null_class_coef'], losses=losses)
criterion = criterion.to(device)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100)
if os.path.isfile(os.path.join(os.getcwd(), 'checkpoint/detr_best.pth')):
checkpoint = torch.load(os.path.join(os.getcwd(), 'checkpoint/detr_best.pth'))
best_loss = checkpoint['loss']
else:
best_loss = math.inf
print('#'*100)
print("best_loss: ", best_loss)
print('#'*100)
for epoch in range(config['start_epoch'], config['end_epoch'] + 1):
train_stats = train_fn(data_loader_train, model,criterion, optimizer,device, epoch, config['max_norm'])
lr_scheduler.step()
test_stats, coco_evaluator= eval_fn(data_loader_val, model,criterion, postprocessors, device,base_ds = get_coco_api_from_dataset(dataset_val))
log_stats = {**{f'train_{k}': v for k, v in train_stats.items()},
**{f'test_{k}': v for k, v in test_stats.items()},
'epoch': epoch,
}
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': test_stats['loss']
}, os.path.join(os.getcwd(), f'checkpoint/detr{epoch:04}.pth'))
if best_loss > test_stats['loss']:
best_loss = test_stats['loss']
torch.save({
'epoch': copy.deepcopy(epoch),
'model_state_dict': copy.deepcopy(model.state_dict()),
'optimizer_state_dict': copy.deepcopy(optimizer.state_dict()),
'loss': copy.deepcopy(test_stats['loss'])
}, os.path.join(os.getcwd(), f'checkpoint/detr_best.pth'))
print('#'*100)
print("best_loss: ", best_loss)
print('#'*100)
with open(os.path.join(os.getcwd(), "work_dirs/log.txt"), "a") as f:
f.write(json.dumps(log_stats) + "\n")
# return model, test_stats # for hyperparameter tuning
hyperparameter tuning¶
모델의 성능을 최적으로 끌어올리기 위해 hyper parameter들을 random으로 조합하여 model의 결과값을 확인하였습니다. mini dataset을 통해 overfitting하려고 시도했으며 예상만큼 loss가 줄지는 않았습니다. 이에 대한 해석으로
1) model 설계 문제
2) DETR model의 특성(높은 성능을 위해 많은 epoch 필요)
(cf: 논문 훈련 GPU : 16개의 V100 GPU
epoch : 300 (Faster R-CNN의 경우 500)
시간 : 72 시간
ref: https://keyog.tistory.com/32 )
2가지가 존재하는 것 같습니다.
결과값들은 log, tune.txt에 모두 기록하여 분석이 가능하도록 설계하였습니다.
# # hyper parameter tuning
'''
hyperparameter search space(tune)
search_space = {
"LR": [10 ** random.uniform(-6, -1) for _ in range(50)],
'null_class_coef': [0.5, 0.3, 0.7, 0.9],
'init_embed': [True, False],
'freeze': [True, False],
'batch_size':[1, 2],
'max_norm': [random.uniform(0.05, 0.2) for _ in range(5)],
'optim': ['momentum', 'AdamW'],
'start_epoch': 0,
'end_epoch': 40
}
'''
# num_sample = 10
# best_loss = math.inf
# best_cfg = None
# best_model = None
# for _ in range(num_sample):
# sample = {}
# for k, v in search_space.items():
# sample[k] = random.choice(v)
# DETR, stats = run(sample)
# if best_loss > stats['loss']:
# best_loss = stats['loss']
# best_cfg = sample
# best_model = DETR
# with open(os.path.join(os.getcwd(), "work_dirs/tune.txt"), "a") as f:
# f.write(json.dumps(sample) + "\n")
# f.write(json.dumps(stats) + '\n')
# f.write('#'*500 + '\n')
# with open(os.path.join(os.getcwd(), 'work_dirs/best_tune.txt'), "a") as f:
# f.write(json.dumps(best_cfg) + '\n')
train & eval¶
전체 train dataset 크기: 24,650 (1920 x 1080)
train split: 24,650 x 0.8
val split: 24,650 x 0.2
1 epoch (train & val): 대략 4~5 시간 (colab pro)
run(config)
visualize sample¶
모델의 학습 결과를 대략적으로 확인하기 위해 val dataset을 통해 bounding box와 label을 비교할 수 있도록 작성하였습니다.
먼저,
1) load from checkpoint(best model or latest model)
2) get dataset(val)
3) bounding box 처리 (x, y, c_x, c_y) -> (x1, y1, x2, y2)
4) label 처리 (idx -> class name)
5) pred output, ground truth output visualize
# load best model, optimizer
best_model = get_model()
if config['optim'] == 'momentum':
optimizer = torch.optim.SGD(best_model.parameters(), lr=config['LR'], momentum = 0.9)
elif config['optim'] == 'AdamW':
optimizer = torch.optim.AdamW(best_model.parameters(), lr=config['LR'])
# checkpoint
ckpt_path = os.path.join(os.getcwd(), 'checkpoint/detr0003.pth')
ckpt = torch.load(ckpt_path)
best_model.load_state_dict(ckpt['model_state_dict'])
optimizer.load_state_dict(ckpt['optimizer_state_dict'])
epoch = ckpt['epoch']
loss = ckpt['loss']
Using cache found in /root/.cache/torch/hub/facebookresearch_detr_main
# label list
# 0: 'NO_OBJECT'
# 1: '세단(승용차)'
# 2: 'SUV'
# 3: '승합차
# 4: '버스'
# 5: '학원차량(통학버스)'
# 6: '트럭'
# 7: '택시'
# 8: '성인'
# 9: '어린이'
# 10: '오토바이'
# 11: '전동킥보드'
# 12: '자전거'
# 13: '유모차'
# 14: '쇼핑카트
batch_size = 2
# transform for inference
import torchvision.transforms as T
transform = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
_, dataset_val = get_dataset(root, mini=True)
sampler_val = torch.utils.data.SequentialSampler(dataset_val)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
data_loader_val = DataLoader(dataset_val, batch_size, sampler=sampler_val,
drop_last=False, collate_fn = collate_fn, num_workers=0)
imgs, tgts= {}, {}
for i, (x, y) in enumerate(data_loader_val):
imgs[i], tgts[i] = x, y
loading annotations into memory... Done (t=0.89s) creating index... index created! loading annotations into memory... Done (t=1.49s) creating index... index created!
threshold = 0.3 # before submitting you should naively check by visualizing
torch.manual_seed(42)
images = list(img.to(device) for img in imgs[10])
targets = [{k: v.to(device) for k, v in t.items()} for t in tgts[10]]
image_id = targets[0]['image_id']
with open(os.path.join(os.getcwd(), 'data/Train_label.json'), 'r') as file:
json_file = json.load(file)
images_info = json_file['images']
file_name = images_info[image_id]['file_name']
img_path = os.path.join(os.getcwd(), 'data/Train_imgs/' + file_name)
img = Image.open(img_path).convert("RGB")
W, H = img.size
with torch.no_grad():
best_model.eval()
best_model.to(device)
im = transform(img)
im = im.unsqueeze(0)
outputs = best_model(im.to(device))
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > threshold
# for cv2.putText use Enligsh not Korean
classes = ['NO_OBJECT', 'sedan', 'SUV', 'van',
'bus', 'school bus', 'truck', 'taxi',
'adult', 'child', 'motorcycle', 'kickboard',
'bicycle', 'stroller', 'shopping cart']
id_to_c = {i: cls for i, cls in enumerate(classes)}
labels = torch.argmax(outputs['pred_logits'][0, keep], dim=-1)
pred_labels = [id_to_c[label] for label in labels.tolist()]
gt_labels = [id_to_c[label.cpu().tolist()] for label in targets[0]['labels']]
score, _ = torch.max(probas[keep], dim = -1)
x_c, y_c, w, h = outputs['pred_boxes'][0, keep].unbind(1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
(x_c + 0.5 * w), (y_c + 0.5 * h)]
boxes = torch.stack(b, dim=1)
preds = boxes * torch.tensor([W, H, W, H], dtype=torch.float32, device = device)
gt_boxes = targets[0]['boxes'] * torch.tensor([W, H, W, H], dtype = torch.float32, device = device)
gt_boxes_np= gt_boxes.detach().cpu().numpy()
c_x, c_y, w, h = gt_boxes_np[:, 0], gt_boxes_np[:, 1], gt_boxes_np[:, 2], gt_boxes_np[:, 3]
x1, y1, x2, y2 = (c_x - w/2).astype(np.int64), (c_y - h/2).astype(np.int64), (c_x + w/2).astype(np.int64), (c_y + h/2).astype(np.int64)
pred = np.array(preds.cpu(), dtype=int)
plt.figure(figsize = (100,50))
img_copy = img.copy()
img_np = np.array(img_copy)
# pred box
for i, p in enumerate(pred):
cv2.rectangle(img_np,
(p[0], p[1]),
(p[2], p[3]),
color=(255, 0, 0), thickness = 5 )
cv2.putText(img_np,
pred_labels[i],
(p[0], p[1]-10),
cv2.FONT_HERSHEY_SIMPLEX,
0.9,
(255,0,0),
2)
# ground truth box
for i in range(len(x1)):
cv2.rectangle(img_np,
(x1[i].astype(int), y1[i].astype(int)),
(x2[i].astype(int), y2[i].astype(int)), color=(0, 255, 0), thickness = 10 )
cv2.putText(img_np,
gt_labels[i],
((x1[i] + x2[i])//2, (y1[i]+y2[i])//2),
cv2.FONT_HERSHEY_SIMPLEX,
0.9,
(0,255,0),
2)
plt.imshow(img_np)
plt.show()
Output hidden; open in https://colab.research.google.com to view.
Inference and submit¶
1) Test_images_info.json(image info, coco format)을 통해 test image dataset에 접근
2) model에 transform(image) feed
3) 결과값 dictionery 형태로 list에 append
4) write json
cf) Test_images_info.json
{
"images": [
{
"file_name": "image1.png",
"license": null,
"coco_url": null,
"height": 1080,
"width": 1920,
"data_captured": null,
"flickr_url": null,
"id": 0
},
{
"file_name": "image2.png",
"license": null,
"coco_url": null,
"height": 1080,
"width": 1920,
"data_captured": null,
"flickr_url": null,
"id": 1
},
...
]
}
threshold = 0.3
with open(os.path.join(os.getcwd(), 'data/Test_images_info.json'), 'r') as j:
image_info = json.load(j)
submission_anno = list()
for img_info in image_info['images']:
file_name = img_info['file_name']
img_path = os.path.join(os.getcwd(), 'data/test/images/' + file_name)
img = Image.open(img_path).convert("RGB")
W, H = img.size
with torch.no_grad():
model_copy.eval()
im = transform(img)
outputs = model_copy(im.unsqueeze(0))
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > threshold
pred_labels = torch.argmax(outputs['pred_logits'][0, keep], dim=-1).tolist()
scores, _ = torch.max(probas[keep], dim = -1)
if len(pred_labels) == 0:
continue
x_c, y_c, w, h = outputs['pred_boxes'][0, keep].unbind(1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
(w), (h)]
boxes = torch.stack(b, dim=1)
preds_b = boxes * torch.tensor([W, H, W, H], dtype=torch.float32)
for i in range(len(pred_labels)):
tmp_dict = dict()
tmp_dict['image_id'] = img_info['id']
tmp_dict['bbox'] = preds_b[i].tolist()
tmp_dict['category_id'] = pred_labels[i]
tmp_dict['score'] = scores[i].item()
tmp_dict['segmentation'] = []
submission_anno.append(tmp_dict)
with open('./sample_submission.json','w',encoding='utf-8') as f:
json.dump(submission_anno,f,ensure_ascii=False)
아쉬운점¶
개인
- 너무 준비되지 않은 상태에서 project에 혼자 도전하다보니 공부를 하는 것이 아닌 code copy & paste 형식으로 진행이 된 점
- directory split을 해결하지 못한 점
- 제출도 해보지 못한 점
- 성능이 너무 좋지 않게 나온 점
- 많은 GPU 연산이 필요한 transformer(DETR) 모델이 아닌 다른 모델을 도전해 봤으면 어떤 결과가 나왔을 지에 대한 궁금점
주최측
- zip 파일로 data를 제공해주다보니 load하는 데에 시간을 너무 많이 소요
- 고해상도의 image파일들을 너무 많이 제공해주다 보니 model 훈련에 시간, 자원이 너무 많이 소요됨
(학생이나 소규모 사업가 입장에서 부담이 되었을 것) - train dataset을 통해 알 수 있듯이 annotaion이 적합하게 되어있지 않은 파일들이 존재
앞으로 학습 방향¶
이번 프로젝트를 통해 많은 detection open source, github들을 살펴보게 되었다.
이를 통해 알게 된 점은 크게
1) argparse 라이브러리를 통한 parse
2) metric logger를 통한 출력
3) coco dataset library를 통한 data 변환
이 주로 사용된다는 점이다.
다양한 task(classification, detection, segmentation, NLP, ...)에 대한 알고리즘을 공부하면서 딥러닝 전반에 대한 공부가 필요한 시점인 것 같다.
code ref
1) https://github.com/facebookresearch/detr.git
2) https://www.kaggle.com/code/tanulsingh077/end-to-end-object-detection-with-transformers-detr/notebook
3) https://keyog.tistory.com/32
4) https://developers-shack.tistory.com/6
5) https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
cf) 데이터의 경우 주최측 제한사항에도 있고 github에도 안 올라가니 모두 삭제