Masters Thesis

Enhancing Object Detection Models Via Synthetic Datasets

Object detection performed by deep neural networks is a popular research task with diverse real-life applications; these target applications will typically have semantic classes outside of the label maps of publically-available, pre-trained models. That is, custom object detection model training is necessary to fill the needs of every potential use-case. Moreover, the limitations of training a robust model exist not only in the creation of advanced network architectures but also in inadequate datasets. Object detection datasets require tedious manual work in order to annotate every object instance in an image with its bounding box and class label. A promising approach to addressing this challenge is to use automatically-labeled, synthetic data which is generated through rendering of 3D object models. Application of graphics rendering to dataset generation can be done in a classic non-differentiable manner and, more recently, through differentiable neural rendering. How to utilize synthetic data to train deep neural networks that can effectively operate on real data is an open research problem and provides motivation to study an effective approach to fill this gap. This thesis contributes (1) a method to generate annotated datasets of rendered objects; (2) a pipeline for adversarial learning via neural rendering, iterative FGSM, and object detection models; (3) an investigation into the effectiveness of the aforementioned approaches through fine-tuning object detection models and evaluating on a real image dataset. The concatenation of the synthetic datasets produced by (1) and (2) with an additional real image dataset outperforms the model baseline trained on only the real dataset by 1.5 and 1.7 points of  mAP@0.50:0.05:0.95 respectively.

Le relazioni