Industry project: Evaluation of Deep Learning Data Augmentation Techniques

Potential supervisors: 
Research groups/keywords: 



Deep learning has a potential to substantially improve the performance for a radar system. A common development obstacle for a successful application of deep learning techniques in many areas is the availability of training data, an uneven sampling of classes in training data and that the training data may be poorly labelled. There are several methods to synthetically augment limited training data. Recently, a new technique has also been suggested. This is based on the use of GAN (generative adversarial networks). It is of interest to study the characteristics of these new methods, and the other methods, and their applicability for e.g. classification problems in radar systems.


Training a GAN to generate synthetic data for the target classification problem is interesting to explore since real world measured data is often scarce and expensive to collect in sufficient amounts. Furthermore, it is time consuming both to collect and label. Hence, using GAN-techniques for data augmentation could potentially alleviate this process and access to larger amounts of training data, both real world and synthetic, can potentially result in a better classifier than achieve in a previous Maste thesis project, only using available real world data.



The scope for the MSc thesis is to characterize and evaluate the discrepancy and deviations for a synthetically generated data distribution when compared to the true, or empirical, data distribution. The empirical distribution can be based on  the training dataset. There are several methods that can be used to compare distributions. These can be in the statistical domain (e.g. use techniques from the Kolmogorov-Smirnov tests, Anderson-Darling test), the information theory domain (e.g. Kullback-Leibler divergence, mutual information or other related metrics/measures) or other measures. It can be also through their interpretation in a classification setting. In some papers this is expressed that there is a ‘covariate shift’ between the true and synthetic data distributions. Or more formally stated that while the classification distributions may be identical, i.e.  , this does not mean that the data distributions are identical, i.e.  . For the classification problem, this may however not pose a problem if classification performance is identical.   


For data augmentation using a GAN, there are also known issues with these (for example mode collapse in the generative model) but their impact on classification may be negligible. Also, GAN have mostly been used in the image domain. For this thesis, the data consists of time sequences that increase regularly (e.g. the radar provides measurements with a regular update).  This can affect the selection of suitable network architectures for the problem.


An alternative to GANs are normalizing flows, a family of generative models which produces tractable distributions where both sampling and density evaluation can be efficient and exact. see (  Support is available in the TensorFlow framework via the Bijector class: (



Recorded data from multiple test campaigns with radar from Saab including sensor registrations from drones and birds. The data set available for this project consists of pre-recorded samples from various locations, recorded on the same type of radar system. A single sample is a track of various length, consisting of its associated plots. The plots include all features extracted by the tracker. The features includes all measurable physical properties such as dimension and velocity in 3 dimensions, RCS, range extension and spectral widening, etc.


The project is suitable for 2 students with strong background in Machine learning and/or data science.

Contact at Saab: Warston Håkan,



Classification between birds and UAVs using recurrent neural networks, MSc Thesis - Henrik Andersson & Chi Thong Luong,  2019

Date range: 
November, 2019 to November, 2024