DIFFUSE: Disentanglement of Features For Utilization in Systematic Evaluation

Training and validating machine learning based methods will commonly require large datasets. These datasets can, unfortunately, only approximate reality and are in many cases not globally applicable, that is data collected in one place on earth may not necessarily be representative globally. The ambition of the DIFFUSE project is to develop methods for generations of data to allow for an increased control of what datasets contain and by extension what they validate.

One challenge that still remains in generation of datasets is to create a good combination of realism, control and variation. In the DIFFUSE project we propose an improvement of current algorithms for data generation by developing their ability to disentangle features in the input. That is to say a specific part of the input should control a specific and understandable part of the output data. This has applications in increasing the understanding of what a generated dataset contains to give a clearer picture of what situations a network trained on it could be expected to work.

The work is divided into five work packages: Administration & Dissemination, Feature Disentanglement, Authentication, Data Generation and Evaluation.