Abstract: When working with limited training data, training a deep neural network without causing overfitting can be a challenge. To address this issue, a new data augmentation network called the Adversarial Data Augmentation Network (ADAN) has been proposed in this article. The ADAN is based on Generative Adversarial Networks (GANs) and consists of a GAN, an autoencoder, and an auxiliary classifier. These networks are trained adversarially to synthesize class-dependent feature vectors in both the latent space and the original feature space, which can then be used to augment the real training data for training classifiers. Instead of using the conventional cross-entropy loss for adversarial training, the Wasserstein divergence is used to produce high-quality synthetic samples.The proposed networks were applied to speech emotion recognition using EmoDB and IEMOCAP as the evaluation datasets. By making the synthetic latent vectors and the real latent vectors share a common representation, the gradient vanishing problem can be largely alleviated. Results show that the augmented data generated by the proposed networks are rich in emotional information


PDF | DOI: 10.17148/IARJSET.2023.10836

Open chat