IMPROVING SPEECH EMOTION RECOGNITION WITH ADVERSARIAL DATA AUGMENTATION NETWORK

Vismaya.S; Sandeep. NK

doi:10.17148/IARJSET.2023.10836

← Back to VOLUME 10, ISSUE 8, AUGUST 2023

IMPROVING SPEECH EMOTION RECOGNITION WITH ADVERSARIAL DATA AUGMENTATION NETWORK

Vismaya.S, Prof. Sandeep. NK

Downloads: Download PDF|DOI: 10.17148/IARJSET.2023.10836

👁 11 views📥 0 downloads

Abstract: When working with limited training data, training a deep neural network without causing overfitting can be a challenge. To address this issue, a new data augmentation network called the Adversarial Data Augmentation Network (ADAN) has been proposed in this article. The ADAN is based on Generative Adversarial Networks (GANs) and consists of a GAN, an autoencoder, and an auxiliary classifier. These networks are trained adversarially to synthesize class-dependent feature vectors in both the latent space and the original feature space, which can then be used to augment the real training data for training classifiers. Instead of using the conventional cross-entropy loss for adversarial training, the Wasserstein divergence is used to produce high-quality synthetic samples.The proposed networks were applied to speech emotion recognition using EmoDB and IEMOCAP as the evaluation datasets. By making the synthetic latent vectors and the real latent vectors share a common representation, the gradient vanishing problem can be largely alleviated. Results show that the augmented data generated by the proposed networks are rich in emotional information

How to Cite:

[1] Vismaya.S, Prof. Sandeep. NK, “IMPROVING SPEECH EMOTION RECOGNITION WITH ADVERSARIAL DATA AUGMENTATION NETWORK,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2023.10836

This work is licensed under a Creative Commons Attribution 4.0 International License.