A Comprehensive Analysis of Data Augmentation Methods for Speech Emotion Recognition

dc.contributor.author Umut Avci
dc.date.accessioned 2025-10-06T16:19:53Z
dc.date.issued 2025
dc.description.abstract The limited availability of labeled emotional speech data remains a significant challenge in the development of robust speech emotion recognition systems. This paper presents a comprehensive investigation of the effectiveness of diverse data augmentation strategies for enhancing emotion recognition performance. Three different data augmentation categories were examined: audio-based transformations image-based modifications and feature-level synthesis. Seventeen transformations were used in audio-based data augmentation to change the time and frequency content of the raw audio signal. Eight transformations such as shifting rotating and zooming were applied to the spectrogram images for image-based data augmentation. The SpecAugment method was also used to transform the spectrograms into versions with masked time and frequency axes. In feature-space-based approaches new feature vectors were generated using five oversampling algorithms and a generative adversarial network. Experimental results from the EMO-DB and IEMOCAP datasets demonstrate that the data augmentation approaches enhance emotion classification performance by up to six percent. Empirical evidence indicates that training sets augmented through combinations of audio-based transformations yield the highest performance gains. In contrast the GAN-based approach fails to improve the classification performance.
dc.identifier.doi 10.1109/ACCESS.2025.3578143
dc.identifier.issn 2169-3536
dc.identifier.uri http://dx.doi.org/10.1109/ACCESS.2025.3578143
dc.identifier.uri https://gcris.yasar.edu.tr/handle/123456789/6068
dc.language.iso English
dc.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof IEEE Access
dc.source IEEE ACCESS
dc.subject Data augmentation, Data augmentation, speech emotion recognition, speech emotion recognition, supervised learning, supervised learning
dc.subject MODEL
dc.title A Comprehensive Analysis of Data Augmentation Methods for Speech Emotion Recognition
dc.type Article
dspace.entity.type Publication
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial true
gdc.description.endpage 111669
gdc.description.startpage 111647
gdc.description.volume 13
gdc.identifier.openalex W4411143102
gdc.index.type WoS
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.3811355E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Data augmentation
gdc.oaire.keywords speech emotion recognition
gdc.oaire.keywords Electrical engineering. Electronics. Nuclear engineering
gdc.oaire.keywords supervised learning
gdc.oaire.keywords TK1-9971
gdc.oaire.popularity 2.5970819E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 4.7229
gdc.openalex.normalizedpercentile 0.95
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 11
gdc.plumx.newscount 1
gdc.plumx.scopuscites 4
oaire.citation.endPage 111669
oaire.citation.startPage 111647
publicationvolume.volumeNumber 13
relation.isOrgUnitOfPublication ac5ddece-c76d-476d-ab30-e4d3029dee37
relation.isOrgUnitOfPublication.latestForDiscovery ac5ddece-c76d-476d-ab30-e4d3029dee37

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
A_Comprehensive_Analysis_of_Data_Augmentation_Methods_for_Speech_Emotion_Recognition.pdf
Size:
2.21 MB
Format:
Adobe Portable Document Format