A Comprehensive Analysis of Data Augmentation Methods for Speech Emotion Recognition
| dc.contributor.author | Umut Avci | |
| dc.contributor.author | Avci, Umut | |
| dc.date.accessioned | 2025-10-06T17:48:46Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | The limited availability of labeled emotional speech data remains a significant challenge in the development of robust speech emotion recognition systems. This paper presents a comprehensive investigation of the effectiveness of diverse data augmentation strategies for enhancing emotion recognition performance. Three different data augmentation categories were examined: audio-based transformations image-based modifications and feature-level synthesis. Seventeen transformations were used in audio-based data augmentation to change the time and frequency content of the raw audio signal. Eight transformations such as shifting rotating and zooming were applied to the spectrogram images for image-based data augmentation. The SpecAugment method was also used to transform the spectrograms into versions with masked time and frequency axes. In feature-space-based approaches new feature vectors were generated using five oversampling algorithms and a generative adversarial network. Experimental results from the EMO-DB and IEMOCAP datasets demonstrate that the data augmentation approaches enhance emotion classification performance by up to six percent. Empirical evidence indicates that training sets augmented through combinations of audio-based transformations yield the highest performance gains. In contrast the GAN-based approach fails to improve the classification performance. © 2025 Elsevier B.V. All rights reserved. | |
| dc.identifier.doi | 10.1109/ACCESS.2025.3578143 | |
| dc.identifier.issn | 21693536 | |
| dc.identifier.issn | 2169-3536 | |
| dc.identifier.scopus | 2-s2.0-105008014125 | |
| dc.identifier.uri | https://www.scopus.com/inward/record.uri?eid=2-s2.0-105008014125&doi=10.1109%2FACCESS.2025.3578143&partnerID=40&md5=14de59bda9bba1dc24cf98e56068446b | |
| dc.identifier.uri | https://gcris.yasar.edu.tr/handle/123456789/8100 | |
| dc.identifier.uri | https://doi.org/10.1109/ACCESS.2025.3578143 | |
| dc.language.iso | English | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.relation.ispartof | IEEE Access | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.source | IEEE Access | |
| dc.subject | Data Augmentation, Speech Emotion Recognition, Supervised Learning, Emotion Recognition, Image Processing, Labeled Data, Psychology Computing, Spectrographs, Speech Analysis, Speech Communication, Analysis Of Data, Audio-based, Augmentation Methods, Classification Performance, Comprehensive Analysis, Data Augmentation, Image-based, Spectrograms, Speech Emotion Recognition, Time And Frequencies, Supervised Learning | |
| dc.subject | Emotion Recognition, Image processing, Labeled data, Psychology computing, Spectrographs, Speech analysis, Speech communication, Analysis of data, Audio-based, Augmentation methods, Classification performance, Comprehensive analysis, Data augmentation, Image-based, Spectrograms, Speech emotion recognition, Time and frequencies, Supervised learning | |
| dc.subject | Data Augmentation | |
| dc.subject | Speech Emotion Recognition | |
| dc.subject | Supervised Learning | |
| dc.title | A Comprehensive Analysis of Data Augmentation Methods for Speech Emotion Recognition | |
| dc.type | Article | |
| dspace.entity.type | Publication | |
| gdc.author.id | Avcı, Umut/0000-0002-7433-8704 | |
| gdc.author.institutional | Avci, Umut (35486827300) | |
| gdc.author.scopusid | 35486827300 | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C5 | |
| gdc.coar.type | text::journal::journal article | |
| gdc.collaboration.industrial | true | |
| gdc.description.department | ||
| gdc.description.departmenttemp | [Avci, Umut] Yasar Univ, Dept Software Engn, TR-35100 Bornova, Izmir, Turkiye | |
| gdc.description.endpage | 111669 | |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| gdc.description.startpage | 111647 | |
| gdc.description.volume | 13 | |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.identifier.openalex | W4411143102 | |
| gdc.identifier.wos | WOS:001522922600037 | |
| gdc.index.type | Scopus | |
| gdc.index.type | WoS | |
| gdc.oaire.accesstype | GOLD | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 0.0 | |
| gdc.oaire.influence | 2.3811355E-9 | |
| gdc.oaire.isgreen | false | |
| gdc.oaire.keywords | Data augmentation | |
| gdc.oaire.keywords | speech emotion recognition | |
| gdc.oaire.keywords | Electrical engineering. Electronics. Nuclear engineering | |
| gdc.oaire.keywords | supervised learning | |
| gdc.oaire.keywords | TK1-9971 | |
| gdc.oaire.popularity | 2.5970819E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.openalex.collaboration | International | |
| gdc.openalex.fwci | 4.7229 | |
| gdc.openalex.normalizedpercentile | 0.95 | |
| gdc.openalex.toppercent | TOP 10% | |
| gdc.opencitations.count | 0 | |
| gdc.plumx.mendeley | 11 | |
| gdc.plumx.newscount | 1 | |
| gdc.plumx.scopuscites | 4 | |
| gdc.scopus.citedcount | 4 | |
| gdc.virtual.author | Avci, Umut | |
| gdc.wos.citedcount | 2 | |
| oaire.citation.endPage | 111669 | |
| oaire.citation.startPage | 111647 | |
| person.identifier.scopus-author-id | Avci- Umut (35486827300) | |
| publicationvolume.volumeNumber | 13 | |
| relation.isAuthorOfPublication | eef28c59-7e0f-4fac-8038-3669108015bc | |
| relation.isAuthorOfPublication.latestForDiscovery | eef28c59-7e0f-4fac-8038-3669108015bc | |
| relation.isOrgUnitOfPublication | ac5ddece-c76d-476d-ab30-e4d3029dee37 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | ac5ddece-c76d-476d-ab30-e4d3029dee37 |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- A_Comprehensive_Analysis_of_Data_Augmentation_Methods_for_Speech_Emotion_Recognition.pdf
- Size:
- 2.21 MB
- Format:
- Adobe Portable Document Format
