Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://gcris.yasar.edu.tr/handle/123456789/11290

Browse

Now showing 1 - 3 of 3

Citation - WoS: 2
Citation - Scopus: 4
A Comprehensive Analysis of Data Augmentation Methods for Speech Emotion Recognition
(Institute of Electrical and Electronics Engineers Inc., 2025) Umut Avci; Avci, Umut
The limited availability of labeled emotional speech data remains a significant challenge in the development of robust speech emotion recognition systems. This paper presents a comprehensive investigation of the effectiveness of diverse data augmentation strategies for enhancing emotion recognition performance. Three different data augmentation categories were examined: audio-based transformations image-based modifications and feature-level synthesis. Seventeen transformations were used in audio-based data augmentation to change the time and frequency content of the raw audio signal. Eight transformations such as shifting rotating and zooming were applied to the spectrogram images for image-based data augmentation. The SpecAugment method was also used to transform the spectrograms into versions with masked time and frequency axes. In feature-space-based approaches new feature vectors were generated using five oversampling algorithms and a generative adversarial network. Experimental results from the EMO-DB and IEMOCAP datasets demonstrate that the data augmentation approaches enhance emotion classification performance by up to six percent. Empirical evidence indicates that training sets augmented through combinations of audio-based transformations yield the highest performance gains. In contrast the GAN-based approach fails to improve the classification performance. © 2025 Elsevier B.V. All rights reserved.
Citation - WoS: 1
Citation - Scopus: 1
A Pattern Mining Approach for Improving Speech Emotion Recognition
(WORLD SCIENTIFIC PUBL CO PTE LTD, 2022-11) Umut Avci; Avci, Umut
Speech-driven user interfaces are becoming more common in our lives. To interact with such systems naturally and effectively machines need to recognize the emotional states of users and respond to them accordingly. At the heart of the emotion recognition research done to this end lies the emotion representation that enables machines to learn and predict emotions. Speech emotion recognition studies use a wide range of low-to-high-level acoustic features for representation purposes such as LLDs their functionals and BoAW. In this paper we present a new method for extracting a novel set of high-level features for classifying emotions. For this purpose we (1) reduce the dimension of discrete-time speech signals (2) perform a quantization operation on the new signals and assign a distinct symbol to each quantization level (3) use the symbol sequences representing the signals to extract discriminative patterns that are capable of distinguishing different emotions from each other and (4) generate a separate set of features for each emotion from the extracted patterns. Experimental results show that pattern features outperform Energy Voicing MFCC Spectral and RASTA feature sets. We also demonstrate that combining the pattern-based features and the acoustic features further improves the classification performance.
Speech Emotion Recognition Using Spectrogram Patterns as Features
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2020) Umut Avci; Avci, Umut; A. Karpov , R. Potapova
In this paper we tackle the problem of identifying emotions from speech by using features derived from spectrogram patterns. Towards this goal we create a spectrogram for each speech signal. Produced spectrograms are divided into non-overlapping partitions based on different frequency ranges. After performing a discretization operation on each partition we mine partition-specific patterns that discriminate an emotion from all other emotions. A classifier is then trained with features obtained from the extracted patterns. Our experimental evaluations indicate that the spectrogram-based patterns outperform the standard set of acoustic features. It is also shown that the results can further be improved with the increasing number of spectrogram partitions. © 2020 Elsevier B.V. All rights reserved.

Browse

Browsing Scopus İndeksli Yayınlar Koleksiyonu by Institution Author "Avci, Umut (35486827300)"