Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Avcı, Umut

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Date

2025

Authors

Avcı, Umut

Open Access Color

GOLD

Green Open Access

No

Publicly Funded

No

Impulse

Average

Influence

Average

Popularity

Average

Abstract

Bu çalışma, Türkçe konuşmalarda duygu tanıma performansını geliştirmek üzere veri artırma tekniklerinin rolünü incelemekte ve BUEMODB ile ITUDB veri kümelerini temel almaktadır. Konuşmaların sessiz bölümlerin kaldırılması ve ses sinyallerinin normalizasyonu ile gerçekleştirilen ön işleme aşamasının ardından, ses verileri mel spektrogramlara dönüştürülmüş, altı öznitelik seti çıkarılmış ve yedi farklı denetimli öğrenme algoritması kullanılarak temel sınıflandırma yapılmıştır. İlk deneyler sonucunda BUEMODB veri seti için %56,3, ITUDB veri seti için %65,2 F1 skoru elde edilmiştir. Sonraki deneylerde, veri artırma teknikleri kullanılarak eğitim verisi beş kat büyütülmüştür. Bu kapsamda Gürültü Ekleme ve Ses Tonu Değiştirme gibi ses dönüşümlerinin yanı sıra Yakınlaştırma ve Yükseklik Kaydırma gibi görüntü dönüşümleri uygulanmıştır. Ses bazlı tekniklerle veri artırıldığında sınıflandırma başarısı iyileşmiş, Hava Emilimi ve Zaman Ölçekleme kombinasyonu ile F1 skorları BUEMODB için %57,6’ya, ITUDB için %71,3’e çıkmıştır. Görüntü bazlı veri artırma teknikleri daha da yüksek performans göstererek BUEMODB için %60,0’lık, ITUDB için %73,2’lik F1 skorları sağlamıştır. Son olarak, en iyi sonuç veren ses ve görüntü dönüşümlerini birleştiren hibrit bir yaklaşım denenmiştir. Bu yöntemle BUEMODB için %59,7, ITUDB için %75,1 F1 skoruna ulaşılmış ve temel performansa göre yaklaşık %10’luk bir artış kaydedilmiştir. Bulgular, özellikle görüntü ve hibrit tabanlı veri artırma tekniklerinin dikkatlice seçilmesi halinde duygu tanıma doğruluğunun önemli ölçüde yükseltilebileceğini göstermiştir.
This study investigates the effects of data augmentation techniques on emotion recognition in Turkish language speech, utilizing the BUEMODB and ITUDB datasets. Following the preprocessing phase, which involved the removal of silent segments and normalization of audio signals, baseline classification was established by converting audio into mel spectrograms, extracting six feature sets, and employing seven machine learning classifiers. The initial results indicated baseline F1 scores of 56.3% for the BUEMODB dataset and 65.2% for the ITUDB dataset. In subsequent experiments, data augmentation techniques were implemented to expand the training data fivefold through various audio transformations, such as Noise Injection and Pitch Shift, alongside image transformations including Zoom Range and Height Shift Range. The application of audio-based augmentation yielded improved classification outcomes, with BUEMODB achieving an accuracy of 57.6% and ITUDB reaching 71.3% when Air Absorption and Time Stretch were employed in combination. Furthermore, image-based augmentation contributed to enhanced performance, resulting in scores of 60.0% for BUEMODB and 73.2% for ITUDB. Ultimately, a hybrid approach was explored, integrating the highest-performing audio and image transformations. This approach led to F1 scores of 59.7% for BUEMODB and 75.1% for ITUDB, reflecting nearly a 10% improvement over baseline performance. The findings underscore that meticulously selected data augmentation techniques, particularly those that are image-based and hybrid, can significantly improve the accuracy of emotion recognition while mitigating the drawbacks associated with excessive transformations.

ORCID

0000-0002-7433-8704

Keywords

Bilgisayar Bilimleri, Yapay Zeka

OpenCitations Citation Count

N/A

Source

Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi

Volume

27

Issue

81

Start Page

359

End Page

375

URI

https://hdl.handle.net/123456789/14127
https://search.trdizin.gov.tr/en/yayin/detay/1357042

Collections

TR-Dizin İndeksli Yayınlar Koleksiyonu

PlumX Metrics

Captures

Mendeley Readers : 2

Full item page

Google Scholar™

Check

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

BIP! Indicators

Research Projects

Journal Issue

Abstract

Description

ORCID

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Captures

Google Scholar™

OpenAlex FWCI

0.0

Sustainable Development Goals