Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Avcı, Umut

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

dc.contributor.author	Avcı, Umut
dc.date.accessioned	2026-04-07T11:55:35Z
dc.date.available	2026-04-07T11:55:35Z
dc.date.issued	2025
dc.description.abstract	Bu çalışma, Türkçe konuşmalarda duygu tanıma performansını geliştirmek üzere veri artırma tekniklerinin rolünü incelemekte ve BUEMODB ile ITUDB veri kümelerini temel almaktadır. Konuşmaların sessiz bölümlerin kaldırılması ve ses sinyallerinin normalizasyonu ile gerçekleştirilen ön işleme aşamasının ardından, ses verileri mel spektrogramlara dönüştürülmüş, altı öznitelik seti çıkarılmış ve yedi farklı denetimli öğrenme algoritması kullanılarak temel sınıflandırma yapılmıştır. İlk deneyler sonucunda BUEMODB veri seti için %56,3, ITUDB veri seti için %65,2 F1 skoru elde edilmiştir. Sonraki deneylerde, veri artırma teknikleri kullanılarak eğitim verisi beş kat büyütülmüştür. Bu kapsamda Gürültü Ekleme ve Ses Tonu Değiştirme gibi ses dönüşümlerinin yanı sıra Yakınlaştırma ve Yükseklik Kaydırma gibi görüntü dönüşümleri uygulanmıştır. Ses bazlı tekniklerle veri artırıldığında sınıflandırma başarısı iyileşmiş, Hava Emilimi ve Zaman Ölçekleme kombinasyonu ile F1 skorları BUEMODB için %57,6’ya, ITUDB için %71,3’e çıkmıştır. Görüntü bazlı veri artırma teknikleri daha da yüksek performans göstererek BUEMODB için %60,0’lık, ITUDB için %73,2’lik F1 skorları sağlamıştır. Son olarak, en iyi sonuç veren ses ve görüntü dönüşümlerini birleştiren hibrit bir yaklaşım denenmiştir. Bu yöntemle BUEMODB için %59,7, ITUDB için %75,1 F1 skoruna ulaşılmış ve temel performansa göre yaklaşık %10’luk bir artış kaydedilmiştir. Bulgular, özellikle görüntü ve hibrit tabanlı veri artırma tekniklerinin dikkatlice seçilmesi halinde duygu tanıma doğruluğunun önemli ölçüde yükseltilebileceğini göstermiştir.	tr
dc.description.abstract	This study investigates the effects of data augmentation techniques on emotion recognition in Turkish language speech, utilizing the BUEMODB and ITUDB datasets. Following the preprocessing phase, which involved the removal of silent segments and normalization of audio signals, baseline classification was established by converting audio into mel spectrograms, extracting six feature sets, and employing seven machine learning classifiers. The initial results indicated baseline F1 scores of 56.3% for the BUEMODB dataset and 65.2% for the ITUDB dataset. In subsequent experiments, data augmentation techniques were implemented to expand the training data fivefold through various audio transformations, such as Noise Injection and Pitch Shift, alongside image transformations including Zoom Range and Height Shift Range. The application of audio-based augmentation yielded improved classification outcomes, with BUEMODB achieving an accuracy of 57.6% and ITUDB reaching 71.3% when Air Absorption and Time Stretch were employed in combination. Furthermore, image-based augmentation contributed to enhanced performance, resulting in scores of 60.0% for BUEMODB and 73.2% for ITUDB. Ultimately, a hybrid approach was explored, integrating the highest-performing audio and image transformations. This approach led to F1 scores of 59.7% for BUEMODB and 75.1% for ITUDB, reflecting nearly a 10% improvement over baseline performance. The findings underscore that meticulously selected data augmentation techniques, particularly those that are image-based and hybrid, can significantly improve the accuracy of emotion recognition while mitigating the drawbacks associated with excessive transformations.	en_US
dc.identifier.doi	10.21205/deufmd.2025278104
dc.identifier.issn	1302-9304
dc.identifier.issn	2547-958X
dc.identifier.uri	https://hdl.handle.net/123456789/14127
dc.identifier.uri	https://search.trdizin.gov.tr/en/yayin/detay/1357042
dc.language.iso	tr
dc.relation.ispartof	Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Bilgisayar Bilimleri, Yapay Zeka
dc.title	Duygusal Konuşma Tanımada Yapay Veri Kullanımı	tr
dc.title	Artificial Data Usage in Recognizing Emotional Speech	en_US
dc.type	Article
dspace.entity.type	Publication
gdc.author.id	0000-0002-7433-8704
gdc.author.institutional	Avcı, Umut
gdc.bip.impulseclass	C5
gdc.bip.influenceclass	C5
gdc.bip.popularityclass	C5
gdc.collaboration.industrial	false
gdc.description.department
gdc.description.departmenttemp	[Avcı, Umut] Yaşar Üniversitesi
gdc.description.endpage	375
gdc.description.issue	81
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.startpage	359
gdc.description.volume	27
gdc.identifier.openalex	W4414587811
gdc.identifier.trdizinid	1357042
gdc.index.type	TR-Dizin
gdc.oaire.accesstype	GOLD
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	0.0
gdc.oaire.influence	2.3811355E-9
gdc.oaire.isgreen	false
gdc.oaire.popularity	2.5970819E-9
gdc.oaire.publicfunded	false
gdc.openalex.collaboration	National
gdc.openalex.fwci	0.0
gdc.openalex.normalizedpercentile	0.34
gdc.opencitations.count	0
gdc.plumx.mendeley	2
gdc.virtual.author	Avci, Umut
relation.isAuthorOfPublication	eef28c59-7e0f-4fac-8038-3669108015bc
relation.isAuthorOfPublication.latestForDiscovery	eef28c59-7e0f-4fac-8038-3669108015bc
relation.isOrgUnitOfPublication	ac5ddece-c76d-476d-ab30-e4d3029dee37
relation.isOrgUnitOfPublication.latestForDiscovery	ac5ddece-c76d-476d-ab30-e4d3029dee37

Collections

TR-Dizin İndeksli Yayınlar Koleksiyonu

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Files

Collections