Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity

Arman Savran; Savran, Arman

Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity

dc.contributor.author	Arman Savran
dc.contributor.author	Savran, Arman
dc.date.accessioned	2025-10-06T17:49:35Z
dc.date.issued	2023
dc.description.abstract	The use of visual signals to detect vocally active duration is quite helpful when there is severe acoustic noise or even can be the only option if the audio channel is missing. There has been significant progress in video-based voice activity detection (VAD). On the other hand while recently emerging event camera (EC) technology has demonstrated great benefits for applications in robotics drones autonomous vehicles and mobile devices including visual speech recognition topics it has not been explored to be used as a vision-only VAD front-end. In this work we propose an event intensity-based method by designing a fully convolutional network to efficiently realize an EC-VAD that segments vocally active duration. Efficiency is due to pooling the data over the mouth area reducing the dimensions by totally collapsing local spatial information as well as due to one-stage detection by a fully temporal convolutional network. Experimental evaluations show successful detection of voice activity with about 0.91 area under the receiver operating curve over a dataset including high speech content variability and different types of facial actions. © 2023 Elsevier B.V. All rights reserved.
dc.description.sponsorship	Yas¸ar University Project Evaluation Commission, (BAP112)
dc.description.sponsorship	Supported by the Yas¸ar University Project Evaluation Commission for the project “Dynamic Facial Analysis with Neuromorphic Camera” [grant number: BAP112].
dc.identifier.doi	10.1109/ASYU58738.2023.10296754
dc.identifier.isbn	9798350306590
dc.identifier.scopus	2-s2.0-85178262391
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85178262391&doi=10.1109%2FASYU58738.2023.10296754&partnerID=40&md5=947d732b1a1eb6c99546515ddb3a5ff0
dc.identifier.uri	https://gcris.yasar.edu.tr/handle/123456789/8519
dc.identifier.uri	https://doi.org/10.1109/ASYU58738.2023.10296754
dc.language.iso	English
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof	2023 Innovations in Intelligent Systems and Applications Conference ASYU 2023
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Event Camera, Fully Convolutional Network, Lip Activity, Visual Speech, Voice Activity Detection, Acoustic Noise, Audio Acoustics, Convolution, Speech Recognition, Audio Channels, Autonomous Vehicles, Camera Technology, Convolutional Networks, Event Camera, Fully Convolutional Network, Lip Activity, Visual Signals, Visual Speech, Voice-activity Detections, Cameras
dc.subject	Acoustic noise, Audio acoustics, Convolution, Speech recognition, Audio channels, Autonomous Vehicles, Camera technology, Convolutional networks, Event camera, Fully convolutional network, Lip activity, Visual signals, Visual speech, Voice-activity detections, Cameras
dc.subject	Event Camera
dc.subject	Visual Speech
dc.subject	Fully Convolutional Network
dc.subject	Lip Activity
dc.subject	Voice Activity Detection
dc.title	Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity
dc.type	Conference Object
dspace.entity.type	Publication
gdc.author.institutional	Savran, Arman (14032056900)
gdc.author.scopusid	14032056900
gdc.bip.impulseclass	C5
gdc.bip.influenceclass	C5
gdc.bip.popularityclass	C4
gdc.coar.type	text::conference output
gdc.collaboration.industrial	false
gdc.description.department
gdc.description.departmenttemp	[Savran A.] Yaşar University, Department of Computer Engineering, İzmir, Turkey
gdc.description.endpage	6
gdc.description.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.startpage	1
gdc.identifier.openalex	W4388038797
gdc.index.type	Scopus
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	3.0
gdc.oaire.influence	2.5462394E-9
gdc.oaire.isgreen	false
gdc.oaire.popularity	4.118417E-9
gdc.oaire.publicfunded	false
gdc.openalex.fwci	1.2226
gdc.openalex.normalizedpercentile	0.8
gdc.opencitations.count	5
gdc.plumx.mendeley	11
gdc.plumx.scopuscites	6
gdc.scopus.citedcount	6
gdc.virtual.author	Savran, Arman
person.identifier.scopus-author-id	Savran- Arman (14032056900)
project.funder.name	Supported by the Yas¸ar University Project Evaluation Commission for the project “Dynamic Facial Analysis with Neuromorphic Camera” [grant number: BAP112].
relation.isAuthorOfPublication	ec3245ee-803e-4537-8ade-40b369fad1c3
relation.isAuthorOfPublication.latestForDiscovery	ec3245ee-803e-4537-8ade-40b369fad1c3
relation.isOrgUnitOfPublication	ac5ddece-c76d-476d-ab30-e4d3029dee37
relation.isOrgUnitOfPublication.latestForDiscovery	ac5ddece-c76d-476d-ab30-e4d3029dee37

Collections

Scopus İndeksli Yayınlar Koleksiyonu

Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity

Files

Collections