Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity

dc.contributor.author Arman Savran
dc.contributor.author Savran, Arman
dc.date.accessioned 2025-10-06T17:49:35Z
dc.date.issued 2023
dc.description.abstract The use of visual signals to detect vocally active duration is quite helpful when there is severe acoustic noise or even can be the only option if the audio channel is missing. There has been significant progress in video-based voice activity detection (VAD). On the other hand while recently emerging event camera (EC) technology has demonstrated great benefits for applications in robotics drones autonomous vehicles and mobile devices including visual speech recognition topics it has not been explored to be used as a vision-only VAD front-end. In this work we propose an event intensity-based method by designing a fully convolutional network to efficiently realize an EC-VAD that segments vocally active duration. Efficiency is due to pooling the data over the mouth area reducing the dimensions by totally collapsing local spatial information as well as due to one-stage detection by a fully temporal convolutional network. Experimental evaluations show successful detection of voice activity with about 0.91 area under the receiver operating curve over a dataset including high speech content variability and different types of facial actions. © 2023 Elsevier B.V. All rights reserved.
dc.description.sponsorship Yas¸ar University Project Evaluation Commission, (BAP112)
dc.description.sponsorship Supported by the Yas¸ar University Project Evaluation Commission for the project “Dynamic Facial Analysis with Neuromorphic Camera” [grant number: BAP112].
dc.identifier.doi 10.1109/ASYU58738.2023.10296754
dc.identifier.isbn 9798350306590
dc.identifier.scopus 2-s2.0-85178262391
dc.identifier.uri https://www.scopus.com/inward/record.uri?eid=2-s2.0-85178262391&doi=10.1109%2FASYU58738.2023.10296754&partnerID=40&md5=947d732b1a1eb6c99546515ddb3a5ff0
dc.identifier.uri https://gcris.yasar.edu.tr/handle/123456789/8519
dc.identifier.uri https://doi.org/10.1109/ASYU58738.2023.10296754
dc.language.iso English
dc.publisher Institute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof 2023 Innovations in Intelligent Systems and Applications Conference ASYU 2023
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Event Camera, Fully Convolutional Network, Lip Activity, Visual Speech, Voice Activity Detection, Acoustic Noise, Audio Acoustics, Convolution, Speech Recognition, Audio Channels, Autonomous Vehicles, Camera Technology, Convolutional Networks, Event Camera, Fully Convolutional Network, Lip Activity, Visual Signals, Visual Speech, Voice-activity Detections, Cameras
dc.subject Acoustic noise, Audio acoustics, Convolution, Speech recognition, Audio channels, Autonomous Vehicles, Camera technology, Convolutional networks, Event camera, Fully convolutional network, Lip activity, Visual signals, Visual speech, Voice-activity detections, Cameras
dc.subject Event Camera
dc.subject Visual Speech
dc.subject Fully Convolutional Network
dc.subject Lip Activity
dc.subject Voice Activity Detection
dc.title Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity
dc.type Conference Object
dspace.entity.type Publication
gdc.author.institutional Savran, Arman (14032056900)
gdc.author.scopusid 14032056900
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department
gdc.description.departmenttemp [Savran A.] Yaşar University, Department of Computer Engineering, İzmir, Turkey
gdc.description.endpage 6
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.startpage 1
gdc.identifier.openalex W4388038797
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.5462394E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 4.118417E-9
gdc.oaire.publicfunded false
gdc.openalex.fwci 1.2226
gdc.openalex.normalizedpercentile 0.8
gdc.opencitations.count 5
gdc.plumx.mendeley 11
gdc.plumx.scopuscites 6
gdc.scopus.citedcount 6
gdc.virtual.author Savran, Arman
person.identifier.scopus-author-id Savran- Arman (14032056900)
project.funder.name Supported by the Yas¸ar University Project Evaluation Commission for the project “Dynamic Facial Analysis with Neuromorphic Camera” [grant number: BAP112].
relation.isAuthorOfPublication ec3245ee-803e-4537-8ade-40b369fad1c3
relation.isAuthorOfPublication.latestForDiscovery ec3245ee-803e-4537-8ade-40b369fad1c3
relation.isOrgUnitOfPublication ac5ddece-c76d-476d-ab30-e4d3029dee37
relation.isOrgUnitOfPublication.latestForDiscovery ac5ddece-c76d-476d-ab30-e4d3029dee37

Files