Evaluation of Convolutional Networks for Event Camera Face Pose Alignment

Arman SavranBurhan Burak OralAlptuğ ÇakıcıOral, Burhan BurakÇakıcı, AlptuğSavran, Arman2025-10-222025[1] G. Gallego et al. “Event-Based Vision: A Survey” IEEE Trans. Pattern Anal. Mach. Intell. vol. 44 no. 1 pp. 154–180 Jan. 2022 doi: 10.1109/TPAMI.2020.3008413.[2] G. Tan Y. Wang H. Han Y. Cao F. Wu and Z.-J. Zha “Multi-grained Spatio-Temporal Features Perceived Network for Event-based Lip-Reading” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) New Orleans LA USA: IEEE Jun. 2022 pp. 20062–20071. doi: 10.1109/CVPR52688.2022.01946.[3] G. Moreira A. Graca B. Silva P. Martins and J. Batista “Neuromorphic Event-based Face Identity Recognition” in 2022 26th International Conference on Pattern Recognition (ICPR) Montreal QC Canada: IEEE Aug. 2022 pp. 922–929. doi: 10.1109/ICPR56361.2022.9956236.[4] A. Savran “Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity” in 2023 Innovations in Intelligent Systems and Applications Conference (ASYU) Sivas Turkiye: IEEE Oct. 2023 pp. 1–6. doi: 10.1109/ASYU58738.2023.10296754.[5] A. Savran “Multi-timescale boosting for efficient and improved event camera face pose alignment” Computer Vision and Image Understanding vol. 236 p. 103817 Nov. 2023 doi: 10.1016/j.cviu.2023.103817.[6] A. Savran and C. Bartolozzi “Face Pose Alignment with Event Cameras” Sensors vol. 20 no. 24 p. 7079 Dec. 2020 doi: 10.3390/s20247079.[7] Z.-H. Feng J. Kittler M. Awais and X.-J. Wu “Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks” Int J Comput Vis vol. 128 no. 8–9 pp. 2126–2145 Sep. 2020 doi: 10.1007/s11263-019- 01275-0.[8] H. Jin S. Liao and L. Shao “Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild” Int J Comput Vis vol. 129 no. 12 pp. 3174– 3194 Dec. 2021 doi: 10.1007/s11263-021-01521-4.[9] B. Browatzki and C. Wallraven “3FabRec: Fast Few- Shot Face Alignment by Reconstruction” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle WA USA: IEEE Jun. 2020 pp. 6109–6119. doi: 10.1109/CVPR42600.2020.00615.[10] Y. Sun X. Wang and X. Tang “Deep Convolutional Network Cascade for Facial Point Detection” in 2013 IEEE Conference on Computer Vision and Pattern Recognition Portland OR USA: IEEE Jun. 2013 pp. 3476–3483. doi: 10.1109/CVPR.2013.446.[11] Y. Wu T. Hassner K. Kim G. Medioni and P. Natarajan “Facial Landmark Detection with Tweaked Convolutional Neural Networks” IEEE Trans. Pattern Anal. Mach. Intell. vol. 40 no. 12 pp. 3067–3074 Dec. 2018 doi: 10.1109/TPAMI.2017.2787130.[12] S. Honari P. Molchanov S. Tyree P. Vincent C. Pal and J. Kautz “Improving Landmark Localization with Semi-Supervised Learning” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Salt Lake City UT USA: IEEE Jun. 2018 pp. 1546–1555. doi: 10.1109/CVPR.2018.00167.[13] A. Kumar et al. “LUVLi Face Alignment: Estimating Landmarks’ Location Uncertainty and Visibility Likelihood” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle WA USA: IEEE Jun. 2020 pp. 8233–8243. doi: 10.1109/CVPR42600.2020.00826.[14] X. Dong and Y. Yang “Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) Seoul Korea (South): IEEE Oct. 2019 pp. 783–792. doi: 10.1109/ICCV.2019.00087.[15] X. Wang L. Bo and L. Fuxin “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression” arXiv:1904.07399 [cs] May 2020 Accessed: Mar. 27 2022. [Online]. Available: http://arxiv.org/abs/1904.07399[16] M. Cannici M. Ciccone A. Romanoni and M. Matteucci “Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Long Beach CA USA: IEEE Jun. 2019 pp. 1656–1665. doi: 10.1109/CVPRW.2019.00209.[17] F. Paredes-Valles and G. C. H. E. de Croon “Back to Event Basics: Self-Supervised Learning of Image Reconstruction for Event Cameras via Photometric Constancy” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Nashville TN USA: IEEE Jun. 2021 pp. 3445–3454. doi: 10.1109/CVPR46437.2021.00345.[18] D. Gehrig A. Loquercio K. Derpanis and D. Scaramuzza “End-to-End Learning of Representations for Asynchronous Event-Based Data” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) Seoul Korea (South): IEEE Oct. 2019 pp. 5632–5642. doi: 10.1109/ICCV.2019.00573.[19] E. Perot P. de Tournemire D. Nitti J. Masci and A. Sironi “Learning to Detect Objects with a 1 Megapixel Event Camera.” arXiv Dec. 09 2020. Accessed: Apr. 24 2024. [Online]. Available: http://arxiv.org/abs/2009.13436[20] A. Kugele T. Pfeil M. Pfeiffer and E. Chicca “How Many Events Make an Object? Improving Single-frame Object Detection on the 1 Mpx Dataset” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Vancouver BC Canada: IEEE Jun. 2023 pp. 3913– 3922. doi: 10.1109/CVPRW59228.2023.00406.[21] C. Boretti P. Bich F. Pareschi L. Prono R. Rovatti and G. Setti “PEDRo: an Event-based Dataset for Person Detection in Robotics” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Vancouver BC Canada: IEEE Jun. 2023 pp. 4065–4070. doi: 10.1109/CVPRW59228.2023.00426.[22] G. Goyal F. Di Pietro N. Carissimi A. Glover and C. Bartolozzi “MoveEnet: Online High-Frequency Human Pose Estimation with an Event Camera” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Vancouver BC Canada: IEEE Jun. 2023 pp. 4024– 4033. doi: 10.1109/CVPRW59228.2023.00420.[23] P. R. Gantier Cadena Y. Qian C. Wang and M. Yang “Sparse-E2VID: A Sparse Convolutional Model for Event-Based Video Reconstruction Trained with Real Event Noise” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Vancouver BC Canada: IEEE Jun. 2023 pp. 4150–4158. doi: 10.1109/CVPRW59228.2023.00437.[24] L. Berlincioni et al. “Neuromorphic Event-based Facial Expression Recognition.” arXiv Apr. 13 2023. Accessed: Apr. 24 2024. [Online]. Available: http://arxiv.org/abs/2304.06351[25] H. Bulzomi M. Schweiker A. Gruel and J. Martinet “End-to-end Neuromorphic Lip Reading” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Vancouver BC Canada: IEEE Jun. 2023 pp. 4101– 4108. doi: 10.1109/CVPRW59228.2023.00431.[26] A. Savran R. Tavarone B. Higy L. Badino and C. Bartolozzi “Energy and Computation Efficient Audio- Visual Voice Activity Detection Driven by Event- Cameras” in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) Xi’an: IEEE May 2018 pp. 333–340. doi: 10.1109/FG.2018.00055.[27] A. Savran \"Temporal Convolutional Networks for Efficient Voice Activity Detection with Event Camera\" Journal of Intelligent Systems: Theory and Applications vol. 7 no. 2 pp. 102–115 Sep. 2024 doi: 10.38016/jista.1400047.[28] A. Savran “Comparison of Timing Strategies for Face Pose Alignment with Event Camera” in 2023 8th International Conference on Computer Science and Engineering (UBMK) Burdur Turkiye: IEEE Sep. 2023 pp. 97–101. doi: 10.1109/UBMK59864.2023.10286582.[29] K. He X. Zhang S. Ren and J. Sun “Deep Residual Learning for Image Recognition.” arXiv Dec. 10 2015. Accessed: Jan. 09 2024. [Online]. Available: http://arxiv.org/abs/1512.03385[30] S. Xie R. Girshick P. Dollar Z. Tu and K. He “Aggregated Residual Transformations for Deep Neural Networks” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu HI: IEEE Jul. 2017 pp. 5987–5995. doi: 10.1109/CVPR.2017.634.[31] J. Hu L. Shen and G. Sun \"Squeeze-and-Excitation Networks\" 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Salt Lake City UT USA 2018 pp. 7132-7141 doi: 10.1109/CVPR.2018.00745.[32] A. G. Howard et al. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” arXiv Apr. 16 2017. Accessed: Apr. 29 2024. [Online]. Available: http://arxiv.org/abs/1704.04861[33] M. Sandler A. Howard M. Zhu A. Zhmoginov and L.-C. Chen “MobileNetV2: Inverted Residuals and Linear Bottlenecks” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Salt Lake City UT: IEEE Jun. 2018 pp. 4510–4520. doi: 10.1109/CVPR.2018.00474.2822-238510.21541/apjess.1417068https://gcris.yasar.edu.tr/handle/123456789/10406https://search.trdizin.gov.tr/en/yayin/detay/1315376Event camera offers substantial advantages over conventional video cameras with their efficiency extremely high temporal resolutions low latency and high dynamic range. These benefits have led to applications in various vision domains. Recently they have been applied in facial recognition tasks as well. However while significant advantages of event cameras in some facial processing tasks have been demonstrated the initial stage in almost any task i.e. face alignment is not at par with the conventional cameras. This study investigates the use of face alignment convolutional networks regarding both performance and complexity for event camera processing. Our aim is event camera face pose alignment that can be used as an efficient preprocessor for facial tasks. Therefore we comparatively evaluate simple convolutional coordinate regression with a hybrid of coordinate and heatmap regression known as pixel-in-pixel regression. Our experimental results reveal the superior performance of the hybrid method. However we also show that if there is a computation bottleneck simple convolutional coordinate regression is preferable for their low resource requirements though at the expense of some performance loss.İngilizceinfo:eu-repo/semantics/openAccessBilgisayar Bilimleri, Yapay ZekaGörüntüleme Bilimi Ve Fotoğraf TeknolojisiEvaluation of Convolutional Networks for Event Camera Face Pose AlignmentArticle