Savran, Arman
Loading...
Name Variants
Job Title
Doç.Dr.
Email Address
Main Affiliation
01.01.09.01. Bilgisayar Mühendisliği Bölümü
Status
Current Staff
Website
ORCID ID
Scopus Author ID
Turkish CoHE Profile ID
Google Scholar ID
WoS Researcher ID
Sustainable Development Goals
1NO POVERTY
0
Research Products
2ZERO HUNGER
0
Research Products
3GOOD HEALTH AND WELL-BEING
0
Research Products
4QUALITY EDUCATION
0
Research Products
5GENDER EQUALITY
0
Research Products
6CLEAN WATER AND SANITATION
0
Research Products
7AFFORDABLE AND CLEAN ENERGY
1
Research Products
8DECENT WORK AND ECONOMIC GROWTH
0
Research Products
9INDUSTRY, INNOVATION AND INFRASTRUCTURE
0
Research Products
10REDUCED INEQUALITIES
0
Research Products
11SUSTAINABLE CITIES AND COMMUNITIES
0
Research Products
12RESPONSIBLE CONSUMPTION AND PRODUCTION
0
Research Products
13CLIMATE ACTION
0
Research Products
14LIFE BELOW WATER
0
Research Products
15LIFE ON LAND
0
Research Products
16PEACE, JUSTICE AND STRONG INSTITUTIONS
0
Research Products
17PARTNERSHIPS FOR THE GOALS
0
Research Products

Documents
24
Citations
1397
h-index
13

Documents
19
Citations
1057

Scholarly Output
9
Articles
4
Views / Downloads
0/0
Supervised MSc Theses
3
Supervised PhD Theses
0
WoS Citation Count
11
Scopus Citation Count
24
Patents
0
Projects
0
WoS Citations per Publication
1.22
Scopus Citations per Publication
2.67
Open Access Source
3
Supervised Theses
3
| Journal | Count |
|---|---|
| 2023 Innovations in Intelligent Systems and Applications Conference ASYU 2023 | 1 |
| 8th International Conference on Computer Science and Engineering UBMK 2023 | 1 |
| Academic Platform Journal of Engineering and Smart Systems | 1 |
| Computer Vision and Image Understanding | 1 |
| Journal of Intelligent Systems: Theory and Applications | 1 |
Current Page: 1 / 2
Scopus Quartile Distribution
Quartile distribution chart data is not available
Competency Cloud

9 results
Scholarly Output Search Results
Now showing 1 - 9 of 9
Master Thesis Saptayıcı-güdümlü konuşma arka planı gürültüsünün evrişimsel ağlar ile giderilmesi(2022) Ayar, Cem; Savran, ArmanKonuşma arka planı gürültüsü, çevrimiçi toplantıların ve canlı internet yayınlarının artan popülaritesi ile özelikle önem teşkil eden, yaygın bir sorundur. Son zamanlarda, Derin Sinir Ağlarının (DSA), geniş bir yelpazedeki arka plan gürültü çeşitlerinin bastırılmasında, birden fazla mikrofon gerektirmeden yüksek başarı elde ettiği gösterilmiştir. Ancak, ciddi kaynak tüketen böyle derin ağlar birçok gerçek hayat uygulamasının pahalı, külfetli veya bazen kullanışsız olmasına yol açar. Bu tez, problemi hafifletmek için, yüksek başarımlı bir DSA'yı, kayda değer gürültü olmayan zamanlarda devre dışı bırakan, yani saptayıcı-güdümlü bir gürültü giderme yaklaşımı ile, bir çözüm önermektedir. İlk olarak, Conv-TasNet olarak bilinen zaman alanında çalışan modern bir evrişimsel sinir ağı (ESA), verimlilik ve başarımına göre eniyilenmiştir. Sonra, ESA-temelli bir gürültülü konuşma saptayıcı tasarlanmış ve farklı büyüklük ve çözünürlük varyasyonları ile saptayıcı-güdümlü tasarı için değerlendirilmiştir. Optimum saptayıcının, optimum Conv-TasNet'in hesaplama yükünün sadece %2'sine sahip olduğu ve çok düşük gürültülü konuşma ıskalama oranı ile sadece ihmal edilebilir bir başarım düşüşüne neden olduğu bulunmuştur. Böylece, bu önemsiz hesaplama yükü ile başarılı bir şekilde gürültülü konuşma saptayarak, saptayıcı-güdümlü yaklaşımımızın muhtemel önemli verimlilik kazanımları için kullanılabileceğini doğruladık. Bu verimlilik kazanımı gürültü oluşma olasılığı ile ters orantılıdır. Bunun yanında, zaten temiz olan konuşmanın otomatik olarak tanımlanmasıyla, ara sıra oluşan işleme kusurlarının yol açtığı hafif bozulmalardan sakınılabileceğini de gösterdik.Master Thesis Olay kamerası ile yüz pozu hızalama için evrişimsel ağların kullanılması(2024) Oral, Burhan Burak; Savran, ArmanEvent camera offers substantial advantages over conventional video cameras with their efficiency, extremely high temporal resolutions, low latency, and high dynamic range. These benefits have led to applications in various vision domains. Recently they have been applied in facial recognition tasks as well. However, while significant advantages of event cameras in some facial processing tasks have been demonstrated, the initial stage in almost any task, i.e., face alignment, is not at par with the conventional cameras. This study investigates the use of face alignment convolutional networks regarding both performance and complexity for event camera processing. Our aim is event camera face pose alignment that can be used as an efficient preprocessor for facial tasks. Therefore, we comparatively evaluate simple convolutional coordinate regression with a hybrid of coordinate and heatmap regression, known as pixel-in-pixel regression. Our experimental results reveal the superior performance of the hybrid method. However, we also show that if there is a computation bottleneck, simple convolutional coordinate regression is preferable for their low resource requirements though at the expense of some performance loss.Article Olay Kamerası ile Verimli Konuşma Sesi Tespiti için Zamansal Evrişimsel Ağlar(2024) Arman Savran; Savran, ArmanKonuşma sesi tespiti (KST) insan bilgisayar arayüzleri için yaygın olarak kullanılan gerekli bir ön-işlemedir. Karmaşık akustik arka plan gürültülerinin varlığı büyük derin sinir ağlarının ağır hesaplama yükü pahasına kullanımlarını gerekli kılmaktadır. Görü yoluyla KST ise arka plan gürültüsü problemi olmadığından tercih edilebilen alternatif bir yaklaşımdır. Görü kanalı ses verisine erişimin mümkün olmadığı durumlarda ise zaten tek seçenektir. Ancak genelde uzun süreler aralıksız çalışması beklenen görsel KST video kamerası donanım ve video verisi işleme gereksinimlerinden dolayı önemli enerji sarfiyatına sebep olur. Bu çalışmada görü yoluyla KST için nöromorfik teknoloji sayesinde verimliliği geleneksel video kameradan oldukça yüksek olan olay kamerasının kullanımı incelenmiştir. Olay kamerasının yüksek zaman çözünürlüklerinde algılama yapması sayesinde uzamsal boyut tamamen indirgenerek sadece zaman boyutundaki örüntülerin öğrenilmesine dayanan son derece hafif fakat başarılı modeller tasarlanmıştır. Tasarımlar zamansal alıcı alan genişlikleri gözetilerek farklı evrişim genleştirme tiplerinin aşağı-örnekleme yöntemlerinin ve evrişim ayırma tekniklerinin bileşimleri ile yapılır. Deneylerde KST’nin çeşitli yüz eylemleri karşısındaki dayanıklıkları ölçülmüştür. Sonuçlar aşağı-örneklemenin yüksek başarım ve verimlilik için gerekli olduğunu ve bunun için maksimum-havuzlamanın adımlı evrişim yöntemiyle aşağı-örnekleme yapmaktan daha üstün başarım elde ettiğini göstermektedir. Bu şekilde üstün başarımlı standart tasarım 1.57 milyon kayan nokta işlemle (MFLOPS) çalışır. Evrişim genleştirmesinin sabit bir faktörle yapılıp aşağı-alt örnekleme ile birleştirilmesiyle de benzer başarımla işlem gereksiniminin yarıdan fazla azaldığı bulunmuştur. Ayrıca derinlemesine ayrışım da uygulanarak işlem gereksinimi 0.30 MFLOPS’a yani standart modelin beşte birinden daha aşağısına indirilmiştir.Article Citation - WoS: 1Citation - Scopus: 2Multi-timescale boosting for efficient and improved event camera face pose alignment(ACADEMIC PRESS INC ELSEVIER SCIENCE, 2023) Arman Savran; Savran, ArmanThe success of event camera (EC) vision in certain types of applications has been steadily shown thanks to energy-efficient sparse sensing high dynamic range and extremely high temporal resolution. However the utilization of ECs for facial processing tasks has remained rather limited. To enable high energy efficiency for large face pose alignment which is a crucial facial pre-processing stage we aim at leveraging EC by effective adaptation of the processing rate proportional to facial movement intensity. For this purpose we propose a novel alternative to the commonly employed constant time frame and event count frame strategies which combines their advantages and provides the benefits of supervised learning. This is realized by a multi-timescale boosting framework that can generate highly sparse pose-events at a variable rate via detection-based online timescale selection. Although detectors of multiple scales with boosted sensitivities operate as a cascade our method provides minimal delay essential for real-time applications. Comprehensive evaluations show that the proposed multi-timescale processing substantially improves the performance-efficiency trade-off over singletimescale frames and markedly over event count frames. Mega-floating-point-operations-per-second ranges from 2.5 at the moderate motion clips to 6.5 at the intense motion clips with negligible computation in the absence of activity. Also alignment errors are considerably reduced by online selection of small timescales at fast head motion and of bigger timescales at slower motion or local activity of lips and eyes. Being orthogonal and complementary to spatial domain techniques the proposed approach can also be conveniently integrated with future advances for further performance/efficiency improvements or for alignment extensions.Master Thesis Olay kamerası için yinelemeli evrişimsel ağ yoluyla tümleşik yüz ve referans noktası yeri belirleme(2025) Kılıç, Giray; Savran, ArmanThe event camera's high temporal resolution, low power consumption, and wide dynamic range make it increasingly popular in robotics, surveillance, and emerging facial applications such as identity recognition, driver monitoring, and visual speech recognition. Localizing the face and landmarks is an essential first step in facial applications. We employ a joint network with a multi-task loss to realize both tasks, avoiding the need for redundant separate models. This is achieved through the multi-task head attached to the neck, which includes a context module designed with deformable convolutions to accommodate the non-rigid variability of facial shapes. The necks are connected to the feature pyramid network (FPN), which receives input from the recurrent convolutional network layers. By experimenting with two datasets of varying characteristics, the ECFacePose and FES datasets, we demonstrate that FPN with spatio-temporal features outperforms previous face localization approaches, achieving superior landmark localization and effective small face detection. Our experiments confirm the performance benefits of the deformable convolution-based context module, temporal consistency loss, and the Rectified Wing Loss. Furthermore, we explore 12 convolutional backbones, categorizing them into lightweight, middleweight, and heavyweight classes, and demonstrate that the middleweight InceptionV3 and DenseNet backbones deliver impressive performance-efficiency trade-offs. Our study illustrates that while FPN is crucial for landmark and small face detection and enhances larger face detection, it also increases FLOPs fourfold and doubles memory usage.Article Citation - WoS: 10Citation - Scopus: 15Face pose alignment with event cameras(MDPI AG, 2020) Arman Savran; Chiara Bartolozzi; Bartolozzi, Chiara; Savran, ArmanEvent camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency high dynamic range and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications. © 2020 Elsevier B.V. All rights reserved.Article Evaluation of Convolutional Networks for Event Camera Face Pose Alignment(2025) Arman Savran; Burhan Burak Oral; Alptuğ Çakıcı; Oral, Burhan Burak; Çakıcı, Alptuğ; Savran, ArmanEvent camera offers substantial advantages over conventional video cameras with their efficiency extremely high temporal resolutions low latency and high dynamic range. These benefits have led to applications in various vision domains. Recently they have been applied in facial recognition tasks as well. However while significant advantages of event cameras in some facial processing tasks have been demonstrated the initial stage in almost any task i.e. face alignment is not at par with the conventional cameras. This study investigates the use of face alignment convolutional networks regarding both performance and complexity for event camera processing. Our aim is event camera face pose alignment that can be used as an efficient preprocessor for facial tasks. Therefore we comparatively evaluate simple convolutional coordinate regression with a hybrid of coordinate and heatmap regression known as pixel-in-pixel regression. Our experimental results reveal the superior performance of the hybrid method. However we also show that if there is a computation bottleneck simple convolutional coordinate regression is preferable for their low resource requirements though at the expense of some performance loss.Conference Object Citation - Scopus: 1Comparison of Timing Strategies for Face Pose Alignment with Event Camera, Olay Kamerasiile Y zPozu Hizalama i in Zamanlama Stratejilerin Karilatirilmasi(Institute of Electrical and Electronics Engineers Inc., 2023) Arman Savran; Savran, ArmanEvent camera which has recently started to increase in use can surpass the traditional camera in certain areas with their efficiency l e vel o f d e tail i n t h e t i me d i mension and high dynamic range. In order to be able to process vision data practically with the event camera first o f a l l t i me intervals must be determined to transform the pixel-event data into a structure suitable for processing. For this purpose two basic timing strategies are applied in the literature: constant time frame and constant event count frame. This study addresses the comparison of these two approaches in order to determine the appropriate time intervals in the event camera face pose alignment problem. Experimental results showed that the constant event count strategy was superior in terms of face pose alignment performance and efficiency. I t w as a l so s een t hat t he t ime frame midpoint achieved a lower error rate than the median and mean timing. © 2023 Elsevier B.V. All rights reserved.Conference Object Citation - Scopus: 6Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity(Institute of Electrical and Electronics Engineers Inc., 2023) Arman Savran; Savran, ArmanThe use of visual signals to detect vocally active duration is quite helpful when there is severe acoustic noise or even can be the only option if the audio channel is missing. There has been significant progress in video-based voice activity detection (VAD). On the other hand while recently emerging event camera (EC) technology has demonstrated great benefits for applications in robotics drones autonomous vehicles and mobile devices including visual speech recognition topics it has not been explored to be used as a vision-only VAD front-end. In this work we propose an event intensity-based method by designing a fully convolutional network to efficiently realize an EC-VAD that segments vocally active duration. Efficiency is due to pooling the data over the mouth area reducing the dimensions by totally collapsing local spatial information as well as due to one-stage detection by a fully temporal convolutional network. Experimental evaluations show successful detection of voice activity with about 0.91 area under the receiver operating curve over a dataset including high speech content variability and different types of facial actions. © 2023 Elsevier B.V. All rights reserved.

