Savran, Arman

Savran, Arman

Profile URL

https://gcris.yasar.edu.tr/handle/123456789/12592

Job Title

Doç.Dr.

Main Affiliation

01.01.09.01. Bilgisayar Mühendisliği Bölümü

Status

Current Staff

Sustainable Development Goals

1

NO POVERTY

0

Research Products

2

ZERO HUNGER

0

Research Products

3

GOOD HEALTH AND WELL-BEING

0

Research Products

4

QUALITY EDUCATION

0

Research Products

5

GENDER EQUALITY

0

Research Products

6

CLEAN WATER AND SANITATION

0

Research Products

7

AFFORDABLE AND CLEAN ENERGY

1

Research Products

8

DECENT WORK AND ECONOMIC GROWTH

0

Research Products

9

INDUSTRY, INNOVATION AND INFRASTRUCTURE

0

Research Products

10

REDUCED INEQUALITIES

0

Research Products

11

SUSTAINABLE CITIES AND COMMUNITIES

0

Research Products

12

RESPONSIBLE CONSUMPTION AND PRODUCTION

0

Research Products

13

CLIMATE ACTION

0

Research Products

14

LIFE BELOW WATER

0

Research Products

15

LIFE ON LAND

0

Research Products

16

PEACE, JUSTICE AND STRONG INSTITUTIONS

0

Research Products

17

PARTNERSHIPS FOR THE GOALS

0

Research Products

Documents

24

Citations

1397

h-index

13

Go to Scopus profile

Documents

19

Citations

1057

Go to WoS profile

Scholarly Output

9

Articles

4

Views / Downloads

0/0

Supervised MSc Theses

3

Supervised PhD Theses

0

WoS Citation Count

11

Scopus Citation Count

24

Patents

0

Projects

0

WoS Citations per Publication

1.22

Scopus Citations per Publication

2.67

Open Access Source

3

Supervised Theses

3

Journal	Count
2023 Innovations in Intelligent Systems and Applications Conference ASYU 2023	1
8th International Conference on Computer Science and Engineering UBMK 2023	1
Academic Platform Journal of Engineering and Smart Systems	1
Computer Vision and Image Understanding	1
Journal of Intelligent Systems: Theory and Applications	1

Page Size:

Current Page: 1 / 2

Scopus Quartile Distribution

Quartile distribution chart data is not available

Competency Cloud

Scholarly Output Search Results

Now showing 1 - 9 of 9

Saptayıcı-güdümlü konuşma arka planı gürültüsünün evrişimsel ağlar ile giderilmesi
(2022) Ayar, Cem; Savran, Arman
Konuşma arka planı gürültüsü, çevrimiçi toplantıların ve canlı internet yayınlarının artan popülaritesi ile özelikle önem teşkil eden, yaygın bir sorundur. Son zamanlarda, Derin Sinir Ağlarının (DSA), geniş bir yelpazedeki arka plan gürültü çeşitlerinin bastırılmasında, birden fazla mikrofon gerektirmeden yüksek başarı elde ettiği gösterilmiştir. Ancak, ciddi kaynak tüketen böyle derin ağlar birçok gerçek hayat uygulamasının pahalı, külfetli veya bazen kullanışsız olmasına yol açar. Bu tez, problemi hafifletmek için, yüksek başarımlı bir DSA'yı, kayda değer gürültü olmayan zamanlarda devre dışı bırakan, yani saptayıcı-güdümlü bir gürültü giderme yaklaşımı ile, bir çözüm önermektedir. İlk olarak, Conv-TasNet olarak bilinen zaman alanında çalışan modern bir evrişimsel sinir ağı (ESA), verimlilik ve başarımına göre eniyilenmiştir. Sonra, ESA-temelli bir gürültülü konuşma saptayıcı tasarlanmış ve farklı büyüklük ve çözünürlük varyasyonları ile saptayıcı-güdümlü tasarı için değerlendirilmiştir. Optimum saptayıcının, optimum Conv-TasNet'in hesaplama yükünün sadece %2'sine sahip olduğu ve çok düşük gürültülü konuşma ıskalama oranı ile sadece ihmal edilebilir bir başarım düşüşüne neden olduğu bulunmuştur. Böylece, bu önemsiz hesaplama yükü ile başarılı bir şekilde gürültülü konuşma saptayarak, saptayıcı-güdümlü yaklaşımımızın muhtemel önemli verimlilik kazanımları için kullanılabileceğini doğruladık. Bu verimlilik kazanımı gürültü oluşma olasılığı ile ters orantılıdır. Bunun yanında, zaten temiz olan konuşmanın otomatik olarak tanımlanmasıyla, ara sıra oluşan işleme kusurlarının yol açtığı hafif bozulmalardan sakınılabileceğini de gösterdik.
Olay kamerası ile yüz pozu hızalama için evrişimsel ağların kullanılması
(2024) Oral, Burhan Burak; Savran, Arman
Event camera offers substantial advantages over conventional video cameras with their efficiency, extremely high temporal resolutions, low latency, and high dynamic range. These benefits have led to applications in various vision domains. Recently they have been applied in facial recognition tasks as well. However, while significant advantages of event cameras in some facial processing tasks have been demonstrated, the initial stage in almost any task, i.e., face alignment, is not at par with the conventional cameras. This study investigates the use of face alignment convolutional networks regarding both performance and complexity for event camera processing. Our aim is event camera face pose alignment that can be used as an efficient preprocessor for facial tasks. Therefore, we comparatively evaluate simple convolutional coordinate regression with a hybrid of coordinate and heatmap regression, known as pixel-in-pixel regression. Our experimental results reveal the superior performance of the hybrid method. However, we also show that if there is a computation bottleneck, simple convolutional coordinate regression is preferable for their low resource requirements though at the expense of some performance loss.
Olay Kamerası ile Verimli Konuşma Sesi Tespiti için Zamansal Evrişimsel Ağlar
(2024) Arman Savran; Savran, Arman
Konuşma sesi tespiti (KST) insan bilgisayar arayüzleri için yaygın olarak kullanılan gerekli bir ön-işlemedir. Karmaşık akustik arka plan gürültülerinin varlığı büyük derin sinir ağlarının ağır hesaplama yükü pahasına kullanımlarını gerekli kılmaktadır. Görü yoluyla KST ise arka plan gürültüsü problemi olmadığından tercih edilebilen alternatif bir yaklaşımdır. Görü kanalı ses verisine erişimin mümkün olmadığı durumlarda ise zaten tek seçenektir. Ancak genelde uzun süreler aralıksız çalışması beklenen görsel KST video kamerası donanım ve video verisi işleme gereksinimlerinden dolayı önemli enerji sarfiyatına sebep olur. Bu çalışmada görü yoluyla KST için nöromorfik teknoloji sayesinde verimliliği geleneksel video kameradan oldukça yüksek olan olay kamerasının kullanımı incelenmiştir. Olay kamerasının yüksek zaman çözünürlüklerinde algılama yapması sayesinde uzamsal boyut tamamen indirgenerek sadece zaman boyutundaki örüntülerin öğrenilmesine dayanan son derece hafif fakat başarılı modeller tasarlanmıştır. Tasarımlar zamansal alıcı alan genişlikleri gözetilerek farklı evrişim genleştirme tiplerinin aşağı-örnekleme yöntemlerinin ve evrişim ayırma tekniklerinin bileşimleri ile yapılır. Deneylerde KST’nin çeşitli yüz eylemleri karşısındaki dayanıklıkları ölçülmüştür. Sonuçlar aşağı-örneklemenin yüksek başarım ve verimlilik için gerekli olduğunu ve bunun için maksimum-havuzlamanın adımlı evrişim yöntemiyle aşağı-örnekleme yapmaktan daha üstün başarım elde ettiğini göstermektedir. Bu şekilde üstün başarımlı standart tasarım 1.57 milyon kayan nokta işlemle (MFLOPS) çalışır. Evrişim genleştirmesinin sabit bir faktörle yapılıp aşağı-alt örnekleme ile birleştirilmesiyle de benzer başarımla işlem gereksiniminin yarıdan fazla azaldığı bulunmuştur. Ayrıca derinlemesine ayrışım da uygulanarak işlem gereksinimi 0.30 MFLOPS’a yani standart modelin beşte birinden daha aşağısına indirilmiştir.
Citation - WoS: 1
Citation - Scopus: 2
Multi-timescale boosting for efficient and improved event camera face pose alignment
(ACADEMIC PRESS INC ELSEVIER SCIENCE, 2023) Arman Savran; Savran, Arman
The success of event camera (EC) vision in certain types of applications has been steadily shown thanks to energy-efficient sparse sensing high dynamic range and extremely high temporal resolution. However the utilization of ECs for facial processing tasks has remained rather limited. To enable high energy efficiency for large face pose alignment which is a crucial facial pre-processing stage we aim at leveraging EC by effective adaptation of the processing rate proportional to facial movement intensity. For this purpose we propose a novel alternative to the commonly employed constant time frame and event count frame strategies which combines their advantages and provides the benefits of supervised learning. This is realized by a multi-timescale boosting framework that can generate highly sparse pose-events at a variable rate via detection-based online timescale selection. Although detectors of multiple scales with boosted sensitivities operate as a cascade our method provides minimal delay essential for real-time applications. Comprehensive evaluations show that the proposed multi-timescale processing substantially improves the performance-efficiency trade-off over singletimescale frames and markedly over event count frames. Mega-floating-point-operations-per-second ranges from 2.5 at the moderate motion clips to 6.5 at the intense motion clips with negligible computation in the absence of activity. Also alignment errors are considerably reduced by online selection of small timescales at fast head motion and of bigger timescales at slower motion or local activity of lips and eyes. Being orthogonal and complementary to spatial domain techniques the proposed approach can also be conveniently integrated with future advances for further performance/efficiency improvements or for alignment extensions.
Olay kamerası için yinelemeli evrişimsel ağ yoluyla tümleşik yüz ve referans noktası yeri belirleme
(2025) Kılıç, Giray; Savran, Arman
The event camera's high temporal resolution, low power consumption, and wide dynamic range make it increasingly popular in robotics, surveillance, and emerging facial applications such as identity recognition, driver monitoring, and visual speech recognition. Localizing the face and landmarks is an essential first step in facial applications. We employ a joint network with a multi-task loss to realize both tasks, avoiding the need for redundant separate models. This is achieved through the multi-task head attached to the neck, which includes a context module designed with deformable convolutions to accommodate the non-rigid variability of facial shapes. The necks are connected to the feature pyramid network (FPN), which receives input from the recurrent convolutional network layers. By experimenting with two datasets of varying characteristics, the ECFacePose and FES datasets, we demonstrate that FPN with spatio-temporal features outperforms previous face localization approaches, achieving superior landmark localization and effective small face detection. Our experiments confirm the performance benefits of the deformable convolution-based context module, temporal consistency loss, and the Rectified Wing Loss. Furthermore, we explore 12 convolutional backbones, categorizing them into lightweight, middleweight, and heavyweight classes, and demonstrate that the middleweight InceptionV3 and DenseNet backbones deliver impressive performance-efficiency trade-offs. Our study illustrates that while FPN is crucial for landmark and small face detection and enhances larger face detection, it also increases FLOPs fourfold and doubles memory usage.
Citation - WoS: 10
Citation - Scopus: 15
Face pose alignment with event cameras
(MDPI AG, 2020) Arman Savran; Chiara Bartolozzi; Bartolozzi, Chiara; Savran, Arman
Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency high dynamic range and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications. © 2020 Elsevier B.V. All rights reserved.
Evaluation of Convolutional Networks for Event Camera Face Pose Alignment
(2025) Arman Savran; Burhan Burak Oral; Alptuğ Çakıcı; Oral, Burhan Burak; Çakıcı, Alptuğ; Savran, Arman
Event camera offers substantial advantages over conventional video cameras with their efficiency extremely high temporal resolutions low latency and high dynamic range. These benefits have led to applications in various vision domains. Recently they have been applied in facial recognition tasks as well. However while significant advantages of event cameras in some facial processing tasks have been demonstrated the initial stage in almost any task i.e. face alignment is not at par with the conventional cameras. This study investigates the use of face alignment convolutional networks regarding both performance and complexity for event camera processing. Our aim is event camera face pose alignment that can be used as an efficient preprocessor for facial tasks. Therefore we comparatively evaluate simple convolutional coordinate regression with a hybrid of coordinate and heatmap regression known as pixel-in-pixel regression. Our experimental results reveal the superior performance of the hybrid method. However we also show that if there is a computation bottleneck simple convolutional coordinate regression is preferable for their low resource requirements though at the expense of some performance loss.
Citation - Scopus: 1
Comparison of Timing Strategies for Face Pose Alignment with Event Camera, Olay Kamerasiile Y zPozu Hizalama i in Zamanlama Stratejilerin Karilatirilmasi
(Institute of Electrical and Electronics Engineers Inc., 2023) Arman Savran; Savran, Arman
Event camera which has recently started to increase in use can surpass the traditional camera in certain areas with their efficiency l e vel o f d e tail i n t h e t i me d i mension and high dynamic range. In order to be able to process vision data practically with the event camera first o f a l l t i me intervals must be determined to transform the pixel-event data into a structure suitable for processing. For this purpose two basic timing strategies are applied in the literature: constant time frame and constant event count frame. This study addresses the comparison of these two approaches in order to determine the appropriate time intervals in the event camera face pose alignment problem. Experimental results showed that the constant event count strategy was superior in terms of face pose alignment performance and efficiency. I t w as a l so s een t hat t he t ime frame midpoint achieved a lower error rate than the median and mean timing. © 2023 Elsevier B.V. All rights reserved.
Citation - Scopus: 6
Fully Convolutional Event-camera Voice Activity Detection Based on Event Intensity
(Institute of Electrical and Electronics Engineers Inc., 2023) Arman Savran; Savran, Arman
The use of visual signals to detect vocally active duration is quite helpful when there is severe acoustic noise or even can be the only option if the audio channel is missing. There has been significant progress in video-based voice activity detection (VAD). On the other hand while recently emerging event camera (EC) technology has demonstrated great benefits for applications in robotics drones autonomous vehicles and mobile devices including visual speech recognition topics it has not been explored to be used as a vision-only VAD front-end. In this work we propose an event intensity-based method by designing a fully convolutional network to efficiently realize an EC-VAD that segments vocally active duration. Efficiency is due to pooling the data over the mouth area reducing the dimensions by totally collapsing local spatial information as well as due to one-stage detection by a fully temporal convolutional network. Experimental evaluations show successful detection of voice activity with about 0.91 area under the receiver operating curve over a dataset including high speech content variability and different types of facial actions. © 2023 Elsevier B.V. All rights reserved.

Savran, Arman

Profile URL

Name Variants

Job Title

Email Address

Main Affiliation

Status

Website

ORCID ID

Scopus Author ID

Turkish CoHE Profile ID

Google Scholar ID

WoS Researcher ID

Files

Sustainable Development Goals

NO POVERTY

ZERO HUNGER

GOOD HEALTH AND WELL-BEING

QUALITY EDUCATION

GENDER EQUALITY

CLEAN WATER AND SANITATION

AFFORDABLE AND CLEAN ENERGY

DECENT WORK AND ECONOMIC GROWTH

INDUSTRY, INNOVATION AND INFRASTRUCTURE

REDUCED INEQUALITIES

SUSTAINABLE CITIES AND COMMUNITIES

RESPONSIBLE CONSUMPTION AND PRODUCTION

CLIMATE ACTION

LIFE BELOW WATER

LIFE ON LAND

PEACE, JUSTICE AND STRONG INSTITUTIONS

PARTNERSHIPS FOR THE GOALS

Documents

24

Citations

1397

h-index

13

Documents

19

Citations

1057

Scholarly Output

9

Articles

4

Views / Downloads

0/0

Supervised MSc Theses

3

Supervised PhD Theses

0

WoS Citation Count

11

Scopus Citation Count

24

Patents

0

Projects

0

WoS Citations per Publication

1.22

Scopus Citations per Publication

2.67

Open Access Source

3

Supervised Theses

3

Scopus Quartile Distribution

Quartile distribution chart data is not available

Competency Cloud

Filters

Settings

Sort By

Results per page

Scholarly Output Search Results