Om Rajendra Kathalkar supervised by Dr. Sachin Chaudhari received his Master of Science in Electronics and Communication Engineering (ECE). Here’s a summary of his research work on Camera-Based Deep Learning Framework for AQI Estimation: Dataset and Methodology:
Urban air quality monitoring faces a fundamental scalability crisis. With only 800 monitoring stations serving over 4,000 Indian cities, traditional sensor networks cannot capture pollution variations that occur every 300-500 meters in complex urban environments. The sparse coverage, combined with deployment costs of INR 40 lakhs to 1.6 crores per station, renders comprehensive monitoring economically unfeasible. Image-based air quality estimation emerges as a transformative alternative, yet existing approaches are constrained by limited datasets and poor generalization across cities.
This thesis introduces two contributions that establish a new paradigm for scalable air quality monitoring. TRAQID (Traffic-Related Air Quality Image Dataset) provides the first comprehensive multi-view dataset with 26,678 synchronized front-rear traffic image pairs, co-located environmental sensors, and systematic coverage across three seasons in Indian urban environments. Benchmark evaluation reveals existing state-of-the-art methods achieve only 75% accuracy on TRAQID, exposing critical limitations in current approaches.
AQIFormer, our transformer-based architecture, presents image-based air quality estimation through weather-aware attention mechanisms, adaptive dual-view fusion, and multi-task temporal learning. The architecture achieves 89.96% accuracy on TRAQID—a remarkable 14.96% improvement over existing methods. Most significantly, AQIFormer demonstrates unprecedented cross-city generalization, maintaining 81.67% accuracy on independent Nagpur data with minimal adaptation (5% local samples), exhibiting only 8.29% performance degradation compared to typical 30%+ drops in existing approaches.
Comprehensive analysis validates each innovation: dual-view integration contributes 11.46% improvement, multi-task learning enhances discriminability by 11.57%, and attention visualization reveals consistent focus on physically meaningful pollution sources across diverse urban environments. The architecture processes standard traffic imagery in real-time, enabling integration with existing camera infrastructure without specialized hardware.
This research establishes the practical foundation for city-scale air quality monitoring at unprecedented spatial resolution and dramatically reduced costs. The robust cross-city performance enables immediate deployment across multiple urban environments, transforming air quality monitoring from expensive, sparse sensor networks to comprehensive, camera-based systems that can protect public health through enhanced environmental awareness.
November 2025

