Optimization Techniques For Edge AI -

In a Q&A about the emergence of edge computing, Dr. Suresh Purini explains its scope, the challenges confronting Edge ML, and briefly describes the active research on edge that is underway at IIITH.

Why has ML on the Edge emerged?
Many applications from domains such as agriculture, health care, retail stores, Industry 4.0, and so on require intelligence on the edge. Real-time response, privacy, intermittent connectivity are some important factors among others which have necessitated this move from the cloud to edge. Further, by moving computation closer to data, rather than vice-versa, we can save on the available network bandwidth and the power consumption from the communication modules. Also, such distributed computations reduce the compute capacity requirements of centralised data centers thus saving on both capital and operational expenses such as a real estate requirement, power consumption, etc. However, deploying modern complex machine learning (ML) models on edge devices which are both compute and memory heavy is fraught with many challenges.

What are some of the key challenges of edge computing?
Edge devices often have limited computational power, memory, and storage compared to centralised servers. Due to this, the cloud-centric ML models need to be retargeted so that they fit in the available resource budget. Further, many edge devices run on batteries, making energy efficiency a critical consideration. The hardware diversity in edge devices ranging from microcontrollers to powerful edge servers, each with different capabilities and architectures requires different model refinement and retargeting strategies. As the field of machine learning evolves at a rapid pace, it becomes increasingly challenging for hardware accelerator designers and the associated software stack, including compilers and runtime systems, to keep up with and efficiently support the latest state-of-the-art models.

What are the typical strategies used to overcome the challenges?
Designing power-efficient hardware accelerators for ML inferencing tasks is an active area of research in both industry and academia. Some of the industry solutions include the NVIDIA Jetson Nano, Google Coral, Intel Movidius, Raspberry Pi, Qualcomm QCS605, and ARM Neoverse. However there are software tools and techniques too that are used to retarget machine learning models for high-compute and memory-constrained edge devices operating within a power envelope. While porting models to the edge, a typical trade-off arises between latency and accuracy. Higher accuracy requirements often lead to increased model complexity and, consequently greater latency, and the reverse is also true. Some of the well-known and emerging techniques that help in managing these accuracy-latency trade-offs are:

Model compression which involves transforming cloud-centric machine learning models to be edge-friendly using various techniques such as architecture simplification, use of lightweight layers, pruning, quantization, and knowledge distillation. For example, by employing these methods, the popular object detection model YOLOv5 has been optimized into edge-friendly variants like YOLOv5s (small) and YOLOv5n (nano).

Dynamic Model Selection: While techniques that are used for model compression focus on optimising single models, we can envision a model repository for a given task with varying accuracy-latency trade-offs under different conditions. Depending on the ambient conditions, we can dynamically switch between high and low complexity models as needed. For instance, under rainy conditions, we might switch to a more complex object detection model from a lighter one that performs well in clear conditions. A meta-model, performing cost-benefit analysis of model switching—possibly using reinforcement learning—can decide whether to switch and which replacement model to use. This type of model switching requires built-in support from the MLOps stack.

Distributed Inference: Many use cases involve the distributed deployment of numerous IoT or edge devices, such as CCTV cameras, working collaboratively towards specific objectives. These applications often have built-in redundancy, making them tolerant to failures, malfunctions, or less accurate inference results from a subset of edge devices. Algorithms can be employed to recover from missing, incorrect, or less accurate inputs by utilising the global information available. This approach allows for the combination of high and low accuracy models to optimise resource costs while maintaining the required global accuracy through the available redundancy.

Online Training and Refinement: Most of the discussion until now focuses on use cases where a model is trained at a centralised location and then retargeted for edge devices. However, some applications require that a deployed edge model be further refined based on ambient conditions using online training techniques. In such cases, an edge device can periodically send a sample of input data to a master model on the cloud and use the output to refine itself. This approach is akin to a distributed knowledge-distillation process.

Federated Learning: Large-scale deployment of distributed edge devices is inherently suited to federated learning, where a centralised master model in the cloud is trained using locally adapted models from the edge devices. By communicating model weights instead of raw data, power and communication bandwidth are conserved, and data privacy is preserved. However, this too faces numerous challenges, including device heterogeneity, intermittent network connectivity, bandwidth limitations, power constraints, privacy risks, and security concerns. Addressing these challenges requires substantial research in privacy-preserving methods, energy-efficient algorithms, and secure systems.

Can you briefly describe IIITH’s strides in edge AI research?
Several research groups at IIITH are exploring various aspects of edge AI research, with some of these initiatives falling under the broader Smart City project.

Digitising Water Meters: Prof. Sachin Chaudhari and his team has developed an IoT-based economic retroﬁtting setup for digitising the analog water meters to make them smart. The setup contains a Raspberry-Pi microcontroller and a Pi-camera mounted on top of the analog water meter to take its images. The captured images are then preprocessed to estimate readings using ML/DL models.

Air Pollution Monitoring using Images: Prof. Sachin Chaudhari’s efforts have led to an IoT-based real-time air quality index (AQI) estimation technique that uses images and weather sensors on Indian roads. A mixture of image features, i.e., traffic density, visibility, and sensor features, i.e., temperature and humidity, were used to predict the AQI. Object detection and localization-based Deep Learning (DL) method along with image processing techniques were used to extract image features while a Machine Learning (ML) model was trained on those features to estimate the AQI.

Model Balancers: Earlier, we discussed the concept of dynamic model selection to enhance resource efficiency. Similarly, a self-adaptive system that switches between different ML models is being developed, deployed, and successfully demonstrated on Qualcomm Edge Devices by Prof. Karthik Vaidhyanathan and his team.

Large Scale Distributed CCTV Camera Analytics: In this work, our team built a scalable distributed video analytics framework that can process thousands of video streams from sources such as CCTV cameras using semantic scene analysis. The main idea is to deploy deep learning pipelines on the fog nodes and generate semantic scene description records (SDRs) of video feeds from the associated CCTV cameras. These SDRs are transmitted to the cloud instead of video frames saving on network bandwidth. Using these SDRs stored on the cloud database, we can answer many complex queries and perform rich video analytics within extremely low latencies. There is no need to scan and process the video streams again on a per query basis. The software architecture on the fog nodes allows for integrating new deep learning pipelines dynamically into the existing system, thereby supporting novel analytics and queries.

Model Selection and Placement: In large-scale deployments of distributed edge devices, it is not always necessary to process data streams on all devices using highly complex and accurate models. We tried another approach where we can strategically deploy models with varying complexity and accuracy by exploiting the redundancy across data streams, thereby leveraging domain-specific meta information to achieve the same quality of analytics. This technique has significant practical applications and presents numerous research opportunities.

Overall, we are witnessing the advent of the edge AI/ML era, which is transforming the way we see and experience the world as intelligent edge devices bridge the human-machine continuum. However, realising this potential requires numerous innovations in hardware, system software, and algorithm optimization for machine learning, as well as scalable distributed systems and algorithms.

This article was initially published in the June edition of TechForward Dispatch

Suresh Purini

Dr Suresh Purini is an Associate Professor of Computer Science at IIIT-Hyderabad. He leads the Computer Systems Group. He has wide research interests spanning compilers, architecture, parallel and distributed systems, and most recently Systems for AI/ML. He practises Heartfulness Meditation, and is a certified yoga and meditation trainer.

Optimization Techniques For Edge AI

Leave a Reply Cancel reply