Divija Swetha Gadiraju supervised by Dr. Lalitha Vadlamani received her doctorate in Electronics and Communication Engineering (ECE). Here’s a summary of her research work on Scaling Blockchain using Codes and DRL based Approach for Blockchain and UAV:
Blockchain and Reinforcement Learning (RL) are two game-changing research areas that have received a lot of
attention recently. In recent years, significant advances in RL have resulted in tremendous success in solving various
sequential decision-making problems in machine learning. The two most successful RL applications are discussed
in this work, unmanned aerial vehicles, and blockchain. Blockchain is a distributed ledger technology with its first
application in Bitcoin. The main challenge in blockchain-based cryptocurrencies is to provide a distributed trust
environment with high security like in a centralized financial system. The current throughput of Bitcoin is around
4 to 7 transactions per second and confirmation latency is about one hour. If Bitcoin has to go mainstream, the
throughput has to be in the order of thousands of transactions per second with very low latency in the order of
a few seconds. Recent advances in blockchain research proposed consensus algorithms that scale bitcoin, such as
sharding and Prism-based blockchain. However, the security of Bitcoin is very high that it can tolerate up to 50%
adversarial nodes and avoids double spending attacks. The current blockchain size is over 260 GB and is growing at
an astonishing rate imposing a huge storage requirement on the nodes. Recent developments improving the Bitcoin
consensus have shown that there is a trade-off between decentralization, scaling, and security. In order to scale
blockchain, we leverage coding theory and RL in this thesis.
Due to the increasing storage requirement for blockchains, the computation can be afforded by only a few miners.
Sharding has been proposed to scale blockchains so that the storage and transaction efficiency of the blockchain
improves at the cost of a security guarantee. Incorporating coding theory into existing consensus algorithms has
demonstrated improvements in terms of storage efficiency and low latency. A Secure-Repair-Blockchain (SRB) is
proposed which aims to decrease the storage cost at the miners. In addition, SRB also decreases the bootstrapping
cost, which allows for new miners to easily join a sharded blockchain. In order to reduce storage, coding-theoretic
techniques are used in SRB. In order to decrease the amount of data that is transferred to the new node joining
a shard, the concept of exact repair secure regenerating codes is used. The proposed blockchain protocol achieves
lower storage than those that do not use coding and achieves lower bootstrapping costs as compared to the different
baselines.
Prism is a recent blockchain algorithm that achieves the physical limit on throughput and latency without
compromising security. However, like the traditional blockchain systems, Prism also has a trade-off between security,
latency, and cost. In recent days, reinforcement learning approaches are investigated in traditional blockchains,
to improve performance. In this work, we apply Deep Reinforcement Learning (DRL) to one of the promising
blockchain protocols, Prism, to optimize its performance. We propose a Deep Reinforcement Learning-based Prism
Blockchain (DRLPB) scheme which dynamically optimizes the parameters of Prism blockchain and helps in
achieving a better performance. In DRLPB, we apply two widely used DRL algorithms, Dueling Deep Q Networks
(DDQN) and Proximal Policy Optimization (PPO). This work presents a novel approach to applying DDQN and
PPO to a blockchain protocol and comparing the performance. The analysis of Prism in terms of latency, and security
level considering other blockchain parameters is provided. Using the analysis, the DRLPB scheme adapts the Prism
blockchain parameters to enhance the security upto 84% more than Prism, while still preserving the performance
guarantees of Prism.
The recent advancements in the field of Internet of Things (IoT) motivate the development of a secure
infrastructure for storing and sharing vast amounts of data. Blockchain, a distributed and immutable ledger, is
best known as a potential solution to data security and privacy for IoT. The scalability of blockchain, which should
optimize the throughput and handle the dynamics of the IoT environment, becomes a challenge due to the enormous
amount of IoT data. The critical challenge in scaling blockchain is to guarantee decentralization, latency, and security
of the system while optimizing the transaction throughput. this paper presents a deep reinforcement learning (DRL)-
based performance optimization for blockchain-enabled IoT. We consider one of the recent promising blockchains,
Prism as the underlying blockchain system because of its good performance guarantees. We integrate the IoT data
to Prism Blockchain and optimize the performance of the system by leveraging Proximal Policy Optimization (PPO)
method. The DRL method helps to optimize the blockchain parameters like mining rate and mined blocks to adapt to
the environment dynamics of the IoT system. Our results show that the proposed method can improve the throughput
of Prism blockchain based IoT systems while preserving Prism performance guarantees. Our scheme can achieve
1.5 times more system rewards than IoT integrated Prism and improve the average throughput of the system by
about 6,000 transactions per sec.
2
Unmanned aerial vehicles (UAVs) are widely used for missions in dynamic environments. DRL can find
effective strategies for multiple agents that need to cooperate to complete the task. The challenge of controlling
the movement of a group of UAVs is addressed by Multi-Agent Deep Reinforcement Learning (MARL). The
collaborative movement of the UAV fleet can be controlled centrally and also in a decentralized fashion, which is
studied in this work. We consider a dynamic military environment with a group of UAVs, whose task is to destroy
the targets while avoiding obstacles like mines. The UAVs inherently come with a limited fuel capacity directing
our research to focus on the minimum task completion time. GLIDE, a continuous-time based PPO algorithm is
leveraged in which the UAVs coordinate among themselves and communicate with the central base to choose the
best possible action. The simulator called UAV SIM is developed for our experimentation in which the mines
are placed at random locations unknown to the UAVs at the beginning of each episode. The performance of the
proposed scheme is evaluated through extensive simulations and a comparison of the centralized action control and
the decentralized action control is presented.
September 2023