Shreya Gupta received her MS Dual Degree in Electronics and Communication Engineering (ECE). Her research work was supervised by Dr. Lalitha Vadlamani. Here’s a summary of her research work on Rack-Aware Cooperative Regenerating Codes and Hadoop Framework for Clay Codes:
In distributed storage systems, two kinds of models are used for repairing in case of multiple erasures. In centralized model, a central node downloads the information from all the helper nodes and reconstructs the failed nodes and then sends reconstructed data to corresponding node. While in cooperative model, repair happens in two rounds and each reconstructing node downloads some information from helper nodes and other reconstructing nodes. In rack-aware distributed storage systems, there is no cost associated with transferring symbols within a rack. Hence, the repair bandwidth will only take into account cross-rack transfer. Rack-aware regenerating codes for the case of single node failures have been studied and their repair bandwidth tradeoff characterized.
In this thesis, we consider the framework of rack-aware cooperative regenerating codes for the case of multiple node failures where the node failures are uniformly distributed among a certain number of racks. We characterize the storage-repair bandwidth tradeoff as well as derive the minimum storage and minimum repair bandwidth points of the tradeoff. We make use of both centralized and cooperative model to arrive at the storage-bandwidth tradeoff and the minimum storage/bandwidth points. We also provide constructions of minimum bandwidth rack-aware cooperative regenerating codes for all parameters.
In next part of the thesis, we present our implementation of Clay Codes in Hadoop. Hadoop is a software platform which provides a framework for distributed storage. Hadoop file system can store data according to different erasure coding algorithms. Some algorithms like replication, xor coding and reed-solomon codes are already implemented in Hadoop source code. Clay Codes are vector codes where data within a node is further divided into chunks. Clay Codes are a type of high-rate MSR codes. For our work, we have tried the implementation of clay codes and discussed the challenges faced.