December 2022
Kanak Garg received his Master of Science – Dual Degree in Computational Natural Sciences (CNS). His research work was supervised by Dr. Nita Parekh. Here’s a summary of his research work on Deep-StRIP: Deep Learning Approach for Structural Repeat Identification in Proteins:
It is observed that internal repeats in eukaryotic proteins are three times more likely compared to in prokaryotes suggesting possible advantages provided to the organism. Functions specific to eukaryotes such as connective tissue proteins, cytoskeletal proteins, ribonucleoproteins, muscle proteins, brain and synaptic proteins, and cell adhesion proteins are observed to have internal repeats. The other major advantage of studying protein repeats is in their ability to confer multiple binding and structural roles on proteins. With a high frequency of occurrence in humans, 30%, tandem repeat proteins have been linked to various complex diseases, e.g., the role of Leucine-rich repeats in Parkinson’s disease, ANK, HEAT, and ARM repeats in cancer, etc. A large surface to-volume ratio, good target affinity, smaller size, and stability in the shape of repeat proteins make them important for biomedical applications. For example, designed ankyrin repeat proteins (DARPins) are being developed for targeted therapies. Due to fewer constraints at the sequence level, detection of repeats at the structure level is desirable. Here we propose a deep learning-based approach for the detection of two major classes of structural repeats, namely, Class III and Class IV repeats in Kajava’s classification, which covers over 90% of known structural repeats and their repeat region. We approached the problem by exploiting the presence of structural information in the protein distance matrix which captures the internal distances between residues in a protein chain. Performance evaluation is carried out by comparison with other state-of-the-art methods and annotations in UniProt. Lastly, we have discussed the improvement done in the development of the NAPS Portal (Network Analysis of Protein Structures) which is used by researchers to visualize and to perform analyses of different protein structures.