[month] [year]

Kritika Agrawal – Dual Degree CSE

Kritika Agrawal received her MS  Dual Degree in Computer Science and Engineering (CSE). Her research work was supervised by Dr. Vikram Pudi. Here’s a summary of Kritika Agrawal’s  MS  thesis, Scalable, Semi-Supervised Extraction of Structured Information from Scientific Literature as explained by her: 

Scientific knowledge is one of the greatest assets of humankind. This knowledge is recorded and disseminated in scientific publications, and the body of scientific literature is growing at a tremendous rate. As scientific communities grow and evolve, there is a high demand for improved methods for finding relevant papers, comparing papers on similar topics and studying trends in the research community. Whenever researchers start working on a problem, they are interested to know if the problem has been solved previously, methods used to solve this problem, the importance of the problem and the applications of that problem. Automatic methods of processing and cataloging that information are necessary for assisting scientists to navigate this vast amount of information and facilitating automated reasoning, discovery, and decision-making on that data. All these tasks involve the common problem of extracting structured information from scientific articles. This leads to the requirement to find automatic ways of extracting such structured information from the vast available raw scientific literature, which can help summarize the research paper and the research community. This thesis focuses on processing scientific articles and creating structured repositories such as knowledge graphs to find new information and make scientific discoveries. In this thesis, we propose a novel, scalable, semi-supervised method for extracting relevant structured information from the vast available raw scientific literature. We extract the fundamental concepts of aim, method, and result from scientific articles and use them to construct a knowledge graph. The algorithm makes use of domain-based word embedding and the bootstrap framework. Our approach also makes use of citation context apart from title and abstract on which most of the work relied till now. We show how the extracted concepts and the available citation graph can be used to represent the research community as a knowledge graph. We demonstrate our method on a sizeable multi-domain dataset built with the help of the DBLP citation network. Our experiments show the domain independence of our algorithm and that our system achieves precision and recall compared to state of the art. The tremendous amount of research publications available online aims to solve a lot of interesting problems. Some of the fields have been studied well and research problems have been solved with time. However, there are few problematic research problems which are yet not solved entirely and interests many researchers. In this thesis, we also aim to find research fields that are saturated and research fields that need to be explored yet by performing temporal analysis on top of the knowledge graph formed.