Ramkishore S received his MS Dual Degree in Computational Linguistics (CL). His research work was supervised by Prof. Vasudeva Varma. Here’s a summary of his research work on Approaches to solving Single Document Extractive Summarisation:
Summarization involves reducing the size of the document without the loss of important content. The reduced content is called the summary of a document. Summarization can be divided into two broad categories: Extractive and Abstractive. In Extractive Summarization, the summary of a document is constructed by extracting phrases and sentences from the document itself. In this work, we are mainly concerned with single document extractive text summarization and employing deep-neural-network-based solutions for the same. While on the surface, extractive summarization seems like a trivial problem since summarizing here involves copying content from the source document, in reality, it is far from trivial. One main reason for the complexity of the task is the absence of gold labels for what sentences/phrases to extract. Typically, extractive summarization systems are evaluated using human-written summaries. However, this summary cannot be directly used to train the neural networks. Indirect methods of labeling or training are commonly used in many of these cases. Extractive Summarization can be divided into two steps: Document Understanding and Document pruning. A document is not simply a bunch of sentences put together. There is an underlying structure between the way sentences are arranged, which makes a document coherent and meaningful. Moreover, summarisation can be said to utilize this exact structure (among other things) to determine which sentences are more important and relevant. Most works assume that the structure of the document is linear and use recursive networks to encode them. But it was shown by Rhetorical Structure Theory (RST) that documents can be represented by tree-like structures. One of the first things we explore is to see if we can use non-linear tree-based networks to encode the document. Moreover, since learning to summarize involves learning to understand documents implicitly, we see if we can try to capture such structures implicitly when learning to summarize. We also explore if such structures can aid in summarisation. Another reason for the complexity of the task is that extractive summarization is different from sentence ranking. Sentence ranking can be defined as ranking sentences based on how good a sentence is for representing the document. However, in summarisation, the selection of each sentence also depends on the other sentences selected. This makes the task very complex due to the exponential number of possibilities in selecting a subset of sentences to represent the document. Each of these possible subsets of sentences can be called a candidate summary of the document. A document of length 25 has over two thousand such candidate summaries, only when considering those of length 2 and 3 sentences. This creates a combinatorial explosion problem that we should overcome by usnig indirect means. Many works deal with this combinatorial problem in their own way. Here we also explore different novel and unique methods for doing the same. We introduce a new pipelined architecture for extractive summarization, which helps in dealing with precisely this problem. We also see how we can adopt different complex reinforcement learning (RL) formulations to extractive summarisation. Many works use one of the two formulations in training their models: either policy-gradient-based algorithms or deep-Q-networks. In this work, we see how we can improve both of them. We also experiment with neural-network-based loss functions for summarisation. At last we see the problems that can arise in training extractive summarisation models when we try to circumvent the combinatorial explosion problem. Many works try to simplify their training objective by approximating summarisation as sentence classification, or use very limited set of samples to train their models. This creates a lot of issues in what the model actually learns. We show this by experiments and analysis, and also provide a glimpse at a solution for solving the same.