Arnav Kapoor received his MS Dual Degree in Computer Science and Engineering (CSE). His research work was supervised by Prof. Ponnurangam Kumaraguru. Here’s a summary of his research work on Justice delayed is justice denied: enabling legal artificial intelligence via bail prediction on Hindi case documents:
Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment and help legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such as Hindi. Additionally one of the most common and time sensitive cases handled by the courts are bail cases.
In this Thesis, we first introduce the Hindi Legal Documents Corpus (HLDC), a corpus of more than 900K legal documents in Hindi. Documents are cleaned and structured to enable the development of downstream applications. We then introduce and tackle the task of bail prediction. We select the bail cases from our HLDC corpus and further extract the facts and arguments and judge’s summary. We experiment with a battery of models and propose a Multi-Task Learning (MTL) based model for the same. Our MTL model uses summarization as an auxiliary task along with bail prediction as the main task.
The intermediate summarisation step is a novel introduction which serves dual purposes. First, it reduces the document size without compromising on the information. Since many transformer models have constraint on the input length, sending a summarised version of the documents allows us to
overcome this barrier. Second, it builds towards explainable legal NLP systems as it allows us to identify salient sentences.
This Thesis lays the foundation for research in Legal NLP for Hindi court documents. The multitude of legal NLP tasks and challenges are indicative of the need for further research in this area.