Abdur Rahman Bin Mohammed Faizullah -

Abdur Rahman Bin Mohammed Faizullah supervised by Dr. Rahul Mishra received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers:

Examining limitations is a crucial step in the scholarly research review process, revealing aspects where a study might lack decisiveness or require enhancement. This practice aids readers in considering broader implications for further research and helps in identifying potential areas for improvement and future investigation. In this thesis, we introduce a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers.

To facilitate this task, we compile a comprehensive dataset calling it LimGen, which encompasses 4068 research papers along with their associated limitations from the ACL anthology. The LimGen dataset represents a significant resource for the academic community, providing a structured and extensive collection of limitations that can be used to train and evaluate models aimed at generating meaningful and constructive limitations.

We investigate several approaches to harness large language models (LLMs) for producing suggestive limitations. This involves a thorough examination of related challenges, practical insights, and potential opportunities. Our research delves into various techniques and methodologies, assessing their effectiveness and exploring ways to enhance the generation of suggestive limitations.

The challenges addressed in this thesis include the inherent difficulty of understanding complex research content, the need for generating limitations that are both relevant and insightful, and the integration of domain-specific knowledge to improve the quality of the generated limitations. We also discuss the practical implications of our findings, providing valuable insights for researchers and practitioners looking to leverage LLMs for this purpose.

We experimented with various setups, including using summarization-specific pre-trained models like BART and PEGASUS, and generative models such as T5, Cerebras-GPT, and Llama 2. To address the length constraint of input papers, we employed the Dense Passage Retrieval (DPR) approach, which processes relevant passages to maintain context. Additionally, we implemented a chain modeling technique to refine the generated limitations for better coherence and relevance.

A comprehensive evaluation of the proposed models was conducted using automatic (ROUGE-based), human (manual expert-based), and model assistive (LLM-based) approaches. The results indicate that models trained specifically for summarization do not effectively generate insightful limitations, highlighting the complexity of the SLG task compared to traditional summarization tasks.

Our contributions include the development of the LimGen dataset, the proposal of several LLM-based schemes for SLG, and a thorough evaluation of these methods. The LimGen dataset and accompanying code are made available to the research community to foster further exploration and development in this area. Our work aims to contribute to the advancement of automated tools for research evaluation, ultimately aiding in the improvement of research quality and the acceleration of scientific discovery.

July 2025