The Web Conference -

Prateek Sancheti working with Dr. Kavita Vemuri presented a paper on LLM Driven Web Profile Extraction for Identical Names at the Information Retrieval Meets Large Language Models Workshop at The Web Conference (A*) held in Singapore from 13 to 17 May 2024. The other author of this paper is Prof. Kamalakar Karlapalem. Here is the summary of the research work as explained by the authors:

The number of individuals with identical names on the internet is increasing. Thus making the task of searching for a specific individual tedious. The user must vet through many profiles with identical names to get to the actual individual of interest. The online presence of an individual forms the profile of the individual. We need a solution that helps users by consolidating the profiles of such individuals by retrieving factual information available on the web and providing the same as a single result. We present a novel solution that retrieves web profiles belonging to those bearing identical Full Names through an end-to-end pipeline. Our solution involves information retrieval from the web (extraction), LLM-driven Named Entity Extraction (retrieval), and standardization of facts using Wikipedia, which returns profiles with fourteen multi-valued attributes. After that, profiles that correspond to the same real-world individuals are determined. We accomplish this by identifying similarities among profiles based on the extracted facts using a Prefix Tree inspired data structure (validation) and utilizing Chat- GPT’s contextual comprehension (revalidation). The system offers varied levels of strictness while consolidating these profiles, namely strict, relaxed, and loose matching. The novelty of our solution lies in the innovative use of GPT – a highly powerful yet unpredictable tool for such a nuanced task. A study involving twenty participants and other results found that one could effectively authenticate information for a specific individual. factual information available on the web and providing the same as a single result. We present a novel solution that retrieves web profiles belonging to those bearing identical Full Names through an end-to-end pipeline. Our solution involves information retrieval from the web (extraction), LLM-driven Named Entity Extraction (retrieval), and standardization of facts using Wikipedia, which returns profiles with fourteen multi-valued attributes. After that, profiles that correspond to the same real-world individuals are determined. We accomplish this by identifying similarities among profiles based on the extracted facts using a Prefix Tree inspired data structure (validation) and utilizing Chat- GPT’s contextual comprehension (revalidation). The system offers varied levels of strictness while consolidating these profiles, namely strict, relaxed, and loose matching. The novelty of our solution lies in the innovative use of GPT – a highly powerful yet unpredictable tool for such a nuanced task. A study involving twenty participants and other results found that one could effectively authenticate information for a specific individual.

Conference page: https://www2024.thewebconf.org/

Link to full paper: https://dl.acm.org/doi/10.1145/3589335.3651946

August 2024