Information and The Art of Relevant Retrieval -

Who amongst us is not guilty of doing a quick internet search in the midst of an argument to prove a point? Or which millennial parent has not resorted to the Internet to answer their child’s queries? Simplistically put, such a process through which relevant answers are fetched for us is essentially known as Information Retrieval (IR). There have been many rapid advancements in the field of information retrieval and its applications can be found across diverse fields.

International Conference on IR

Last month saw the conclusion of the 40th annual European Conference on Information Retrieval (ECIR) in Grenoble, France. This is the premier European forum for the presentation of new research results in the field of Information Retrieval. According to the official website, ECIR has traditionally had a strong student focus. And hence papers whose sole or main author is a postgraduate student or postdoctoral researcher have especially been welcomed here.

In keeping with this focus, a delegation of students from IIIT-H led by Prof. Vasudeva Varma from the Information Retrieval and Extraction (IRE) Lab attended and made a total of 8 presentations at the conference. Two of the student delegates, Raksha Jalan and Pinkesh Badjatiya were also accepted under the ECIR-Grenoble grant program which provides stipends to cover registration fees and travel expenses for full-time students wishing to attend ECIR 2018 in Grenoble.

Demos, Posters and Paper Presentations

Over the course of 4 days at the conference, a total of 39 full papers and 39 short papers together with 6 demos, 5 workshops, and 3 tutorials were presented. These were reviewed and selected from over 300 submissions. Accepted papers covered the state of the art in information retrieval including topics such as: topic modeling, deep learning, evaluation, user behaviour, document representation, recommendation systems, retrieval methods, learning and classification, and micro-blogs. The student delegates were unanimous in their views about the great opportunity they got to not only showcase their path-breaking research in the area but also to interact and network with other famous personalities like Fernando Diaz, Director of Research at Spotify. Bhaskar Mitra, Principal Applied Scientist at Microsoft Research, to name a few.

Here’s a brief summary of the presentations made by IIIT-H students which included demos, posters, long paper presentations as well as workshops.

SIREN

Fourth year dual degree student Shriyansh Agrawal successfully demonstrated a Domain Specific Search Engine (DSSE) based on Security named SIREN, an acronym for Security Information Retrieval and Extraction eNgine. He says that building on the growing popularity of domain-specific-search engines or vertical search engines which focus on one area of knowledge, they came up with a security-based search engine. “Our interactions with Chief Information Security officers of in the banking sector and other security experts reiterated the need for a security information search site.”

According to the paper, authored by Lalit. S. Mohan, SIREN is a security search engine “that aims to provide details on (i) vulnerabilities, threats, incidents, controls and advisories (ii) disambiguated and relevant search results ranked based on the credibility.”

Medical Forum Question Classification Using Deep Learning

Raksha Jalan, a 2nd year MS by Research student built a model for automatic classification of questions asked on health forums based on user intentions. She says, thousands of questions on public forums remain unanswered or get very late responses. Automatic question classifier can direct questions to specific experts according to their topic preferences to get quick and better responses. Although the model was designed for health forums primarily, it can be used for other Question Answering platforms.

Attention-based Neural Text Segmentation

Pinkesh Badjatiya says that the idea of the research on “Text Segmentation” is to teach machines to learn to create logical paragraphs in the text by looking at thousands or millions of examples of sections already created by humans on Wikipedia. This task finds its use in numerous applications such as Web page understanding, question-answering, creating summaries of conversations, extracting snippets of important text from emails for notifications etc. The study attempts to teach machines to learn to selectively focus on the sentences based on their importance in text, just like humans would, to improve the quality of the generated snippets.

Explicit Modelling of the Implicit Short Term User Preferences for Music Recommendation

Kartik Gupta presented this paper which proposes a new model for finding the short term user interests while listening to music. The model uses Last.fm tags of songs the user has listened to and can recommend the next song based on that. This approach can explicitly tell the features of the song the user is giving importance to at any given point of time.

Extracting relevant medical info from Tweets

In the field of health applications, Shashank Gupta and his team proposed a method of retrieving data on Adverse Drug Reactions from Tweets. The study titled Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets says that current adverse drug reaction (ADR) surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official reporting. A poster was also presented on Multi-Task Learning for Extraction of Adverse Drug Reaction Mentions from Tweets. The authors proposed two multi-task learning based methods to tackle the problem of labeled data scarcity for extracting adverse drug reaction mentions.

Second International Workshop on Recent Trends in News Information Retrieval

As per the abstract posted on the website, “Although IR and NLP have been applied to news for decades, the changing nature of the space requires fresh approaches and a closer collaboration with our colleagues from the journalism environment.” The goal of the workshop hence was to stimulate discussion between the communities and to share interesting approaches to solve real user problems, such as users being overwhelmed by the volume and diversity of news now available, and (being) unaware of the selection of stories that they see in various feeds.

There were two workshop presentations in this field of news IR: Neural Content-Collaborative Filtering for News Recommendation and Estimating Credibility of News Authors from their WIKI Validated Predictions. The former was guided by Dr. Vasudeva Varma and the latter by Dr. Kamalakar Karlapalem.

Student Impressions

Shriyansh Agrawal mentions that Grenoble like many other European cities has a good number of people from Indian diaspora. He says that they were particularly delighted to connect with IIIT-H alumni at the conference. “There were a few PhD students at the University of Grenoble who had done their Masters from IIIT-H. A team from Flipkart comprising of IIIT-H graduates presented their poster too. We had a great time together with Prof. Vasudev and talks with him over the conference dinner gave us some very good insights on a lot of things, from academia to taking important life decisions,” he signs off.

Sarita Chebbi

Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.

Information and The Art of Relevant Retrieval

Leave a Reply Cancel reply