[month] [year]

Nikhil Pinnaparaju – Dual Degree CL

Nikhil Pinnaparaju received his MS  Dual Degree in Computational Linguistics (CL). His research work was supervised by Prof. Vasudeva Varma. Here’s a summary of Nikhil Pinnaparaju’s thesis Exploring Content and Spread of Misinformation Online:

With the rise in popularity of social networks, news is no longer monopolized by newspapers, TV and the radio. Most young people get their information about daily events from Twitter’s Trending page or Facebook’s News Updates and other such online forums. Traditional print media no longer remains the preferred way to get access to information anymore. However, this leads us to an unprecedented problem of content verification. While social media platforms have empowered many, by giving voices to people in remote corners of the world, it has also become a double edged sword. Anyone can now tweet or post whatever they want without any factual evidence or proof online. Such posts can gain traction and lead to a lot of misleading information and confusion for an unsuspecting reader. Taking just the case of Twitter, a popular social media website, Twitter sees an average of around 6000 Tweets per second. This translates to nearly 500 million tweets per day. Manually fact checking data on such a scale is an impossible task, and this is just for one such social media platform. Hence we turn towards automated approaches to the problem.

While the sheer quantity already poses a huge challenge, this problem is compounded by the problem of the spread of fake news. A study by Sinan Aral, at the MIT Sloan School of Management, showed that falsehood diffused significantly farther, faster, deeper and more broader than the truth in social media platforms like Twitter. This echoes the quote from Mark Twain that “a lie can travel halfway around the world, while the truth is putting on its shoes.” This makes the problem of automated detection harder as we need not only an accurate system but a fast one as well, as this becomes a time sensitive problem.

With the recent advances and breakthroughs in the field of Machine Learning and Deep learning, we have been seeing great results for classification problems. We leverage these innovations to explore their use and effectiveness in the problem of fake news and misinformation. In this thesis, we build a novel architecture for fast and accurate fake news detection in $T^3N$. $T^3N$ (Text and Temporal Tree Network), leverages deep learning based architectures to encode text, user and tree information in a temporal-aware manner. In Chapter 4, we also look at the identification of fake news spreaders in social media. Empirical observations showed that tweets shared are significant in identifying if the author is a spreader of fake news or other wise and hence we look at identification of fake news spreaders based on the content shared by them. Chapter 5 studies multi-modal fake news which consist of text and corresponding images and looks at the relationship between images corresponding to a sample and the text. Finally in Chapter 6, we look at the problem in the Indian context. India a country of 122 Major languages and 1599 other languages, is bound see a problem of fake news in regional languages. We try to put forward a first Indian Language dataset for fake news and see if it can be used to help push the frontier for work in regional languages.