Suhan Prabhu received his MS-Dual Degree in Computational Linguistics (CL). His research was supervised by Dr. Manish Shrivastava.
Here’s a summary of Suhan Prabhu’s thesis, Exploring Event Extraction through languages as explained by him:
The word event is quite commonplace in any language. Often it refers to the notion of something happening, when we need to mention that happening exclusively. News stories often cover “current events” as a serialized list of occurrences which serves as a self-sufficient understanding of a real-world situation, abstracted just to one question: ”What took place?”
As a trilingual, who speaks languages from three different language families (English, Hindi, and Kannada), the diversity of representing temporal phenomena is fascinating. From differences in morphology, all the way to how the notion of time is treated in these languages, everything in between differs significantly. That is not to say, however, that there are no common properties at all. That fact that the aim of event representation in all these languages is to capture temporally significant phenomena allows some common properties to arise from the various syntactic, semantic, and morphological features used in these representations.
This thesis is a culmination of the work done on event detection, annotation, and analysis. We present here the development of the detection of events on a large scale for low resource languages in two ways from a computational and a linguistic perspective. From the linguistic perspective, we discuss the creation of a language specific event annotation and representation task for Kannada, a morphologically rich resource poor Dravidian language and Hindi, a popular Indo-Aryan language. This is one of the first attempts at a large-scale discourse level annotation for Kannada, which can be used for semantic annotation and corpus development for other tasks in the language. From a computational perspective, we investigate leveraging information from resource rich languages and use transfer learning to detect events in a resource poor environment as well. We present a Language Invariant Neural Event Detection (ALINED) architecture. ALINED uses an aggregation of both sub-word level features as well as lexical and structural information.
This thesis has served to increase the repertoire of literature, corpora, and machine learning models for Indian languages, which are known to be resource starved. It is my hope that this work can be used as a stepping stone towards making progress in NLP for Indian languages. I provide here a mechanism to leverage resource rich languages for improving the state of other resource poor languages.