Sriram Chaudhury received his doctorate in Computational Linguistics (CL). His research work was supervised by Prof. Dipti Mishra Sharma. Here’s a summary of his research work on Mutual-Bootstrapping for Language Resource Development:
Language is the most important medium of communication and education. In a multi-lingual country like India where there are 22 scheduled languages, it is a matter of national policy to endorse and promote the development of all these languages and reduce the language barrier. Giving universal access to information to people in their language is the primary concern and need of society. In this technological revolution, access to all information and communication technology (ICT) and Human language technology (HLT) will make it convenient for people to gain all the benefits of this. Though there are several NLP Tools and approaches available, due to the scarcity of good quality benchmark data, the development of practical usable systems is hampered. Machine Translation (MT) systems play a major role in bridging the language gap. However, there are several pit-falls in the MT due to various linguistic, cultural, and social reasons. Hence, there is a need for an Accessor system where there is a guarantee of faithful translation. In this thesis, we have proposed and implemented an architecture to develop a usable system called ’Anusaaraka.’ The use of an expert system to develop this Accessor + MT system is unique and helpful for the common man to contribute to further improvement of the system. As discussed without proper quality linguistic resources, no usable NLP applications can be developed. To address this issue, we have proposed a unique concept of ’mutual-bootstrapping’ where various linguistic/NLP tools interact with each other to further enhance/boot-strap the development of high quality linguistic resources. We also demonstrated the proposed model with the task of ’word/phrase alignment’ of a parallel corpus by building an ’Alignment Tool’. Methods to build high-quality linguistic resources like bilingual dictionaries, multiword expression dictionaries, treebank, domain dictionaries, technical dictionaries, etc. through mutual interactions by the Alignment Tool are discussed. Neural network based Machine Translation (NMT) systems are now producing state-of-the-art results. But still, there are errors and idiosyncrasies in the output. We have applied the ’mutual-bootstrapping’ approach to improve the accuracy of the NMT. Through direct pre-processing and post-processing techniques using the ’Alignment tool,’ the boot-strapping of NMT to produce high-quality translations will be discussed in detail. Throughout the thesis, we have emphasized the concept of ’mutual-bootstrapping’ in developing various NLP Tools/resources. The results of the tools are compared with other state-of-the-art results and the error analysis is done.