Praveen Krishnan received his doctorate in Computer Science and Engineering. His research work was supervised by Prof. CV Jawahar. Here’s a summary of Praveen Krishnan’s thesis, Learning Representations for Word Images as explained by him:
Reading and writing documents is one among the primary skills with which we gather and communicate information. With the emergence of Artificial Intelligence (AI), researchers are in constant pursuits to build intelligent algorithms that can bring our physical and digital worlds close to each other. One such important domain is document image analysis, where we delve into the problem of understanding content from scanned document image collections. Considering “words” as the basic unit in understanding a document, in this thesis, we address the problem of finding the best possible representation for word images.
Representation learning has been a key investigation for an AI problem. The primary goal of this thesis is to learn efficient representations for word images that encode its content. An ideal representation should be invariant to multiple fonts, handwritten styles and less sensitive to noise and distortions. In the past, representations have been handcrafted, specific to modalities (printed, handwritten), and sensitive to the complexities in hand writing in multi-writer scenarios. In this work, we choose the paradigm of learning from data using deep neural networks. We take our inspiration from the fact that given large amounts of annotated data, modern deep neural networks can inherently learn better representations. In this thesis, we also relax the need for large annotated datasets by heavily capitalizing on synthetically generated images. We also introduce a novel problem of learning semantic representation for word images which encodes the semantics of the word and reduces the vocabulary gap that exists between the query and the retrieved results.