To make its products more relevant to Indians, Google is working on a host of initiatives especially customized for its Indian audiences. From working with Indian language publishers, to bringing more relevant content online, to launching a new feature in Google Go that lets you listen to web pages, these products show it has Indian users in mind. As part of the initiative to make its products available to more of the Indian population, Google India recently conducted a Hackathon whose focus was on ‘building for India’. Open to amateur as well as professional developers across industry and academia, this day-long event held simultaneously at the Google Hyderabad and Bangalore locations saw various social initiative projects across two tracks. An Android track where participants were invited to build mobile applications relating to the areas of education, healthcare, agriculture, traffic, pollution, water, fintech, infrastructure and so on. And the Machine Learning track where participants were provided with datasets focused on India and ideas for projects related to them. A host of goodies and giveaways such as Chromecasts and Google Home were announced to the participants and winners.
Text Classification and Waste Segregation: ML Track
The Machine Learning track had problem statements that included the creation of ML models to categorize waste items from an image captured using a mobile phone, building a model to predict farm output based on geographical and seasonal factors and recommend crops to farmers. A student team from the Search and Information Extraction Lab (SIEL) that works under Prof. Vasudeva Varma, chose the problem statement of Text Classification, an area close to their own field of research. They were given the mandate to build ML models to categorize customer support queries in the context of a public service provider in India. In this case, the focus was on the rural sector in general and the Indian farmer in particular. Bakhtiyar Syed, a dual degree student and Vijayasaradhi Indurthi, part-time student who incidentally works full-time at TeraData, formed the Z-Research team deciding to focus on query classifications for Indian farmers.
Isha Dua, an MS by Research student with the Centre for Visual Information Technology (CVIT) who works under Prof. C.V. Jawahar chose the problem of waste segregation. Apart from the general categorization that involves separating waste into dry-waste [recyclables] and wet-waste [food], the problem required further sub-categorizing the dry-waste into paper, cans, plastics and others. Training data was provided for all models.
Z-Research
While speaking of the rationale behind the name, Bakhtiyar laughs as he says they wanted to stand out as a research team. “While the ‘Z’ doesn’t really mean anything, we wanted our team name to be classy. There were other team names such as ‘Looking For Jobs And Internships’ and so on. Our’s stood for our work,” he says. Explaining what they did, Bakhtiyar says that assuming there was a query from a farmer on Google, they had to automatically classify that query into one of 10 categories, such as fertilizer use and availability, weather, varieties of crops and so on. Isha Dua who did not explicitly hack away under a moniker says that she chose the problem of waste segregation because of the necessity to deal with waste in today’s world. “Tons of plastic along with other waste is removed from the sea everyday. Segregating that into different categories becomes challenging, so building a system that can automatically segregate waste from visual images becomes important. As soon as I saw this problem I developed a curiosity to solve it.”
Specific datasets were made available to all the teams and the ML tracks were hosted on Kaggle, a data science competition host website.
Challenges
“The challenges we faced included dealing with the skewed datasets. It may sound easy in theory to create a baseline and apply it to the dataset but in reality, we had to account for the fact that not all users may be proficient in English. For example, a farmer could query a weather-related topic by typing in a grammatically incorrect “wether” or maybe insert Hindi words in the English query or the dataset itself had variations of Hindi words such as “nahi”, and “nahin”,” explains Bakhtiyar. Adding to his teammate’s explanation, Vijayasaradhi says, “People may be querying the same problem but the expression of their query differs. They could have compact queries such as “sugarcane crop output Summer” rather than a complete sentence”.
“For me, being a one-person team, I had less time to run more number of experiments and doing more analysis. That was the biggest challenge,” says Isha. The proposed problem included images, approximately 2,500 in number across 5 classes. “As the data was very less, I tried data augmentation to increase the size to 5,000 images. These images were then normalized and I tried a few methods to classify them into 5 categories. I also did an analysis of why results can’t improve beyond a certain limit,” says Isha.
IIIT-H’s Role
Walking away with a coveted Google Home as their prize, the Z-Research team reveals that it faced the most intense competition from its own IIIT-H colleagues. “There were many teams from IIIT-H itself. And some from our own lab too. Being in the Information Retrieval lab and under the guidance of Vasudeva Varma sir, we were motivated to give this Hackathon a shot. Almost everyone from IIIT-H who participated in the competition took part in the ML track. This speaks volumes about the quality of research that we’re doing out here, “ says Bakhtiyar. Vijayasaradhi seems to agree when he says that it is the group setting that inculcates a strong sense of competitive spirit that rubs off, prompting participation in such events. “I didn’t have a love for Hackathons. But a love for machine learning, yes!”, beams Bakhtiyar.
Age is just a Number
For 37-year-old Vijayasaradhi, anything that constitutes as a text classification problem is his area of expertise. “From fake news to bizarre news, click-baits, short text classification, very similar to the problem statement we had in the Hackathon,” says Vijayasaradhi. 21-year-old Bakhtiyar’s area of research deals with content repurposing or style transfer where content from say, a research paper is converted into a blog post, or the same piece of information is presented in a different fashion to different end-users. When not playing or watching football, Bakhtiyar confesses to being a gym rat. The more soft-spoken of the two, Vijayasaradhi, unwinds by playing the piano and escaping into the world of books. Sharing the same sense of nerdiness, the difference in ages seems perfunctory. The duo agrees that by working together they made up for their individual weaknesses, thereby creating a formidable team.
“Not only have we been working together on common research problems for a while now, we also have respect towards each other’s ideas. Our discussions were driven only by the strength of the ideas and age did not play any role in this matter”, remarks Vijayasaradhi sagely.
Thinking Out-Of-The-Box
24-year-old Isha says that she enjoys hacking, labelling it a “fun activity”. Having participated in Microsoft.Code.Fun.do earlier this year and walking away with the first place in that too, she is certainly no newbie hacker. “The organisers liked the approach I used to solve the problem which involved methods ranging from baseline to deep neural network. There were many strong teams I was competing with. In fact I was not even at the top of the leaderboard but I won because of the procedure I followed,” she says. Listing sketching, cooking, and exploring new things among her non-academic interests, Isha says she enjoys interacting with others especially at events like Hackathons, “because it is fun to know others’ perspectives”. Debunking existing myths about lesser participation by women in coding events such as these, Isha remarks that the women were definitely not in the minority with a sizeable number of women participants too.