X
    Categories: Applied research

How Personal Experiences And Conviction Are Helping This Startup Bridge Linguistic Barriers With Tech

Winner of the Bhashini Grand Innovation Challenge by MeitY, Govt. of India, eBhasha Setu is taking ‘knowledge for all’ very seriously with its language processing technology services.

There’s a rather poignant moment in the movie “12th Fail” where Manoj Kumar, the aspiring IAS candidate discovers that he’s goofed up yet another shot at the UPSC exam by inadvertently writing a long essay on “Terrorism” in lieu of “Tourism in India”. Rashid Ahmad, CEO and Co-founder of  eBhasha Setu – a language processing technology services company for Indian languages, understands this feeling of frustration and despondency only too well. As a vernacular student, he had his first brush with the linguistic divide in India when he was trying to enrol for a BSc in Mathematics at Aligarh Muslim University (AMU). Not only was the competitive exam experience painful, he says, but there was a sense of alienation after enrollment too. “I knew most of the Mathematical concepts and was familiar with its terminology in Hindi but in examinations, I would get flummoxed by the questions in English,” he recounts.  

The Universe Conspires
As destiny would have it, a few years later, Rashid found himself in the thick of things at Sampark – an Indian Language to Indian Language Machine Translation (ILMT) project that had its genesis on IIITH campus at its Language Technology Research Center (LTRC). The project, steered by Prof. Rajeev Sangal, Prof. Dipti Misra Sharma and Dr. Vineet Chaitanya, pioneers of NLP in India, was run by a consortium of more than 11 Indian institutes and sought to create language technology for 9 Indian languages resulting in the machine translation of 18 language pairs. It was under Prof. Sangal’s mentorship that Rashid was motivated to enrol for an MS by Research degree and later a PhD at IIITH’s Language Technology Research Centre. Pawan Kumar, co-founder, eBhasha Setu, explains Rashid’s and his own presence on the ILMT project as a software engineering group representing Expert Software: “The project had a government mandate requiring an industry partner to “integrate heterogeneous research prototypes into a field deployable machine translation system – a lab to land journey.” Pawan’s self-conviction about language and language technologies stems from the fact that “if a nation wants to consider itself as a knowledge society, people at the grassroots should have access to information and knowledge in their local language that they can read, write, and understand in.” When the Sampark project officially came to an end in 2017, it made perfect sense for Pawan, Rashid and Sanket, another colleague to transfer the technology and launch their own language technology startup at the Centre for Innovation and Entrepreneurship (CIE) at IIITH. They did this with the active encouragement of their mentors Dr. Mukul Sinha and Prof. Rajeev Sangal, and MeitY, as well as initial seed funding by Dr. Sinha and angel investor, Mr. Yogesh Andlay. 

What They Do
“Our mission is to build a platform such that Indian languages can be the vehicle for transmission of knowledge and Science – भारतीय भाषा में ज्ञान-विज्ञान,” states Pawan. According to the team, knowledge or information is for all only if it is accessible, affordable and understandable. They believe that while accessibility and affordability have been addressed by technology to a large extent, the problem of understanding needs significant efforts which technology alone cannot solve. “Knowledge or access to knowledge has been made possible by cheaper devices like smartphones. Affordability has also been almost made possible by continuous falling prices of data networks, but making knowledge understandable is only possible by enabling communication or content in local languages,” says Pawan, while highlighting that all their language technology services are supplemented with an element of human intervention.

Language Solutions
Apart from using the Sampark machine translation system to translate one Indian language into another, the eBhasha Setu platform also utilises language technology components available in open source and tries to build NLP applications based on them. Cognisant of the fact that no machine translation system available today is 100% accurate, Pawan emphasises that eBhasha Setu wants to provide trust-worthy, publication-grade language services for text-to-text processing. Currently eBhasha Setu has built multiple language services platforms for different types of language processing tasks, which includes Transzaar – an end-to-end translation management system, Videozaar – a platform for video-to-video translation, Webzaar – a platform for website localization and Avataar – a digitization and translation platform that involves optical character recognition (OCR). For all these platforms they remain “technology agnostic”. “Whatever technology is available, we will plug it into our platform to provide productivity, accuracy and turnaround time benefits”, he says. To underscore the platform’s accuracy, they have incorporated a human-in-the-loop to improve the quality of output to an acceptable level of accuracy.  

Indic Language Market
“Being a multilingual nation, there’s a huge requirement for language technology in India in healthcare, education and the judiciary,” remarks Dipti Misra Sharma, Prof. Emeritus, Machine Translation and Linguistics, IIITH. She was approached by CMC, Vellore to help out in translation of Patient Consent Forms and Information Sheets in multiple Indian languages and instantly referred the hospital to eBhasha Setu for the same. “The information sheet typically tells the patient what they are getting into and the consent form is an official acknowledgement of this knowledge which they can then sign,” says Prof. Sharma. There exists a legal mandate of making information about clinical trials or procedures accessible and understandable to the patient in his or her own native language. For the illiterate or for those who are often confronted with these documents in an unfamiliar language and sign without understanding the implications, the eBhasha Setu team has successfully translated the documents into Bangla, Telugu, Malayalam, Tamil and Hindi. 

The startup has also been a part of Swayam – a Ministry of Education initiative to host online all courses from Grade 9 onwards till post-graduation so that they are available to anyone who wishes to learn free of cost. While the original content of these courses was in English, the demand however was in Indian languages. With the aim of being accessible to the larger populace, a translation of all the courses was mooted. “IIITH took up the project and translated content which was in the form of video lectures into 8 Indian languages. This meant that it first had to be transcribed, then translated, and we were supposed to provide subtitles. eBhasha worked with us and on top of machine translation, we had to manually edit and correct the machine translatable output,” narrates Prof. Sharma. Based on her experience with the project, she observes that though the eBhasha team took longer than other agencies to deliver the output, the quality of translation was top-notch. “For other agencies who were faster in translating, we had to do a lot of corrections. Of course, when you are trying to maintain quality, the time that you take is a little longer,” she muses.   

What Makes Them Unique
The eBhasha team avers that their platform fills gaps in accuracy of translation created by popular machine translation engines. “When we say we are into NLP tech, we are not being puritarian..we are all for bringing in English vocabulary into the vernacular domain especially where it is needed. For instance, the word ‘tumour’ is widely accepted, so why translate that? Why not use the same word instead? The only thing we insist on is that it should be understandable, and it should improve the user’s productivity,” states Pawan emphatically. Terming translation a very ‘cognitive’ and ‘creative’ task, he mentions that they have used several engineering approaches to make their translation services more like a ‘factory mode of production’ to enable scale of operations. 

Bhashini Grand Innovation Challenge
The startup recently won the Bhashini Grand Innovation Challenge bagging  25 Lakhs and a contract with the government for translation and digitisation of records in 10 Indian languages. The competition is part of the National Language Technology Mission which aims to provide an impetus to startups that provide language technology solutions. Pre-trained AI models in the form of APIs in various Indian languages are made available to the competing startups via the Bhashini platform. The challenge involved two problem statements inviting solutions that used the APIs given. One was live speech-to-speech translation and the other was document text translation of official governmental communication that is typically available in multiple regional language.  eBhasha Setu competed in the latter category. “We basically had two tasks – digitising the official documents which could be either printed or hand-written and then translating them into a familiar language before responding to them in the original language,” explains Rashid. The team says that their innovation which they labelled as ‘Avataar’ lay in scanning the documents, extracting relevant information so that it could be searchable in the digital medium and eventually translatable. The challenge was conducted in 4 phases spread over a period of 8 months. 

The team demonstrated 4 different use cases of their OCR + translation solution. “Let’s say you have studied in a university in Hyderabad which has given you certificates in Telugu and now you want to go to another state for higher studies which requires the certificates in English or in another language. Our solution can do that. We demonstrated similar requirements in the case of companies dealing with consumer goods invoices in multiple languages, or real estate firms formulating lengthy rental agreements, or law firms with documented legal proceedings running into hundreds of pages where a lower court judgement in Hindi needs to be handed over to the High Court (which requires it) in English,” narrates Pawan.      

Language Of The People
“When the team first started off around 10 years ago, not many people appreciated translation technology because it was not very much known, so they struggled initially. The need was always there. It is only now that the big tech companies are investing big time in India because they know there’s a large market where you need to do everything in the language of the people,” remarks Prof. Sharma. The bootstrapped eBhasha Setu team sees the potential for Ianguage technologies in many domains too but as Pawan remarks wryly, “There are a lot of users but no consumers.” He adds, “It’s not because they don’t want to pay for it but “they can’t pay for it.” This is where institutional intervention assumes importance and it’s gratifying to see the Government stepping in with various initiatives for boosting language technologies research to make knowledge accessible and equitable for all.

Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.

Sarita Chebbi :Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it's not her own.