“Practical Natural Language Processing”, a book co-authored by IIITians, has caught the eye of the global AI community, snagging a place in the top 1% of all general and technical books sold on Amazon. Anuj Gupta, Harshit Surana and Sowmya Vajjala shared the backstory on how their book was published by industry giant O’Reilly Media, a rare feat for an all-Indian team of writers.
Practical NLP follows a DIY model, bridging the chasm between the classical textbook and the ‘down in the trenches’ action in the industry. The book focuses on real-life scenarios and case studies to help readers crack everyday challenges that AI teams face in their daily work. The writers bring their varied expertise in building and scaling NLP systems, from their own experiences with early-stage start-ups and multinational giants.
Hard-fought practical advice from the field
If you are keen on ‘building and deploying NLP in the wild’, this book is an answer to your prayer. As a first handbook for the new initiate or a reference guide for a software engineer, data scientist, ML engineer, product manager or a VP, the book has something for every practitioner.
Beyond the cookie-cutter textbook and self-help cookbook, it “shows you how to do ML/NLP right, and get the most ROI from your efforts”, explain the authors. Whether it is building models based on quantity and quality of data available or navigating the minefield of deep learning, the compendium is packed with practical guidelines on how to build a good ML team and deliver AI projects successfully.
In a glowing testimonial, Zachary Lipton, Professor at Carnegie Mellon University (and author of Dive into Deep Learning) says “While many great books focus on ML’s algorithmic fundamentals, this book exposes the anatomy of real-world systems: from e-commerce applications to virtual assistants. Painting a realistic picture of modern production systems, the book teaches not only deep learning but also the heuristics and patchwork pipelines that define the (actual) state of the art for deployed NLP systems. The authors zoom out, teaching problem formulation, and aren’t afraid to zoom in on the grimy details, including handling messy data and sustaining live systems”.
The book received rave reviews from Tier 1 organizations like Google AI, Microsoft Research, Amazon, Facebook, KPMG, Google and Intel. The reader base includes a stellar list of start-up founders, researchers, practitioners, students and academicians.
“From healthcare to e-commerce and finance, the book covers many of the most sought-after domains where NLP is being put to use and walks through core tasks clearly and understandably. Overall, the book is a great manual on how to get the most out of current NLP in your industry” summed up Sebastian Ruder, Scientist at Google DeepMind and a very popular blogger on natural language processing, machine learning, and deep learning.
The book already has a low-cost subcontinent edition, a Chinese edition and is available in a digital format. Various universities globally have adopted the book for their applied AI/NLP courses. With the launch of the Indian edition, the team hopes to see the book in the curriculum of the IIITs, IITs and IISc. “This book is a must for all aspiring NLP engineers, entrepreneurs who want to build companies around language technologies and academic researchers who would like to see their inventions reach the real users”, said Monojit Choudhury, Principal Researcher at Microsoft Research and faculty at IIT Kharagpur, who used the book to teach a course at Plaksha University.
Harshit Surana is a serial entrepreneur, currently co-founder and CTO of Chaos Genius. It was a tough call for him to break away from the family’s paints and hardware trade to make a mark in the nebulous world of core technology. After working as a research scientist at LTRC under Prof. Dipti Misra Sharma and Dr. Anil Kumar Singh, Harshit went on to complete his Masters from Carnegie Mellon University and worked with MIT Media Lab on knowledge graphs.
Anuj Gupta heads the AI work at Vahan Inc, an organization that connects millions of Indians with blue-collar jobs at large organizations like Amazon, Big Basket, Zomato, etc. It was his passion for cryptography that led Anuj to IIIT H, where he joined the MS program in Theoretical CS in 2006 and worked with Prof. Srinathan at the CSTAR Lab. In the course of his career, Anuj has incubated multiple AI teams and led AI efforts, at both early-stage start-ups and Fortune 100 companies.
Since 2016, Anuj has been conducting workshops and boot camps for professionals in India and abroad, sharing his expertise in building AI systems. Harshit and Anuj’s paths had crossed, at a conference in Bangalore and thus began an enduring friendship. Harshit helped to further fine-tune the course material and the idea for the NLP book evolved from a discussion on concretizing the learnings from these workshops. The 3rd author, young Bodhisattwa Majumder, a Ph.D. scholar at the University of California at San Diego joined them and in 2017, the book project commenced.
“In early 2018, Anuj pinged me to review one of the chapters that they were writing. For anyone who asks me to review, I generally write a long critique”, chuckles Sowmya Vajjala, the 4th author. “They invited me to join the team around mid of 2018”. Sowmya was already plugged into the tech publishing ecosystem through reviewing for various publishers including Manning.
Sowmya started working in NLP at the LTRC labs in 2006 in the MS program, where she first met Anuj. She credits her advisor, Prof. Vasudev Verma, and her brother, Halley Kalyan with sparking joy in the speech and language translation domain. Her career trajectory included a Ph.D. at the Universitat Tubingen, an Assistant Professor position at the Iowa State University, and a year in software industry. She is currently a full-time researcher at Canada’s National Research Council.
Three years in writing exile
Before the accolades came, there was a lot of hard work. The authors had met several publishers and gone through multiple proposal rewrites before they landed the leading publishing house of O’Reilly. “We drew up a proposal for O’Reilly and that worked out very well since we already had the benefit of robust feedback from the other publishers”, explains Harshit.
From June 2017, O’Reilly brought extreme rigor to the entire writing process until the book went to print in May 2020. The team had to juggle their writing schedules with work commitments and personal challenges. Bodhi was applying for his Ph.D. program and had back issues, Harshit suffered repetitive strain injury on his hands, Anuj nursed a tennis elbow and Sowmya was expecting her first child. “The day Stanford’s NLP group retweeted one of our tweets on NLP, we knew we had written something worthy” says Anuj.
Fruit that didn’t fall far from the IIIT Tree
It is quite an endorsement that a small institute like IIIT H, with 5000 alumni has more than 100 entrepreneurs. “IIIT H is still my spiritual and intellectual home”, observes Harshit. “The exhaustiveness of references that we used, comes from the meticulousness that we learned there; something that my advisor Luis von Ahn, creator of Captcha, loves about the Institute.”
Anuj believes that it is the Institute’s DNA that binds them all together. There is an ecosystem on campus that demands that students think independently and the brand name inspires confidence to take risks and start something from scratch. “The roots and the water that nourished us came from that banyan tree”, reflects Anuj.
Codebase associated with the book: https://github.com/practical-nlp/practical-nlp
Book website: http://www.practicalnlp.ai