In a boost to India-centric clinical research and development, IIITH in collaboration with Nizam’s Institute of Medical Sciences (NIMS), Hyderabad has unveiled publicly available datasets comprising digitized histopathological images of brain cancer and kidney disease (Lupus Nephritis).
The India Pathology Dataset (IPD) project, is a multi-stakeholder joint venture between academia, hospitals, industry, and the government to digitise slide images of tissue biopsies for reaping benefits that range from reduced risk of damaging physical slides to improved clinical decision-making to improved turnaround time and bettering research opportunities with the help of AI.
As part of the initiative supported by the Technological Innovation Hub for Data Banks, Data Services, and Data Analytics (TiH-Data), IIITH installed a whole slide digital scanner at the premises of NIMS, Hyderabad. “Traditionally, tissue samples and biopsies are visualised under the microscope. But by digitising these slides, computers can be used to visualise these images and they can be shared across locations for a collaborative diagnosis with other pathologists,” remarks Prof. Vinod P.K, who is leading the curation of datasets on various cancers.
Curated Brain Tumour Dataset
One of the first datasets that has been released is the IPD-Brain dataset in Nature Scientific Data – a prestigious, open-access, online-only journal for descriptions of scientifically valuable datasets. With its focus on Indian demographics, it comprises 547 high-resolution H&E slides from 367 patients making it one of the largest in Asia. “The effective management of all cancers relies on precise typing, sub-typing, and grading,” remarks Dr. Megha Uppin, Department of Pathology, NIMS. Hence this is the first step in brain tumour research where machine learning models can be trained on the dataset to not only explore regional and ethnic disease variations but also enhance diagnostic precision by identifying cancer subtypes too. “The diagnosis of brain tumors is now largely based on molecular genetics. Plus, the number of techniques required for giving an accurate WHO diagnosis of brain tumors is ever increasing with pathologists having to standardize and implement these in routine practice. AI in brain tumors can bridge this gap to diagnose the molecular abnormalities to make a cost effective and accurate diagnosis,” says Dr. Megha. Besides, with the current shortage of specialized neuropathologists in all parts of the country, the presence of AI can aid the peripheral institutes and hospitals in availing help from specialized doctors through digital pathology.
While a beginning has been made by curating a dataset on brain tumours, efforts are underway to expand the dataset to include other cancers such as breast cancer, lung cancer, colorectal, oral and cervical cancers. NIMS is also contributing to curating the dataset on lung cancer.
Dataset on Lupus Nephritis
In addition to the cancer datasets, the project has also compiled another on lupus nephritis. Lupus is a kidney disease that occurs when the immune system attacks the kidneys. “It’s an autoimmune disease that disproportionately affects women in India and there’s a high incidence of it in Telangana with a sizable number of patients approaching NIMS for treatment,” says Prof. Vinod. He adds that in order to understand the different classes of the disease and prescribe appropriate treatment plans, a high level of expertise is required of nephropathologists of whom there are very few in India. With NIMS being one of the centres with such skilled personnel, the idea was to come up with a diagnostic tool to help them interpret the slides and classify disease which directs the appropriate treatment in these patients. “AI also helps in overcoming the problem of interobserver variations in class subtyping of lupus nephritis,” observes Dr. Megha Uppin.
AI To Predict Molecular Changes
While subtyping and grading of the cancers are routine and time-consuming tasks traditionally performed by histopathologists, tasks that cannot be performed by human observers are predicting molecular markers from H&E slide images. “The pathologist can’t see what are the underlying molecular changes happening at the DNA level which gets reflected in the tissue morphology,” says Prof. Vinod explaining that there’s a correlation between the DNA alterations and the morphological changes that are witnessed at the tissue level. Molecular profile details are traditionally obtained from genetic labs where such testing is done or by performing immunohistochemistry (IHC). The group has instead attempted to predict molecular details with the help of tissue morphology itself using H&E slide images. One such effort involves predicting IDH mutations which plays a critical role in the diagnosis and prognosis of brain tumor patients.
Why Histopathological Datasets Are Vital
The relevance of IPD is manifold. The open source nature of the dataset makes it an excellent resource for other researchers who are looking to analyse data or willing to go further and create new AI models. Terming it as one of first few instances of open source medical data from India for “human good”, Prof. Vinod says that a second whole slide scanner system has also been set up at IIITH campus and is available for anyone who wishes to use it. There are dental colleges as well as corporate hospitals currently using the scanner. The dataset itself can also double up as a valuable aid in education. “For example, a pathologist who’s studying an MD in Pathology can use these digitised histopathological images to get an in-depth understanding,” he says. Speaking about the future, Prof. Vinod mentions that many more such datasets are in the pipeline with the dataset on breast cancer currently in progress. “We are also working to include many other collaborators. What’s unique about this project is that it’s India-specific. Up until now, researchers had to rely on datasets like the TCGA (The Cancer Genome Atlas), which are based on the U.S. population, for histopathology studies. This is truly the first-of-its-kind for India.” On the whole, IPD serves as a complete database for teaching, learning and research.