Akshaya Karthikeyan received her MS Dual Degree in Computational Natural Sciences (CNS). Her research work was supervised by Prof. Deva Priyakumar. Here’s a summary of her research work on Machine learning for pulmonology: disease prognosis and evaluation metric for generated data:
Artificial Intelligence (AI) has seen a significant growth in the past decade in various wide ranging domains. Machine learning (ML) methods have been applied in healthcare for various tasks like disease diagnosis, disease progression and outcome prediction, etc. ML aids healthcare systems by executing medical tasks with expert-level performance in lesser time, thereby enabling optimized resource allocation, targeted care and accelerated decision-making. This thesis presents two different applications of AI in pulmonology, a domain of healthcare that deals with respiratory diseases.
The first application of ML in pulmonology is about how ML is employed for early mortality prediction of COVID-19 patients. The coronavirus disease 2019 (COVID-19), caused by the virus SARS-CoV-2, is an acute respiratory disease that has been classified as a pandemic by the World Health Organization (WHO). The sudden spike in the number of infections and high mortality rates have put immense pressure on the public healthcare systems. Hence, it is crucial to identify the key factors for mortality prediction to optimize patient treatment strategy. Since routine blood test results are more widely available compared to other forms of data like X-rays, CT-scans and ultrasounds, this thesis focuses on ML methods based on blood tests data to predict COVID-19 mortality risk. A powerful combination of five features: neutrophils, lymphocytes, lactate dehydrogenase (LDH), high-sensitivity C-reactive protein (hs-CRP) and age helps to predict mortality with 96% accuracy. Various ML models (neural networks, logistic regression, XGBoost, random forests, SVM and decision trees) have been trained and their performances have been compared to determine the model that achieves consistently high accuracy across the days that span the disease. The best performing method using XGBoost feature importance and neural network classification, predicts with an accuracy of 90% as early as 16 days before the
outcome. Robust testing with three cases based on days to outcome confirms the strong predictive performance and practicality of the proposed model. A detailed analysis and identification of trends was performed using these key biomarkers to provide useful insights for intuitive application. This thesis provide solutions that would help accelerate the decision-making process in healthcare systems for focused medical treatments in an accurate, early and reliable manner.
In the second application of ML in pulmonology, we devise an evaluation metric called FXD score to evaluate synthetic Chest X-Ray (CXR) generation methods. In addition to the role of CXRs in diagnosing multiple disease conditions, their importance has immensely increased during the COVID-19 pandemic. Generative models are widely used for data augmentation in solving tasks with insufficient data. Evaluating generative models is inherently a challenging problem. Frechet Inception Distance (FID) and other sample based evaluation metrics compare the distributions of real and generated images in the latent space of models pre-trained on the ImageNet dataset, for instance the pre-trained Inception v3 (Iv3) model. These metrics work well in analyzing the quality of synthetic images of common entities, but generally fail to analyze quality of images from niche domains. In this thesis, we propose an improved evaluation metric for evaluating the quality of synthetic CXR images generated – Frechet Distance that is calculated using the pre-trained TorchXRayVision (XRV) model’s representations (FXD score).
We perform a series of carefully designed experiments for evaluating the necessary conditions for a meaningful metric. Through these experiments, we show that the proposed method has more discriminability, robustness to transformations, and is more sensitive to mode dropping, mode collapsing and overfitting. Therefore, FXD score is proposed as a better alternative to metrics based on Iv3 embeddings. We demonstrate the importance of choosing domain specific representations in evaluating the quality of synthetic images by analyzing Chest X-Rays (CXRs). The FXD score will enable researchers in building generation models more efficiently which will in turn aid in analyzing infected patients more accurately