Abhiroop Talasila supervised by Prof. Deva Priyakumar U his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Self-supervision and contrastive learning to enable interpretable out-of-distribution performance:
Early detection and effective management of medical conditions significantly enhance patient outcomes, yet challenges persist due to limited resources, particularly in achieving widespread medical expertise. While supervised deep learning excels in medical image segmentation, the scarcity of labelled data poses significant hurdles for clinical applications. In response, self-supervised learning (SSL) systems have gained traction, leveraging unlabelled medical images to extract essential properties. Among SSL methods, contrastive learning stands out, aiming to minimize the distance between similar images while maximizing it between dissimilar ones, thereby enhancing segmentation accuracy. Current SSL methods exhibit limitations, prompting the advancement of unsupervised learning techniques that leverage data’s inherent structure to learn representations without the need for labelled data. A considerable challenge with unsupervised pre-training is domain shift, which occurs when the distribution of data used for pre-training differs from that used for fine-tuning. Addressing a crucial challenge in thoracic disease diagnosis using X-ray imaging, our work in the second chapter explores the need for anatomically accurate findings and why they shouldn’t be overlooked in interpretability assessments. We propose an innovative self-supervised and weakly supervised pre-training pipeline paired with an auxiliary loss followed by supervised fine-tuning to tackle these limitations effectively. Leveraging the Chest X-ray14 dataset for pre-training and the CheXpert dataset for fine-tuning ensures model stability and generalizability across diverse data sources. Specifically, our approach demonstrates a remarkable 31% improvement in Intersection over Union on the NIH CXR dataset, enabling precise classification across 14 chest X-ray categories. While our focus lies on chest X-rays, our proposed approach extends its applicability to various imaging modalities and clinical workflows, showcasing its broad impact in medical imaging and diagnostics. In our third chapter, we expand the Swin Transformer architecture to learn from multiple medical imaging modalities, enhancing downstream performance. Our novel Swin-FUSE (Swin Multi-Modal Fusion for UnSupervised Enhancement) framework offers several advantages: utilizing CTs and MRIs for pre-training to create complementary feature representations, incorporating a domain-invariance module (DIM) to enhance domain adaptability, and demonstrating high generalizability beyond pre-trained tasks. Empirical results on publicly available 3D segmentation datasets showcase a marginal 1-2% performance trade-o. compared to single-modality models while achieving a remarkable 27% improvement on out-of-distribution modalities. In our fourth chapter, we introduce an attention-based model for the 2023 ISBI X-ray Projectomic Reconstruction Challenge, aimed at fostering exploration into XNH imaging challenges, particularly focusing on the white matter region of the rodent brain. Our proposed AGU-Net model demonstrates enhanced performance in merging segmented regions and producing accurate segmentations with reduced error rates, as evidenced by its higher scores in the Normalized ERL and Rand merge metrics. Our model achieved an outstanding XPRESS score of 0.8159 on the test set, securing 3rd place. In summary, this thesis presents innovative solutions leveraging self-supervised pretraining and contrastive learning to enhance representation learning in medical imaging. These advancements extend to domain adaptation tasks, ultimately enhancing the interpretability of conventional black-box machine learning models, thus paving the way for more reliable and clinically relevant diagnostic tools.
September 2024