Sagar Sandeep Joshi - Content Generation -

Sagar Sandeep Joshi received his Master of Science in Computer Science and Engineering (CSE). His research work was supervised by Prof. Vasudeva Varma. Here’s a summary of his research work on Methods in Legal Contractual Content Generation:

In recent years there has been an increase in the generation of health care documents such as clinical trials, discharge summaries, and Electronic Health Records(EHRs). These documents contain a lot of actionable data buried in them. Actionable data includes a set of events and activities that occur in health care processes. This valuable information has led to an increased scope for research on biomedical literature. However, most of the data reside in the form of free text which makes it difficult to extract useful information. The thesis develops methods to automatically extract semantics from the health care documents in an effort to check conformance of the treatment processes with standard treatment Guidelines. Discharge summary is one of the major sources of information about the treatment process. A discharge summary contains information about a patient’s one or more encounters with health care service providers, stored electronically to share across different stakeholders in the health care system. The discharge summary has summarized information that includes a wide range of information like chief complaint, physical examination, vital signs, lab test results, recommended medications, and discharge status. Text analytics over these documents has various applications and would help the caregivers to provide better healthcare. The main theme of the the thesis is to automatically extract medical semantics from discharge summaries. We focus on semantics such as medical entities, attributes of medical entities and relationships between these entities. Further, we illustrate an application of these semantics on a Treatment Process Conformance checking use case. Automatically identifying medical entities from biomedical literature is referred to as Biomedical Named Entity Recognition (BMNER). BMNER is one of the important tasks in the field of biomedical text mining. Most of the prior works on BMNER were based on feature-dependent machine learning techniques and focused only on continuous named entities and not discontinuous entities. Discontinuous entities are comprised of two or more non-consecutive components. Traditional BIO tagging schema is unable to tag sentences with discontinuous entities. In this thesis, we propose a novel systematic BIODT tagging schema to identify both continuous and discontinuous named entities. We explore deep learning models that require limited feature engineering for tagging the entities. Our results illustrate that our BIODT tagging schema performs better than traditional BIO and other tagging schemas and overcomes label sparsity problem in identifying both continuous and discontinuous biomedical entities. We also show that our neural network model with BIODT tagging schema has shown superior performance than state-of-the-art methods on CLEF 2013 and SemEval 2013 datasets which were based on feature-dependent machine learning techniques. Mere medical entities cannot give enough information for understanding the condition of the patient. In a given context, characteristic of a medical entity is based on different attributes like temporal information, severity, and progression of the disease. In this work, we consider ten attributes that allow us to understand the main details regarding the condition of the patient. They are Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression. In this thesis, we present a methodology with rule-based and machine learning approaches to identify each of these attributes. We evaluate our methodology on ShARe/CLEF eHealth Evaluation Lab 2014 Challenge dataset on attribute level and system-level accuracy. Mining relationships between treatment(s), test(s), and medical problem(s) is vital in the biomedical domain. This helps in various applications such as decision support systems, safety surveillance, and new treatment discovery. In this thesis, we propose a deep learning approach that utilizes both word-level and sentence-level representations to extract the relationships between treatment and problem. While deep learning techniques demand a large amount of data for training, we make use of a rule-based system particularly for relationship classes with fewer samples. Our final relations are derived by jointly combining the results from deep learning and rule-based models. Our system achieved a promising performance on the relationship classes of I2b2 2010 relation extraction task.

Finally, we employ the above pipeline of tasks such as Biomedical Named Entity Recognition, Medical Entity Attribute extraction, and Medical Relation Extraction on a single dataset and leverage the extracted information for the Conformance Checking use case. Conformance checking requires the extracted medical entities and relationships to be structured as a treatment process present in the discharge summary. We propose a workflow representation of the patient’s discharge summary which is referred to as a workflow instance. The goal is to check the conformance of the workflow instance against the standard treatment plan. Standard treatment plans are extracted from the treatment guidelines provided in healthcare sources, such as the National Comprehensive Cancer Network, WebMD, and Mayo Clinic. For each disease, these guidelines are curated, aggregated, and represented as a workflow specification.

We present multiple measures to compute the conformance of workflow instance with workflow specification. We validate our end-to-end pipeline from extracting semantics to conformance checking on discharge summary data of three diseases, namely, colon cancer, coronary artery disease, and brain tumor, collected from THYME corpus and MIMIC III clinical database. Our approach and the solution can be used by hospitals and patients to determine adherence, gaps, and additions to standard treatment plans. Further, our work can facilitate identifying common errors and goodness in the actual enactment of treatment plans, which can further lead to refinement of standard treatment plans.

June 2023

Sagar Sandeep Joshi – Content Generation