Neha Soorma received her MS Dual Degree in Computer Science and Engineering (CSE). Her research work was supervised by Prof. Kamal Karlapalem. Here’s a summary of her research work on A Workflow Driven Approach For Knowledge Discovery Process:
As we march into the digital information age, data overload is one of the most significant concerns in almost all domains. Massive volumes of data generate and accumulate every day. With the rise in data volumes arise the need for sophisticated computational techniques and tools to support the extraction of useful knowledge from data. This data can only provide value if we can extract hidden patterns and knowledge from it and utilize it to achieve business or operational goals. The field of Knowledge Discovery in Databases (KDD) aims at finding structures and meaningful information from raw unstructured data. KDD is an iterative process consisting of multiple phases and generally requires domain knowledge and data analysis expertise. With the inclusion of information technology in all spheres of life, significant work has been done in the field of KDD to utilize computational techniques.
Today we have a plethora of software systems to support KDD. These systems provide the ability for KDD specification and execution with an interactive user interface or API integration for external systems. However, most of these systems support the KDD process in a fixed, predetermined manner. The specification and execution procedures are coded as a fixed set of rules/algorithms, lacking the much-needed dynamism in the so-called fuzzy, complicated knowledge discovery process. We propose a solution that can ease this stringent process by using the concept of meta workflows. We show that the KDD process can be modeled as a workflow, which can then be coupled with Meta Workflows leading to a flexible KDD management system. We allow KDD specification and execution procedures to be dynamically modifiable at runtime by using the concept of meta workflows. In our work, we present a Meta Workflow based solution for KDD specification and execution. This is done using two Meta Workflows – Meta Specification Workflow and Meta Execution Workflow for KDD. A Meta Specification workflow captures the process to specify a KDD process. Similarly, a Meta Execution workflow defines the execution engine of a KDD process. The Meta Execution workflow instances couple itself with KDD workflow instances and executes them.
We build a prototype Web user interface-based KDD system by utilizing the flexibility provided by Meta Workflow logic combined with the power of existent KDD tools like RapidMiner. We establish adaptable execution logic by maintaining multiple Meta Execution Workflows in our system, which provides flexibility, automation, and exception handling in KDD specification and execution. We use a Task Scheduler, which continuously looks out for any ready tasks and executes them by coupling them with Meta Execution workflow instances. We propose the execution algorithms used in our system to support this model. We also demonstrate the working of our solution with real-life scenarios using KDD and how our approach works in these cases.