[month] [year]

Komallapalli Kaushik – Dynamic Databases

Komallapalli Kaushik received his MS Dual Degree in  Computer Science and Engineering (CSE). His research work was supervised by Prof. P. Krishna Reddy. Here’s a summary of his research work on A Comprehensive Incremental Framework to Extract Coverage Patterns from Dynamic Databases:

Pattern mining is an important task of data mining and involves the extraction of interesting associations from large databases. In the literature, pattern mining approaches have been investigated to extract the different kinds of patterns such as frequent patterns, sequential patterns, periodic patterns, utility patterns and coverage patterns from transactional databases. The pattern mining approaches have been widely employed to improve the performance of applications in the areas of recommendation systems, e-commerce, customer relation management, internet advertising, decision support systems, social media and so on. One of the key issues of developing pattern mining algorithms is as follows: a pattern mining algorithm has to process a huge volume of transactional dataset. Research efforts are going on to develop pattern mining algorithms to operate on large datasets by exploiting different parallel processing paradigms, namely shared memory, shared disk, shared-nothing, GPUs, incremental, MapReduce and Grid. 

Typically, a given transactional database D gets updated due to the addition and/or deletion of transactions. Consequently, some of the previously discovered patterns may become invalid, while some new patterns may emerge. To extract the new knowledge we should mine the database every time it gets updated. One of the main issues is to mine huge Dynamic Databases for every time instant from scratch is computationally costly. The concept of incremental mining is evolved to solve the issue of mining patterns for every instant efficiently for Dynamic Databases and has become an active research area. When the database is updated, with Incremental mining, there is an opportunity to improve the performance by limiting the computation to updated parts of the database and with the required additional processing. In the literature, Incremental mining approaches have been investigated to extract different kinds of patterns such as frequent patterns, sequential patterns, periodic patterns and utility patterns from transactional databases. In this thesis, we propose an improved approach for extracting Coverage Patterns for Dynamic Databases under the incremental paradigm.

Given a transactional database, the coverage pattern mining involves the extraction of multiple sets of items, where each set (covers a certain percentage of transactions) satisfies relative frequency, coverage support and the overlap ratio constraints. The applicability of coverage patterns has been demonstrated in improving the performance of search engine advertising, banner advertising and visibility mining. In the literature, a level-wise iterative coverage pattern mining (CPM) and a pattern growth approach have been proposed.

In this thesis, we propose a Comprehensive Coverage Pattern Mining (CCPM) approach, for efficiently extracting Coverage Patterns under the incremental paradigm, when a set of transactions are added only, set of transactions are deleted only and both set of transactions are added and deleted from the original database. For each case, we have identified a set of rules by identifying the potential opportunities to avoid unnecessary processing as compared to the rudimentary option of the processing from the scratch. Furthermore, it has been observed that, when a set of transactions are added and/or deleted from the original database, if the order of items in the frequency list (Flist) of the database is changed, additional patterns have to be validated and require additional processing. Based on the rules and by considering the issue of change in the order of items in the Flist, we have proposed improved approaches to extract coverage patterns, by considering the case of (i) addition of transactions (ii) deletion of transactions and (iii) addition and deletion of transactions. We have also performed extensive experiments on two real click-stream datasets and one synthetic dataset and demonstrated the overall effectiveness of our proposed approach.