Data Science and Analytics Centre (DSAC) organised a talk by Prof. Philippe Fournier-Viger, Harbin Institute of Technology, Shenzhen, China on Algorithms to Discover High Utility Patterns in Symbolic Data on 15 December.
A large amount of data is collected daily by retail and online stores about transactions made by customers. This data can be viewed as symbolic data where items are products purchased by customers. Analyzing customer transactions can reveal interesting patterns that can be used for decision making. A traditional way of discovering patterns in symbolic data is to apply algorithms to discover frequent patterns, which represent sets of values appearing frequently in data (e.g. products frequently purchased together by customers). Although this model has been widely applied and used to analyze data and many other applications, it relies on the unrealistic assumption that a pattern appearing frequently in a database is interesting. But in real-life, other measures of interest are more suitable such as the profit yield by patterns.
To address this issue, a lot of attention has been recently given to the task of discovering high utility patterns. It consists of discovering the sets of items (products or values), which yield a high profit (or have a high importance) when purchased (appearing) together. Although many algorithms have been designed for identifying high utility itemsets in transactions, many of those algorithms have important limitations such as not considering the time dimension and finding itemsets containing items that are weakly correlated. In this talk, Prof. Philippe discussed the problem of high utility itemset mining and extensions that have recently proposed to discover more interesting patterns such as periodic high utility patterns (patterns representing recurring customer behavior that yield a high profit), peak high utility itemsets (sets of products that yield a high profit during a specific time period, e.g. Christmas), and the problem of discovering correlated items that yield a high profit. Finally, he briefly mentioned other problems related to the discovery of high utility patterns and also mentioned about the SPMF data mining library.
Prof. Philippe Fournier-Viger is a Canadian researcher, at the Harbin Institute of Technology, Shenzhen, China and an adjunct professor at the University of Moncton, Moncton, Canada. He has received the title of national talent from the National Science Foundation of China. His research interests include data mining, frequent pattern mining, sequence analysis and prediction, big data, and applications. He has published more than 250 research papers in refereed international conferences and journals, which have received more than 4,500 citations. He is the founder of the SPMF open-source data mining library (http://www.philippe-fournier-viger.com/spmf/, which has been used in more than 630 research papers since 2010. He is also editor-in-chief of the Data Mining and Pattern Recognition journal, is co-organizer of the workshop on utility mining at KDD 2018 and IEEE ICDM 2019, and editor of the book “High Utility Pattern Mining: Theory, Algorithms and Applications” published by Springer in 2019.