Constrained Logistic Regression for Discriminative Pattern Mining
Analyzing differences in multivariate datasets is a challenging problem. This topic was earlier studied by finding changes in the distribution differences either in the form of patterns representing conjunction of attribute value pairs or univariate statistical analysis for each attribute in order to highlight the differences. All such methods focus only on change in attributes in some form and do not implicitly consider the class labels associated with the data. In this paper, we pose the difference in distribution in a supervised scenario where the change in the data distribution is measured in terms of the change in the corresponding classification boundary. We propose a new constrained logistic regression model to measure such a difference between multivariate data distributions based on the predictive models induced on them. Using our constrained models, we measure the difference in the data distributions using the changes in the classification boundary of these models. We demonstrate the advantages of the proposed work over other methods available in the literature using both synthetic and real-world datasets.
KeywordsLogistic regression constrained learning discriminative pattern mining change detection
Unable to display preview. Download preview PDF.
- 2.Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/
- 6.Coleman, T.F., Li, Y.: An interior trust region approach for nonlinear minimizations subject to bounds. Technical Report TR 93-1342 (1993)Google Scholar
- 7.Dai, W., Yang, Q., Xue, G., Yu, Y.: Boosting for transfer learning. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 193–200 (2007)Google Scholar
- 8.Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52 (1999)Google Scholar
- 10.Fang, G., Pandey, G., Wang, W., Gupta, M., Steinbach, M., Kumar, V.: Mining low-support discriminative patterns from dense and high-dimensional data. IEEE Transactions on Knowledge and Data Engineering (2011)Google Scholar
- 14.Hilderman, R.J., Peckham, T.: A statistically sound alternative approach to mining contrast sets. In: Proceedings of the 4th Australasian Data Mining Conference (AusDM), pp. 157–172 (2005)Google Scholar
- 17.Liu, B., Hsu, W., Han, H.S., Xia, Y.: Mining changes for real-life applications. In: Data Warehousing and Knowledge Discovery, Second International Conference (DaWaK) Proceedings, pp. 337–346 (2000)Google Scholar
- 20.Ntoutsi, I., Kalousis, A., Theodoridis, Y.: A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: SIAM International Conference on Data Mining (SDM), pp. 810–821 (2008)Google Scholar
- 21.Odibat, O., Reddy, C.K., Giroux, C.N.: Differential biclustering for gene expression analysis. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (BCB), pp. 275–284 (2010)Google Scholar
- 22.Palit, I., Reddy, C.K., Schwartz, K.L.: Differential predictive modeling for racial disparities in breast cancer. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 239–245 (2009)Google Scholar
- 26.Wang, K., Zhou, S., Fu, A.W.C., Yu, J.X.: Mining changes of classification by correspondence tracing. In: Proceedings of the Third SIAM International Conference on Data Mining (SDM), pp. 95–106 (2003)Google Scholar
- 27.Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 256–265 (2003)Google Scholar