Abstract
Digital marketing strategies can help businesses achieve better Return on Investment (ROI). Big data and predictive modelling are key to identifying these specific customers. Yet the very rich and mostly irrelevant attributes(features) will adversely affect the predictive modelling performance, both computationally and qualitatively. So selecting relevant features is a crucial task for marketing applications. The feature selection process is very time consuming due to the large amount of data and high dimensionality of features. In this paper, we propose to reduce the computation time through regularizing the feature search process using expert knowledge. We also combine the regularized search with a generative filtering step, so we can address potential problems with the regularized search and further speed up the process. In addition, a progressive sampling and coarse to fine selection framework is built to further lower the space and time requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
C(n) depends on the modelling algorithm. It is linear w.r.t. n for many commonly used algorithms such as logistic regression and random forest [8]. In this case, the complexity term \(C(1)+C(2)+ \ldots +C(n) \propto n^2\). Without loss of generality, we use Eq. 1 to represent the complexity. The derivation in Sect. 4.1 holds either way.
References
Berrendero, J.R., Cuevas, A., Torrecilla, J.L.: The mRMR variable selection method: a comparative study for functional data. J. Stat. Comput. Simul. 86(5), 891–907 (2016)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Deng, K.: Omega: On-line memory-based general purpose system classifier. Ph.D. dissertation, Carnegie Mellon University (1998)
Farahat, A.K., Ghodsi, A., Kamel, M.S.: An efficient greedy method for unsupervised feature selection. In: ICDM, pp. 161–170 (2011)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. 3, 1157–1182 (2003)
Hsu, H.H., Hsieh, C.W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)
Huda, S., Yearwood, J., Stranieri, A.: Hybrid wrapper-filter approaches for input feature selection using maximum relevance-minimum redundancy and artificial neural network input gain measurement approximation. In: ACSC, pp. 43–52 (2011)
Iyer, K.: Computational complexity of data mining algorithms used in fraud detection. Ph.D. dissertation, Pennsylvania State University (2005)
Kotsiantis, S.: Feature selection for machine learning classification problems: a recent overview. Artif. Intell. Rev. 42, 1–20 (2011)
Kroeger, P.R.: Analyzing Grammar: An Introduction. Cambridge University Press, Cambridge (2005)
Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)
Mahdokht, M., Yan, Y., Cui, Y., Dy, J.: Convex principal feature selection. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 619–628 (2010)
Manikandan, P., Venkateswaran, C.J.: Feature selection algorithms: literature review. Smart Comput. Rev. 4(3) (2014)
Minka, T.P.: A comparison of numerical optimizers for logistic regression. Unpublished draft (2003)
Nguyen, X.V., Chan, J., Romano, S., Bailey, J.: Effective global approaches for mutual information based feature selection. In: KDD, pp. 512–521 (2014)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Senliol, B., Gulgezen, G., Yu, L., Cataltepe, Z.: Fast correlation based filter (FCBF) with a different search strategy. In: 23rd International Symposium on Computer and Information Sciences, pp. 1–4 (2008)
Shao, W., He, L., Lu, C., Wei, X., Yu, P.: Online unsupervised multi-view feature selection. In: ICDM, pp. 1203–1208 (2016)
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review (2014)
Tang, J., Hu, X., Gao, H., Liu, H.: Unsupervised feature selection for multi-view data in social media. In: SDM, pp. 270–278 (2013)
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)
Venkateswara, H., Lade, P., Lin, B., Ye, J., Panchanathan, S.: Efficient approximate solutions to mutual information based global feature selection. In: ICDM, pp. 1009–1014 (2015)
Vinzamuri, B., Padthe, K.K., Reddy, C.K.: Feature grouping using weighted l1 norm for high-dimensional data. In: ICDM, pp. 1233–1238. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhang, W., Bose, S., Kobeissi, S., Tomko, S., Challis, C. (2018). Efficient Feature Selection Framework for Digital Marketing Applications. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)