Abstract
Click-Through Rate (CTR) prediction is one of the key techniques in computational advertising. At present, CTR prediction is commonly conducted by linear models combined with \(L_1\) regularization, which is based on previous feature engineering including feature normalization and cross combination. In this case, the model cannot realize automatic feature learning. This paper uses the ensemble method for reference and proposes a feature selection algorithm based on gradient boosting. The algorithm employs the methods of Gradient Boosting Decision Tree (GBDT) and Logistic Regression (LR), and further conducts a positive analysis in the data set of kaggle-CTR prediction on display ads. The experimental result verifies the feasibility and validity of feature selection method. Moreover, it improves the performance of CTR prediction model, whose AUC value reaches 0.908.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. 5(4), 1–34 (2015)
Chen, Y., Yan, T.: Position-normalized click prediction in search advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 795–803 (2012)
Chakrabarti, D., Agarwal, D., Josifovski, V.: Contextual advertising by combining relevance with click feedback. In: Proceedings of the 17th International Conference on World Wide Web, pp. 417–426 (2008)
Zhang, W., Jones, R.: Comparing click logs, editorial labels for training query rewriting. In: WWW 2007 Workshop on Query Log Analysis: Social and Technological Challenges (2007)
Yue, K., Wang, C., Zhu, Y.L., Liu, W.Y.: Click-through rate prediction of online advertisements based on probabilistic graphical model. J. East Chin. Normal Univ. 53(3), 15–25 (2013)
Duchi, J., Jordan, M., McMahan, B.: Estimation, optimization, and parallelism when data is sparse. In: Advances in Neural Information Processing Systems, pp. 2832–2840 (2013)
Shen, S., Hu, B., Chen, W., Yang, Q.: Personalized click model through collaborative filtering. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 323–332 (2012)
Zhou, Z.-H., Methods, E.: Foundations and Algorithms, 1st edn. Chapman & Hall/CRC, New York (2012)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1994)
Yuan, G.X., Ho, C.H., Lin, C.J.: An improved glmnet for l1-regularized logistic regression. J. Mach. Learn. Res. 13, 1999–2030 (2012)
Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31(1), 1–38 (2004)
Lobo, J.M., Jimnez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision tree ensemble: small heterogeneous is better than large homogeneous. In: Seventh IEEE International Conference on Machine Learning and Applications, pp. 900–905 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, Z., Yu, Q., Shen, C., Hu, W. (2016). Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2016. IDEAL 2016. Lecture Notes in Computer Science(), vol 9937. Springer, Cham. https://doi.org/10.1007/978-3-319-46257-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-46257-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46256-1
Online ISBN: 978-3-319-46257-8
eBook Packages: Computer ScienceComputer Science (R0)