Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting

Wang, Zheng; Yu, Qingsong; Shen, Chaomin; Hu, Wenxin

doi:10.1007/978-3-319-46257-8_15

Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting

Zheng Wang²¹,
Qingsong Yu²¹,
Chaomin Shen²¹ &
…
Wenxin Hu²¹

Conference paper
First Online: 13 September 2016

2201 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9937))

Abstract

Click-Through Rate (CTR) prediction is one of the key techniques in computational advertising. At present, CTR prediction is commonly conducted by linear models combined with \(L_1\) regularization, which is based on previous feature engineering including feature normalization and cross combination. In this case, the model cannot realize automatic feature learning. This paper uses the ensemble method for reference and proposes a feature selection algorithm based on gradient boosting. The algorithm employs the methods of Gradient Boosting Decision Tree (GBDT) and Logistic Regression (LR), and further conducts a positive analysis in the data set of kaggle-CTR prediction on display ads. The experimental result verifies the feasibility and validity of feature selection method. Moreover, it improves the performance of CTR prediction model, whose AUC value reaches 0.908.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. 5(4), 1–34 (2015)
Article Google Scholar
Chen, Y., Yan, T.: Position-normalized click prediction in search advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 795–803 (2012)
Google Scholar
Chakrabarti, D., Agarwal, D., Josifovski, V.: Contextual advertising by combining relevance with click feedback. In: Proceedings of the 17th International Conference on World Wide Web, pp. 417–426 (2008)
Google Scholar
Zhang, W., Jones, R.: Comparing click logs, editorial labels for training query rewriting. In: WWW 2007 Workshop on Query Log Analysis: Social and Technological Challenges (2007)
Google Scholar
Yue, K., Wang, C., Zhu, Y.L., Liu, W.Y.: Click-through rate prediction of online advertisements based on probabilistic graphical model. J. East Chin. Normal Univ. 53(3), 15–25 (2013)
Google Scholar
Duchi, J., Jordan, M., McMahan, B.: Estimation, optimization, and parallelism when data is sparse. In: Advances in Neural Information Processing Systems, pp. 2832–2840 (2013)
Google Scholar
Shen, S., Hu, B., Chen, W., Yang, Q.: Personalized click model through collaborative filtering. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 323–332 (2012)
Google Scholar
Zhou, Z.-H., Methods, E.: Foundations and Algorithms, 1st edn. Chapman & Hall/CRC, New York (2012)
Google Scholar
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1994)
MathSciNet MATH Google Scholar
Yuan, G.X., Ho, C.H., Lin, C.J.: An improved glmnet for l1-regularized logistic regression. J. Mach. Learn. Res. 13, 1999–2030 (2012)
MathSciNet MATH Google Scholar
Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31(1), 1–38 (2004)
MathSciNet Google Scholar
Lobo, J.M., Jimnez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
Article Google Scholar
Gashler, M., Giraud-Carrier, C., Martinez, T.: Decision tree ensemble: small heterogeneous is better than large homogeneous. In: Seventh IEEE International Conference on Machine Learning and Applications, pp. 900–905 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, East China Normal University, Shanghai, 200062, China
Zheng Wang, Qingsong Yu, Chaomin Shen & Wenxin Hu

Authors

Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingsong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Chaomin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Wang .

Editor information

Editors and Affiliations

University of Manchester, Manchester, United Kingdom
Hujun Yin
Nanjing University, Nanjing, China
Yang Gao
Yangzhou University, Yangzhou, Jiangsu, China
Bin Li
Aeronautics and Astronautics, Nanjing University Aeronautics and Astronautics, Nanjing, China
Daoqiang Zhang
Nanjing Normal University, Nanjing, China
Ming Yang
Yangzhou University, Yangzhou, Jiangsu, China
Yun Li
Ostfalia University of Applied Sciences, Wolfenbüttel, Germany
Frank Klawonn
University of Seville, Seville, Spain
Antonio J. Tallón-Ballesteros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Yu, Q., Shen, C., Hu, W. (2016). Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2016. IDEAL 2016. Lecture Notes in Computer Science(), vol 9937. Springer, Cham. https://doi.org/10.1007/978-3-319-46257-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-46257-8_15
Published: 13 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46256-1
Online ISBN: 978-3-319-46257-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics