Factorized weight interaction neural networks for sparse feature prediction

Zou, Dafang; Sheng, Mengmeng; Yu, Hui; Mao, Jiafa; Chen, Shengyong; Sheng, Weiguo

doi:10.1007/s00521-019-04470-9

Factorized weight interaction neural networks for sparse feature prediction

Original Article
Published: 16 September 2019

Volume 32, pages 9567–9579, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Dafang Zou¹,
Mengmeng Sheng¹,
Hui Yu¹,
Jiafa Mao¹,
Shengyong Chen¹ &
…
Weiguo Sheng ORCID: orcid.org/0000-0001-9680-5126²

510 Accesses
5 Citations
Explore all metrics

Abstract

Non-contiguous and categorical sparse feature data are widely existed on the Internet. To build a machine learning system with these data, it is important to properly model the interaction among features. In this paper, we propose a factorized weight interaction neural network (INN) with a new network structure called weight-interaction layer to learn patterns from feature interactions and factorized weight parameters of each feature interaction. The proposed INN can greatly reduce the dimension of sparse data via the weight-interaction layer, while the multi-layer neural network can be used to capture high-order feature latent patterns. Our experimental results on two real datasets show that the proposed method is able to effectively improve the prediction accuracy and generalization performance of the model, and consistently outperform related methods to be compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Learning Sparse Neural Networks with Identity Layers

Double Regularization-Based RVFL and edRVFL Networks for Sparse-Dataset Classification

SRS-DNN: a deep neural network with strengthening response sparsity

Article 26 June 2019

References

Baltrunas L, Church K, Karatzoglou A, Oliver N (2015) Frappe: understanding the usage and perception of mobile app recommendations in-the-wild. arXiv preprint arXiv:1505.03014
Bayer I, He X, Kanagal B, Rendle S (2017) A generic coordinate descent framework for learning from implicit feedback. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 1341–1350
Chapelle O, Manavoglu E, Rosales R (2015) Simple and scalable response prediction for display advertising. ACM Trans Intell Syst Technol (TIST) 5(4):61
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems. ACM, pp 7–10
Cui Y, Zhang R, Li W, Mao J (2011) Bid landscape forecasting in online ad exchange marketplace. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 265–273
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(Jul):2121–2159
MathSciNet MATH Google Scholar
Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress, Madinson
Google Scholar
Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649
Guo H, Tang R, Ye Y, Li Z, He X (2017) DeepFM: a factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1725–1731
Hand DJ, Yu K (2001) Idiot’s bayes—Not so stupid after all? Int Stat Rev 69(3):385–398
MATH Google Scholar
Harper FM, Konstan JA (2016) The movielens datasets: history and context. ACM Trans Interact Intell Syst (tiis) 5(4):19
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He X, Chua TS (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 355–364
He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, et al (2014) Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the eighth international workshop on data mining for online advertising. ACM, pp 1–9
He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 173–182
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Juan Y, Zhuang Y, Chin WS, Lin CJ (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 43–50
Juan Y, Lefortier D, Chapelle O (2017) Field-aware factorization machines in a real-world online advertising system. In: Proceedings of the 26th international conference on world wide web companion, international world wide web conferences steering committee, pp 680–688
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, et al (2013) Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1222–1230
Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 141–149
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pan J, Xu J, Ruiz AL, Zhao W, Pan S, Sun Y, Lu Q (2018) Field-weighted factorization machines for click-through rate prediction in display advertising. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 1349–1357
Punjabi S, Bhatt P (2018) Robust factorization machines for user response prediction. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 669–678
Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1149–1154
Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X (2018) Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst (TOIS) 37(1):5
Google Scholar
Rendle S (2010) Factorization machines. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 995–1000
Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol (TIST) 3(3):57
Google Scholar
Rendle S, Gantner Z, Freudenthaler C, Schmidt-Thieme L (2011) Fast context-aware recommendations with factorization machines. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 635–644
Reshma R, Sowmya V, Soman K (2018) Effect of Lgendre–Fenchel denoising and SVD-based dimensionality reduction algorithm on hyperspectral image classification. Neural Comput Appl 29(8):301–310
Article Google Scholar
Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 521–530
Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J (2016) Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 255–262
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Ta AP (2015) Factorization machines with follow-the-regularized-leader for CTR prediction in display advertising. In: 2015 IEEE international conference on big data (big data). IEEE, pp 2889–2891
Xiao J, Ye H, He X, Zhang H, Wu F, Chua TS (2017) Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv preprint arXiv:1708.04617
Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531. https://doi.org/10.1109/TII.2016.2605629
Article Google Scholar
Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 1–20
Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data. In: European conference on information retrieval. Springer, pp 45–57
Zhou G, Song C, Zhu X, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2017) Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61573316, 61873082 and U1509207.

Author information

Authors and Affiliations

Zhejiang University of Technology, Hangzhou, 310023, China
Dafang Zou, Mengmeng Sheng, Hui Yu, Jiafa Mao & Shengyong Chen
Hangzhou Normal University, Hangzhou, 311121, China
Weiguo Sheng

Authors

Dafang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Mengmeng Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiafa Mao
View author publications
You can also search for this author in PubMed Google Scholar
Shengyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiguo Sheng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, D., Sheng, M., Yu, H. et al. Factorized weight interaction neural networks for sparse feature prediction. Neural Comput & Applic 32, 9567–9579 (2020). https://doi.org/10.1007/s00521-019-04470-9

Download citation

Received: 12 February 2019
Accepted: 27 August 2019
Published: 16 September 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00521-019-04470-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Factorized weight interaction neural networks for sparse feature prediction

Abstract

Access this article

Similar content being viewed by others

Learning Sparse Neural Networks with Identity Layers

Double Regularization-Based RVFL and edRVFL Networks for Sparse-Dataset Classification

SRS-DNN: a deep neural network with strengthening response sparsity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Factorized weight interaction neural networks for sparse feature prediction

Abstract

Access this article

Similar content being viewed by others

Learning Sparse Neural Networks with Identity Layers

Double Regularization-Based RVFL and edRVFL Networks for Sparse-Dataset Classification

SRS-DNN: a deep neural network with strengthening response sparsity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation