Factorized weight interaction neural networks for sparse feature prediction

Abstract

Non-contiguous and categorical sparse feature data are widely existed on the Internet. To build a machine learning system with these data, it is important to properly model the interaction among features. In this paper, we propose a factorized weight interaction neural network (INN) with a new network structure called weight-interaction layer to learn patterns from feature interactions and factorized weight parameters of each feature interaction. The proposed INN can greatly reduce the dimension of sparse data via the weight-interaction layer, while the multi-layer neural network can be used to capture high-order feature latent patterns. Our experimental results on two real datasets show that the proposed method is able to effectively improve the prediction accuracy and generalization performance of the model, and consistently outperform related methods to be compared.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Baltrunas L, Church K, Karatzoglou A, Oliver N (2015) Frappe: understanding the usage and perception of mobile app recommendations in-the-wild. arXiv preprint arXiv:1505.03014

  2. 2.

    Bayer I, He X, Kanagal B, Rendle S (2017) A generic coordinate descent framework for learning from implicit feedback. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 1341–1350

  3. 3.

    Chapelle O, Manavoglu E, Rosales R (2015) Simple and scalable response prediction for display advertising. ACM Trans Intell Syst Technol (TIST) 5(4):61

    Google Scholar 

  4. 4.

    Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794

  5. 5.

    Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems. ACM, pp 7–10

  6. 6.

    Cui Y, Zhang R, Li W, Mao J (2011) Bid landscape forecasting in online ad exchange marketplace. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 265–273

  7. 7.

    Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  8. 8.

    Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(Jul):2121–2159

    MathSciNet  MATH  Google Scholar 

  9. 9.

    Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress, Madinson

    Google Scholar 

  10. 10.

    Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649

  11. 11.

    Guo H, Tang R, Ye Y, Li Z, He X (2017) DeepFM: a factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1725–1731

  12. 12.

    Hand DJ, Yu K (2001) Idiot’s bayes—Not so stupid after all? Int Stat Rev 69(3):385–398

    MATH  Google Scholar 

  13. 13.

    Harper FM, Konstan JA (2016) The movielens datasets: history and context. ACM Trans Interact Intell Syst (tiis) 5(4):19

    Google Scholar 

  14. 14.

    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. 15.

    He X, Chua TS (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 355–364

  16. 16.

    He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, et al (2014) Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the eighth international workshop on data mining for online advertising. ACM, pp 1–9

  17. 17.

    He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 173–182

  18. 18.

    Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  19. 19.

    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  20. 20.

    Juan Y, Zhuang Y, Chin WS, Lin CJ (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 43–50

  21. 21.

    Juan Y, Lefortier D, Chapelle O (2017) Field-aware factorization machines in a real-world online advertising system. In: Proceedings of the 26th international conference on world wide web companion, international world wide web conferences steering committee, pp 680–688

  22. 22.

    Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154

  23. 23.

    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  24. 24.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  25. 25.

    McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, et al (2013) Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1222–1230

  26. 26.

    Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 141–149

  27. 27.

    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  28. 28.

    Pan J, Xu J, Ruiz AL, Zhao W, Pan S, Sun Y, Lu Q (2018) Field-weighted factorization machines for click-through rate prediction in display advertising. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 1349–1357

  29. 29.

    Punjabi S, Bhatt P (2018) Robust factorization machines for user response prediction. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 669–678

  30. 30.

    Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1149–1154

  31. 31.

    Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X (2018) Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst (TOIS) 37(1):5

    Google Scholar 

  32. 32.

    Rendle S (2010) Factorization machines. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 995–1000

  33. 33.

    Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol (TIST) 3(3):57

    Google Scholar 

  34. 34.

    Rendle S, Gantner Z, Freudenthaler C, Schmidt-Thieme L (2011) Fast context-aware recommendations with factorization machines. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 635–644

  35. 35.

    Reshma R, Sowmya V, Soman K (2018) Effect of Lgendre–Fenchel denoising and SVD-based dimensionality reduction algorithm on hyperspectral image classification. Neural Comput Appl 29(8):301–310

    Article  Google Scholar 

  36. 36.

    Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 521–530

  37. 37.

    Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J (2016) Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 255–262

  38. 38.

    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  39. 39.

    Ta AP (2015) Factorization machines with follow-the-regularized-leader for CTR prediction in display advertising. In: 2015 IEEE international conference on big data (big data). IEEE, pp 2889–2891

  40. 40.

    Xiao J, Ye H, He X, Zhang H, Wu F, Chua TS (2017) Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv preprint arXiv:1708.04617

  41. 41.

    Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531. https://doi.org/10.1109/TII.2016.2605629

    Article  Google Scholar 

  42. 42.

    Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 1–20

  43. 43.

    Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data. In: European conference on information retrieval. Springer, pp 45–57

  44. 44.

    Zhou G, Song C, Zhu X, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2017) Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61573316, 61873082 and U1509207.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Weiguo Sheng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zou, D., Sheng, M., Yu, H. et al. Factorized weight interaction neural networks for sparse feature prediction. Neural Comput & Applic 32, 9567–9579 (2020). https://doi.org/10.1007/s00521-019-04470-9

Download citation

Keywords

  • Neural network
  • Sparse data
  • Factorization machine
  • Feature interaction