Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors

  • Geng Peng
  • Ying Liu
  • Jiyuan Wang
  • Jifa Gu


Web search query data are obtained to reflect social spots and serve as novel economic indicators. When faced with high-dimensional query data, selecting keywords that have plausible predictive ability and can reduce dimensionality is critical. This paper presents a new integrative method that combines Hurst Exponent (HE) and Time Difference Correlation (TDC) analysis to select keywords with powerful predictive ability. The method is called the HE-TDC screening method and requires keywords with predictive ability to satisfy two characteristics, namely, high correlation and fluctuation memorability similar to the predicting target series. An empirical study is employed to predict the volume of tourism visitors in the Jiuzhai Valley scenic area. The study shows that keywords selected using HE-TDC method produce a model with better robustness and predictive ability.


Tourism visitor volume prediction web-search data HE-TDC method Jiuzhai Valley time series Hurst exponent 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Bangwayo-Skeete, P. F. & Skeete, R. W. (2015). Can Google data improve the forecasting performance of tourist arrivals? mixed-data sampling approach. Tourism Management, 46: 454–464.CrossRefGoogle Scholar
  2. [2]
    Brynjolfsson, E., Geva, T. & Reichman, S. (2015). Crowd-squared: amplifying the predictive power of search trend data. MIS Quarterly (Forthcoming). Available at Cited January 18, 2016.Google Scholar
  3. [3]
    CNNIC. (2014). Statistical Report on the Development of China Internet Network in the Thirty-Fifth Time. China Internet Network Information Center. Available at Cited March 1st, 2015.Google Scholar
  4. [4]
    D. Butler. (2013). When Google got flu wrong. Nature, 494(7436): 155.CrossRefGoogle Scholar
  5. [5]
    Du J., Xu H. & Huang X. (2014). Box office prediction based on microblog. Expert Systems with Applications, 41(4): 1680–1689.CrossRefGoogle Scholar
  6. [6]
    Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012–1014.CrossRefGoogle Scholar
  7. [7]
    Lazer, D., Kennedy, R., King, G. & Vespignani, A. (2014). Big data. The parable of Google flu: traps in big data analysis. Science (NY), 343(6176): 1203.Google Scholar
  8. [8]
    Liu, Y., Chen, Y., Wu, S., Peng, G. & Lv, B. (2015). Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Annals of Operations Research, 234(1): 77–94.MathSciNetCrossRefzbMATHGoogle Scholar
  9. [9]
    Peng, G. & Wang, J.Y. (2014). Detecting syphilis amount in China based on Baidu query data. In: International Conference on Soft Computing in Information Communication Technology (SCICT 2014), Atlantis Press.Google Scholar
  10. [10]
    Preis, T., Moat, H.S. & Stanley, H.E. (2013). Quantifying trading behavior in financial markets using google trends. Scientific Reports, 3:1684. doi:10.1038/srep01684CrossRefGoogle Scholar
  11. [11]
    Scott, S. L. & Varian, H. R. (2013). Bayesian variable selection for nowcasting economic time series. National Bureau of Economic Research. Available via Cited January 18, 2016.CrossRefGoogle Scholar
  12. [12]
    Vaughan, L. & Romero-Frías, E. (2014). Web search volume as a predictor of academic fame: an exploration of Google Trends. Journal of the Association for Information Science and Technology, 65(4): 707–720.CrossRefGoogle Scholar
  13. [13]
    Wang, J.Y., Peng, G. & Dai, W. (2014). Prediction of online trade growth using search-ANFIS: transactions on Taobao as examples. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), July 6-11, 2014, Beijing, China.Google Scholar
  14. [14]
    Wu, L. & Brynjolfsson, E. (2014). The future of prediction: how Google searches foreshadow housing prices and sales. Economics of Digitization, University of Chicago Press.Google Scholar
  15. [15]
    Yang, X., Pan, B., Evans, J. A. & Lv, B. (2015). Forecasting Chinese tourist volume with search engine data. Tourism Management, 46: 386–397.CrossRefGoogle Scholar
  16. [16]
    Yang, Y., Pan, B. & Song, H. (2014). Predicting hotel demand using destination marketing organization’s WEB traffic data. Journal of Travel Research, 53(4): 433–447.CrossRefGoogle Scholar
  17. [17]
    Yuan, Q., Nsoesie, E. O., Lv, B., Peng, G., Chunara, R. & Brownstein, J. S. (2013). Monitoring influenza epidemics in china with search query from Baidu. PloS one, 8(5): e64323.CrossRefGoogle Scholar

Copyright information

© Systems Engineering Society of China and Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.School of Economic and ManagementUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.Key Laboratory of Big Data Mining and Knowledge ManagementChinese Academy of SciencesBeijingChina
  3. 3.Academy of Mathematics and Systems ScienceChinese Academy of SciencesBeijingChina

Personalised recommendations