The VLDB Journal

, Volume 28, Issue 1, pp 123–145 | Cite as

Exploring market competition over topics in spatio-temporal document collections

  • Kaiqi ZhaoEmail author
  • Gao Cong
  • Jin-Yao Chin
  • Rong Wen
Regular Paper


With the prominence of location-based services and social networks in recent years, huge amounts of spatio-temporal document collections (e.g., geo-tagged tweets) have been generated. These data collections often imply user’s ideas on different products and thus are helpful for business owners to explore hot topics of their brands and the competition relation to other brands in different spatial regions during different periods. In this work, we aim to mine the topics and the market competition of different brands over each topic for a category of business (e.g., coffeehouses) from spatio-temporal documents within a user-specified region and time period. To support such spatio-temporal search online in an exploratory manner, we propose a novel framework equipped by (1) a generative model for mining topics and market competition, (2) an Octree-based off-line pre-training method for the model and (3) an efficient algorithm for combining pre-trained models to return the topics and market competition on each topic within a user-specified pair of region and time span. Extensive experiments show that our framework is able to improve the runtime by up to an order of magnitude compared with baselines while achieving similar model quality in terms of training log-likelihood.


Topic model Exploratory search Algorithms Gibbs sampling Spatio-temporal data 



This work was supported in part by a MOE Tier-2 grant MOE2016-T2-1-137, a MOE Tier-1 grant RG31/17, and NSFC under the grant 61772537. It was also partially supported under the A*STAR TSRP fund 1424200021.


  1. 1.
    Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., Smola, A.J.: Scalable inference in latent variable models. In: WSDM, pp. 123–132 (2012)Google Scholar
  2. 2.
    AlSumait, L., Barbará, D., Domeniconi, C.: On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp. 3–12 (2008)Google Scholar
  3. 3.
    Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: SIGMOD, pp. 1047–1050 (2009)Google Scholar
  4. 4.
    Archak, N., Ghose, A., Ipeirotis, P.G.: Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In: KDD, pp. 56–65 (2007)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Ding, B., Zhao, B., Lin, C.X., Han, J., Zhai, C., Srivastava, A.N., Oza, N.C.: Efficient keyword-based search for top-k cells in text cube. IEEE Trans. Knowl. Data Eng. 23(12), 1795–1810 (2011)CrossRefGoogle Scholar
  7. 7.
    Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the l 1-ball for learning in high dimensions. In: ICML, pp. 272–279 (2008)Google Scholar
  8. 8.
    Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., Huang, J.: STREAMCUBE: hierarchical spatio-temporal hashtag clustering for event exploration over the twitter stream. In: ICDE, pp. 1561–1572 (2015)Google Scholar
  9. 9.
    Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., Tomokiyo, T.: Deriving marketing intelligence from online discussion. In: KDD, pp. 419–428 (2005)Google Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2000)zbMATHGoogle Scholar
  11. 11.
    Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the twitter stream. In: WWW, pp. 769–778 (2012)Google Scholar
  12. 12.
    Hong, L., Convertino, G., Chi, E.H.: Language matters in twitter: a large scale study. In: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain (2011)Google Scholar
  13. 13.
    Li, G., Hu, J., Feng, J., Tan, K.: Effective location identification from microblogs. In: ICDE, pp. 880–891 (2014)Google Scholar
  14. 14.
    Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text cube: computing IR measures for multidimensional text database analysis. In: ICDM, pp. 905–910 (2008)Google Scholar
  15. 15.
    Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD, pp. 1155–1158 (2010)Google Scholar
  16. 16.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)Google Scholar
  17. 17.
    Morstatter, F., Pfeffer, J., Liu, H.: When is it biased? Assessing the representativeness of twitter’s streaming api. In: WWW, WWW ’14 Companion, pp. 555–556 (2014)Google Scholar
  18. 18.
    Sarkas, N., Angel, A., Koudas, N., Srivastava, D.: Efficient identification of coupled entities in document collections. In: ICDE, pp. 769–772 (2010)Google Scholar
  19. 19.
    Simitsis, A., Baid, A., Sismanis, Y., Reinwald, B.: Multidimensional content exploration. PVLDB 1(1), 660–671 (2008)Google Scholar
  20. 20.
    Sizov, S.: Geofolk: latent spatial semantics in web 2.0 social media. In: WSDM, pp. 281–290 (2010)Google Scholar
  21. 21.
    Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. PVLDB 3(1–2), 703–710 (2010)Google Scholar
  22. 22.
    Strötgen, J., Gertz, M.: Timetrails: a system for exploring spatio-temporal information in documents. PVLDB 3(2), 1569–1572 (2010)Google Scholar
  23. 23.
    Strötgen, J., Gertz, M., Popov, P.: Extraction and exploration of spatio-temporal information in documents. In: GIR (2010)Google Scholar
  24. 24.
    Wang, Y., Bai, H., Stanton, M., Chen, W.Y., Chang, E.Y.: Plda: Parallel latent dirichlet allocation for large-scale applications. In: AAIM, pp. 301–314 (2009)Google Scholar
  25. 25.
    Wu, S., Rand, W., Raschid, L.: Recommendations in social media for brand monitoring. In: RecSys, pp. 345–348 (2011)Google Scholar
  26. 26.
    Xu, K., Liao, S.S., Li, J., Song, Y.: Mining comparative opinions from customer reviews for competitive intelligence. Decis. Support Syst. 50(4), 743–754 (2011)CrossRefGoogle Scholar
  27. 27.
    Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015)Google Scholar
  28. 28.
    Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: KDD, pp. 937–946 (2009)Google Scholar
  29. 29.
    Yin, H., Cui, B., Chen, L., Hu, Z., Huang, Z.: A temporal context-aware model for user behavior modeling in social media systems. In: SIGMOD, pp. 1543–1554 (2014)Google Scholar
  30. 30.
    Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011)Google Scholar
  31. 31.
    Yuan, Q., Cong, G., Ma, Z., Sun, A., Thalmann, N.M.: Who, where, when and what: discover spatio-temporal topics for twitter users. In: SIGKDD, pp. 605–613 (2013)Google Scholar
  32. 32.
    Zhang, D., Zhai, C., Han, J.: Topic cube: topic modeling for OLAP on multidimensional text databases. In: SDM, pp. 1124–1135 (2009)Google Scholar
  33. 33.
    Zhang, D., Zhai, C., Han, J.: Mitexcube: microtextcluster cube for online analysis of text cells. In: CIDU, pp. 204–218 (2011)Google Scholar
  34. 34.
    Zhang, D., Zhai, C., Han, J., Srivastava, A.N., Oza, N.C.: Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. Stat. Anal. Data Min. 2(5–6), 378–395 (2009)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434 (2015)Google Scholar
  36. 36.
    Zhao, B., Lin, C.X., Ding, B., Han, J.: Texplorer: keyword-based object search and exploration in multidimensional text databases. In: CIKM, pp. 1709–1718 (2011)Google Scholar
  37. 37.
    Zhao, K., Chen, L., Cong, G.: Topic exploration in spatio-temporal document collections. In: SIGMOD, pp. 985–998 (2016)Google Scholar
  38. 38.
    Zhao, K., Cong, G., Yuan, Q., Zhu, K.Q.: SAR: a sentiment-aspect-region model for user preference analysis in geo-tagged reviews. In: ICDE, pp. 675–686 (2015)Google Scholar
  39. 39.
    Zhu, C., Zhu, H., Xiong, H., Ding, P., Xie, F.: Recruitment market trend analysis with sequential latent variable models. In: KDD, pp. 383–392 (2016)Google Scholar
  40. 40.
    Zhu, J., Xing, E.P.: Sparse topical coding. CoRR arXiv:1202.3778 (2012)

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.Singapore Institute of Manufacturing TechnologySingaporeSingapore

Personalised recommendations