Skip to main content

Temporal Topic Modeling of Scholarly Publications for Future Trend Forecasting

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10721))

Included in the following conference series:

Abstract

The volume of scholarly articles published every year has grown exponentially over the years. With these growths in both core and interdisciplinary areas of research, analyzing interesting research trends can be helpful for new researchers and organizations geared towards collaborative work. Existing approaches used unsupervised learning methods such as clustering to group articles with similar characteristics for topic discovery, with low accuracy. Efficient and fast topic discovery models and future trend forecasters can be helpful in building intelligent applications like recommender systems for scholarly articles. In this paper, a novel approach to automatically discover topics (latent factors) from a large set of text documents using association rule mining on frequent itemsets is proposed. Temporal correlation analysis is used for finding the correlation between a set of topics, for improved prediction. To predict the popularity of a topic in the near future, time series analysis based on a set of topic vectors is performed. For experimental validation of the proposed approach, a dataset composed of 17 years worth of computer science scholarly articles, published through standard IEEE conferences was used, and the proposed approach achieved meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tucker, C., Kim, H.: Predicting emerging product design trend by mining publicly available customer review data, vol. 6, pp. 43–52 (2011)

    Google Scholar 

  2. Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans. Inf. Syst. 27(2), 12:1–12:19 (2009). http://doi.acm.org/10.1145/1462198.1462204

    Article  Google Scholar 

  3. Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using WordNet to disambiguate word senses for text classification. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4489, pp. 781–789. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72588-6_127

    Chapter  Google Scholar 

  4. Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the Second International Conference on Information and Knowledge Management, CIKM 1993, pp. 67–74. ACM, New York (1993). http://doi.acm.org/10.1145/170088.170106

  5. Wiemer-Hastings, P., Wiemer-Hastings, K., Graesser, A.: Latent semantic analysis. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 1–14. Citeseer (2004)

    Google Scholar 

  6. Ayad, H., Kamel, M.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47922-8_14

    Chapter  Google Scholar 

  7. Yang, Y., Kamel, M., Jin, F.: Topic discovery from document using ant-based clustering combination. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 100–108. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31849-1_11

    Chapter  Google Scholar 

  8. Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manag. 43(3), 752768 (2007)

    Article  Google Scholar 

  9. Jayabharathy, J., Kanmani, S., Parveen, A.A.: Document clustering and topic discovery based on semantic similarity in scientific literature. In: 2011 IEEE 3rd International Conference on Communication software and networks (ICCSN), pp. 425–429. IEEE (2011)

    Google Scholar 

  10. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  11. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  12. Newman, D.J.: Probabilistic topic decomposition of an eighteenth-century American newspaper. J. Am. Soc. Inf. Sci. Technol 57, 753–767 (2006)

    Article  Google Scholar 

  13. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Zhu, D., Fukazawa, Y., Karapetsas, E., Ota, J.: Intuitive topic discovery by incorporating word-pairs connection into LDA. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, December 2012, vol. 1, pp. 303–310 (2012)

    Google Scholar 

  15. Adhikari, R., Verma, G., Khandelwal, I.: A model ranking based selective ensemble approach for time series forecasting. Procedia Comput. Sci. 48, 14–21 (2015)

    Article  Google Scholar 

  16. Granger, C.W., Ramanathan, R.: Improved methods of combining forecasts. J. Forecast. 3(2), 197–204 (1984)

    Article  Google Scholar 

  17. Zhi, X., Qi, H., Bai, Y., Lin, C.: A comparison of three kinds of multimodel ensemble forecast techniques based on the tigge data. Acta Meteorologica Sinica 26, 41–51 (2012)

    Article  Google Scholar 

  18. Senn, M.: IEEE explorer gateway (2009). http://ieeexplore.ieee.org/gateway. Accessed 20 Oct 2016

  19. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)

    Google Scholar 

  20. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)

    Google Scholar 

  21. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  22. Borgelt, C.: Keeping things simple: finding frequent item sets by recursive elimination. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 66–70. ACM (2005)

    Google Scholar 

  23. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993). ACM

    Article  Google Scholar 

  24. Ruiz, M.D., Gomez-Romero, J., Molina-Solana, M., Campana, J.R., Martn-Bautista, M.J.: Meta-association rules for mining interesting associations in multiple datasets. Appl. Soft Comput. 49, 212–223 (2016)

    Article  Google Scholar 

  25. Borgelt, C.: Simple algorithms for frequent item set mining. Adv. Mach. Learn. II(263), 351–369 (2010)

    Article  MATH  Google Scholar 

  26. Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)

    Article  MATH  Google Scholar 

  27. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 1018 (2009)

    Article  Google Scholar 

  28. Vapnik, V., Golowich, S.E., Smola, A., et al.: Support vector method for function approximation, regression estimation, and signal processing. In: Advances in Neural Information Processing Systems, pp. 281–287 (1997)

    Google Scholar 

  29. Willmott, C.J.: On the validation of models. Phys. Geogr. 2(2), 184–194 (1981)

    Google Scholar 

  30. Maragos, P.: Morphological correlation and mean absolute error criteria. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1989, pp. 1568–1571. IEEE (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amol P. Bhopale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhopale, A.P., Shevgoor, S.K. (2017). Temporal Topic Modeling of Scholarly Publications for Future Trend Forecasting. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds) Big Data Analytics. BDA 2017. Lecture Notes in Computer Science(), vol 10721. Springer, Cham. https://doi.org/10.1007/978-3-319-72413-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72413-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72412-6

  • Online ISBN: 978-3-319-72413-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics