Temporal Topic Modeling of Scholarly Publications for Future Trend Forecasting

Bhopale, Amol P.; Shevgoor, Sowmya Kamath

doi:10.1007/978-3-319-72413-3_10

Amol P. Bhopale¹⁷ &
Sowmya Kamath Shevgoor¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10721))

Included in the following conference series:

International Conference on Big Data Analytics

2261 Accesses
2 Citations

Abstract

The volume of scholarly articles published every year has grown exponentially over the years. With these growths in both core and interdisciplinary areas of research, analyzing interesting research trends can be helpful for new researchers and organizations geared towards collaborative work. Existing approaches used unsupervised learning methods such as clustering to group articles with similar characteristics for topic discovery, with low accuracy. Efficient and fast topic discovery models and future trend forecasters can be helpful in building intelligent applications like recommender systems for scholarly articles. In this paper, a novel approach to automatically discover topics (latent factors) from a large set of text documents using association rule mining on frequent itemsets is proposed. Temporal correlation analysis is used for finding the correlation between a set of topics, for improved prediction. To predict the popularity of a topic in the near future, time series analysis based on a set of topic vectors is performed. For experimental validation of the proposed approach, a dataset composed of 17 years worth of computer science scholarly articles, published through standard IEEE conferences was used, and the proposed approach achieved meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tucker, C., Kim, H.: Predicting emerging product design trend by mining publicly available customer review data, vol. 6, pp. 43–52 (2011)
Google Scholar
Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans. Inf. Syst. 27(2), 12:1–12:19 (2009). http://doi.acm.org/10.1145/1462198.1462204
Article Google Scholar
Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using WordNet to disambiguate word senses for text classification. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4489, pp. 781–789. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72588-6_127
Chapter Google Scholar
Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the Second International Conference on Information and Knowledge Management, CIKM 1993, pp. 67–74. ACM, New York (1993). http://doi.acm.org/10.1145/170088.170106
Wiemer-Hastings, P., Wiemer-Hastings, K., Graesser, A.: Latent semantic analysis. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 1–14. Citeseer (2004)
Google Scholar
Ayad, H., Kamel, M.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47922-8_14
Chapter Google Scholar
Yang, Y., Kamel, M., Jin, F.: Topic discovery from document using ant-based clustering combination. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 100–108. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31849-1_11
Chapter Google Scholar
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manag. 43(3), 752768 (2007)
Article Google Scholar
Jayabharathy, J., Kanmani, S., Parveen, A.A.: Document clustering and topic discovery based on semantic similarity in scientific literature. In: 2011 IEEE 3rd International Conference on Communication software and networks (ICCSN), pp. 425–429. IEEE (2011)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)
Article Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Newman, D.J.: Probabilistic topic decomposition of an eighteenth-century American newspaper. J. Am. Soc. Inf. Sci. Technol 57, 753–767 (2006)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Zhu, D., Fukazawa, Y., Karapetsas, E., Ota, J.: Intuitive topic discovery by incorporating word-pairs connection into LDA. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, December 2012, vol. 1, pp. 303–310 (2012)
Google Scholar
Adhikari, R., Verma, G., Khandelwal, I.: A model ranking based selective ensemble approach for time series forecasting. Procedia Comput. Sci. 48, 14–21 (2015)
Article Google Scholar
Granger, C.W., Ramanathan, R.: Improved methods of combining forecasts. J. Forecast. 3(2), 197–204 (1984)
Article Google Scholar
Zhi, X., Qi, H., Bai, Y., Lin, C.: A comparison of three kinds of multimodel ensemble forecast techniques based on the tigge data. Acta Meteorologica Sinica 26, 41–51 (2012)
Article Google Scholar
Senn, M.: IEEE explorer gateway (2009). http://ieeexplore.ieee.org/gateway. Accessed 20 Oct 2016
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Borgelt, C.: Keeping things simple: finding frequent item sets by recursive elimination. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 66–70. ACM (2005)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993). ACM
Article Google Scholar
Ruiz, M.D., Gomez-Romero, J., Molina-Solana, M., Campana, J.R., Martn-Bautista, M.J.: Meta-association rules for mining interesting associations in multiple datasets. Appl. Soft Comput. 49, 212–223 (2016)
Article Google Scholar
Borgelt, C.: Simple algorithms for frequent item set mining. Adv. Mach. Learn. II(263), 351–369 (2010)
Article MATH Google Scholar
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)
Article MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 1018 (2009)
Article Google Scholar
Vapnik, V., Golowich, S.E., Smola, A., et al.: Support vector method for function approximation, regression estimation, and signal processing. In: Advances in Neural Information Processing Systems, pp. 281–287 (1997)
Google Scholar
Willmott, C.J.: On the validation of models. Phys. Geogr. 2(2), 184–194 (1981)
Google Scholar
Maragos, P.: Morphological correlation and mean absolute error criteria. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1989, pp. 1568–1571. IEEE (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India
Amol P. Bhopale & Sowmya Kamath Shevgoor

Authors

Amol P. Bhopale
View author publications
You can also search for this author in PubMed Google Scholar
Sowmya Kamath Shevgoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amol P. Bhopale .

Editor information

Editors and Affiliations

International Institute of Information Technology, Hyderabad, India
P. Krishna Reddy
Rajiv Gandhi Education City, Sonepat, India
Ashish Sureka
University of Texas at Arlington, Arlington, Texas, USA
Sharma Chakravarthy
University of Aizu, Aizu-Wakamatsu, Japan
Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhopale, A.P., Shevgoor, S.K. (2017). Temporal Topic Modeling of Scholarly Publications for Future Trend Forecasting. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds) Big Data Analytics. BDA 2017. Lecture Notes in Computer Science(), vol 10721. Springer, Cham. https://doi.org/10.1007/978-3-319-72413-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-72413-3_10
Published: 25 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72412-6
Online ISBN: 978-3-319-72413-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics