A statistical approach for modeling inter-document semantic relationships in digital libraries

Muralikumar, Jeyavaishnavi; Seelan, Sri Ananda; Vijayakumar, Narendranath; Balasubramanian, Vidhya

doi:10.1007/s10844-016-0423-6

A statistical approach for modeling inter-document semantic relationships in digital libraries

Published: 18 July 2016

Volume 48, pages 477–498, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jeyavaishnavi Muralikumar¹,
Sri Ananda Seelan¹,
Narendranath Vijayakumar¹ &
…
Vidhya Balasubramanian¹

477 Accesses
3 Citations
Explore all metrics

Abstract

E-Learning repositories and digital libraries are fast becoming important sources for gathering information and learning material. Such systems must therefore provide services to support the learning needs of their users. When a retrieval system shows how its documents relate to each other semantically, a user gets the liberty to choose from different material, and direct his/her study in a focused manner. This calls for a model that identifies types of document relationships, that need to address different aspects of learning. This article defines three such types and a unique statistical model that can automatically identify them in technical/scientific documents. The model defines measures to quantify the degree of relatedness based on distinct statistical patterns exhibited by the common terms in a pair of documents. This approach does not strictly require a knowledge base or hypertext for identifying the characteristic relationship between two documents. Such a statistical model can be extended to build further relatedness types and can be used alongside various other techniques in digital library recommendation engines. Our experiments over a large number of technical documents show that our techniques effectively extract the different types of relationships between documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Search in a Personal Digital Library

Efficient Graph-Based Document Similarity

A Domain Based Approach to Information Retrieval in Digital Libraries

References

MIT (2012). Mit open courseware. http://ocw.mit.edu/.
NPTEL (2012). National Programme on Technology Enhanced Learning, NPTEL. http://nptel.iitm.ac.in/.
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches, Proceedings of Human Language Technologies: NAACL, Association for Computational Linguistics (pp. 19–27).
Google Scholar
Aletras, N., Stevenson, M., & Clough, P. (2012). Computing similarity between items in a digital library of cultural heritage. Journal on Computing and Cultural Heritage (JOCCH), 5(4), 16.
Google Scholar
Andrews, K., Gütl, C., Moser, J., Sabol, V., & Lackner, W. (2001). Search result visualisation with xfind, User Interfaces to Data Intensive Systems, 2001. UIDIS 2001. Proceedings. Second International Workshop on, IEEE (pp. 50–58).
Chapter Google Scholar
Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., & Damodar, A. (2012). Automatic keyphrase extraction and segmentation of video lectures, Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on, IEEE (pp. 1–10).
Chapter Google Scholar
Bean, A., & Green, R. (2001). Relationships in the Organization of Knowledge Vol. 2. Berlin: Springer.
Capelle, M., Hogenboom, F., Hogenboom, A., & Frasincar, F. (2013). Semantic news recommendation using wordnet and bing similarities, Proceedings of the 28th Annual ACM Symposium on Applied Computing, ACM (pp. 296–302).
Chapter Google Scholar
Chalmers, M., & Chitson, P. (1992). Bead: Explorations in information visualization, Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 330–337).
Google Scholar
Denning, P., Horning, J., Parnas, D., & Weinstein, L. (2005). Wikipedia risks. Communications of the ACM, 48(12), 152–152.
Article Google Scholar
Foltz, P.W., Kintsch, W., & Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2–3), 285–307.
Article Google Scholar
Frantzi, K.T., & Ananiadou, S. (1996). Extracting nested collocations, Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 41–46).
Chapter Google Scholar
Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the C-value/NC-value method. International Journal on Digital Libraries, 3(2), 115–130.
Article Google Scholar
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis, IJCAI, (Vol. 7 pp. 1606–1611).
Gonzalez-Agirre, A., Rigau, G., Agirre, E., Aletras, N., & Stevenson, M. (2015). Why are these similar? Investigating item similarity types in a large digital library. Journal of the Association for Information Science and Technology.
Gouws, S. (2010). Evaluation and development of conceptual document similarity metrics with content-based recommender applications, Stellenbosch: University of Stellenbosch.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289–296): Morgan Kaufmann Publishers Inc.
Hopfgartner, F. (2010). Personalised video retrieval: Application of implicit feedback and semantic user profiles, University of Glasgow.
Huang, A. (2008). Similarity measures for text document clustering, Proceedings of the Sixth New Zealand Computer Science Research Student Conference (pp. 49–56).
Google Scholar
Huang, L., Milne, D., Frank, E., & Witten, I.H. (2012). Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology, 63(8), 1593–1608.
Article Google Scholar
Huynh, T., Hoang, K., Do, L., Tran, H., Luong, H., & Gauch, S. (2012). Scientific publication recommendations based on collaborative citation networks, Collaboration Technologies and Systems (CTS), 2012 International Conference on (pp. 316–321).
Chapter Google Scholar
Khoo, C.S.G., & Na, J.C. (2006). Semantic relations in information science. Annual Review of Information Science and Technology, 40, 157–228.
Article Google Scholar
Lai, C.H., Liu, D.R., & Lin, C.S. (2013). Novel personal and group-based trust models in collaborative filtering for document recommendation. Information Sciences, 239(0), 31–49.
Article Google Scholar
McCormack, A.J., & Yager, R. E. (1989). A new taxonomy of science education. Science Teacher, 56(2), 47–48.
Google Scholar
Rafi, M., & Shaikh, M.S. (2013). An improved semantic similarity measure for document clustering based on topic maps. arXiv:1303.4087.
Schaefer, C., Hienert, D., & Gottron, T. (2014). Normalized Relevance Distance–A Stable Metric for Computing Semantic Relatedness over Reference Corpora, ECAI.
Strube, M., & Ponzetto, S.P. (2006). WikiRelate! Computing semantic relatedness using Wikipedia, AAAI, (Vol. 6 pp. 1419–1424).
Turdakov, D., & Velikhov, P. (2008). Semantic relatedness metric for wikipedia concepts based on link analysis and its application to word sense disambiguation.
Wan, X. (2007). A novel document similarity measure based on earth mover’s distance. Information Sciences, 177(18), 3718–3730.
Article Google Scholar
Wan, X.J., & Peng, Y.X. (2005). A new retrieval model based on texttiling for document similarity search. Journal of Computer Science and Technology, 20(4), 552–558.
Article Google Scholar
Wu, H.C., Luk, R.W.P., Wong, K.F., & Kwok, K.L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 13.
Article Google Scholar
Zarrinkalam, F., & Kahani M. (2012). A new metric for measuring relatedness of scientificpapers based on non-textual features: Scientific Research Publishing.
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: an efficient data clustering method for very large databases, ACM SIGMOD Record, (Vol. 25 pp. 103–114).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore, India
Jeyavaishnavi Muralikumar, Sri Ananda Seelan, Narendranath Vijayakumar & Vidhya Balasubramanian

Authors

Jeyavaishnavi Muralikumar
View author publications
You can also search for this author in PubMed Google Scholar
Sri Ananda Seelan
View author publications
You can also search for this author in PubMed Google Scholar
Narendranath Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Vidhya Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vidhya Balasubramanian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muralikumar, J., Seelan, S.A., Vijayakumar, N. et al. A statistical approach for modeling inter-document semantic relationships in digital libraries. J Intell Inf Syst 48, 477–498 (2017). https://doi.org/10.1007/s10844-016-0423-6

Download citation

Received: 07 February 2016
Revised: 01 July 2016
Accepted: 03 July 2016
Published: 18 July 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10844-016-0423-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A statistical approach for modeling inter-document semantic relationships in digital libraries

Abstract

Access this article

Similar content being viewed by others

Semantic Search in a Personal Digital Library

Efficient Graph-Based Document Similarity

A Domain Based Approach to Information Retrieval in Digital Libraries

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A statistical approach for modeling inter-document semantic relationships in digital libraries

Abstract

Access this article

Similar content being viewed by others

Semantic Search in a Personal Digital Library

Efficient Graph-Based Document Similarity

A Domain Based Approach to Information Retrieval in Digital Libraries

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation