Abstract
E-Learning repositories and digital libraries are fast becoming important sources for gathering information and learning material. Such systems must therefore provide services to support the learning needs of their users. When a retrieval system shows how its documents relate to each other semantically, a user gets the liberty to choose from different material, and direct his/her study in a focused manner. This calls for a model that identifies types of document relationships, that need to address different aspects of learning. This article defines three such types and a unique statistical model that can automatically identify them in technical/scientific documents. The model defines measures to quantify the degree of relatedness based on distinct statistical patterns exhibited by the common terms in a pair of documents. This approach does not strictly require a knowledge base or hypertext for identifying the characteristic relationship between two documents. Such a statistical model can be extended to build further relatedness types and can be used alongside various other techniques in digital library recommendation engines. Our experiments over a large number of technical documents show that our techniques effectively extract the different types of relationships between documents.
Similar content being viewed by others
References
MIT (2012). Mit open courseware. http://ocw.mit.edu/.
NPTEL (2012). National Programme on Technology Enhanced Learning, NPTEL. http://nptel.iitm.ac.in/.
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches, Proceedings of Human Language Technologies: NAACL, Association for Computational Linguistics (pp. 19–27).
Aletras, N., Stevenson, M., & Clough, P. (2012). Computing similarity between items in a digital library of cultural heritage. Journal on Computing and Cultural Heritage (JOCCH), 5(4), 16.
Andrews, K., Gütl, C., Moser, J., Sabol, V., & Lackner, W. (2001). Search result visualisation with xfind, User Interfaces to Data Intensive Systems, 2001. UIDIS 2001. Proceedings. Second International Workshop on, IEEE (pp. 50–58).
Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., & Damodar, A. (2012). Automatic keyphrase extraction and segmentation of video lectures, Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on, IEEE (pp. 1–10).
Bean, A., & Green, R. (2001). Relationships in the Organization of Knowledge Vol. 2. Berlin: Springer.
Capelle, M., Hogenboom, F., Hogenboom, A., & Frasincar, F. (2013). Semantic news recommendation using wordnet and bing similarities, Proceedings of the 28th Annual ACM Symposium on Applied Computing, ACM (pp. 296–302).
Chalmers, M., & Chitson, P. (1992). Bead: Explorations in information visualization, Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 330–337).
Denning, P., Horning, J., Parnas, D., & Weinstein, L. (2005). Wikipedia risks. Communications of the ACM, 48(12), 152–152.
Foltz, P.W., Kintsch, W., & Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2–3), 285–307.
Frantzi, K.T., & Ananiadou, S. (1996). Extracting nested collocations, Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 41–46).
Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the C-value/NC-value method. International Journal on Digital Libraries, 3(2), 115–130.
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis, IJCAI, (Vol. 7 pp. 1606–1611).
Gonzalez-Agirre, A., Rigau, G., Agirre, E., Aletras, N., & Stevenson, M. (2015). Why are these similar? Investigating item similarity types in a large digital library. Journal of the Association for Information Science and Technology.
Gouws, S. (2010). Evaluation and development of conceptual document similarity metrics with content-based recommender applications, Stellenbosch: University of Stellenbosch.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289–296): Morgan Kaufmann Publishers Inc.
Hopfgartner, F. (2010). Personalised video retrieval: Application of implicit feedback and semantic user profiles, University of Glasgow.
Huang, A. (2008). Similarity measures for text document clustering, Proceedings of the Sixth New Zealand Computer Science Research Student Conference (pp. 49–56).
Huang, L., Milne, D., Frank, E., & Witten, I.H. (2012). Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology, 63(8), 1593–1608.
Huynh, T., Hoang, K., Do, L., Tran, H., Luong, H., & Gauch, S. (2012). Scientific publication recommendations based on collaborative citation networks, Collaboration Technologies and Systems (CTS), 2012 International Conference on (pp. 316–321).
Khoo, C.S.G., & Na, J.C. (2006). Semantic relations in information science. Annual Review of Information Science and Technology, 40, 157–228.
Lai, C.H., Liu, D.R., & Lin, C.S. (2013). Novel personal and group-based trust models in collaborative filtering for document recommendation. Information Sciences, 239(0), 31–49.
McCormack, A.J., & Yager, R. E. (1989). A new taxonomy of science education. Science Teacher, 56(2), 47–48.
Rafi, M., & Shaikh, M.S. (2013). An improved semantic similarity measure for document clustering based on topic maps. arXiv:1303.4087.
Schaefer, C., Hienert, D., & Gottron, T. (2014). Normalized Relevance Distance–A Stable Metric for Computing Semantic Relatedness over Reference Corpora, ECAI.
Strube, M., & Ponzetto, S.P. (2006). WikiRelate! Computing semantic relatedness using Wikipedia, AAAI, (Vol. 6 pp. 1419–1424).
Turdakov, D., & Velikhov, P. (2008). Semantic relatedness metric for wikipedia concepts based on link analysis and its application to word sense disambiguation.
Wan, X. (2007). A novel document similarity measure based on earth mover’s distance. Information Sciences, 177(18), 3718–3730.
Wan, X.J., & Peng, Y.X. (2005). A new retrieval model based on texttiling for document similarity search. Journal of Computer Science and Technology, 20(4), 552–558.
Wu, H.C., Luk, R.W.P., Wong, K.F., & Kwok, K.L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 13.
Zarrinkalam, F., & Kahani M. (2012). A new metric for measuring relatedness of scientificpapers based on non-textual features: Scientific Research Publishing.
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: an efficient data clustering method for very large databases, ACM SIGMOD Record, (Vol. 25 pp. 103–114).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Muralikumar, J., Seelan, S.A., Vijayakumar, N. et al. A statistical approach for modeling inter-document semantic relationships in digital libraries. J Intell Inf Syst 48, 477–498 (2017). https://doi.org/10.1007/s10844-016-0423-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-016-0423-6