, Volume 102, Issue 1, pp 905–927 | Cite as

Identifying the landscape of Alzheimer’s disease research with network and content analysis

  • Min Song
  • Go Eun Heo
  • Dahee Lee


Alzheimer’s disease (AD) is one of degenerative brain diseases, whose cause is hard to be diagnosed accurately. As the number of AD patients has increased, researchers have strived to understand the disease and develop its treatment, such as medical experiments and literature analysis. In the area of literature analysis, several traditional studies analyzed the literature at the macro level like author, journal, and institution. However, analysis of the literature both at the macro level and micro level will allow for better recognizing the AD research field. Therefore, in this study we adopt a more comprehensive approach to analyze the AD literature, which consists of productivity analysis (year, journal/proceeding, author, and Medical Subject Heading terms), network analysis (co-occurrence frequency, centrality, and community) and content analysis. To this end, we collect metadata of 96,081 articles retrieved from PubMed. We specifically perform the concept graph-based network analysis applying the five centrality measures after mapping the semantic relationship between the UMLS concepts from the AD literature. We also analyze the time-series topical trend using the Dirichlet multinomial regression topic modeling technique. The results indicate that the year 2013 is the most productive year and Journal of Alzheimer’s Disease the most productive journal. In discovery of the core biological entities and their relationships resided in the AD related PubMed literature, the relationship with glycogen storage disease is founded most frequently mentioned. In addition, we analyze 16 main topics of the AD literature and find a noticeable increasing trend in the topic of transgenic mouse.


Alzheimer’s disease (AD) Bibliometrics Document representation Concept graph Topic modeling 



This work was supported by the Bio-Synergy Research Project (2013M3A9C4078138) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.


  1. Al-Mubaid, H., & Singh, R. K. (2005). A new text mining approach for finding protein-to-disease associations. American Journal of Biochemistry and Biotechnology, 1(3), 145.CrossRefGoogle Scholar
  2. Andreasen, T., Bulskov, H., Jensen, P. A., & Lassen, T. (2009). Conceptual indexing of text using ontologies and lexical resources. Presented at the Proceedings of the eighth international conference on flexible query answering systems (Vol. 5822, pp. 323–332). Berlin: Springer.Google Scholar
  3. Ansari, M. A., Gul, S., & Yaseen, M. (2006). Alzheimer’s disease: A bibliometric study. Trends in Information Management (TRIM), 2(2), 130–140.Google Scholar
  4. Bachman, D., Wolf, P. A., Linn, R., Knoefel, J., Cobb, J., Belanger, A., … D’Agostino, R. (1993). Incidence of dementia and probable Alzheimer’s disease in a general population The Framingham Study. Neurology, 43(3 Part 1), 515–515.Google Scholar
  5. Barnes, L., Wilson, R., Schneider, J., Bienias, J., Evans, D., & Bennett, D. (2003). Gender, cognitive decline, and risk of AD in older persons. Neurology, 60(11), 1777–1781.CrossRefGoogle Scholar
  6. Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. (pp. 361–362). Presented at the International AAAI Conference on Weblogs and Social Media, ICWSM 2009.Google Scholar
  7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATHGoogle Scholar
  8. Bleik, S., Song, M., Smalter, A., Huan, J., & Lushington, G. (2009). CGM: A biomedical text categorization approach using concept graph mining (pp. 38–43). Presented at the IEEE International Conference on Bioinformatics and Biomedicine Workshop, 2009, BIBMW 2009.Google Scholar
  9. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.CrossRefGoogle Scholar
  10. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1), 107–117.CrossRefGoogle Scholar
  11. Brookmeyer, R., Johnson, E., Ziegler-Graham, K., & Arrighi, H. M. (2007). Forecasting the global burden of Alzheimer’s disease. Alzheimer’s & Dementia, 3(3), 186–191.CrossRefGoogle Scholar
  12. Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. Proceedings of 3rd annual symposium on document analysis and information retrieval, 48113(2), 161–175.Google Scholar
  13. Chen, H., Wan, Y., Jiang, S., & Cheng, Y. (2014). Alzheimer’s disease research in the future: bibliometric analysis of cholinesterase inhibitors from 1993 to 2012. Scientometrics, 98(3), 1865–1877.CrossRefGoogle Scholar
  14. Chen, Y.-M., Wang, X.-L., & Liu, B.-Q. (2005). Multi-document summarization based on lexical chains. 2005. Presented at the Proceedings of 2005 IEEE international conference on machine learning and cybernetics (Vol. 3, pp. 1937–1942).Google Scholar
  15. Damashek, M. (1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–848.CrossRefGoogle Scholar
  16. Ercan, G., & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing and Management, 43(6), 1705–1714.CrossRefGoogle Scholar
  17. Erhardt, R. A., Schneider, R., & Blaschke, C. (2006). Status of text-mining techniques applied to biomedical text. Drug Discovery Today, 11(7), 315–325.CrossRefGoogle Scholar
  18. Evans, D. A., Bennett, D. A., Wilson, R. S., Bienias, J. L., Morris, M. C., Scherr, P. A., et al. (2003). Incidence of Alzheimer disease in a biracial urban community: Relation to apolipoprotein E allele status. Archives of Neurology, 60(2), 185–189.CrossRefGoogle Scholar
  19. Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Presented at the Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19–25). New York: ACM.Google Scholar
  20. Hebert, L. E., Scherr, P. A., McCann, J. J., Beckett, L. A., & Evans, D. A. (2001). Is the risk of developing Alzheimer’s disease greater for women than for men? American Journal of Epidemiology, 153(2), 132–136.CrossRefGoogle Scholar
  21. Huang, C., Tian, Y., Zhou, Z., Ling, C. X., & Huang, T. (2006). Keyphrase extraction using semantic networks structure analysis (pp. 275–284). Presented at the Sixth IEEE international conference on data mining, ICDM’06.Google Scholar
  22. Krauthammer, M., Kaufmann, C. A., Gilliam, T. C., & Rzhetsky, A. (2004). Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer’s disease. Proceedings of the National Academy of Sciences of the United States of America, 101(42), 15148–15153.CrossRefGoogle Scholar
  23. Kukull, W. A., Higdon, R., Bowen, J. D., McCormick, W. C., Teri, L., Schellenberg, G. D., et al. (2002). Dementia and Alzheimer disease incidence: A prospective cohort study. Archives of Neurology, 59(11), 1737–1746.CrossRefGoogle Scholar
  24. Lambiotte, R., Delvenne, J. C., & Barahona, M. (2009). Laplacian dynamics and multiscale modular structure in networks. ArXiv preprint arXiv: 0812.1770.Google Scholar
  25. Li, J., Zhu, X., & Chen, J. Y. (2009). Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts. PLoS Computational Biology, 5(7), e1000450. doi: 10.1371/journal.pcbi.1000450.CrossRefGoogle Scholar
  26. Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). The unified medical language system. Methods of Information in Medicine, 32(4), 281–291.Google Scholar
  27. Miech, R., Breitner, J., Zandi, P., Khachaturian, A., Anthony, J., & Mayer, L. (2002). Incidence of AD may decline in the early 90 s for men, later for women The Cache County study. Neurology, 58(2), 209–218.CrossRefGoogle Scholar
  28. Mimno, D., & McCallum, A. (2008). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. Presented at the Proceedings of the 24th conference on uncertainty in artificial intelligence (pp. 411–418).Google Scholar
  29. Orešič, M., Lötjönen, J., & Soininen, H. (2010). Systems medicine and the integration of bioinformatic tools for the diagnosis of Alzheimer’s disease. Genome Medicine, 2(11), 83.CrossRefGoogle Scholar
  30. Ravetti, M. G., Rosso, O. A., Berretta, R., & Moscato, P. (2010). Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One, 5(4), e10153. doi: 10.1371/journal.pone.0010153.
  31. Rocca, W. A., Cha, R. H., Waring, S. C., & Kokmen, E. (1998). Incidence of dementia and Alzheimer’s disease: A reanalysis of data from Rochester, Minnesota, 1975–1984. American Journal of Epidemiology, 148(1), 51–62.CrossRefGoogle Scholar
  32. Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefMATHGoogle Scholar
  33. Seshadri, S., Wolf, P., Beiser, A., Au, R., McNulty, K., White, R., et al. (1997). Lifetime risk of dementia and Alzheimer’s disease: The impact of mortality on risk estimates in the Framingham Study. Neurology, 49(6), 1498–1504.CrossRefGoogle Scholar
  34. Shehata, S., Karray, F., & Kamel, M. (2007). A concept-based model for enhancing text categorization. Presented at the Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 629–637). New York: ACM.Google Scholar
  35. Smalheiser, N. R., & Swanson, D. R. (1996). Linking estrogen to Alzheimer’s disease: An informatics approach. Neurology, 47(3), 809–810.CrossRefGoogle Scholar
  36. Smalheiser, N. R., & Swanson, D. R. (1998). Using ARROWSMITH: A computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3), 149–153.CrossRefGoogle Scholar
  37. Song, M., Kim, S., Zhang, G., Ding, Y., & Chambers, T. (2014). Productivity and influence in bioinformatics: A bibliometric analysis using PubMed central. Journal of the Association for Information Science and Technology, 65(2), 352–371. doi: 10.1002/asi.22970.CrossRefGoogle Scholar
  38. Sorensen, A. A. (2009). Alzheimer’s disease research: scientific productivity and impact of the top 100 investigators in the field. Journal of Alzheimer’s Disease, 16(3), 451–465.Google Scholar
  39. Sorensen, A. A., Seary, A., & Riopelle, K. (2010). Alzheimer’s disease research: A COIN study using co-authorship network analytics. Procedia-Social and Behavioral Sciences, 2(4), 6582–6586. doi: 10.1016/j.sbspro.2010.04.068.CrossRefGoogle Scholar
  40. Thota, H., Rao, A. A., Reddi, K. K., Akula, S., Changalasetty, S. B., & Srinubabu, G. (2007). Alzheimer’s disease care and management: Role of information technology. Bioinformation, 2(3), 91–95.CrossRefGoogle Scholar
  41. Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.MathSciNetMATHGoogle Scholar
  42. Wan, X., Yang, J., & Xiao, J. (2007). Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction (Vol. 45(1), p 552). Presented at the Annual Meeting-Association for Computational Linguistics.Google Scholar
  43. Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2014

Authors and Affiliations

  1. 1.Department of Library and Information ScienceYonsei UniversitySeoulKorea

Personalised recommendations