MeKG: Building a Medical Knowledge Graph by Data Mining from MEDLINE

  • Thuan PhamEmail author
  • Xiaohui Tao
  • Ji Zhang
  • Jianming Yong
  • Xujuan Zhou
  • Raj Gururajan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11976)


Mining data on a knowledge level can help to achieve a higher performance of a decision support system. This study built a knowledge graph based on MEDLINE that has a large number of articles in the medical domain. MEDLINE uses Medical Subject Headings (MeSH) for document index. Based on MeSH, articles are extracted from the MEDLINE correspondent to medical subjects. Using the MeSH as the backbone of knowledge base, the MEDLINE articles were used to generate instances which helped to populate the knowledge base. This approach facilitated the creation of a knowledge graph that was capable of discovering the hidden knowledge among concepts of MeSH. The knowledge graph had a significant effect on improving the quality of healthcare. The contribution of the research is on a framework for building knowledge bases. Moreover, the approach provided an essential source at the knowledge level for researchers in healthcare.


MeSH MEDLINE Knowledge graph Data mining 


  1. 1.
    Banuqitah, H., Eassa, F., Jambi, K., Abulkhair, M.: Two level self-supervised relation extraction from MEDLINE using UMLS. Int. J. Data Min. Knowl. Manag. Process 6(3), 11–23 (2016)CrossRefGoogle Scholar
  2. 2.
    Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)Google Scholar
  3. 3.
    Costa, J.P., et al.: Mining MEDLINE for the visualisation of a global perspective on biomedical knowledge. In: KDD 2018 (24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining) (2018)Google Scholar
  4. 4.
    Ganguly, D., Roy, D., Mitra, M., Jones, G.J.: Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–798. ACM (2015)Google Scholar
  5. 5.
    Goh, W.P., Tao, X., Zhang, J., Yong, J.: Decision support systems for adoption in dental clinics: a survey. Knowl.-Based Syst. 104, 195–206 (2016)CrossRefGoogle Scholar
  6. 6.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  7. 7.
    Nguyen, G.-H., Tamine, L., Soulier, L., Souf, N.: Learning concept-driven document embeddings for medical information search. In: ten Teije, A., Popow, C., Holmes, J.H., Sacchi, L. (eds.) AIME 2017. LNCS (LNAI), vol. 10259, pp. 160–170. Springer, Cham (2017). Scholar
  8. 8.
    Pham, T., Tao, X., Zhanag, J., Yong, J., Zhang, W., Cai, Y.: Mining heterogeneous information graph for health status classification. In: 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC), pp. 73–78. IEEE (2018)Google Scholar
  9. 9.
    Shi, L., Li, S., Yang, X., Qi, J., Pan, G., Zhou, B.: Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. BioMed Res. Int. 2017, 12 (2017)Google Scholar
  10. 10.
    Voskarides, N., Meij, E., Tsagkias, M., De Rijke, M., Weerkamp, W.: Learning to explain entity relationships in knowledge graphs. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 564–574 (2015)Google Scholar
  11. 11.
    Wang, H., Zhang, Q., Yuan, J.: Semantically enhanced medical information retrieval system: a tensor factorization based approach. IEEE Access 5, 7584–7593 (2017)CrossRefGoogle Scholar
  12. 12.
    Xu, C., et al.: RC-NET: a general framework for incorporating knowledge into word representations. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1219–1228. ACM (2014)Google Scholar
  13. 13.
    Xu, R., Wang, Q.: Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC Bioinform. 14(1), 181 (2013)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575–584. ACM (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Thuan Pham
    • 1
    Email author
  • Xiaohui Tao
    • 1
  • Ji Zhang
    • 1
  • Jianming Yong
    • 2
  • Xujuan Zhou
    • 2
  • Raj Gururajan
    • 2
  1. 1.School of SciencesUniversity of Southern QueenslandToowoombaAustralia
  2. 2.School of Management and EnterpriseUniversity of Southern QueenslandToowoombaAustralia

Personalised recommendations