Discovering Emerging Research Topics Based on SPO Predications

  • Zhengyin HuEmail author
  • Rong-Qiang Zeng
  • Lin Peng
  • Hongseng Pang
  • Xiaochu Qin
  • Cheng Guo
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1027)


With the rapid growth of scientific literatures, it is very important to discover the implicit knowledge from the vast information accurately and efficiently. To achieve this goal, we propose a percolation approach to discovering emerging research topics by combining text mining and scientometrics methods based on Subject-Predication-Object (SPO) predications, which consist of a subject argument, an object argument, and the relation that binds them. Firstly, SPO predications are extracted and cleaned from content of literatures to construct SPO semantic networks. Then, community detection is conducted in the SPO semantic networks. Afterwards, two indicators of Research Topic Age (RTA) and Research Topic Authors Number (RTAN) combined by hypervolume-based selection algorithm (HBS) are chosen to identify potential emerging research topics from communities. Finally, scientific literatures of stem cells are selected as a case study, and the result indicates that the approach can effectively and accurately discover the emerging research topics.


Emerging research topics Subject-Predication-Object Community detection Hypervolume-based selection Stem cell 



The work in this paper was supported by the Informationization Special Project of Chinese Academy of Sciences “E-Science Application for Knowledge Discovery in Stem Cells” (Grant No: XXH13506-203) and the Fundamental Research Funds for the Central Universities (Grant No. A0920502051815-69).


  1. 1.
    Swanson, D.R.: Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78(1), 29–37 (1990)Google Scholar
  2. 2.
    Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003)CrossRefGoogle Scholar
  3. 3.
    Rindflesch, T.C., et al.: Semantic MEDLINE: an advanced information management application for biomedicine. Inf. Serv. Use 31, 15–21 (2011)CrossRefGoogle Scholar
  4. 4.
    Chen, C.: Hindsight, insight, and foresight: a multi-level structural variation approach to the study of a scientific field. Technol. Anal. Strat. Manag. 25(6), 619–640 (2013)CrossRefGoogle Scholar
  5. 5.
    Chen, C., Dubin, R., Kim, M.C.: Emerging trends and new developments in regenerative medicine: a scientometric update (2000–2014). Expert. Opin. Biol. Ther. 14(9), 1295–1317 (2014)CrossRefGoogle Scholar
  6. 6.
    Shibata, N., Kajikawa, Y., Sakata, I.: Detecting potential technological fronts by comparing scientific papers and patents. Foresight 13(5), 51–60 (2011)CrossRefGoogle Scholar
  7. 7.
    Gong, X., Jiang, J., Duan, Z., Lu, H.: A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology. BMC Bioinform. 19(4), 111–119 (2018)Google Scholar
  8. 8.
    Kilicoglu, H., Rosemblat, G., Fiszman, M., Rindflesch, T.C.: Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinform. 12(1), 1–17 (2011)CrossRefGoogle Scholar
  9. 9.
    Nikdelfaz, O., Jalili, S.: Disease genes prediction by HMM based PU-learning using gene expression profiles. J. Biomed. Inform. 81, 102–111 (2018)CrossRefGoogle Scholar
  10. 10.
    Zhao, M., Zhang, S., Li, W., Chen, G.: Matching biomedical ontologies based on formal concept analysis. J. Biomed. Semant. 9(11), 1–27 (2018)Google Scholar
  11. 11.
    Zhang, Y., Porter, A.L., Zhengyin, H., et al.: “Term clumping” for technical intelligence: a case study on dye-sensitized solar cells. Technol. Forecast. Soc. Chang. 85, 26–39 (2014)CrossRefGoogle Scholar
  12. 12.
    Fiszman, M., Rindflesch, T.C., Kilicoglu, H.: Abstraction summarization for managing the biomedical research literature. In: Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, pp. 76–83 (2004)Google Scholar
  13. 13.
    Hu, Z.-Y., Zeng, R.-Q., Qin, X.-C., Wei, L., Zhang, Z.: A method of biomedical knowledge discovery by literature mining based on SPO predications: a case study of induced pluripotent stem cells. In: Perner, P. (ed.) MLDM 2018. LNCS (LNAI), vol. 10935, pp. 383–393. Springer, Cham (2018). Scholar
  14. 14.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRefGoogle Scholar
  15. 15.
    Blondel, V.D., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)CrossRefGoogle Scholar
  16. 16.
    Xu, X.Y., Zheng, Y.N., Liu, Z.H.: Study on the method of identifying research fronts based on scientific papers and patents. Libr. Inf. Serv. 60(24), 97–106 (2016)Google Scholar
  17. 17.
    Basseur, M., Zeng, R.-Q., Hao, J.-K.: Hypervolume-based multi-objective local search. Neural Comput. Appl. 21(8), 1917–1929 (2012)CrossRefGoogle Scholar
  18. 18.
    Wei, L., Hu, Z.Y., Pang, H.S., et al.: Study on knowledge discovery in biomedical literature based on SPO predications: a case study of induced pluripotent stem cells. Digit. Libr. Forum 9, 28–34 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Zhengyin Hu
    • 1
    Email author
  • Rong-Qiang Zeng
    • 1
    • 2
  • Lin Peng
    • 1
  • Hongseng Pang
    • 3
  • Xiaochu Qin
    • 4
  • Cheng Guo
    • 4
  1. 1.Chengdu Library and Information CenterChinese Academy of SciencesChengduPeople’s Republic of China
  2. 2.School of MathematicsSouthwest Jiaotong UniversityChengduPeople’s Republic of China
  3. 3.Shenzhen UniversityShenzhenPeople’s Republic of China
  4. 4.Guangzhou Institutes of Biomedicine and Health, Chinese Academy of SciencesGuangzhouPeople’s Republic of China

Personalised recommendations