Abstract
Conclusive association entities (CAEs) in the title and the abstract of an article a are those biomedical entities (e.g., genes, diseases, and chemicals) that are specific targets on which conclusive findings about their associations are reported in a. Identification of the CAEs is essential for the analysis of conclusive associations, which is a task routinely conducted by many biomedical scientists. However, CAE identification is challenging, as it is difficult to identify the specific entities and then estimate how conclusive the findings on the entities are. In this paper we present an association mining technique to improve CAE identification. The technique is based on a hypothesis: two candidate entities in an article are likely to be CAEs of the article if a strong association between them is mined from a collection of articles. Experimental results show that, by integrating the technique with representative keyword identification indicators, CAE identification can be significantly improved. The results are of technical and practical significance to the indexing, curation, and exploration of conclusive associations reported in biomedical literature.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Update information of CTD can be found at http://ctdbase.org/help/faq/;jsessionid=92111C8A6B218E4B2513C3B0BEE7E63F?p=6422623 (accessed, September 2018).
- 2.
A large number of biomedical scientists join the curation tasks of GHR, see http://ghr.nlm.nih.gov/ExpertReviewers (accessed, September 2018).
- 3.
OMIM updates association information on a daily basis, see http://www.omim.org/about (accessed, September 2018).
- 4.
MeSH (available at https://www.ncbi.nlm.nih.gov/mesh) is a controlled vocabulary for indexing biomedical articles.
- 5.
SVMrank is available at http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html.
- 6.
In one of our previous projects (ID: MOST 105-2221-E-320-004), we ever employed articles from CTD as experimental data.
- 7.
More information about the customized vocabulary is available at http://ctdbase.org/help/faq/;jsessionid=92111C8A6B218E4B2513C3B0BEE7E63F?p=6422623, http://ctdbase.org/help/geneDetailHelp.jsp, and http://ctdbase.org/help/diseaseDetailHelp.jsp (accessed, May 2017)
References
Arighi, C.N., et al.: BioCreative III interactive task: an overview. BMC Bioinform. 12(Suppl. 8), S4 (2011)
Aronson, A.R.: The MMI Ranking Function (1997). https://ii.nlm.nih.gov/MTI/Details/mmi.shtml. Accessed May 2018
Boyack, K.W., et al.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)
Davis, A.P., et al.: The comparative toxicogenomics database: update 2017. Nucleic Acids Res. 45(Database issue), D972–D978 (2017)
Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., Alkema, W.: Literature mining for the discovery of hidden connections between drugs, genes diseases. PLoS Comput. Biol. 6(9), e1000943 (2010). https://doi.org/10.1371/journal.pcbi.1000943
Heo, G.E., Kang, K.Y., Song, M.: A flexible text mining system for entity and relation extraction in PubMed. In: Proceedings of DTMBIO 2015 (2015)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of ACM SIGKDD, Edmonton, Alberta, Canada, pp. 133–142 (2002)
Kim, J., So, S, Lee, H.J., Park, J.C., Kim, J.J., Lee, H.: DigSee: disease gene search engine with evidence sentences (version cancer). Nucleic Acids Res. 41(Web Server issue), W510–W517 (2013). https://doi.org/10.1093/nar/gkt531
Kwon, K., Choi, C.H., Lee, J., Jeong, J., Cho, W.S.: A graph based representative keywords extraction model from news articles. In: Proceedings of the 2015 International Conference on Big Data Applications and Services, pp. 30–36 (2015)
Li, L., Liu, S., Qin, M., Wang, Y., Huang, D.: Extracting biomedical event with dual decomposition integrating word embeddings. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(4), 669–677 (2016)
Liu, R.-L., Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts. J. Am. Soc. Inf. Sci. Technol. 62(12), 2479–2495 (2011)
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(01), 157–169 (2004)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2004)
Mork, J., Aronson, A., Demner-Fushman, D.: 12 years on - Is the NLM medical text indexer still useful and relevant? J. Biomed. Semant. 8, 8 (2017)
Özgür, A., Vu, T., Erkan, G., Radev, D.R.: Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 24(13), i277–i285 (2008)
PubMed: Algorithm for finding best matching citations in PubMed. https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Algorithm_for_finding_best_ma. Accessed September 2018
Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinform. 4, 20 (2003)
Thomas, J.R., Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization in e-Newspapers. In: Proceedings of ICIA-16 (2016)
Thuy Phan, T.T., Ohkawa, T.: Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features. BMC Bioinform. 17(Suppl 7), 246 (2016)
Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015)
Tudor, C.O., Schmidt, C.J., Vijay-Shanker, K.: eGIFT: mining gene information from the literature. BMC Bioinform. 11, 418 (2010)
Wiegers, T.C., Davis, A.P., Cohen, K.B., Hirschman, L., Mattingly, C.J.: Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinform. 10, 326 (2009)
Žitnik, S., Žitnik, M., Zupan, B., Bajec, M.: Sieve-based relation extraction of gene regulatory networks from biological literature. BMC Bioinform. 16(Suppl. 16), S1 (2015)
Acknowledgment
This research was supported by Ministry of Science and Technology, Taiwan (grant ID: MOST 107-2221-E-320-004).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, RL. (2019). Identification of Conclusive Association Entities by Biomedical Association Mining. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11431. Springer, Cham. https://doi.org/10.1007/978-3-030-14799-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-14799-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14798-3
Online ISBN: 978-3-030-14799-0
eBook Packages: Computer ScienceComputer Science (R0)