Frontier knowledge discovery and visualization in cancer field based on KOS and LDA
Scientific research journals have achieved the latest development in scientific research in various fields. However, the interpretation and use of biomedical information is still a very complicated issue. How to use practical methods to interpret biomedical literature into structured data and analyze it into what we can understand has become a major issue. In this paper, a frontier knowledge discovery model based on KOS and LDA is proposed and applied in detecting burst topic and its sematic information relationship in cancer field. Experiments showed that the model plays an important role in topic recognition, evolution recognition and visualization. Furthermore, the application of KOS combined with LDA can effectively remove noisy concept in sematic layer and show a good effect.
KeywordsKnowledge organization system (KOS) Latent Dirichlet allocation (LDA) Frontier knowledge Topic Evolution
The project is supported by the National Natural Science Foundation of China (Grant No. 61502402), the Fundamental Research Funds for the Central Universities (Grant No. 20720180073), the state key laboratory of virtual reality technology and systems of China (Grant No. BUAA-VR-15 KF-09) and the Xiamen University (Grant No. 20720150081).
- AlSumait, L., Barbara, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, & X. D. Wu (Eds.), ICDM 2008: Eighth IEEE international conference on data mining, proceedings (pp. 3–12, IEEE international conference on data mining).Google Scholar
- Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2012). On smoothing and inference for topic models. UAI 2009, abs/1205.2662, 27-34. https://arxiv.org/abs/1205.2662v1.
- Bleeker, F. E., Molenaar, R. J., & Sieger, L. (2012). Recent advances in the molecular understanding of glioblastoma. Journal of Neuro-oncology, 108(1), 11.Google Scholar
- Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. In International conference on neural information processing systems, 2005 (pp. 147–154).Google Scholar
- Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In International conference, 2006 (pp. 113–120).Google Scholar
- Buckland, M., Chen, A., Chen, H. M., Kim, Y., Lam, B., Larson, R., et al. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D-Lib Magazine. http://www.dlib.org/dlib/january99/buckland/01buckland.html.
- Cao, L., & Zheng, C. (2010). An Improved Algorithm for Semantic Similarity Based on HowNet. Electronic Technology, 47, 1–3.Google Scholar
- Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7), 1775–1781.Google Scholar
- Chen, L. (2010). The analysis of research frontier and hot topics about knowledge discovery (KD) based on mapping knowledge domain. In Wase international conference on information engineering, 2010 (pp. 28–32).Google Scholar
- Chen, Y. N., Liu, L. Z., & IEEE. (2016). Development and research of topic detection and tracking. In Proceedings of 2016 IEEE 7th international conference on software engineering and service science. International conference on software engineering and service science (pp. 170–173). New York: IEEE.Google Scholar
- Collaborators, G. D. (2017). Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet, 390(10100), 1260.Google Scholar
- Daura-Oller, E., Cabre, M., Montero, M. A., Paternain, J. L., & Romeu, A. (2009). Specific gene hypomethylation and cancer: New insights into coding region feature trends. Bioinformation, 3(8), 340–343.Google Scholar
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6%3c391:aid-asi1%3e3.0.co;2-9.Google Scholar
- Ding, W. Y., Zhang, Y., Chen, C. M., & Hu, X. H. (2016). Semi-supervised Dirichlet–Hawkes process with applications of topic detection and tracking in twitter (2016 IEEE international conference on big data). New York: IEEE.Google Scholar
- Fan, S. P., Xin-Ying, A. N., & Zhao, Y. G. (2016). Framework for multidimensional feature recognition-based studies on frontier knowledge discovery in medical field. Chinese Journal of Medical Library and Information Science, 25, 1–7.Google Scholar
- Griffiths, T. (2007). Gibbs sampling in the generative model of latent Dirichlet allocation. Standford: Standford University.Google Scholar
- Haixia, S., Qing, Q., Yingjie, W., & Lian, L. J. (2010). Research on semantic similarity measuring of MeSH. New Technology of Library and Information Service, 26(6), 12–16.Google Scholar
- Hofmann, T. (1999). Probabilistic latent semantic indexing (Sigir’99: Proceedings of 22nd international conference on research and development in information retrieval). Google Scholar
- Hong, Y., Zhang, Y., Liu, T., & Li, S. (2007). Evaluation and research of topic detection and tracking. Journal of Chinese Information Processing, 21(6), 71–87.Google Scholar
- Humphreys, B. L. (1988). Unified medical language system: Progress report. International Classification, 15, 85–86.Google Scholar
- Lei, G. (2017). Visualization of topic discovery and evolution based on LDA. Modern Computer, 7, 42–44.Google Scholar
- Li, H. J., Cheng, P., & Xie, H. Y. (2017). Text Visualization and LDA Model Based on R Language. In L. Zhu, & T. Zheng (Eds.), Proceedings Of the 2017 2nd International Conference on Machinery, Electronics And Control Simulation (Vol. 138, pp. 516-519, AER-Advances in Engineering Research). Paris: Atlantis Press.Google Scholar
- Li, G., Jiang, S., Zhang, W., Pang, J., & Huang, Q. (2016). Online web video topic detection and tracking with semi-supervised learning. Multimedia Systems, 22(1), 115–125.Google Scholar
- Lindberg, D. A. H., & Humphreys, B. L. (1987). Toward a unified medical language. In European federation for medical informatics, Rome, Italy, 1987 September 21–25, 1987 (pp. 23–31).Google Scholar
- Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265–266.Google Scholar
- Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology, 6, 1–12.Google Scholar
- Pedersen, T., Patwardhan, S., & Michelizzi, J. WordNet: Similarity—measuring the relatedness of concepts. In National conference on artificial intelligence, 2004 (pp. 1024–1025).Google Scholar
- Rau, P. L. P. (2005). Book review: The craft of information visualization: Readings and reflections by B. B. Bederson and B. Shneiderman. International Journal of Human–Computer Interaction, 18(1), 129–130.Google Scholar
- Scibor, E., & Tomasikbeck, J. (1994). On the establishment of concordances between indexing languages of universal or interdisciplinary scope (Polish experiences). Knowledge Organization, 21(4), 203–212.Google Scholar
- Shaoping, F., Xinying, A., & Wanhui, L. (2017). The study on method for topic semantic similarity based on medical literature. Library and Information Service, 8, 96–105.Google Scholar
- Wang, C., Blei, D., & Heckerman, D. (2012). Continuous time dynamic topic models. Uai, abs/1206.3298, 579–586. https://arxiv.org/abs/1206.3298.
- Wang, M., Jayaraman, P. P., Solaiman, E., Chen, L. Y., Li, Z., Jun, S., et al. (2018). A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications. Future Generation Computer Systems-the International Journal of Escience, 87, 580–590. https://doi.org/10.1016/j.future.2018.01.047.Google Scholar
- Wang, X., & Mccallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. In ACM SIGKDD international conference on knowledge discovery and data mining, 2006 (pp. 424–433).Google Scholar
- WP12, C. (2000). Cross concordances of classifications and thesauri. http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.
- Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014a). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.Google Scholar
- Wu, Q. Q., Zheng, Y., She, Y., & An, X. (2014b). Emerging topic detection model based on LDA and its application in stem cell field. In IEEE international conference on computational science and engineering, 2014 (pp. 1939–1944).Google Scholar
- Xiang, Q., Yu, H., Ziyan, C., Xiaoyan, L., Jing, T., Tinglei, H., et al. (2014). BURST-LDA: A new topic model for detecting bursty topics from stream text. Journal of Electronics (China), 6, 565–575.Google Scholar
- Xiaohui, Q., & Xiaoqiu, L. (2015). Topic evolution research on a certain field based on LDA topic association filter. New Technology of Library and Information Service, 31(3), 18–25.Google Scholar
- Young, R. M., Jamshidi, A., Davis, G., & Sherman, J. H. (2015). Current trends in the surgical management and treatment of adult glioblastoma. Annals of Translational Medicine, 3(9), 121. https://doi.org/10.3978/j.issn.2305-5839.2015.05.10.Google Scholar
- Zeng, M. L. (2010). Knowledge organization systems (KOS). Proceedings of the American Society for Information Science and Technology, 44(1), 1–3.Google Scholar
- Zeng, M. L., & Chan, L. M. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the Association for Information Science and Technology, 55(5), 377–395.Google Scholar
- Zheng, R., Zhao, H., & Zhang, X. (2015). A word similarity algorithm with sememe probability density ratio based on HowNet. International Journal of Hybrid Information Technology, 8, 417–426.Google Scholar