Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics
As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.
KeywordsBioinformatics BLAST Probability Distributed data management Algorithm Cloud Networking Computing
This Research was supported by the Tongmyong University Research Grants 2016 (2016A013).
- 1.Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).Google Scholar
- 2.Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).Google Scholar
- 3.Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).Google Scholar
- 6.Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).Google Scholar
- 7.Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).Google Scholar
- 8.Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).Google Scholar
- 9.Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).Google Scholar
- 13.Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).Google Scholar
- 15.Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).Google Scholar
- 16.Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).Google Scholar
- 21.Feng, W. (2010). mpiBLAST. http://www.mpiblast.org. Accessed 17 May 2016.
- 35.Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).Google Scholar
- 36.A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.Google Scholar