Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

  • Yoon-Su Jeong
  • Seung-Soo ShinEmail author


As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.


Bioinformatics BLAST Probability Distributed data management Algorithm Cloud Networking Computing 



This Research was supported by the Tongmyong University Research Grants 2016 (2016A013).


  1. 1.
    Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).Google Scholar
  2. 2.
    Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).Google Scholar
  3. 3.
    Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).Google Scholar
  4. 4.
    Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.CrossRefGoogle Scholar
  5. 5.
    Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).Google Scholar
  7. 7.
    Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).Google Scholar
  8. 8.
    Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).Google Scholar
  9. 9.
    Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).Google Scholar
  10. 10.
    Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.CrossRefGoogle Scholar
  11. 11.
    Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.CrossRefGoogle Scholar
  12. 12.
    Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.CrossRefGoogle Scholar
  13. 13.
    Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).Google Scholar
  14. 14.
    Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.CrossRefGoogle Scholar
  15. 15.
    Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).Google Scholar
  16. 16.
    Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).Google Scholar
  17. 17.
    Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.CrossRefGoogle Scholar
  18. 18.
    Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.CrossRefGoogle Scholar
  19. 19.
    Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.CrossRefGoogle Scholar
  20. 20.
    Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.CrossRefGoogle Scholar
  21. 21.
    Feng, W. (2010). mpiBLAST. Accessed 17 May 2016.
  22. 22.
    Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.CrossRefGoogle Scholar
  23. 23.
    Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.CrossRefGoogle Scholar
  24. 24.
    Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer.CrossRefGoogle Scholar
  25. 25.
    Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.CrossRefGoogle Scholar
  26. 26.
    Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.CrossRefGoogle Scholar
  27. 27.
    Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.CrossRefGoogle Scholar
  28. 28.
    Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.CrossRefGoogle Scholar
  29. 29.
    Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.CrossRefGoogle Scholar
  30. 30.
    Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.MathSciNetCrossRefGoogle Scholar
  31. 31.
    Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.CrossRefGoogle Scholar
  32. 32.
    Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.CrossRefGoogle Scholar
  33. 33.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.CrossRefGoogle Scholar
  34. 34.
    Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.MathSciNetCrossRefGoogle Scholar
  35. 35.
    Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).Google Scholar
  36. 36.
    A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Information Communication EngineeringMokwon UniversityDaejeonKorea
  2. 2.Department of Information SecurityTongmyong UniversityBusanKorea

Personalised recommendations