Abstract
Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.
Chapter PDF
Similar content being viewed by others
Keywords
- Local Storage
- Single Instruction Multiple Data
- Memory Organization
- Multicore Architecture
- Synchronization Overhead
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Fast data finder (fdf) and genematcher (2000), http://www.paracel.com
Aji, A.M., Feng, W.c., Blagojevic, F., Nikolopoulos, D.S.: Cell-swat: modeling and scheduling wavefront computations on the cell broadband engine. In: CF 2008: Proceedings of the 5th conference on Computing frontiers, pp. 13–22. ACM, New York (2008)
Alam, S.R., Meredith, J.S., Vetter, J.S.: Balancing productivity and performance on the cell broadband engine. In: IEEE International Conference on Cluster Computing, pp. 149–158 (2007)
Altschul, S.F., Madden, T.L., Schffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database serach programs. Nucleic acids research 25, 3389–3402 (1997)
Blas, A.D., Karplus, K., Keller, H., Kendrick, M., Mesa-Martinez, F.J., Hughey, R.: The ucsc kestrel parallel processor. IEEE Transactions on Parallel and Distributed systems (January 2005)
Boukerche, A., Magalhaes, A.C., Ayala, M., Santana, T.M.: Parallel strategies for local biological sequence alignment in a cluster of workstations. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS. IEEE Computer Society, Los Alamitos (2005)
Boukerche, A., Melo, A.C., Sandes, E.F., Ayala-Rincon, M.: An exact parallel algorithm to compare very long biological sequences in clusters of workstations. Cluster Computing 10(2), 187–202 (2007)
Chen, C., Schmidt, B.: Computing large-scale alignments on a multi-cluster. In: IEEE International Conference on Cluster Computing, vol. 38 (2003)
Edmiston, E.E., Core, N.G., Saltz, J.H., Smith, R.M.: Parallel processing of biological sequence comparison algorithms. Int. J. Parallel Program. 17(3) (1988)
Friman, S., Ramirez, A., Valero, M.: Quantitative analysis of sequence alignment applications on multiprocessor architectures. In: CF 2009: Proceedings of the 6th ACM conference on Computing frontiers, pp. 61–70. ACM, New York (2009)
Gedik, B., Bordawekar, R.R., Yu, P.S.: Cellsort: high performance sorting on the cell processor. In: VLDB 2007: Proceedings of the 33rd international conference on Very large data bases, pp. 1286–1297, VLDB Endowment (2007)
Manavski, S.A., Valle, G.: Cuda compatible gpu cards as efficient hardwarer accelerator for smith-waterman sequence alignment. BMC Bioinformatics 9 (2008)
Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the smith-waterman and FASTA algorithms. Genomics 11 (1991)
Petrini, F., Fossum, G., Fernández, J., Varbanescu, A.L., Kistler, M., Perrone, M.: Multicore surprises: Lessons learned from optimizing sweep3d on the cell broadband engine. In: IPDPS, pp. 1–10 (2007)
Sachdeva, V., Kistler, M., Speight, E., Tzeng, T.H.K.: Exploring the viability of the cell broadband engine for bioinformatics applications. In: Proceedings of the 6th Workshop on High Performance Computational Biology, pp. 1–8 (2007)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez, F., Cabarcas, F., Ramirez, A., Valero, M. (2010). Long DNA Sequence Comparison on Multicore Architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)