Abstract
Large de bruijn graph based algorithm is widely used in genome assembly and metagenetic assembly. The scale of this kind of graphs - in some cases billions of vertices and edges - poses challenges to genome assembly problem. In this paper, a one-step bi-directed graph is used to abstract the problem of genome assembly. After that small world asynchronous parallel model (SWAP) is proposed to handle the edge merging operation predefined in the graph. SWAP aims at making use of the locality of computing and communication to explore parallelism for graph algorithm. Based on the above graph abstraction and SWAP model, an assembler is developed, and experiment results shows that a factor of 20 times speedup is achieved when the number of processors scales from 10 to 640 when testing on processing C.elegans data.
Chapter PDF
Similar content being viewed by others
References
Bennet, S.: Solexa ltd. Pharmacogenomics 5(4), 433–438 (2004)
Pandey, V., Nutter, R.C., Prediger, E.: Applied Biosystems SOLiDTM System: Ligation-Based Sequencing. In: Next Generation Genome Sequencing: Towards Personalized Medicine. Wiley (2008)
Business Wire, Helicos biosciences enters molecular diagnostics collaboration with renowned research center to sequence cancer-associated genes. Genetic Engineering and Biotechnology News (2008)
Idury, R.M., Waterman, M.S.: A New Algorithm for DNA Sequence Assembly. Journal of Computational Biology 2(2), 291–306 (1995)
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 98(17), 9748–9753 (2001)
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using De Bruijn graphs. Genome Research 18(5), 821–829 (2008)
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265–272 (2010)
Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., et al.: ABySS: a parallel assembler for short read sequence data. Genome Research 19(6), 1117–1123 (2009)
Jackson, B.G., Aluru, S.: Parallel Construction of Bidirected String Graphs for Genome Assembly. In: Proc. of the 37th International Conference on Parallel Processing (ICPP 2008), pp. 346–353 (September 2008)
Jackson, B.G., Schnable, P.S., Aluru, S.: Parallel short sequence assembly of transcriptomes. BMC Bioinformatics 10(S-1) (2009)
Jackson, B.G., Regennitter, M., Yang, X., Schnable, P.S., Aluru, S.: Parallel de novo assembly of large genomes from high-throughput short reads. In: Proc. of the 24th International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta (2010)
Miller, R.: A Library for Bulk-Synchronous Parallel Programming. In: Proc. British Computer Society Parallel Processing Specialist Group Workshop on General Purpose Parallel Computing (1993)
Goudreau, M.W., Lang, K., Rao, S.B., Suel, T., Tsantilas, T.: Portable and Effcient Parallel Computing Using the BSP Model. IEEE Transactions on Computers 48(7), 670–689 (1999)
Bonorden, O., Juurlink, B.H.H., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) Library. Parallel Computing 29(2), 187–207 (2003)
Chan, A., Dehne, F.: CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines. International Journal of High Performance Computing Applications 19(1), 81–97 (2005)
Gregor, D., Lumsdaine, A.: The Parallel BGL: A Generic Library for Distributed Graph Computations. In: Proc. of Parallel Object-Oriented Scientific Computing, POOSC (2005)
Gregor, D., Lumsdaine, A.: Lifting Sequential Graph Algorithms for Distributed-Memory Parallel Computation. In: Proc. of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications(OOPSLA 2005), pp. 423–437 (2005)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010 Proceedings of the 2010 International Conference on Management of Data, New York, pp. 135–146 (2010)
Valiant, L.G.: A bridging model for parallel computation. Communications of the ACMÂ 33(8) (August 1990)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM - 50th Anniversary Issue: 1958 - 2008Â 51(1) (2008)
Tanenbaum, A.S.: Computer Networks. Prentice Hall, New Jersey (2003)
Zhang, W., Chen, J., Yang, Y., Tang, Y., Shang, J., Shen, B.: A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS ONE 6(3) (March 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Meng, J., Yuan, J., Cheng, J., Wei, Y., Feng, S. (2012). Small World Asynchronous Parallel Model for Genome Assembly. In: Park, J.J., Zomaya, A., Yeo, SS., Sahni, S. (eds) Network and Parallel Computing. NPC 2012. Lecture Notes in Computer Science, vol 7513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35606-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-35606-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35605-6
Online ISBN: 978-3-642-35606-3
eBook Packages: Computer ScienceComputer Science (R0)