Skip to main content

Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid

  • Conference paper
Book cover Biological and Medical Data Analysis (ISBMDA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3745))

Included in the following conference series:

Abstract

One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches.

This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS). The PattInProt application searches sites and signatures of proteins into databanks of protein sequences.

This work was supported in part by the ACI GRID of the french department of research

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. EBI, http://www.ebi.ac.uk

  2. Inst. de Biologie et Chime des Protéines, http://www.ibcp.fr

  3. NCBI, http://www.ncbi.nlm.nih.gov

  4. NPS@, http://npsa-pbil.ibcp.fr

  5. SIB, http://www.isb-sib.ch/

  6. The European DataGrid Project, http://www.eu-datagrid.org

  7. Apweiler, R., Bairoch, A., Wu, C.H.: Protein sequence databases. Current Opinion in Chem. Bio. 8, 76–80 (2004)

    Article  Google Scholar 

  8. Bell, W., Cameron, D., Capozza, L., Millar, A., Stockinger, K., Zini, F.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 46–57. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Bell, W., Cameron, D., Capozza, L., Millar, A., Stockinger, K., Zini, F.: OptorSim - A Grid Simulator for Studying Dynamic Data Replication Strategies. International Journal of High Performance Computing Applications 17(4) (2003)

    Google Scholar 

  10. Berkelaar, M.: LP SOLVE, http://www.cs.sunysb.edu/~algorith/implement/lpsolve/implement.shtml

  11. Berman, F., Fox, G., Hey, A. (eds.): Grid Computing: Making the Global Infrastructure a Reality. Wiley, Chichester (2003)

    Google Scholar 

  12. Bernal, A., Ear, U., Kyrpides, N.: Genomes OnLine Database (GOLD): A Monitor of Genome Projects World-Wide. NAR 29, 126–127 (2001)

    Article  Google Scholar 

  13. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT Protein Knowledgebase and its Supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)

    Article  Google Scholar 

  14. Bucher, P., Bairoch, A.: A Generalized Profile Syntax for Biomolecular Sequences Motifs and Its Function in Automatic Sequence Interpretation. In: Proceedings 2nd International Conference on Intelligent Systems for Molecular Biology, vol. 2, pp. 53–61. AAAI Press, Menlo Park (1994)

    Google Scholar 

  15. Cardellini, V., Casalicchio, E., Colajanni, M., Su, P.: The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Surveys 34(2), 263–311 (2002)

    Article  Google Scholar 

  16. Chakrabarti, A., Dheepak, R., Sengupta, S.: Integration of Scheduling and Replication in Data Grids. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296, pp. 375–385. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., Tuecke, S.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. J. of Netw. and Comp. Appl. 23, 187–200 (2001)

    Article  Google Scholar 

  18. Combet, C., Blanchet, C., Gourgeon, C., Deléage, G.: Nps@: Network protein sequence analysis. TIBS 25(3):[291] 147–150 (2000)

    Google Scholar 

  19. Cameron, R.C.-S.D.G., Millar, A., Nicholson, C., Stockinger, K., Zini, F.: Evaluating Scheduling and Replica Optimisation Strategies in OptorSim. In: 4th International Workshop on Grid Computing (Grid 2003), November 2003, IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

  20. DIET, http://graal.ens-lyon.fr/DIET/

  21. Foster, I., Kesselman, C. (eds.): The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (2004)

    Google Scholar 

  22. GriPP, http://gripps.ibcp.fr

  23. Hoscheck, W., Jaen-Martinez, J., Samar, A., Stockinger, H., Stockinger, K.: Data Management in an International Data Grid Project. In: Buyya, R., Baker, M. (eds.) GRID 2000. LNCS, vol. 1971, pp. 77–90. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  24. Jacq, N., Blanchet, C., Combet, C., Cornillot, E., Duret, L., Kurata, K., Nakamura, H., Sil-vestre, T., Breton, V.: Grid as a Bioinformatic Tool. Parallel Comp. Special issue: High-performance parallel bio-comp. 30(9-10), 1093–1107 (2004)

    Google Scholar 

  25. Kosar, T., Livny, M.: Stork: Making Data Placement a First Class Citizen in the Grid. In: Proceedings of 24th IEEE Int. Conference on Distributed Computing Systems (ICDCS 2004) (March 2004)

    Google Scholar 

  26. Krishnan, A.: A Survey of Life Sciences Applications on the Grid. New Generation Computing 22, 111–126 (2004)

    Article  MATH  Google Scholar 

  27. Kunszt, P., Guy, L.: Grid Computing: Making the Global Infrastructure a Reality. In: The Open Grid Services Architecture for Data Grids, pp. 385–435. Wiley, Chichester (2003)

    Google Scholar 

  28. Lamehamedi, H., Szymanski, B., Shentu, Z., Deelman, E.: Data Replication Strategies in Grid Environments. In: Proc. 5th International Conference on Algorithms and Architecture for Parallel Processing, ICA3PP 2002, October 2002, pp. 378–383. IEEE Computer Science Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  29. Perriere, G., Combet, C., Penel, S., Blanchet, C., Thioulouse, J., Geourjon, C., Grassot, J., Charavay, C., Gouy, M., Duret, L., Deleage, G.: Integrated Databanks Access and Sequence/Structure Analysis Services at the PBIL. Nucleic Acids Res. 31, 3393–3399 (2003)

    Article  Google Scholar 

  30. Podlipding, S., Böszörmenyi, L.: A Survey ofWeb Cache Replacement Strategies. ACM Computing Surveys 35(4), 374–398 (2003)

    Article  Google Scholar 

  31. Qin, X., Jiang, H.: Data Grid: Supporting Data-Intensive Applications in Wide- Area Networks. Technical Report TR-03-05-01, Univ. of Nebraska-Lincoln (May 2003)

    Google Scholar 

  32. Ranganathan, K., Foster, I.: Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications. In: Proc. of the 11th Int. Symp. for High Performance Distributed Computing (HPDC-11) (July 2002)

    Google Scholar 

  33. Ranganathan, K., Foster, I.: Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids. Journal of Grid Computing 1(1), 53–62 (2003)

    Article  Google Scholar 

  34. Wu, C., Yeh, L., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R., Suzek, B.: The Protein Information Resource. Nucleic Acids Res. 31, 345–347 (2003)

    Article  Google Scholar 

  35. Xu, C., Jin, H., Srimani, P.: Special Issue on Scalable Web Services and Architecture. Journal on Parallel and Distributed Computing 63 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Desprez, F., Vernois, A., Blanchet, C. (2005). Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds) Biological and Medical Data Analysis. ISBMDA 2005. Lecture Notes in Computer Science(), vol 3745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_27

Download citation

  • DOI: https://doi.org/10.1007/11573067_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29674-4

  • Online ISBN: 978-3-540-31658-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics