Skip to main content

Chapter 15: Search Computing and the Life Sciences

  • Chapter
Book cover Search Computing

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5950))

Abstract

Search Computing has been proposed to support the integration of the results of search engines with other data and computational resources. A key feature of the resulting integration platform is direct support for multi-domain ordered data, reflecting the fact that search engines produce ranked outputs, which should be taken into account when the results of several requests are combined. In the life sciences, there are many different types of ranked data. For example, ranked data may represent many different phenomena, including physical ordering within a genome, algorithmically assigned scores that represent levels of sequence similarity, and experimentally measured values such as expression levels. This chapter explores the extent to which the search computing functionalities designed for use with search engine results may be applicable for different forms of ranked data that are encountered when carrying out data integration in the life sciences. This is done by classifying different types of ranked data in the life sciences, providing examples of different types of ranking and ranking integration needs in the life sciences, identifying issues in the integration of such ranked data, and discussing techniques for drawing conclusions from diverse rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stead, D., Paton, N.W., Missier, P., Embury, S.M., Hedeler, C., Jin, B., Brown, A.J.P., Preece, A.D.: Information quality in proteomics. Brief. Bioinform. 9(2), 174–188 (2008)

    Article  Google Scholar 

  2. Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., Contrino, S., Coulson, R., Farne, A., Lara, G.G., Holloway, E., Kapushesky, M., Lilja, P., Mukherjee, G., Oezcimen, A., Rayner, T., Rocca-Serra, P., Sharma, A., Sansone, S., Brazma, A.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 33(Database issue), D553-D555 (2005)

    Article  Google Scholar 

  3. Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research annual database issue and the NAR online molecular biology database collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)

    Article  Google Scholar 

  4. Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(suppl. 2), S8 (2008)

    Article  Google Scholar 

  5. Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005)

    Article  Google Scholar 

  6. Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Mashing up search services. IEEE Internet Comput. 12(5), 16–23 (2008)

    Article  Google Scholar 

  7. Hernandez, T., Kambhampati, S.: Integration of biological sources: current systems and challenges ahead. SIGMOD Record 33(3), 51–60 (2004)

    Article  Google Scholar 

  8. Masseroli, M., Ceri, S., Campi, A.: Integration and mining of genomic annotations: experiences and perspectives in GFINDer data warehousing. In: Paton, N.W., Missier, P., Hedeler, C. (eds.) DILS 2009. LNCS (LNBI), vol. 5647, pp. 88–95. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Hull, D., Wolstencroft, K., Stevens, R., Goble, C.A., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)

    Article  Google Scholar 

  10. Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 534–551 (2001)

    Article  Google Scholar 

  11. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International World Wide Web Conference, WWW 2001, pp. 613–622. ACM Press, New York (2001)

    Google Scholar 

  12. Edgar, R., Domravech, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)

    Article  Google Scholar 

  13. Jones, P., Côté, R.G., Martens, L., Quinn, A.F., Taylor, C.F., Derache, W., Hermjakob, H., Apweiler, R.: PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 34(Database Issue), D659–D663 (2006)

    Article  Google Scholar 

  14. Olken, F.: Graph data management for molecular biology. OMICS: A Journal of Integr. Biol. 7(1), 75–78 (2003)

    Article  Google Scholar 

  15. Castrillo, J.I., Zeef, L.A., Hoyle, D.C., Zhang, N., Hayes, A., Gardner, D.C., Cornell, M.J., Petty, J., Hakes, L., Wardleworth, L., Rash, B., Brown, M., Dunn, W.B., Broadhurst, D., O’Donoghue, K., Hester, S.S., Dunkley, T.P., Hart, S.R., Swainston, N., Li, P., Gaskell, S.J., Paton, N.W., Lilley, K.S., Kell, D.B., Oliver, S.G.: Growth control of the eukaryote cell: a systems biology study in yeast. J. Biol. 6(2), 4 (2007)

    Article  Google Scholar 

  16. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  17. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  18. Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.J., Hsu, C.N., Tsai, R.T., Hung, H.C., Lau, W.W., Johnson, C.A., Saetre, R., Yoshida, K., Chen, Y.H., Kim, S., Shin, S.Y., Zhang, B.T., Baumgartner Jr., W.A., Hunter, L., Haddow, B., Matthews, M., Wang, X., Ruch, P., Ehrler, F., Ozgür, A., Erkan, G., Radev, D.R., Krauthammer, M., Luong, T., Hoffmann, R., Sander, C., Valencia, A.: Introducing meta-services for biomedical information extraction. Genome Biol. 9(suppl. 2), S6 (2008)

    Article  Google Scholar 

  19. Goble, C.A., Belhajjame, K., Tanoh, F., Bhagat, J., Wolstencroft, K., Stevens, R., Pettifer, S., Nzuobontane, E., McWilliam, H., Laurent, T., Lopez, R.: BioCatalogue: a curated Web Service registry for the Life Science community. In: ISMB/ECCB 2009. Technology Track: TT40 (2009)

    Google Scholar 

  20. Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., Tarczy-Hornoch, P.: Data integration and genomic medicine. J. Biomed. Inform. 40(1), 5–16 (2007)

    Article  Google Scholar 

  21. Pihur, V., Datta, S., Datta, S.: Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13), 1607–1615 (2007)

    Article  Google Scholar 

  22. DeConde, R., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5, Article 15 (2006)

    Google Scholar 

  23. Pihur, V., Datta, S., Datta, S.: RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics 10, 62 (2009)

    Article  Google Scholar 

  24. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  25. Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline operator. In: Proceedings 17th International Conference on Data Engineering, ICDE 2001, pp. 421–430. IEEE Press, New York (2001)

    Google Scholar 

  26. Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. J. Mach. Learn. Res. 8, 2727–2754 (2007)

    MATH  Google Scholar 

  27. Cheung, C.W.: Probabilistic rank aggregation for multiple SVM ranking. MPhil Thesis. Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. Hong Kong (2009)

    Google Scholar 

  28. Sawaragi, Y., Nakayama, H., Tanino, T.: Theory of multiobjective optimization. Mathematics in Science and Engineering, vol. 176. Academic Press Inc., Orlando (1985)

    MATH  Google Scholar 

  29. Steuer, R.E.: Multiple criteria optimization: theory, computations, and application. John Wiley & Sons, Inc., New York (1986)

    MATH  Google Scholar 

  30. Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, Inc., New York (2002)

    MATH  Google Scholar 

  31. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. KanGAL Report no. 200001 (2000)

    Google Scholar 

  32. Zitzler, E., Thiele, L.: An evolutionary algorithm for multiobjective optimization: the strength Pareto approach. TIK-Report no. 43 (1998)

    Google Scholar 

  33. Handl, F., Kell, D.B., Knowles, J.D.: Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 279–292 (2007)

    Article  Google Scholar 

  34. Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31(3), 316–319 (2002)

    Google Scholar 

  35. Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)

    Article  Google Scholar 

  36. Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)

    Article  Google Scholar 

  37. Kearsey, M.J.: The principles of QTL analysis (a minimal mathematics approach). J. Exp. Bot. 49(327), 1619–1623 (1998)

    Article  Google Scholar 

  38. Datta, R., de Schoolmeester, M.L., Hedeler, C., Paton, N.W., Brass, A.M., Else, K.J.: Identification of novel genes in intestinal tissue that are regulated after infection with an intestinal nematode parasite. Infect. Immun. 73(7), 4025–4033 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Masseroli, M., Paton, N.W., Spasić, I. (2010). Chapter 15: Search Computing and the Life Sciences. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 5950. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12310-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12310-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12309-2

  • Online ISBN: 978-3-642-12310-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics