Chapter 15: Search Computing and the Life Sciences

Masseroli, Marco; Paton, Norman W.; Spasić, Irena

doi:10.1007/978-3-642-12310-8_15

Marco Masseroli¹⁷,
Norman W. Paton¹⁸ &
Irena Spasić¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5950))

953 Accesses
3 Citations

Abstract

Search Computing has been proposed to support the integration of the results of search engines with other data and computational resources. A key feature of the resulting integration platform is direct support for multi-domain ordered data, reflecting the fact that search engines produce ranked outputs, which should be taken into account when the results of several requests are combined. In the life sciences, there are many different types of ranked data. For example, ranked data may represent many different phenomena, including physical ordering within a genome, algorithmically assigned scores that represent levels of sequence similarity, and experimentally measured values such as expression levels. This chapter explores the extent to which the search computing functionalities designed for use with search engine results may be applicable for different forms of ranked data that are encountered when carrying out data integration in the life sciences. This is done by classifying different types of ranked data in the life sciences, providing examples of different types of ranking and ranking integration needs in the life sciences, identifying issues in the integration of such ranked data, and discussing techniques for drawing conclusions from diverse rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stead, D., Paton, N.W., Missier, P., Embury, S.M., Hedeler, C., Jin, B., Brown, A.J.P., Preece, A.D.: Information quality in proteomics. Brief. Bioinform. 9(2), 174–188 (2008)
Article Google Scholar
Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., Contrino, S., Coulson, R., Farne, A., Lara, G.G., Holloway, E., Kapushesky, M., Lilja, P., Mukherjee, G., Oezcimen, A., Rayner, T., Rocca-Serra, P., Sharma, A., Sansone, S., Brazma, A.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 33(Database issue), D553-D555 (2005)
Article Google Scholar
Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research annual database issue and the NAR online molecular biology database collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)
Article Google Scholar
Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(suppl. 2), S8 (2008)
Article Google Scholar
Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005)
Article Google Scholar
Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Mashing up search services. IEEE Internet Comput. 12(5), 16–23 (2008)
Article Google Scholar
Hernandez, T., Kambhampati, S.: Integration of biological sources: current systems and challenges ahead. SIGMOD Record 33(3), 51–60 (2004)
Article Google Scholar
Masseroli, M., Ceri, S., Campi, A.: Integration and mining of genomic annotations: experiences and perspectives in GFINDer data warehousing. In: Paton, N.W., Missier, P., Hedeler, C. (eds.) DILS 2009. LNCS (LNBI), vol. 5647, pp. 88–95. Springer, Heidelberg (2009)
Chapter Google Scholar
Hull, D., Wolstencroft, K., Stevens, R., Goble, C.A., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)
Article Google Scholar
Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 534–551 (2001)
Article Google Scholar
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International World Wide Web Conference, WWW 2001, pp. 613–622. ACM Press, New York (2001)
Google Scholar
Edgar, R., Domravech, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Article Google Scholar
Jones, P., Côté, R.G., Martens, L., Quinn, A.F., Taylor, C.F., Derache, W., Hermjakob, H., Apweiler, R.: PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 34(Database Issue), D659–D663 (2006)
Article Google Scholar
Olken, F.: Graph data management for molecular biology. OMICS: A Journal of Integr. Biol. 7(1), 75–78 (2003)
Article Google Scholar
Castrillo, J.I., Zeef, L.A., Hoyle, D.C., Zhang, N., Hayes, A., Gardner, D.C., Cornell, M.J., Petty, J., Hakes, L., Wardleworth, L., Rash, B., Brown, M., Dunn, W.B., Broadhurst, D., O’Donoghue, K., Hester, S.S., Dunkley, T.P., Hart, S.R., Swainston, N., Li, P., Gaskell, S.J., Paton, N.W., Lilley, K.S., Kell, D.B., Oliver, S.G.: Growth control of the eukaryote cell: a systems biology study in yeast. J. Biol. 6(2), 4 (2007)
Article Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process Manag. 24(5), 513–523 (1988)
Article Google Scholar
Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.J., Hsu, C.N., Tsai, R.T., Hung, H.C., Lau, W.W., Johnson, C.A., Saetre, R., Yoshida, K., Chen, Y.H., Kim, S., Shin, S.Y., Zhang, B.T., Baumgartner Jr., W.A., Hunter, L., Haddow, B., Matthews, M., Wang, X., Ruch, P., Ehrler, F., Ozgür, A., Erkan, G., Radev, D.R., Krauthammer, M., Luong, T., Hoffmann, R., Sander, C., Valencia, A.: Introducing meta-services for biomedical information extraction. Genome Biol. 9(suppl. 2), S6 (2008)
Article Google Scholar
Goble, C.A., Belhajjame, K., Tanoh, F., Bhagat, J., Wolstencroft, K., Stevens, R., Pettifer, S., Nzuobontane, E., McWilliam, H., Laurent, T., Lopez, R.: BioCatalogue: a curated Web Service registry for the Life Science community. In: ISMB/ECCB 2009. Technology Track: TT40 (2009)
Google Scholar
Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., Tarczy-Hornoch, P.: Data integration and genomic medicine. J. Biomed. Inform. 40(1), 5–16 (2007)
Article Google Scholar
Pihur, V., Datta, S., Datta, S.: Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13), 1607–1615 (2007)
Article Google Scholar
DeConde, R., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5, Article 15 (2006)
Google Scholar
Pihur, V., Datta, S., Datta, S.: RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics 10, 62 (2009)
Article Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)
Article MathSciNet MATH Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline operator. In: Proceedings 17th International Conference on Data Engineering, ICDE 2001, pp. 421–430. IEEE Press, New York (2001)
Google Scholar
Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. J. Mach. Learn. Res. 8, 2727–2754 (2007)
MATH Google Scholar
Cheung, C.W.: Probabilistic rank aggregation for multiple SVM ranking. MPhil Thesis. Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. Hong Kong (2009)
Google Scholar
Sawaragi, Y., Nakayama, H., Tanino, T.: Theory of multiobjective optimization. Mathematics in Science and Engineering, vol. 176. Academic Press Inc., Orlando (1985)
MATH Google Scholar
Steuer, R.E.: Multiple criteria optimization: theory, computations, and application. John Wiley & Sons, Inc., New York (1986)
MATH Google Scholar
Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, Inc., New York (2002)
MATH Google Scholar
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. KanGAL Report no. 200001 (2000)
Google Scholar
Zitzler, E., Thiele, L.: An evolutionary algorithm for multiobjective optimization: the strength Pareto approach. TIK-Report no. 43 (1998)
Google Scholar
Handl, F., Kell, D.B., Knowles, J.D.: Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 279–292 (2007)
Article Google Scholar
Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31(3), 316–319 (2002)
Google Scholar
Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)
Article Google Scholar
Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)
Article Google Scholar
Kearsey, M.J.: The principles of QTL analysis (a minimal mathematics approach). J. Exp. Bot. 49(327), 1619–1623 (1998)
Article Google Scholar
Datta, R., de Schoolmeester, M.L., Hedeler, C., Paton, N.W., Brass, A.M., Else, K.J.: Identification of novel genes in intestinal tissue that are regulated after infection with an intestinal nematode parasite. Infect. Immun. 73(7), 4025–4033 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informatzione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milano, Italy
Marco Masseroli
School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Norman W. Paton & Irena Spasić

Authors

Marco Masseroli
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar
Irena Spasić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza L. Da Vinci, 32, I20133, Milano, Italy
Stefano Ceri & Marco Brambilla &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masseroli, M., Paton, N.W., Spasić, I. (2010). Chapter 15: Search Computing and the Life Sciences. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 5950. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12310-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-12310-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12309-2
Online ISBN: 978-3-642-12310-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics