Skip to main content
Log in

Passage extraction and result combination for genomics information retrieval

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we first propose algorithms for passage extraction to build indices for the purpose of generating more accurate passages as query answers. Second, we propose a basic result combination method and an improved result combination method to combine the retrieved results from different indices for the purpose of selecting and merging relevant passages as outputs. For passage extraction, three new algorithms are proposed, namely paragraphParsed, sentenceParsed and wordSentenceParsed. For result combination, a novel method is proposed, in which we use factor analysis to generate a better baseline result for combination by finding some hidden common factors that can be used to estimate the importance of keywords and keyword associations. Finally, we report the experimental results that confirm the effectiveness and superiority of the factor analysis based method for result combination. Our proposed approaches achieve excellent results on the TREC 2006 and 2007 Genomics data sets, which provide a promising avenue for constructing high performance information retrieval systems in biomedicine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The “error” is a statistical term that means the amount by which an individual differs from what is average for the common factors.

References

  • Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S., & Williams, P. (1997). Okapi at TREC-5. In Proceedings of the 5th text REtrieval conference (pp. 143–166). NIST Special Publication.

  • Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions. ACM Transactions on Information Systems (TOIS), 12(1), 92–115.

    Article  Google Scholar 

  • Hersh, W., Cohen, A. M., & Roberts, P. (2007). TREC 2007 genomics track overview. In Proceedings of the 16th text REtrieval conference. NIST Special Publication.

  • Hersh, W., Cohen, A. M., Roberts, P., & Rekapalli1, H. K. (2006). TREC 2006 genomics track overview. In Proceedings of the 15th text REtrieval conference. NIST Special Publication.

  • Hersh, W., Cohen, A. M., & Yang, J. (2005). TREC 2005 genomics track overview. In Proceedings of 14th text REtrieval conference. NIST Special Publication.

  • Huang, X., Huang, Y., & Wen, M. (2005a). A dual index model for contextual IR. In Proceedings of the 28th international ACM SIGIR conference on research and development in information retrieval (pp. 613–614).

  • Huang, X., Peng, F., Schuurmans, D., Cercone, N., & Robertson, S. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval Journal, 6(4), 333–362.

    Article  Google Scholar 

  • Huang, X., Zhong, M., & Si, L. (2005b). York University at TREC 2005: Genomics track. In Proceedings of the 14th text retrieval conference.

  • Jiang, J., & Zhai, C. (2007). An empirical study of tokenization strategies for biomedical information retrieval. Information Retrieval, 10(4–5), 341–363.

    Article  Google Scholar 

  • Machado, A., & Marinho, C. (2003). An image retrieval method based on factor analysis. In Proceedings of the XVI Brazilian symposium on computer graphics and image processing (pp. 191–198).

  • Mandl, T. (1999). Efficient preprocessing for information retrieval with neural networks. Datenbank Rundbrief, 24, 54–60.

    Google Scholar 

  • Montegomery Douglas, C., Peck Elizabeth, A., & Geoffrey, V. G. (2001). Introduction to linear regression analysis (3rd ed.). New York: Wiley.

    Google Scholar 

  • Reyment, R., & Joreskog, G. (1996). Applied factor analysis in the natural sciences (2nd ed.). Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Richard, G. L. (1983). Factor analysis (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  • Robertson, E. S., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th international ACM SIGIR conference on research and development in information retrieval (pp. 232–241).

  • Subbarao, C., Subbarao, N., & Chandu, S. (1995). Characterisation of groundwater contamination using factor analysis. Environmental Geology, 28, 175–180.

    Article  Google Scholar 

  • Tsai, M. F., Wang, Y. T., & Chen, H. H. (2008). A study of learning a merge model for multilingual information retrieval. In Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval (pp. 195–202).

  • Wang, M., & Si, L. (2008). Discriminative probabilistic models for passage based retrieval. In Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval (pp. 419–426).

  • Zhong, M., & Huang, X. (2006). Concept-based biomedical text retrieval. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 723–724).

  • Zhou, W., Yu, C., Smalheiser, N., Torvik, V., & Hong, J. (2007). Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval (pp. 655–662).

  • Zhou, X., Hu, X., Zhang, X., Lin, X., & Song, I. (2006). Context-sensitive semantic smoothing for the language modeling approach to genomic IR. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 170–177).

Download references

Acknowledgements

This research is supported in part by the research grant from the Natural Sciences & Engineering Research Council (NSERC) of Canada and the Early Researcher Award/Premier’s Research Excellence Award. We would like to thank Ming Zhong and Luo Si for their contributions at the early stage of this project. The authors are also grateful to the anonymous reviewers for their constructive comments, which have helped improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jimmy Xiangji Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Q., Huang, J.X. Passage extraction and result combination for genomics information retrieval. J Intell Inf Syst 34, 249–274 (2010). https://doi.org/10.1007/s10844-009-0097-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-009-0097-4

Keywords

Navigation