Skip to main content

Joint Learning Using Multiple Types of Data and Knowledge

  • Chapter
Medical Informatics

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 8))

  • 3031 Accesses

Chapter Overview

This chapter discusses joint learning research in biomedical domains. A brief review of the field of joint learning research is given, with emphases on the large-scale data and knowledge resources used for learning and the central biological questions involved. Two representative joint learning case studies are presented with algorithmic details. The two case studies involved two representative joint learning tasks, protein function classification and regulatory network learning, and two important algorithmic frameworks for joint learning, the kernel-based framework and probabilistic graphical models. A wide range of biological data and existing knowledge was also involved in these two studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., and Boeckmann, B. (2004). “UniProt: The Universal Protein Knowledgebase,” Nucleic Acids Research 32, D115–D119.

    Article  PubMed  CAS  Google Scholar 

  • Bader, G. D., Betel, D. and Hogue, C. W. V. (2003). “BIND: The Biomolecular Interaction Network Database,” Nucleic Acids Research 31, 248–250.

    Article  PubMed  CAS  Google Scholar 

  • Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Wheeler, D. L. (2004). “GenBank: Update,” Nucleic Acids Research 32, D23–D26.

    Article  PubMed  CAS  Google Scholar 

  • Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C. and Estreicher, A. (2003). “The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003,” Nucleic Acids Research 31, 365–370.

    Article  PubMed  CAS  Google Scholar 

  • Bourne, P. E., Addess, K. J., Bluhm, W. F. and Chen, L. (2004). “The Distribution and Query Systems of the RCSB Protein Data Bank,” Nucleic Acids Research 32, D223–D225.

    Article  PubMed  CAS  Google Scholar 

  • Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M. and Al., E. (2003). “ArrayExpress: Public Repository For Microarray Gene Expression Data at the EBI,” Nucleic Acids Research 31, 68–71.

    Article  PubMed  CAS  Google Scholar 

  • Chrisman, L., Langley, P., Bay, S. and Pohorille, A. (2003). “Incorporating Biological Knowledge into Evaluation of Causal Regulatory Hypotheses,” in Pacific Symposium on Biocomputing, Pp. 128–139.

    Google Scholar 

  • Chu, T., Glymour, C., Scheines, R. and Spirtes, P. (2003). “A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurements with Microarrays,” Bioinformatics 19, 1147–52.

    Article  PubMed  CAS  Google Scholar 

  • De Hoon, M. J. L., Imoto, S., Kobayashi, K., Ogasawara, N. and Miyano, S. (2004). “Predicting the Operon Structure of Bacillus Subtilis Using Operon Length, Intergene Distance, and Gene Expression Information,” in Pacific Symposium on Biocomputing, Pp. 276–287.

    Google Scholar 

  • De Jong, H. (2002). “Modeling and Simulation of Genetic Regulatory Systems: A Literature Review,” Journal of Computational Biology 9, 67–103.

    Article  PubMed  Google Scholar 

  • Druzdzel, M. J. and Henrion, M. (1993). “Efficient Reasoning in Qualitative Probabilistic Networks,” in Eleventh National Conference on Artificial Intelligence, 548–553.

    Google Scholar 

  • Emmert-Buck, M. R., Strausberg, R. L., Krizman, D. B., Bonaldo, M. F. and Al., E. (2000). “Molecular Profiling of Clinical Tissue Specimens: Feasibility and Applications,” American Journal of Pathology, 156, 1109–1115.

    PubMed  CAS  Google Scholar 

  • Eskin, E. and Agichtein, E. (2004). “Combining Text Mining and Sequence Analysis to Discover Protein Functional Regions,” in Pacific Symposium on Biocomputing, Pp. 288–299.

    Google Scholar 

  • Forbus, K. D. (1984). “Qualitative Process Theory,” Artificial Intelligence 24, 85–168.

    Article  Google Scholar 

  • Friedman, N. (2004). “Inferring Cellular Networks Using Probabilistic Graphical Models,” Science 303, 799–805.

    Article  PubMed  CAS  Google Scholar 

  • Friedman, N., Linial, M., Nachman, I. and Pe’er, D. (2000). “Using Bayesian Network to Analyze Expression Data,” Journal of Computational Biology 7, 601–620.

    Article  PubMed  CAS  Google Scholar 

  • Gerber, G. K., Joseph, Z.-B., Lee, T. I., Robert, F., Gordon, D. B., Fraenkel, E., Simon, I., Jaakkola, T. S., Young, R. A. and Gifford, D. K. (2003). “Computational Discovery of Gene Modules and Regulatory Networks,” in 11th International Conference on Intelligent Systems For Molecular Biology.

    Google Scholar 

  • Gollub, J., Ball, C. A., Binkley, G., Sherlock, G. and Al., E. (2003). “The Stanford Microarray Database: Data Access and Quality Assessment Tools,” Nucleic Acids Research 31, 94–96.

    Article  PubMed  CAS  Google Scholar 

  • Hartemink, A. and Segal, E. (2004). “Session Introduction,” in Pacific Symposium on Biocomputing, Pp. 262–263.

    Google Scholar 

  • Hartemink, A. J., Gifford, D. K., Jaakkola, T. S. and Young, R. A. (2002). “Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models,” in Pacific Symposium on Biocomputing, Pp. 437–449.

    Google Scholar 

  • Heckerman, D., Geiger, D. and Chickering, D. H. (1995). “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data,” Machine Learning 20, 197–243.

    Google Scholar 

  • Heidtke, K. R. and Schulze-Kremer, S. (1998). “Design and implementation of a Qualitative Simulation Model of Lambda Phage infection,” Bioinformatics 14, 81–91.

    Article  PubMed  CAS  Google Scholar 

  • Henrion, M. and Druzdzel, M. J. (1991). “Qualitative Propagation and Scenario-based Approaches to Explanation in Probabilistic Reasoning,” Sixth Conference on Uncertainty in Artificial Intelligence, Pp. 17–32.

    Google Scholar 

  • Husmeier, D. (2003). “Sensitivity and Specificity of Inferring Genetic Regulatory Interactions from Microarray Experiments with Dynamic Bayesian Networks,” Bioinformatics 19, 2271–2282.

    Article  PubMed  CAS  Google Scholar 

  • Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S. and Miyano, S. (2003). “Estimating Gene Networks by Bayesian Networks from Microarrays and Biological Knowledge,” in 11th International Conference on Intelligent Systems For Molecular Biology.

    Google Scholar 

  • Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S. and Miyano, S. (2004). “Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks,” Journal of Bioinformatics and Computational Biology 2, 77–98.

    Article  PubMed  CAS  Google Scholar 

  • Iossifov, I., Krauthammer, M., Friedman, C., Hatzivassiloglou, V., Bader, J. S., White, K. P. and Rzhetsky, A. (2004). “Probabilistic Inference of Molecular Networks from Noisy Data Sources,” Bioinformatics 20, 1205–13.

    Article  PubMed  CAS  Google Scholar 

  • Jansen, R., Greenbaum, D. and Gerstein, M. (2002). “Relating Whole-genome Expression Data with Protein-protein Interactions,” Genome Research 12, 37–46.

    Article  PubMed  CAS  Google Scholar 

  • Kanehisa, M., Goto, S., Kawashima, S. and Nakaya, A. (2002). “The KEGG Databases at GenomeNet,” Nucleic Acids Research 30, 42–46.

    Article  PubMed  CAS  Google Scholar 

  • Karp, P. D. (1993). “A Qualitative Biochemistry and Its Application to the Tryptophan Operon,” in Hunter, L. (Ed), Artificial Intelligence and Molecular Biology, AAAI Press, Pp. 289–324.

    Google Scholar 

  • Karp, P. D. and Mavrovouniotis, M. M. (1994). “Representing, Analyzing, and Synthesizing Biochemical Pathways,” IEEE Expert 9, 11–22.

    Article  Google Scholar 

  • Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Collado-Vides, J., Paley, S. M., Pellegrini-Toole, A., Bonavides, C., & Gama-Castro, S. (2002). “The EcoCyc Database,” Nucleic Acids Research, 30, 56–58.

    Article  PubMed  CAS  Google Scholar 

  • Kazic, T. (1993). “Reasoning About Biochemical Compounds and Processes,” in Second International Conference on Bioinformatics, Supercomputing and the Human Genome Project. Singapore, Pp. 35–49.

    Google Scholar 

  • Kondor, R. I. and Lafferty., J. (2002). “Diffusion Kernels on Graphs and Other Discrete Input Spaces,” in International Conference on Machine Learning, Pp. 315–322.

    Google Scholar 

  • Krull, M., Voss, N., Choi, C., Pistor, S., Potapov, A. and Wingender, E. (2003). “TRANSPATH: An Integrated Database on Signal Transduction and a Tool for Array Analysis,” Nucleic Acids Res. 31, 97–100.

    Article  PubMed  CAS  Google Scholar 

  • Kuipers, B. (1986) “Qualitative Simulation,” Artificial Intelligence 29, 289–338.

    Article  Google Scholar 

  • Kuipers, B. and Kassirer, J. (1987) “Knowledge Acquisition by Analysis of Verbatim Protocols,” in Kidd, A. (Ed), Knowledge Acquisition For Expert Systems, Plenum, Pp. 289–338.

    Google Scholar 

  • Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E. and Jordan, M. I. (2002) “Learning the Kernel Matrix with Semi-definite Programming,” in 19th International Conference on Machine Learning, Pp. 323–330.

    Google Scholar 

  • Lanckriet, G. R. G., Deng, M., Cristianini, N., Jordan, M. I. and Noble, W. S. (2004) “Kernel-based Data Fusion and Its Application to Protein Function Prediction in Yeast,” in Pacific Symposium on Biocomputing, Pp. 300–311.

    Google Scholar 

  • Lemer, C, Antezana, E., Couche, F., Fays, F. and Al., E. (2004) “The AMAZE LightBench: A Web Interface to a Relational Database of Cellular Processes,” Nucleic Acids Research 32, D443–D448.

    Article  PubMed  CAS  Google Scholar 

  • Leroy, G. and Chen, H. (2002) “Filling Preposition-based Templates to Capture Information from Medical Abstracts,” in Pacific Symposium on Biocomputing, Pp. 350–361.

    Google Scholar 

  • Li, H., Li, J., Tan, S. H. and Ng, S.-K. (2004) “Discovery of Binding Motif Pairs from Protein Complex Structural Data and Protein Interaction Sequence Data,” in Pacific Symposium on Biocomputing, Pp. 312–323.

    Google Scholar 

  • McAdams, H. H. and Shapiro, L. (1995) “Circuit Simulation of Genetic Networks,” Science 269.

    Google Scholar 

  • Meyers, S. and Friedland, P. (1984) “Knowledge Based Simulation of Genetic Regulation in Bacteriophage Lambda,” Nucleic Acids Research 12, 1–9.

    PubMed  CAS  Google Scholar 

  • Model, F., Adorjan, P., Olek, A. and Piepenbrock, C. (2001) “Feature Selection for DNA Methylation Based Cancer Classification,” Bioinformatics 17, 157–164.

    Google Scholar 

  • Nariai, N., Kim, S., Imoto, S. and Miyano, S. (2004) “Using Protein-protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks,” Pacific Symposium on Biocomputing, Pp. 336–347.

    Google Scholar 

  • Pe’er, D., Regev, A., Elidan, G. and Friedman, N. (2001) “Inferring Subnetworks from Perturbed Expression Profiles,” Bioinformatics 17, S215–24.

    PubMed  Google Scholar 

  • Platt, J. C. (1998) “Fast Training of Support Vector Machines Using Sequential Minimum Pptimization,” in Schökopf, B., Burges, C, and Smola, A. (Ed), Advances in Kernel Methods-Support Vector Learning, MIT Press, Pp. 185–08.

    Google Scholar 

  • Prakash, A., Blanchette, M., Sinha, S. and Tompa, M. (2004) “Motif Discovery in Heterogeneous Sequence Data,” in Pacific Symposium on Biocomputing, Pp. 348–359.

    Google Scholar 

  • Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U. and Eisenberg, D. (2004). “The Database of Interacting Proteins: 2004 Update,” Nucleic Acids Research 32, D449–D451.

    Article  PubMed  CAS  Google Scholar 

  • Segal, E., Barash, Y., Simon, I., Friedman, N. and Koller, D. (2002) “From Promoter Sequence to Expression: A Probabilistic Framework,” in 6th International Conference on Research in Computational Molecular Biology

    Google Scholar 

  • Segal, E., Wang, H. and Koller, D. (2003) “Discovering Molecular Pathways from Protein Interaction and Gene Expression Data,” Bioinformatics 19, i264–i272.

    Article  PubMed  Google Scholar 

  • Smith, T. F. and Waterman, M. S. (1981). “Identification of Common Molecular Subsequences,” Journal of Molecular Biology 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  • Somorjai, R. L., Dolenko, B. and Baumgartner, R. (2003) “Class Prediction and Discovery Using Gene Microarray and Proteomics Mass Spectroscopy Data: Curses, Caveats, Cautions,” Bioinformatics 19, 1484–91.

    Article  PubMed  CAS  Google Scholar 

  • Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D. and Futcher, B. (1998) “Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Sacccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell 9,3, 273–297.

    Google Scholar 

  • Takai-Igarashi, T. and Kaminuma, T. (1999) “A Pathway Finding System for the Cell Signaling Networks Database,” Silico Biology 1, 129–146.

    CAS  Google Scholar 

  • Takusagawa, K. T. and Gifford, D. K. (2004) “Negative Information For Motif Discovery,” in Pacific Symposium on Biocomputing, Pp. 360–371.

    Google Scholar 

  • Tamada, Y., Kim, S., Bannai, H., Imoto, S., Tashiro, K., Kuhara, S. and Miyano, S. (2003). “Estimating Gene Networks from Gene Expression Data by Combining Bayesian Network Model with Promoter Element Detection,” Bioinformatics 19, II227–II236.

    Article  PubMed  Google Scholar 

  • Tsuda, K. and Noble, W. S. (2004) “Learning Kernels from Biological Networks by Maximizing Entropy,” Bioinformatics 20, I326–I333.

    Article  PubMed  CAS  Google Scholar 

  • Vandenberghe, L. and Boyd, S. (1996) “Semidefinite Programming,” SIAM Review 38, 49–95.

    Article  Google Scholar 

  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag.

    Google Scholar 

  • Weld, D. S. and De Kleer, J. (1990). Readings in Qualitative Reasoning About Physical Systems. Morgan Kaufmann.

    Google Scholar 

  • Wellman, M. P. (1990) “Fundamental Concepts of Qualitative Probabilistic Networks,” Artificial Intelligence 44, 257–303.

    Article  Google Scholar 

  • Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L.-S. L. and Natale, D. A. (2004) “PIRSF: Family Classification System At the Protein Information Resource,” Nucleic Acids Research 32, D112–D114.

    Article  PubMed  CAS  Google Scholar 

  • Yoo, C, Thorsson, V. and Cooper, G. F. (2002) “Discovery of Causal Relationships in a Gene-regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data,” in Pacific Symposium on Biocomputing, Pp. 498–509.

    Google Scholar 

  • Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M. and Cesareni, G. (2002) “MINT: A Molecular INTeraction Database,” FEBS Letters 513, 135–140.

    Article  PubMed  CAS  Google Scholar 

Suggested Readings

  • Baldi, P. and S. Brunak. 2001. Bioinformatics: The Machine Learning Approach, The MIT Press, Cambridge.

    Google Scholar 

  • Buntine, W. 1996. “A guide to the literature on learning probabilistic networks from data,” IEEE Transactions on Knowledge and Data Engineering, 8(2), 195–210.

    Article  Google Scholar 

  • Cheng, J., R. Greiner, J. Kelly, D. A. Bell and W. Liu. 2002. “Learning Bayesian networks from data: an information-theory based approach,” The Artificial Intelligence Journal, 137, 43–90.

    Article  Google Scholar 

  • Chrisman, L., P. Langley, S. Bay and A. Pohorille. 2003. “Incorporating biological knowledge into evaluation of causal regulatory hypotheses,” In the Proceedings of Pacific Symposium on Biocomputing, 8, 128–139.

    Google Scholar 

  • De Jong, H. 2002. “Modeling and simulation of genetic regulatory systems: a literature review,” Journal of Computational Biology, 9, 67–103.

    Article  PubMed  Google Scholar 

  • Friedman, N. 2004. “Inferring cellular networks using probabilistic graphical models,” Science, 303(5659), 799–805.

    Article  PubMed  CAS  Google Scholar 

  • Hartemink, A. J., D. K. Gifford, T. S. Jaakkola and R. A. Young. 2002. “Combining location and expression data for principled discovery of genetic regulatory network models,” In the Proceedings of Pacific Symposium on Biocomputing, 7, 437–449.

    Google Scholar 

  • Imoto, S., T. Higuchi, T. Goto, K. Tashiro, S. Kuhara and S. Miyano. 2004. “Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks,” Journal of Bioinformatics and Computational Biology, 2(1), 77–98.

    Article  PubMed  CAS  Google Scholar 

  • Lanckriet, G. R. G., M. Deng, N. Cristianini, M. I. Jordan and W. S. Noble. 2004. “Kernel-based data fusion and Its application to protein function prediction in yeast,” In the Proceedings of Pacific Symposium on Biocomputing, 9, 300–311.

    Google Scholar 

  • Segal, E., H. Wang and D. Koller. 2003. “Discovering molecular pathways from protein interaction and gene expression data,” Bioinformatics, 19(Suppl: 1), i264–i272.

    Article  PubMed  Google Scholar 

  • Speed, T. 2003. Statistical Analysis of Gene Expression Microarray Data, CRC Press.

    Google Scholar 

  • Tamada, Y., S. Kim, H. Bannai, S. Imoto, K. Tashiro, S. Kuhara and S. Miyano. 2003. “Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection,” Bioinformatics, 19(Suppl 2), II227–II236.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Huang, Z., Su, H., Chen, H. (2005). Joint Learning Using Multiple Types of Data and Knowledge. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_21

Download citation

  • DOI: https://doi.org/10.1007/0-387-25739-X_21

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24381-8

  • Online ISBN: 978-0-387-25739-6

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics