Chapter Overview
This chapter discusses joint learning research in biomedical domains. A brief review of the field of joint learning research is given, with emphases on the large-scale data and knowledge resources used for learning and the central biological questions involved. Two representative joint learning case studies are presented with algorithmic details. The two case studies involved two representative joint learning tasks, protein function classification and regulatory network learning, and two important algorithmic frameworks for joint learning, the kernel-based framework and probabilistic graphical models. A wide range of biological data and existing knowledge was also involved in these two studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., and Boeckmann, B. (2004). “UniProt: The Universal Protein Knowledgebase,” Nucleic Acids Research 32, D115–D119.
Bader, G. D., Betel, D. and Hogue, C. W. V. (2003). “BIND: The Biomolecular Interaction Network Database,” Nucleic Acids Research 31, 248–250.
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Wheeler, D. L. (2004). “GenBank: Update,” Nucleic Acids Research 32, D23–D26.
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C. and Estreicher, A. (2003). “The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003,” Nucleic Acids Research 31, 365–370.
Bourne, P. E., Addess, K. J., Bluhm, W. F. and Chen, L. (2004). “The Distribution and Query Systems of the RCSB Protein Data Bank,” Nucleic Acids Research 32, D223–D225.
Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M. and Al., E. (2003). “ArrayExpress: Public Repository For Microarray Gene Expression Data at the EBI,” Nucleic Acids Research 31, 68–71.
Chrisman, L., Langley, P., Bay, S. and Pohorille, A. (2003). “Incorporating Biological Knowledge into Evaluation of Causal Regulatory Hypotheses,” in Pacific Symposium on Biocomputing, Pp. 128–139.
Chu, T., Glymour, C., Scheines, R. and Spirtes, P. (2003). “A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurements with Microarrays,” Bioinformatics 19, 1147–52.
De Hoon, M. J. L., Imoto, S., Kobayashi, K., Ogasawara, N. and Miyano, S. (2004). “Predicting the Operon Structure of Bacillus Subtilis Using Operon Length, Intergene Distance, and Gene Expression Information,” in Pacific Symposium on Biocomputing, Pp. 276–287.
De Jong, H. (2002). “Modeling and Simulation of Genetic Regulatory Systems: A Literature Review,” Journal of Computational Biology 9, 67–103.
Druzdzel, M. J. and Henrion, M. (1993). “Efficient Reasoning in Qualitative Probabilistic Networks,” in Eleventh National Conference on Artificial Intelligence, 548–553.
Emmert-Buck, M. R., Strausberg, R. L., Krizman, D. B., Bonaldo, M. F. and Al., E. (2000). “Molecular Profiling of Clinical Tissue Specimens: Feasibility and Applications,” American Journal of Pathology, 156, 1109–1115.
Eskin, E. and Agichtein, E. (2004). “Combining Text Mining and Sequence Analysis to Discover Protein Functional Regions,” in Pacific Symposium on Biocomputing, Pp. 288–299.
Forbus, K. D. (1984). “Qualitative Process Theory,” Artificial Intelligence 24, 85–168.
Friedman, N. (2004). “Inferring Cellular Networks Using Probabilistic Graphical Models,” Science 303, 799–805.
Friedman, N., Linial, M., Nachman, I. and Pe’er, D. (2000). “Using Bayesian Network to Analyze Expression Data,” Journal of Computational Biology 7, 601–620.
Gerber, G. K., Joseph, Z.-B., Lee, T. I., Robert, F., Gordon, D. B., Fraenkel, E., Simon, I., Jaakkola, T. S., Young, R. A. and Gifford, D. K. (2003). “Computational Discovery of Gene Modules and Regulatory Networks,” in 11th International Conference on Intelligent Systems For Molecular Biology.
Gollub, J., Ball, C. A., Binkley, G., Sherlock, G. and Al., E. (2003). “The Stanford Microarray Database: Data Access and Quality Assessment Tools,” Nucleic Acids Research 31, 94–96.
Hartemink, A. and Segal, E. (2004). “Session Introduction,” in Pacific Symposium on Biocomputing, Pp. 262–263.
Hartemink, A. J., Gifford, D. K., Jaakkola, T. S. and Young, R. A. (2002). “Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models,” in Pacific Symposium on Biocomputing, Pp. 437–449.
Heckerman, D., Geiger, D. and Chickering, D. H. (1995). “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data,” Machine Learning 20, 197–243.
Heidtke, K. R. and Schulze-Kremer, S. (1998). “Design and implementation of a Qualitative Simulation Model of Lambda Phage infection,” Bioinformatics 14, 81–91.
Henrion, M. and Druzdzel, M. J. (1991). “Qualitative Propagation and Scenario-based Approaches to Explanation in Probabilistic Reasoning,” Sixth Conference on Uncertainty in Artificial Intelligence, Pp. 17–32.
Husmeier, D. (2003). “Sensitivity and Specificity of Inferring Genetic Regulatory Interactions from Microarray Experiments with Dynamic Bayesian Networks,” Bioinformatics 19, 2271–2282.
Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S. and Miyano, S. (2003). “Estimating Gene Networks by Bayesian Networks from Microarrays and Biological Knowledge,” in 11th International Conference on Intelligent Systems For Molecular Biology.
Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S. and Miyano, S. (2004). “Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks,” Journal of Bioinformatics and Computational Biology 2, 77–98.
Iossifov, I., Krauthammer, M., Friedman, C., Hatzivassiloglou, V., Bader, J. S., White, K. P. and Rzhetsky, A. (2004). “Probabilistic Inference of Molecular Networks from Noisy Data Sources,” Bioinformatics 20, 1205–13.
Jansen, R., Greenbaum, D. and Gerstein, M. (2002). “Relating Whole-genome Expression Data with Protein-protein Interactions,” Genome Research 12, 37–46.
Kanehisa, M., Goto, S., Kawashima, S. and Nakaya, A. (2002). “The KEGG Databases at GenomeNet,” Nucleic Acids Research 30, 42–46.
Karp, P. D. (1993). “A Qualitative Biochemistry and Its Application to the Tryptophan Operon,” in Hunter, L. (Ed), Artificial Intelligence and Molecular Biology, AAAI Press, Pp. 289–324.
Karp, P. D. and Mavrovouniotis, M. M. (1994). “Representing, Analyzing, and Synthesizing Biochemical Pathways,” IEEE Expert 9, 11–22.
Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Collado-Vides, J., Paley, S. M., Pellegrini-Toole, A., Bonavides, C., & Gama-Castro, S. (2002). “The EcoCyc Database,” Nucleic Acids Research, 30, 56–58.
Kazic, T. (1993). “Reasoning About Biochemical Compounds and Processes,” in Second International Conference on Bioinformatics, Supercomputing and the Human Genome Project. Singapore, Pp. 35–49.
Kondor, R. I. and Lafferty., J. (2002). “Diffusion Kernels on Graphs and Other Discrete Input Spaces,” in International Conference on Machine Learning, Pp. 315–322.
Krull, M., Voss, N., Choi, C., Pistor, S., Potapov, A. and Wingender, E. (2003). “TRANSPATH: An Integrated Database on Signal Transduction and a Tool for Array Analysis,” Nucleic Acids Res. 31, 97–100.
Kuipers, B. (1986) “Qualitative Simulation,” Artificial Intelligence 29, 289–338.
Kuipers, B. and Kassirer, J. (1987) “Knowledge Acquisition by Analysis of Verbatim Protocols,” in Kidd, A. (Ed), Knowledge Acquisition For Expert Systems, Plenum, Pp. 289–338.
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E. and Jordan, M. I. (2002) “Learning the Kernel Matrix with Semi-definite Programming,” in 19th International Conference on Machine Learning, Pp. 323–330.
Lanckriet, G. R. G., Deng, M., Cristianini, N., Jordan, M. I. and Noble, W. S. (2004) “Kernel-based Data Fusion and Its Application to Protein Function Prediction in Yeast,” in Pacific Symposium on Biocomputing, Pp. 300–311.
Lemer, C, Antezana, E., Couche, F., Fays, F. and Al., E. (2004) “The AMAZE LightBench: A Web Interface to a Relational Database of Cellular Processes,” Nucleic Acids Research 32, D443–D448.
Leroy, G. and Chen, H. (2002) “Filling Preposition-based Templates to Capture Information from Medical Abstracts,” in Pacific Symposium on Biocomputing, Pp. 350–361.
Li, H., Li, J., Tan, S. H. and Ng, S.-K. (2004) “Discovery of Binding Motif Pairs from Protein Complex Structural Data and Protein Interaction Sequence Data,” in Pacific Symposium on Biocomputing, Pp. 312–323.
McAdams, H. H. and Shapiro, L. (1995) “Circuit Simulation of Genetic Networks,” Science 269.
Meyers, S. and Friedland, P. (1984) “Knowledge Based Simulation of Genetic Regulation in Bacteriophage Lambda,” Nucleic Acids Research 12, 1–9.
Model, F., Adorjan, P., Olek, A. and Piepenbrock, C. (2001) “Feature Selection for DNA Methylation Based Cancer Classification,” Bioinformatics 17, 157–164.
Nariai, N., Kim, S., Imoto, S. and Miyano, S. (2004) “Using Protein-protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks,” Pacific Symposium on Biocomputing, Pp. 336–347.
Pe’er, D., Regev, A., Elidan, G. and Friedman, N. (2001) “Inferring Subnetworks from Perturbed Expression Profiles,” Bioinformatics 17, S215–24.
Platt, J. C. (1998) “Fast Training of Support Vector Machines Using Sequential Minimum Pptimization,” in Schökopf, B., Burges, C, and Smola, A. (Ed), Advances in Kernel Methods-Support Vector Learning, MIT Press, Pp. 185–08.
Prakash, A., Blanchette, M., Sinha, S. and Tompa, M. (2004) “Motif Discovery in Heterogeneous Sequence Data,” in Pacific Symposium on Biocomputing, Pp. 348–359.
Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U. and Eisenberg, D. (2004). “The Database of Interacting Proteins: 2004 Update,” Nucleic Acids Research 32, D449–D451.
Segal, E., Barash, Y., Simon, I., Friedman, N. and Koller, D. (2002) “From Promoter Sequence to Expression: A Probabilistic Framework,” in 6th International Conference on Research in Computational Molecular Biology
Segal, E., Wang, H. and Koller, D. (2003) “Discovering Molecular Pathways from Protein Interaction and Gene Expression Data,” Bioinformatics 19, i264–i272.
Smith, T. F. and Waterman, M. S. (1981). “Identification of Common Molecular Subsequences,” Journal of Molecular Biology 147, 195–197.
Somorjai, R. L., Dolenko, B. and Baumgartner, R. (2003) “Class Prediction and Discovery Using Gene Microarray and Proteomics Mass Spectroscopy Data: Curses, Caveats, Cautions,” Bioinformatics 19, 1484–91.
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D. and Futcher, B. (1998) “Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Sacccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell 9,3, 273–297.
Takai-Igarashi, T. and Kaminuma, T. (1999) “A Pathway Finding System for the Cell Signaling Networks Database,” Silico Biology 1, 129–146.
Takusagawa, K. T. and Gifford, D. K. (2004) “Negative Information For Motif Discovery,” in Pacific Symposium on Biocomputing, Pp. 360–371.
Tamada, Y., Kim, S., Bannai, H., Imoto, S., Tashiro, K., Kuhara, S. and Miyano, S. (2003). “Estimating Gene Networks from Gene Expression Data by Combining Bayesian Network Model with Promoter Element Detection,” Bioinformatics 19, II227–II236.
Tsuda, K. and Noble, W. S. (2004) “Learning Kernels from Biological Networks by Maximizing Entropy,” Bioinformatics 20, I326–I333.
Vandenberghe, L. and Boyd, S. (1996) “Semidefinite Programming,” SIAM Review 38, 49–95.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag.
Weld, D. S. and De Kleer, J. (1990). Readings in Qualitative Reasoning About Physical Systems. Morgan Kaufmann.
Wellman, M. P. (1990) “Fundamental Concepts of Qualitative Probabilistic Networks,” Artificial Intelligence 44, 257–303.
Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L.-S. L. and Natale, D. A. (2004) “PIRSF: Family Classification System At the Protein Information Resource,” Nucleic Acids Research 32, D112–D114.
Yoo, C, Thorsson, V. and Cooper, G. F. (2002) “Discovery of Causal Relationships in a Gene-regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data,” in Pacific Symposium on Biocomputing, Pp. 498–509.
Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M. and Cesareni, G. (2002) “MINT: A Molecular INTeraction Database,” FEBS Letters 513, 135–140.
Suggested Readings
Baldi, P. and S. Brunak. 2001. Bioinformatics: The Machine Learning Approach, The MIT Press, Cambridge.
Buntine, W. 1996. “A guide to the literature on learning probabilistic networks from data,” IEEE Transactions on Knowledge and Data Engineering, 8(2), 195–210.
Cheng, J., R. Greiner, J. Kelly, D. A. Bell and W. Liu. 2002. “Learning Bayesian networks from data: an information-theory based approach,” The Artificial Intelligence Journal, 137, 43–90.
Chrisman, L., P. Langley, S. Bay and A. Pohorille. 2003. “Incorporating biological knowledge into evaluation of causal regulatory hypotheses,” In the Proceedings of Pacific Symposium on Biocomputing, 8, 128–139.
De Jong, H. 2002. “Modeling and simulation of genetic regulatory systems: a literature review,” Journal of Computational Biology, 9, 67–103.
Friedman, N. 2004. “Inferring cellular networks using probabilistic graphical models,” Science, 303(5659), 799–805.
Hartemink, A. J., D. K. Gifford, T. S. Jaakkola and R. A. Young. 2002. “Combining location and expression data for principled discovery of genetic regulatory network models,” In the Proceedings of Pacific Symposium on Biocomputing, 7, 437–449.
Imoto, S., T. Higuchi, T. Goto, K. Tashiro, S. Kuhara and S. Miyano. 2004. “Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks,” Journal of Bioinformatics and Computational Biology, 2(1), 77–98.
Lanckriet, G. R. G., M. Deng, N. Cristianini, M. I. Jordan and W. S. Noble. 2004. “Kernel-based data fusion and Its application to protein function prediction in yeast,” In the Proceedings of Pacific Symposium on Biocomputing, 9, 300–311.
Segal, E., H. Wang and D. Koller. 2003. “Discovering molecular pathways from protein interaction and gene expression data,” Bioinformatics, 19(Suppl: 1), i264–i272.
Speed, T. 2003. Statistical Analysis of Gene Expression Microarray Data, CRC Press.
Tamada, Y., S. Kim, H. Bannai, S. Imoto, K. Tashiro, S. Kuhara and S. Miyano. 2003. “Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection,” Bioinformatics, 19(Suppl 2), II227–II236.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Huang, Z., Su, H., Chen, H. (2005). Joint Learning Using Multiple Types of Data and Knowledge. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_21
Download citation
DOI: https://doi.org/10.1007/0-387-25739-X_21
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24381-8
Online ISBN: 978-0-387-25739-6
eBook Packages: MedicineMedicine (R0)