Abstract
In the era of information overflow, data mining and machine learning are indispensable tools to retrieve information and knowledge from data. The idea of incorporating several data sources in analysis may be beneficial by reducing the noise, as well as by improving statistical significance and leveraging the interactions and correlations between data sources to obtain more refined and higher-level information [50], which is known as data fusion. In bioinformatics, considerable effort has been devoted to genomic data fusion, which is an emerging topic pertaining to a lot of applications. At present, terabytes of data are generated by high-throughput techniques at an increasing rate. In data fusion, these terabytes are further multiplied by the number of data sources or the number of species. A statistical model describing this data is therefore not an easy matter. To tackle this challenge, it is rather effective to consider the data as being generated by a complex and unknown black box with the goal of finding a function or an algorithm that operates on an input to predict the output. About 15 years ago, Boser [8] and Vapnik [51] introduced the support vector method which makes use of kernel functions. This method has offered plenty of opportunities to solve complicated problems but also brought lots of interdisciplinary challenges in statistics, optimization theory, and the applications therein [40].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., Carmeliet, P., Moreau, Y.: Gene prioritization through genomic data fusion. Nature Biotechnology 24, 537–544 (2006)
Aerts, S., Van Loo, P., Thijs, G., Mayer, H., de Martin, R., Moreau, Y., De Moor, B.: TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Research 396, W393–W396 (2005)
Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)
Andersen, E.D., Andersen, K.D.: The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In: High Perf. Optimization, pp. 197–232. Kluwer Academic Publishers, New York (2000)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 25, 25–29 (2000)
Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of 21st International Conference of Machine Learning. ACM Press, New York (2004)
van den Bosch, T., Daemen, A., Gevaert, O., Timmerman, D.: Mathematical decision trees versus clinician based algorithms in the diagnosis of endometrial disease. In: Proc. of the 17th World Congress on Ultrasound in Obstetrics and Gynecology (ISUOG), vol. 412 (2007)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press, New York (1992)
Bottomley, C., Daemen, A., Mukri, F., Papageorghiou, A.T., Kirk, E., Pexsters, A., De Moor, B., Timmerman, D., Bourne, T.: Functional linear discriminant analysis: a new longitudinal approach to the assessment of embryonic growth. Human Reproduction 24, 278–283 (2007)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Cawley, G.C.: Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs. In: Proc. of 2006 International Joint Conference on Neural Networks, pp. 1661–1668. IEEE press, Los Alamitos (2006)
Condous, G., Okaro, E., Khalid, A., Timmerman, D., Lu, C., Zhou, Y., Van Huffel, S., Bourne, T.: The use of a new logistic regression model for predicting the outcome of pregnancies of unknown location. Human Reproduction 21, 278–283 (2004)
Daemen, A., De Moor, B.: Development of a kernel function for clinical data. In: Proc. of the 31th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5913–5917. IEEE press, Los Alamitos (2009)
Daemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J.A.K., Sempous, C., Machiels, J.P., Haustermans, K., De Moor, B.: A kernel-based integration of genome-wide data for clinical decision support. Genome Medicine 1, 39 (2009)
De Bie, T., Tranchevent, L.C., Van Oeffelen, L., Moreau, Y.: Kernel-based data fusion for gene prioritization. Bioinformatics 132, i125–i132 (2007)
Eeles, R.A., Kote-Jarai, Z., Giles, G.G., Olama, A.A.A., Guy, M., Jugurnauth, S.K., Mulholland, S., Leongamornlert, D.A., Edwards, S.M., Morrison, J., et al.: Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genetics 40, 316–321 (2008)
Flicek, P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Caotes, G., Gunningham, F., Cutts, T., Down, T., Dyer, S.C., Eyre, T., Fitzgerald, S., Fernandez-Banet, J., Gräf, S., Haider, S., Hammond, R., Holland, R., Howe, K.L., Howe, K., Johnson, N., Jenkinson, A., Kähäri, A., Keefe, D., Kokocinski, F., Kulesha, E., Lawson, D., Longden, I., Megy, K., Meidl, P., Overduin, B., Parker, A., Pritchard, B., Prlic, A., Rice, S., Rios, D., Schuster, M., Sealy, I., Slater, G., Smedley, D., Spudich, G., Trevanion, S., Vilella, A.J., Vogel, J., White, S., Wood, M., Birney, E., Cox, T., Curwen, V., Durbin, R., Fernandez-Suarez, X.M., Herrero, J., Hubbard, T.J.P., Kasprzyk, A., Proctor, G., Smith, J., Ureta-Vidal, A., Searle, S.: Ensembl 2008. Nucleic Acids Research 36, D707–D714 (2007)
Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., De Moor, B.: Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 190, e184–e190 (2006)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. Recent Advances in Learning and Control 371, 95–110 (2008)
Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, version 1.21 (2010), http://cvxr.com/cvx
Gudmundsson, J., Sulem, P., Rafnar, T., Bergthorsson, J.T., Manolescu, A., Gudbjartsson, D., Agnarsson, B.A., Sigurdsson, A., Benediktsdottir, K.R., Blondal, T., et al.: Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nature Genetics 40, 281–283 (2008)
Hettich, R., Kortanek, K.O.: Semi-infinite programming: theory, methods, and applications. SIAM Review 35, 380–429 (1993)
Kaliski, J., Haglin, D., Roos, C., Terlaky, T.: Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operations Research 4, 285–303 (1997)
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y.: KEGG for linking genomes to life and the environment. Nucleic Acids Research 36, D480–D484 (2008)
Kim, S.J., Magnani, A., Boyd, S.: Optimal kernel selection in kernel fisher discriminant analysis. In: Proceeding of 23rd International Conference of Machine Learning. ACM Press, New York (2006)
Kloft, M., Brefeld, U., Laskov, P., Sonnenburg, S.: Non-sparse multiple kernel learning. In: NIPS 2008 Workshop: Kernel Learning Automatic Selection of Optimal Kernels (2008)
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., Müller, K.R., Zien, A.: Efficient and Accurate Lp-norm Multiple Kernel Learning. In: Advances in Neural Information Processing Systems, vol. 22. MIT Press, Cambridge (2009)
Kowalski, M., Szafranski, M., Ralaivola, L.: Multiple indefinite kernel learning with mixed norm regularization. In: Proc. of the 26th International Conference of Machine Learning. ACM Press, New York (2009)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Reserach 5, 27–72 (2005)
Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20, 2626–2635 (2004)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proc. of the Pacific Symposium on Biocomputing 2002, pp. 564–575 (2002)
Matys, V., Fricke, E., Geffers, R., Gößling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.-U., Land, S., Lewicki-Potapov, B., Michael, H., Münch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research 31, 374–378 (2003)
Moult, J., Fidelis, K., Kryshtafovych, A., Rost, B., Tramontano, A.: Critical assessment of methods of protein structure prediction - Round VIII. Proteins 69(S8), 3–9 (2009)
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk-Genevaux, P.S., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A.N., Orchard, S., Orengo, C., Petryszak, R., Selengut, J.D., Sigrist, C.J.A., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., Yeats, C.: New developments in the InterPro database. Nucleic Acids Research 35, D224–D228 (2007)
Ng, A.Y.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of 21st International Conference of Machine Learning. ACM Press, New York (2004)
Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications. Tech. Rep. AIM-1602 (1997)
Reemtsen, R.: Some other approximation methods for semi-infinite optimization problems. Jounral of Computational and Applied Mathematics 53, 87–108 (1994)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13, 1443–1471 (2001)
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software 11/12, 625–653 (1999)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)
Son, C.G., Bilke, S., Davis, S., Greer, B.T., Wei, J.S., Whiteford, C.C., Chen, Q.R., Cenacchi, N., Khan, J.: Database of mRNA gene expression profiles of multiple human organs. Genome Research 15, 443–450 (2005)
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Machine Learning Research 7, 1531–1565 (2006)
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., Patapoutian, A., Hampton, G.M., Schultz, P.G., Hogenesch, J.B.: Large-scale analysis of the human and mouse transcriptomes. PNAS 99, 4465–4470 (2002)
Suykens, J.A.K., De Brabanter, J., Lukas, L., Vandewalle, J.: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing, Special issue on fundamental and information processing aspects of neurocomputing 48, 85–105 (2002)
Suykens, J.A.K., Van Gestel, T., Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Press, Singapore (2002)
Suykens, J.A.K., Vandewalle, J.: Multiclass Least Squares Support Vector Machines. In: Proc. of IJCNN 1999. IEEE, Los Alamitos (1999)
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9, 293–300 (1999)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letter 20, 1191–1199 (1999)
Thomas, G., Jacobs, K.B., Yeager, M., Kraft, P., Wacholder, S., Orr, N., Yu, K., Chatterjee, N., Welch, R., Hutchinson, A., et al.: Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genetics 40, 310–315 (2008)
Tretyakov, K.: Methods of genomic data fusion: An overview. Internal Report, Institute of Computer Science, University of Tartu (2006)
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (1999)
Veropoulos, K., Cristianini, N., Campbell, C.: Controlling the sensitivity of support vector machines. In: Proc. of the IJCAI 1999, pp. 55–60. Morgan Kaufmann Press, San Francisco (1999)
Ye, J., McGinnis, S., Madden, T.L.: BLAST: improvements for better sequence analysis. Nucleic Acids Research 34, W6–W9 (2006)
Ye, J.P., Ji, S.H., Chen, J.H.: Multi-class discriminant kernel learning via convex programming. Journal of Machine Learning Research 40, 719–758 (2008)
Yu, S., Tranchevent, L.-C., De Moor, B., Moreau, Y.: Gene prioritization and clustering by multi-view text mining. BMC Bioinformatics 11, 1–48 (2010)
Yu, S., Tranchevent, L.-C., Liu, X., Glänzel, W., Suykens, J.A.K., De Moor, B., Moreau, Y.: Optimized data fusion for kernel K-means clustering. Internal Report, K.U.Leuven (2008) (submitted for publication)
Yu, S., Van Vooren, S., Tranchevent, L.-C., De Moor, B., Moreau, Y.: Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining. Bioinformatics 24, i119–i125 (2008)
Yu, S., Tranchevent, L.-C., Liu, X., Glänzel, W., Suykens, J.A.K., De Moor, B., Moreau, Y.: Optimized data fusion for kernel K-means clustering. Internal Report 08-200, ESAT-SISTA, K.U.Leuven, Lirias number: 242275 (2008) (submitted for publication)
Zheng, Y., Yang, X., Beddoe, G.: Reduction of False Positives in Polyp Detection Using Weighted Support Vector Machines. In: Proc. of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4433–4436. IEEE Press, Los Alamitos (2007)
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yu, S., Tranchevent, LC., De Moor, B., Moreau, Y. (2011). L n -norm Multiple Kernel Learning and Least Squares Support Vector Machines. In: Kernel-based Data Fusion for Machine Learning. Studies in Computational Intelligence, vol 345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19406-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-19406-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19405-4
Online ISBN: 978-3-642-19406-1
eBook Packages: EngineeringEngineering (R0)