Abstract
Kernel methods have now witnessed more than a decade of increasing popularity in the bioinformatics community. In this article, we will compactly review this development, examining the areas in which kernel methods have contributed to computational biology and describing the reasons for their success.
1 Introduction
Kernel methods are a family of algorithms from statistical machine learning (61; 67). These include the Support Vector Machine (SVM) for regression and classification as well as methods for principal component analysis (62), feature selection (72), clustering (94), two-sample tests (7; 19), or dimensionality reduction (93). These kernel methods have witnessed a huge surge in popularity in bioinformatics over the last decade. To illustrate this popularity: pubmed, the search engine for biomedical literature, lists 1,710 hits for ‘kernel methods’ and 1,798 hits for ‘SVM’ (as of May 28, 2009).
The goal of this article is to review which problems in bioinformatics have been tackled using kernel methods, and to explain their popularity in this field. Section 15 provides a summary of the central terminology in kernel methods. Section 15 describes how kernels can be used for data integration. Section 15 illustrates the power of kernel methods in dealing with structured objects such as strings or graphs. Section 15 presents an overview of applications of Support Vector Machines in bioinformatics, and Sect. 15 reviews applications of kernel methods in bioinformatics beyond SVM-based classification or regression. The interested reader is referred to Chaps. ?? and ?? of Schölkopf et al. (63) for primers on molecular biology and kernel methods, to an introduction to Support Vector Machines and kernel methods in computational biology (4), and to a primer on Support Vector Machines for biologists (49).
2 Terminology
A kernel function is an inner product between two objects x, x′ ∈ 1D4B3; in a feature space ℋ:
where ϕ : x1D4B3; → ℋ maps the data points from the input space X to feature space ℋ. k(x, x′) is referred to as the kernel value of x and x′. If this kernel function is applied to all pairs of objects from a set of objects, one obtains a matrix of kernel values, the kernel matrixK. K is always positive semi-definite,Footnote 1 that is all its eigenvalues are non-negative. Intuitively, a kernel function can be thought of as a similarity function between x and x′, and k(x, x′) can be thought of as a similarity score, and the matrix K as a similarity matrix, that is a matrix of similarity scores.
The idea underlying kernel methods is to map the original input data, on which statistical inference is to be performed, to a higher dimensional space, the so-called feature space, and to perform inference in this feature space. Naively, this procedure would comprise two steps: (1) mapping the data points to feature space via a mapping ϕ, (2) performing the prediction or computing the statistics of interest in this feature space. Kernel methods manage to perform this procedure in one single step: rather than separating mapping and prediction into two steps, inference is performed by evaluating kernel functions on the objects in input space. By means of these kernel functions, one implicitly solves the problem in feature space, but without explicitly computing the mapping ϕ. Hence any algorithm that solves a learning problem by accessing the data points only by means of kernel functions is a kernel method.
3 Data Integration
One major reason for the popularity of kernel methods in bioinformatics is their power in data integration. This attractiveness is due to the closure properties which kernels possess:
-
1.
k 1, k 2 are kernels ⇒ k = k 1 + k 2 is a kernel
-
2.
k 1, k 2 are kernels ⇒ k = k 1 ∗ k 2 is a kernel
-
3.
k 1 is a kernel, λ is a positive scalar ⇒ k = λ ∗ k 1 is a kernel
Hence kernels can easily be combined in linear combinations or products. For instance, to compare two proteins, one can define a kernel on their sequences and on their 3D structures and then combine these into a joint sequence–structure kernel for proteins (40).
The goal of multiple kernel learning is to optimise the weights in a linear combination of kernels for a particular prediction task (34); a related technique is referred to as hyperkernels (50). Lack of runtime efficiency turned out to be a limitation of early approaches to multiple kernel learning and triggered further research that addressed this problem (54; 75). In bioinformatics, (35) applied the kernel learning technique to protein function prediction by optimally combining kernels on genome-wide data sets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions. Tsuda et al. (84) present an efficient variant of multiple kernel learning for protein function prediction from multiple networks, such as physical interaction networks and metabolic networks.
4 Analysing Structured Data
A second advantage of kernel methods is that they can easily be applied to structured data (22), for instance, graphs, sets, time series, and strings. The single requirement is that one can define a positive definite kernel on two structured objects, which intuitively speaking, quantifies the similarity between these two objects. As strings are abundant in bioinformatics as nucleotide and amino acid sequences, and biological networks steadily gain more attention, this applicability to structured data is another reason for the popularity of kernel methods in bioinformatics. In the following, we will describe the basic concepts underlying string and graph kernels.
4.1 String Kernels
The classic kernel for measuring the similarity of two strings s and s′ from an alphabet Σ is the spectrum kernel (36) that counts common substrings of length n in the two strings:
where #(q ⊆ s) is the frequency of substring q in string s, which can be computed in O( | s | + | s′ | ) (89), where | s | is the length of string s.
As nucleotide and protein sequences are prone to mutations, insertions, deletions and other changes over time, the spectrum kernel was extended in several ways to allow for mismatches (37), substitutions, gaps and wildcards (38). Recently, the runtime of these string kernels with inexact matching was sped up significantly in Kuksa et al. (32). Approaches such as (74) allow to perform SVM training on very large string datasets.
4.2 Graph Kernels
The classic kernel for quantifying the similarity of two graphs is the random-walk graph kernel (17; 28), which counts matching walks in two graphs. It can be computed elegantly by means of the direct product graph, also referred to as tensor or categorical product (26).
Definition 1.
Let G = (V, E, ℒ) be a graph with vertex set V , edge set E and a label function: ℒ : V ∪E → ℝ. The direct product of two graphs G = (V, E, ℒ) and G′ = (V′, E′, ℒ′) shall be denoted as G × = G ×G′. The node set V × and edge set E × of the direct product graph are defined as:
Using this product graph, the random walk kernel (also known as product graph kernel) can be defined as follows.
Definition 2.
Let G and G′ be two graphs, let A × denote the adjacency matrix of their product graph G ×, and let V × denote the node set of the product graph G ×. With a sequence of weights λ = λ0, λ1, …(λ i ∈ ℝ; λ i ≥ 0 for all i ∈ ℕ) the product graph kernel is defined as
if the limit exists.
Naively implemented, random walk kernels scale as O(n 6), where n is the number of nodes in the larger of the two graphs, but their runtime was reduced to O(n 3) by means of Sylvester equations (91). As random walk kernels are limited in their ability to detect common (non-path-shaped) substructures, a family of graph kernels has been proposed that count other types of matching subgraph patterns, for instance shortest paths (8), cycles (24), subtrees (55), and limited-size subgraphs (69).
In recent work (68), a highly-scalable graph kernel was presented based on so-called subtree patterns or tree-walks. Its runtime scales as O(N h m), where N is the number of graphs in the dataset, h the height of the subtree patterns and m the number of edges per graph. This graph kernel is orders of magnitude faster than previous approaches, while leading to competitive or better results on several benchmark datasets.
5 Support Vector Machines in Bioinformatics
The ultimate reason why kernel methods became a central branch of statistical bioinformatics was the Support Vector Machine, which reached or outperformed the accuracy levels of state-of-the-art classifiers on numerous prediction tasks in computational biology. For a comprehensive review of Support Vector Machines in computational biology up to the year 2004, the interested reader is referred to Noble (48).
Support Vector Machines were originally defined for binary classification problems (11; 85): Given two classes of data points, a positive and a negative class, one wants to be able to correctly predict the class membership of new, unlabeled data points. Support Vector Machines tackle this task by introducing a hyperplane that separates the positive from the negative class, and which maximises the margin, that is the distance to any point from the positive or negative class. New data points are then predicted to be members of the positive or negative class depending on which half-space they are located in with respect to the separating hyperplane. The enormous impact of Support Vector Machines was triggered by the observation that the dual form of the Support Vector Machine optimization problem only accesses the data points by means of inner products (60), and that this inner product could be replaced by any other inner product, that is by another kernel function.
Over the following decade, a multitude of applications of Support Vector Machines in bioinformatics emerged, which can be divided into three large branches: SVM applications on DNA/RNA sequences, proteins, and gene expression profiles. These branches differ in the biological objects or data types that they study, but they often make use the of same computational techniques. String kernels, for example, can be applied both to DNA/RNA and protein sequences.
5.1 DNA and RNA Sequences
Classification of DNA and RNA sequences via Support Vector Machines is one of the prime applications of SVMs in computational biology.
5.1.1 DNA Sequences
Several SVM-based prediction problems on DNA sequences have been studied in the literature, including secondary structure prediction from DNA sequence by an RBF kernel (25), but gene finding is the central prediction task on genomic sequences that SVMs have been applied to over recent years.
Support Vector Machines were successfully applied to various tasks in gene finding, in particular for splice site recognition. The prediction task is here to discriminate between sequences that do contain a true splice site versus sequences with a decoy splice site (73). The string kernel employed is the weighted degree shift kernel. It builds upon the spectrum kernel, counting matching n-mers in two strings, but the n-mers must occur at similar positions within in the sequence, not at arbitrary positions as in the spectrum kernel. Multiple kernel learning techniques were employed in Sonnenburg et al. (76) to determine the sequence motifs that are predictive of true splice sites (see also Sect. 15). In Ratsch et al. (56), this technique was further extended to the recognition of alternatively spliced exons. It was applied both to known exons to detect alternatively spliced ones, and to introns in order to check whether they might contain a yet unknown alternatively spliced exon. In Sonnenburg et al. (78), SVMs were employed for promoter recognition in humans. The SVM used a combination of kernels on weak indicators of promoter presence, including strings kernels on specific sequence motifs and properties and a linear kernel on the stacking energy and the twistedness of the DNA. These algorithmic components were assembled into a complete system for gene finding that was used to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans (57), correctly identifying all exons and introns in 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations. A kernel-based approach was also presented for the identification of regulatory modules in euchromatic sequences (64). The prediction task is here to decide whether a promoter region is the target of a transcription factor or not. The kernel designed for this task compares the sequence region around the best matches of a set of motifs within the sequence and their relative positions to the transcription start site.
5.1.2 RNA Sequences
Support Vector Machines have also been applied in RNA research. A major classification problem that arises in this field is to decide whether an RNA sequence is member of a functional RNA family. For this task, special-purpose kernels on RNA sequences have been defined, so-called stem kernels, which compare the stem structures that appear in the secondary structure of two RNA sequences (58; 59). The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. Other typical applications of SVMs in RNA research include distinguishing protein-coding from non-coding RNA (42) and predicting target genes for microRNAs (31; 92).
5.2 Proteins
A second large area of SVM applications in biology is proteomics, in particular in protein structure, function and interaction prediction.
5.2.1 Protein Sequence Comparison
Protein comparison tries to establish the similarity of two proteins in order to find proteins that belong to the same structural or functional class. This comparison can focus on different aspects of the protein: its amino acid sequence, (approximated) physicochemical properties, or its 3D structure.
Comparing and classifying protein sequences is one of the classic tasks in bioinformatics, and one step towards goals such as protein function prediction, protein structure prediction, fold recognition, or remote homology detection. Kernels on sequences in combination with Support Vector Machines contributed to the field of sequence comparison by enabling discriminative classification of sequences. This field in kernel machines in bioinformatics witnessed a lot of work on kernel design, resulting in a number of conceptually different kernels, which we describe in the following.
The Fisher kernel combines Support Vector Machines with Hidden Markov Models for protein remote homology detection (27). The Hidden Markov Model is trained on protein sequences from the positive class and then applied to all proteins in the training and test set to derive a feature vector representation of the protein in terms of a gradient vector. This Fisher-kernel – used within a SVM – outperformed classic sequence alignment techniques such as BLAST (1) in protein homology detection. The Fisher-kernel was later generalised to the class of marginalised kernels on sequences (82): these kernels apply to all objects that are generated from latent variable models (e.g., HMM). The central idea is to first define a joint kernel for the complete data which includes both visible and hidden variables. The marginalized kernel for visible data is then obtained by taking the expectation with respect to the hidden variables.
Ding and Dubchak (14) derived feature vector representations of the physicochemical properties of proteins from their amino acid sequence and then used these vectors, a kernel on vectors and SVMs to predict SCOP fold membership of proteins (47). The physicochemical properties for these composition kernels were derived by means of amino acid indices (30): These indices are tables which map each amino acid type to one scalar that approximately describes a physicochemical property of this amino acid, for instance, its polarity, polarizability, van der Waals volume, or hydrophobicity. Cai et al. (13) used a similar approach to classify proteins into structural classes.
Motif kernels, as defined by Logan et al. (43) and Ben-Hur and Brutlag (2), are an alternative way of representing a protein sequence by a vector whose components indicate motif occurrence or absence. Logan et al. (43) use weight matrix motifs from the BLOCKS database (23), which are derived from multiple sequence alignments and occur in highly conserved, and often functionally important, regions of the proteins. These motifs are compared to proteins and the resulting scores are used as feature vector representations of the proteins. Ben-Hur and Brutlag (2) employ motifs from the eBLOCKS database of discrete sequence motifs (80), and show how to efficiently compute the resulting motif kernel using a trie data structure.
Liao and Noble (41) defined a different feature vector representation of protein sequence, resulting in an empirical kernel that directly uses existing sequence alignment techniques: For a set of n proteins, they first compute a n ×n matrix of sequence similarity scores (for instance, Smith-Waterman scores (70)) and then represent each protein by its corresponding vector of sequence similarity scores in this matrix.
The most recent class of protein sequence kernels are string kernels that count common substrings in two strings (see Sect. 15). These kernels either require exact matches (36), allow for a limited number of mismatches (39), or allow for substitutions, gaps or wildcards (38).
Further kernels on sequences have been defined which take local properties of the sequence (44) and local alignments (88) into account for specific prediction tasks, such as subcellular localisation prediction.
5.2.2 Protein Structure Comparison
With the ability to determine protein structure more rapidly advancing than our ability to study function, function predictions from protein structure gained more and more attention in computational biology. Dobson and Doig (15) described 1,178 protein structures as vectors by means of simple features such as secondary-structure content, amino acid propensities, surface properties and ligands, to then classify them into enzymes and non-enzymes via Support Vector Machines. Borgwardt et al. (9) modeled proteins from the same dataset as graphs, in which nodes represent secondary structure elements and edges represent neighborship of these elements along the amino acid chain or in 3D space. They then employed a random walk graph kernel on these graph models to perform function prediction and improved over the results achieved by Dobson and Doig (15). On other benchmark datasets for functional and structural classification, Qiu et al. (52) showed that a kernel that employs similarity scores based on the structural alignment tool MAMMOTH (51) outperforms the previous vector- and graph-based approaches.
5.2.3 Protein Interaction Prediction
A third central topic in computational proteomics is the prediction of protein–protein interactions, due to the numerous false-positive and false-negative edges in currently known protein–protein interaction networks. This problem can be cast as a binary classification problem: a pair of proteins is predicted to interact (positive class) or not (negative class). Bock and Gough (5) defined the first Support Vector Machine approach to this problem, in which they represented each pair of proteins as a concatenated feature vector of physicochemical and surface properties of these two proteins. Ben-Hur and Noble (3) further refined this approach by defining a pairwise tensor kernel k tensor on two pairs of proteins (a, b) and (c, d):
where k single measures the similarity between two proteins based on their sequences, gene ontology annotations, local properties of the network, and homologous interactions in other species. Two pairs of proteins are similar in this kernel, if for each protein in one pair, a protein with similar properties can be found in the other pair.
A setback of the tensor product kernels is the fact that the similarity or dissimilarity of the proteins within one pair is not taken into account. This changed when the metric learning pairwise kernel k mlpk was defined (87):
which directly compares the relative similarity of the two proteins, (ϕ(a) − ϕ(b)) and (ϕ(c) − ϕ(d)), to each other and improves upon the prediction accuracy of the tensor kernel.
The pairwise tensor kernel and a Gaussian Radial Basis Function (RBF) kernel that considers within-pair-similarity of proteins were used in a recent study to predict co-complex-membership of protein pairs in yeast (53). The tensor kernel was based on a kernel k single , a weighted sum of kernels including three kernels on protein sequences and three diffusion kernels which measure proximity of the proteins within a physical or genetic interaction network. The Gaussian RBF kernel was computed on features that reflect coexpression, coregulation, colocalisation, similar gene ontology annotation and interologs of the proteins within a pair.
All the kernel methods for protein-interaction prediction via SVMs have in common that they treat the existence of interactions as pairwise independent events, that is, the existence of one interaction does not make the existence of other interactions more or less likely.
5.2.4 Other Kernel Applications in Proteomics
Other applications of SVMs in proteomics mainly involve protein function prediction from data sources other than sequence or structure, for which we describe some representative examples here. In one of the early studies in this direction, (86) defines a kernel on trees for function prediction from phylogenetic profiles of proteins. Tsuda and Noble (83) present an approach for predicting the function of unannotated proteins in protein-interaction or metabolomic networks. Their method uses a locally constrained diffusion kernel, which maximises the von Neumann entropy network, to measure similarity between nodes, and a Support Vector Machine for annotating proteins with unknown function.
5.3 Gene Expression Profiles
Another popular field of SVM applications are predictions based on microarray gene expression measurements. Existing kernels on vectors, such as the linear, polynomial and Gaussian RBF kernel, can be readily applied here without involved kernel design.
5.3.1 Diagnosis and Prognosis
The most common task in this field is to predict the phenotype of a patient based on his or her gene expression levels, primarily for disease diagnosis or for drug response prediction. The first study of this kind was conducted by Mukherjee et al. (46) on the dataset of gene expression levels of two classes of leukemia patients from Golub et al. (18), to tell apart these two subtypes of leukemia using a linear kernel and a SVM. Many similarly interesting studies followed, each of them focusing on a particular task of diagnosis or prognosis. The first kernel for time series of microarrays was defined in Borgwardt et al. (10). Here, gene expression profiles of multiple sclerosis patients were compared to predict their response to treatment by the drug beta-interferon by means of a dynamical systems kernel (90).
5.3.2 Function Prediction
SVMs on gene expression levels were also used for gene function prediction. Here, a gene is represented as a vector of its expression levels across different conditions, tissues or patients. The underlying assumption is that two genes are functionally related if they exhibit similar expression levels under different external conditions. The first study is this direction (12) predicted the membership of 6,000 yeast genes to five functional classes from the MIPS Yeast Genome Database (45).
6 Kernel Methods Beyond Classification
While Support Vector Machines are clearly the most popular kernel method in bioinformatics, there are also learning problems in bioinformatics which require different algorithmic machinery and statistical tests than classification or regression.
6.1 Data Integration for Network Inference
First, several kernel methods for data integration, in particular on networks, were defined.
Kato et al. (29) model protein interaction prediction as a kernel matrix completion problem. Their setting is that they are given a large dataset of proteins with different types of information on these proteins, including gene expression levels, protein localization, and phylogenetic profiles. They represent each of these data types by a ‘large’ kernel matrix. They are also given the true protein interactions between a small subset of all proteins, which they convert into a ‘small’ kernel matrix. They then define an algorithm for completing the small kernel matrix by means of the information from the large kernel matrices and to thereby infer the missing, unknown interactions.
Yamanishi et al. (95) also define a supervised approach to protein network inference from multiple types of data including gene expression, localisation information and phylogenetic profiles. They combine ideas from spectral clustering and kernel canonical correlation analysis to derive features that are indicative of protein interaction. This technique is further refined in Yamanishi et al. (96) for enzyme network inference by enforcing chemical constraints to be fulfilled by the resulting network structure.
6.2 Feature Selection
Second, feature selection is an important problem in computational biology, as the features that are relevant for an accurate prediction are extremely important to understand the underlying biological process.
A typical example for the relevance of feature selection in bioinformatics is gene selection from microarray data. Support Vector Machine-based approaches to feature selection were defined early on, which recursively eliminate irrelevant features (21) or iteratively downscale the less informative features (94).
Borgwardt et al. (9) and Sonnenburg et al. (76) employed multiple kernel learning for feature selection, to weight different kernels used by a Support Vector Machine. In (9), hyperkernels were used to determine which node attributes in a graph model of protein structure were most important for correct protein function prediction. These nodes represented alpha-helices or beta-sheets in the tertiary structure of the protein, and their attributes were their length in amino acids and Angstroms, and statistics on their hydrophobicity, polarity, polarizability and van der Waals volume. Among all these attributes, hyperkernel learning assigned the largest weight to the amino acid length.
In (76), multiple kernel learning was used to determine those sequence motifs that are most relevant for correct splice site recognition. Each kernel represented one single sequence motif at a specific sequence position, and multiple kernel learning determined the weight for each of these motifs, resulting in a set of position-specific sequence patterns that are associated with true splice sites. This technique was further refined in Sonnenburg et al. (77), now taking the overlap in sequence between different substrings into account and allowing to assess the importance of (consensus) sequence motifs for correct prediction, even if they do not occur in the given collection of sequences.
Song et al. (71) define a kernel-based approach to gene selection from microarray data. They show that many of the vast number of feature selection algorithms from the microarray literature are indeed instances of this framework, which are obtained by a different choice of kernel and/or a particular type of normalisation. New gene selection algorithms can easily be derived from this framework, even for regression and multi-class settings, and existing techniques can be objectively compared to each other, by replacing one kernel by another, while keeping other properties fixed, such as the normalisation technique employed.
6.3 Statistical Tests
Third, a recent development in machine learning are kernel-based statistical tests (19; 20), which led to a first application in bioinformatics: Borgwardt (7) define a kernel-based statistical test to check cross-platform comparability of microarray data. This two-sample test, whose goal it is to establish whether two samples were drawn from the same distribution or not, computes the distance between the means of the two samples in a universal reproducing kernel Hilbert Space (79) as its test statistic. The larger this distance, the smaller the probability that the two samples originate from the same distribution. In experiments on microarray cross-platform comparability, the test manages to clearly distinguish between samples of microarray measurements that were generated on the same platform and those from different platforms.
6.4 Kernel Methods for Structured Output
Fourth, another recent development in kernel machine learning are kernel methods for structured output domains. The classic Support Vector Machine was designed for binary classification problems, and data objects that were drawn i.i.d (independently and identically distributed) from an underlying distribution. However, it is obvious that many prediction problems in biology are multi-class problems, and that predictions on different objects can depend highly on each other.
For instance, if one wants to annotate a DNA sequence in gene finding, the predicted label of a nucleotide (e.g., exonic or intronic) is highly dependent on those of the neighbouring bases. This is often referred to as the label sequence learning problem: Given a sequence of n letters, one wants to predict a sequence of n class labels. Hidden Markov Models are the classic tool for this problem in computational biology (16). Conditional random fields were developed as a discriminative alternative to the generative model that Hidden Markov Models are based upon Lafferty et al. (33). Kernel-based discriminative approaches to this problem have recently been defined in machine learning as well, and employed successfully for sequence alignment (6; 65), gene finding and genome annotation (57; 66), and tiling array analysis (97; 98). A general approach to Support Vector Machine classification in multiclass and structured output domains was proposed by Tsochantaridis et al. (81), and promises to trigger further research in this direction in computational biology.
6.5 Outlook
In our opinion, the success story of kernel methods in bioinformatics will continue over the next decade. The strength of kernels in dealing with structured objects will lead to more applications of kernels in biological network analysis. Their ability to elegantly handle high-dimensional data and to integrate various data sources will make them one attractive tool for tasks such as genome-wide association studies. Furthermore, the ability to encode prior knowledge in the kernel function will foster the use of kernel methods in various specialised prediction tasks in computational biology.
Notes
- 1.
The machine learning community often (incorrectly) uses the term positive definite rather than positive semi-definite.
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.
Ben-Hur, A., & Brutlag, D. (2003). Remote homology detection: A motif based approach. Bioinformatics, 19 (Suppl. 1), i26–i33. URL http://www.ncbi.nlm.nih.gov/pubmed/12855434. PMID: 12855434
Ben-Hur, A., & Noble, W. S. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics (Oxford, England), 21 (Suppl. 1), i38–i46. DOI10.1093/bioinformatics/bti1016. URL http://www.ncbi.nlm.nih.gov/pubmed/15961482. PMID: 15961482
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Support vector machines and kernels for computational biology. PLoS Computational Biology, 4(10), e1000,173. DOI10.1371/journal.pcbi.1000173. URL http://www.ncbi.nlm.nih.gov/pubmed/18974822. PMID: 18974822
Bock, J. R., & Gough, D. A. (2001). Predicting protein–protein interactions from primary structure. Bioinformatics (Oxford, England), 17(5), 455–460. URL http://www.ncbi.nlm.nih.gov/pubmed/11331240. PMID: 11331240
Bona, F. D., Ossowski, S., Schneeberger, K., & Rätsch, G. (2008). Optimal spliced alignments of short sequence reads. Bioinformatics (Oxford, England), 24(16), i174–i180. DOI10.1093/bioinformatics/btn300. URL http://www.ncbi.nlm.nih.gov/pubmed/18689821. PMID: 18689821
Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H. P., Schölkopf, B., & Smola, A. J. (2006). Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics (ISMB), 22(14), e49–e57.
Borgwardt, K. M., & Kriegel, H. P. (2005). Shortest-path kernels on graphs. In ICDM (pp. 74–81). IEEE Computer Society.
Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H. P. (2005). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1), i47–i56.
Borgwardt, K. M., Vishwanathan, S. V. N., & Kriegel, H. P. (2006). Class prediction from time series gene expression profiles using dynamical systems kernels. In R. B. Altman, T. Murray, T. E. Klein, A. K. Dunker, & L. Hunter (Eds.), Pacific symposium on biocomputing (pp. 547–558). World Scientific.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings of the annual conference on computational learning theory (pp. 144–152). Pittsburgh, PA: ACM.
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Furey, T. S., et al. (2000). Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97(1), 262–267.
Cai, Y. D., Liu, X. J., Xu, X. B., & Chou, K. C. (2002). Prediction of protein structural classes by support vector machines. Computational Chemistry, 26(3), 293–296.
Ding, C. H., & Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17(4), 349–358.
Dobson, P. D., & Doig, A. J. (2003). Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology, 330(4), 771–783.
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press.
Gärtner, T., Flach, P. A., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In B. Schölkopf & M. K. Warmuth (Eds.), COLT, Lecture Notes in Computer Science (Vol. 2777, pp. 129–143). Springer.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., & Smola, A. (2007). A kernel method for the two-sample-problem. In Advances in neural information processing systems (Vol. 19, pp. 513–520). Cambridge, MA: MIT.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), NIPS. MIT Press.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.
Haussler, D. (1999). Convolutional kernels on discrete structures. Tech. Rep., UCSC-CRL-99-10. UC Santa Cruz: Computer Science Department.
Henikoff, S., Henikoff, J. G. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Research, 19, 6565–6572.
Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel (Eds.), KDD (pp. 158–167). ACM.
Hua, S., & Sun, Z. (2001). A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. Journal of Molecular Biology, 308(2), 397–407. DOI10.1006/jmbi.2001.4580. URL http://www.ncbi.nlm.nih.gov/pubmed/11327775. PMID: 11327775
Imrich, W., & Klavzar, S. (2000). Product graphs: Structure and recognition. In Wiley Interscience Series in Discrete Mathematics. New York: Wiley VCH.
Jaakkola, T., Diekhans, M., & Haussler, D. (1999). Using the fisher kernel method to detect remote protein homologies. In T. Lengauer, R. Schneider, P. Bork, D. L. Brutlag, J. I. Glasgow, H. W. Mewes, et al. (Eds.), ISMB (pp. 149–158). AAAI.
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning (ICML). Washington, DC: United States.
Kato, T., Tsuda, K., & Asai, K. (2005). Selective integration of multiple biological data for supervised network inference. Bioinformatics (Oxford, England), 21(10), 2488–2495. DOI10.1093/bioinformatics/bti339. URL http://www.ncbi.nlm.nih.gov/pubmed/15728114. PMID: 15728114
Kawashima, S., Ogata, H., & Kanehisa, M. (1999). Aaindex: Amino acid index database. Nucleic Acids Research, 27(1), 368–369.
Kim, S., Nam, J., Rhee, J., Lee, W., & Zhang, B. (2006). miTarget: microRNA target gene prediction using a support vector machine. BMC Bioinformatics, 7, 411. DOI10.1186/1471-2105-7-411. URL http://www.ncbi.nlm.nih.gov/pubmed/16978421. PMID: 16978421
Kuksa, P. P., Huang, P. H., & Pavlovic, V. (2008). Scalable algorithms for string kernels with inexact matching. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), NIPS (pp. 881–888). MIT.
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In C. E. Brodley & A. P. Danyluk (Eds.), ICML (pp. 282–289). Morgan Kaufmann.
Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5, 27–72.
Lanckriet, G. R. G., Bie, T. D., Cristianini, N., Jordan, M. I., & Noble, W. S. (2004). A statistical framework for genomic data fusion. Bioinformatics, 20(16), 2626–2635. DOI10.1093/bioinformatics/bth294. URL http://www.ncbi.nlm.nih.gov/pubmed/15130933. PMID: 15130933
Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the pacific symposium on biocomputing (pp. 564–575).
Leslie, C., Eskin, E., Weston, J., & Noble, W. S. (2002). Mismatch string kernels for SVM protein classification. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15). Cambridge, MA: MIT.
Leslie, C. S., & Kuang, R. (2003). Fast kernels for inexact string matching. In B. Schölkopf & M. K. Warmuth (Eds.), COLT, Lecture Notes in Computer Science (Vol. 2777, pp. 114–128). Springer.
Leslie, C. S., Eskin, E., Cohen, A., Weston, J., & Noble, W. S. (2004). Mismatch string kernels for discriminative protein classification. Bioinformatics (Oxford, England), 20(4), 467–476. DOI10.1093/bioinformatics/btg431. URL http://www.ncbi.nlm.nih.gov/pubmed/14990442. PMID: 14990442
Lewis, D. P., Jebara, T., & Noble, W. S. (2006). Support vector machine learning from heterogeneous data: An empirical analysis using protein sequence and structure. Bioinformatics (Oxford, England), 22(22), 2753–2760. DOI10.1093/bioinformatics/btl475. URL http://www.ncbi.nlm.nih.gov/pubmed/16966363. PMID: 16966363
Liao, L., & Noble, W. S. (2002). Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In RECOMB (pp. 225–232).
Liu, J., Gough, J., & Rost, B. (2006). Distinguishing Protein-Coding from Non-Coding RNAs through support vector machines. PLoS Genetics, 2(4), 529–536.
Logan, B., Moreno, P., Suzek, B., Weng, Z., & Kasif, S. (2001). A study of remote homology detection. Tech. Rep., Cambridge Research Laboratory.
Matsuda, S., Vert, J., Saigo, H., Ueda, N., Toh, H., & Akutsu, T. (2005). A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Science: A Publication of the Protein Society, 14(11), 2804–2813. DOI10.1110/ps.051597405. URL http://www.ncbi.nlm.nih.gov/pubmed/16251364. PMID: 16251364
Mewes, H. W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., et al. (2000). MIPS: A database for genomes and protein sequences. Nucleic Acids Research, 28(1), 37–40. URL http://www.ncbi.nlm.nih.gov/pubmed/10592176. PMID: 10592176
Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, T., Mesirov, J.P., et al. (2000). Support vector machine classification of microarray data. Tech. Rep., Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Murzin, A. G., Brenner, S. E., Hubbard, T., & Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247(4), 536–40. DOI10.1006/jmbi.1995.0159. URL http://www.ncbi.nlm.nih.gov/pubmed/7723011. PMID: 7723011
Noble, W. (2004). Support vector machine applications in computational biology. In B. Schölkopf, K. Tsuda, & J. P. Vert (Eds.), Kernel methods in computational biology. Cambridge, MA: MIT.
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567. DOI10.1038/nbt1206-1565. URL http://dx.doi.org/10.1038/nbt1206-1565
Ong, C. S., & Smola, A. J. (2003). Machine learning with hyperkernels. In T. Fawcett & N. Mishra (Eds.), ICML (pp. 568–575). AAAI.
Ortiz, A. R., Strauss, C. E. M., & Olmea, O. (2002). MAMMOTH (matching molecular models obtained from theory): An automated method for model comparison. Protein Science: A Publication of the Protein Society, 11(11), 2606–2621. DOI10.1110/ps.0215902. URL http://www.ncbi.nlm.nih.gov/pubmed/12381844. PMID: 12381844
Qiu, J., Hue, M., Ben-Hur, A., Vert, J., & Noble, W. S. (2007). A structural alignment kernel for protein structures. Bioinformatics (Oxford, England), 23(9), 1090–1098. DOI10.1093/bioinformatics/btl642. URL http://www.ncbi.nlm.nih.gov/pubmed/17234638. PMID: 17234638
Qiu, J., & Noble, W. S. (2008). Predicting co-complexed protein pairs from heterogeneous data. PLoS Computational Biology, 4(4), e1000,054. DOI10.1371/journal.pcbi.1000054. URL http://www.ncbi.nlm.nih.gov/pubmed/18421371. PMID: 18421371
Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2007). More efficiency in multiple kernel learning. In Z. Ghahramani (Ed.), ICML, ACM International Conference Proceeding Series (Vol. 227, pp. 775–782). ACM.
Ramon, J., & Gärtner, T. (2003). Expressivity versus efficiency of graph kernels. Tech. Rep., First International Workshop on Mining Graphs, Trees and Sequences (held with ECML/PKDD’03).
Rätsch, G., Sönnenburg, S., & Schölkopf, B. (2005). RASE: Recognition of alternatively spliced exons in c. elegans. Bioinformatics, 21 (Suppl. 1), i369–i377.
Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K., Sommer, R., et al. (2007). Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Computational Biology, 3(2), e20. PMID: 17319737
Sakakibara, Y., Popendorf, K., Ogawa, N., Asai, K., & Sato, K. (2007). Stem kernels for RNA sequence analyses. Journal of Bioinformatics and Computational Biology, 5(5), 1103–1122. URL http://www.ncbi.nlm.nih.gov/pubmed/17933013. PMID: 17933013
Sato, K., Mituyama, T., Asai, K., & Sakakibara, Y. (2008). Directed acyclic graph kernels for structural RNA analysis. BMC Bioinformatics, 9, 318. DOI10.1186/1471-2105-9-318. URL http://www.ncbi.nlm.nih.gov/pubmed/18647390. PMID: 18647390
Schölkopf, B. (1997). Support vector learning. München: R. Oldenbourg Verlag. PhD thesis, TU Berlin. Download: http://www.kernel-machines.org
Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge, MA: MIT.
Schölkopf, B., Smola, A. J., & Müller, K. R. (1997). Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, & J. D. Nicoud (Eds.), Artificial neural networks ICANN’97 (Vol. 1327, pp. 583–588). Berlin: Springer Lecture Notes in Computer Science.
Schölkopf, B., Tsuda, K., & Vert, J. P. (2004). Kernel Methods in Computational Biology. Cambridge, MA: MIT.
Schultheiss, S. J., Busch, W., Lohmann, J. U., Kohlbacher, O., & Rätsch, G. (2009). KIRMES: kernel-based identification of regulatory modules in euchromatic sequences. Bioinformatics (Oxford, England), DOI10.1093/bioinformatics/btp278. URL http://www.ncbi.nlm.nih.gov/pubmed/19389732. PMID: 19389732
Schulze, U., Hepp, B., Ong, C. S., & Rätsch, G. (2007). PALMA: mRNA to genome alignments using large margin algorithms. Bioinformatics (Oxford, England), 23(15), 1892–1900.DOI10.1093/bioinformatics/btm275. URL http://www.ncbi.nlm.nih.gov/pubmed/17537755. PMID: 17537755
Schweikert, G., Zien, A., Zeller, G., Behr, J., Dieterich, C., Ong, C. S., et al. (2009). mGene: Accurate SVM-based gene finding with an application to nematode genomes. Genome Research, 19(11), 2133–2143. DOI10.1101/gr.090597.108. URL http://www.ncbi.nlm.nih.gov/pubmed/19564452. PMID: 19564452
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge, UK: Cambridge University Press.
Shervashidze, N., & Borgwardt, K. M. (2009). Fast subtree kernels on graphs. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), NIPS (pp. 1660–1668). Cambridge, MA: MIT.
Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., & Borgwardt, K. M. (2009). Efficient graphlet kernels for large graph comparison. In D. van Dyk & M. Welling (Eds.), Proceedings of the twelfth international conference on artificial intelligence and statistics. Clearwater Beach, Florida.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195–197. URL http://www.ncbi.nlm.nih.gov/pubmed/7265238. PMID: 7265238
Song, L., Bedo, J., Borgwardt, K., Gretton, A., & Smola, A. (2007). Gene selection via the BAHSIC family of algorithms. Bioinformatics, 23(13), i490–i498.
Song, L., Smola, A., Gretton, A., Borgwardt, K., & Bedo, J. (2007). Supervised feature selection via dependence estimation. In: Ghahramani, Z. (ed.): ACM International Conference Proceeding Series, vol. 227. ACM.
Sonnenburg, S., Rätsch, G., Jagota, A. K., & Müller, K. R. (2002). New methods for splice site recognition. In Proceedings of the International Conference on Artificial Neural Networks (ICANN) (pp. 329–336).
Sonnenburg, S., Rätsch, G., & Rieck, K. (2007). Large-scale learning with string kernels. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-Scale kernel machines (pp. 73—104). Cambridge, MA: MIT.
Sonnenburg, S., Rätsch, G., & Schäfer, C. (2005). A general and efficient multiple kernel learning algorithm. In NIPS.
Sonnenburg, S., Rätsch, G., & Schäfer, C. (2005). Learning interpretable SVMs for biological sequence classification. In RECOMB 2005, LNBI 3500 (pp. 389–407). Berlin, Heidelberg: Springer-Verlag.
Sonnenburg, S., Zien, A., Philips, P., & Rätsch, G. (2008). POIMs: positional oligomer importance matrices — understanding support vector machine based signal detectors. Bioinformatics, 24(13), i6–i14. URL http://bioinformatics.oxfordjournals.org/cgi/content/full/24/13/i6
Sonnenburg, S., Zien, A., & Rätsch, G. (2006). ARTS: Accurate recognition of tran- scription starts in human. Bioinformatics (Oxford, England)22(14), e472–480. DOI10.1093/ DOIbioinformatics/btl250. URL http://www.ncbi.nlm.nih.gov/pubmed/16873509. PMID: 16873509
Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18, 768–791.
Su, Q. J., Lu, L., Saxonov, S., & Brutlag, D. L. (2005). eBLOCKs: Enumerating conserved protein blocks to achieve maximal sensitivity and specificity. Nucleic Acids Research, 33(Database issue), D178–D182. DOI10.1093/nar/gki060. URL http://www.ncbi.nlm.nih.gov/pubmed/15608172. PMID: 15608172
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Tsuda, K., Kin, T., & Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics (Oxford, England), 18 (Suppl. 1), S268–S275. URL http://www.ncbi.nlm.nih.gov/pubmed/12169556. PMID: 12169556
Tsuda, K., Noble, W. S. (2004). Learning kernels from biological networks by maximizing entropy. Bioinformatics (Oxford, England), 20 (Suppl. 1), i326–i333. DOI10.1093/bioinformatics/bth906. URL http://www.ncbi.nlm.nih.gov/pubmed/15262816. PMID: 15262816
Tsuda, K., Shin, H., & Schölkopf, B. (2005). Fast protein classification with multiple networks. Bioinformatics, 21 (Suppl. 2), ii59–ii65.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
Vert, J. (2002). A tree kernel to analyse phylogenetic profiles. Bioinformatics, 18, S276–S284.
Vert, J., Qiu, J., & Noble, W. S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8 (Suppl. 10), S8. DOI10.1186/1471-2105-8-S10-S8. URL http://www.ncbi.nlm.nih.gov/pubmed/18269702. PMID: 18269702
Vert, J. P., Saigo, H., & Akutsu, T. (2004). Local alignment kernels for biological sequences. In B. Schölkopf, K. Tsuda, & J. P. Vert (Eds.), Kernel methods in computational biology (pp. 261–274). Cambridge, MA: MIT.
Vishwanathan, S., & Smola, A. (2003). Fast kernels for string and tree matching. In K. Tsuda, B. Schölkopf, & J. Vert (Eds.), Kernels and bioinformatics. Cambridge, MA: MIT. Forthcoming
Vishwanathan, S. V., Smola, A. J., & Vidal, R. (2007). Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision, 73(1), 95–119. URL http://portal.acm.org/citation.cfm?id=1227529
Vishwanathan, S. V. N., Borgwardt, K., & Schraudolph, N. N. (2007). Fast computation of graph kernels. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems (Vol. 19). Cambridge MA: MIT.
Wang, X., & Naqa, I. M. E. (2008). Prediction of both conserved and nonconserved microRNA targets in animals. Bioinformatics (Oxford, England), 24(3), 325–332. DOI10.1093/bioinformatics/btm595. URL http://www.ncbi.nlm.nih.gov/pubmed/18048393. PMID: 18048393
Weinberger, K. Q., Sha, F., & Saul, L. K. (2004). Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the 21st international conference on machine learning. Banff, Canada.
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for svms. In T. K. Leen, T. G. Dietterich, V. Tresp (Eds.), NIPS (pp. 668–674). MIT.
Yamanishi, Y., Vert, J., & Kanehisa, M. (2004). Protein network inference from multiple genomic data: A supervised approach. Bioinformatics (Oxford, England), 20 (Suppl. 1), i363–i370. DOI10.1093/bioinformatics/bth910. URL http://www.ncbi.nlm.nih.gov/pubmed/15262821. PMID: 15262821
Yamanishi, Y., Vert, J., & Kanehisa, M. (2005). Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics (Oxford, England), 21 (Suppl 1), i468–i477. DOI10.1093/bioinformatics/bti1012. URL http://www.ncbi.nlm.nih.gov/pubmed/15961492. PMID: 15961492
Zeller, G., Clark, R. M., Schneeberger, K., Bohlen, A., Weigel, D., & Rätsch, G. (2008). Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays. Genome Research, 18(6), 918–929.
Zeller, G., Henz, S. R., Laubinger, S., Weigel, D., & Rätsch, G. (2008). Transcript normalization and segmentation of tiling array data. In R. B. Altman, A. K. Dunker, L. Hunter, T. Murray, & T.E. Klein (Eds.), Pacific symposium on biocomputing (pp. 527–538). World Scientific.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Borgwardt, K.M. (2011). Kernel Methods in Bioinformatics. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-16345-6_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)