In this paper, we investigated data analysis methods to discover useful genomic data for predicting protein function. Nowadays, non-SIM based bioinformatics methods are becoming popular. One such method is Data Mining Prediction (DMP). This is based on combining evidence from amino-acid attributes, predicted structure and phylogenic patterns; and uses a combination of Inductive Logic Programming data mining, and decision trees to produce prediction rules for functional class. We examined the scientific literature for direct experimental derivations of ORF function. It confirmed the DMP predictions. Accuracy varied between rules, and with the detail of prediction, but they were generally significantly better than random. These DMP predictions have been confirmed by direct experimentation. DMP is, to the best of our knowledge, the first non-SIM based prediction method to have been tested directly on new data.


Functional Class Vote Rule Assigned Function Predict Protein Function Unambiguous Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brown, M., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T., Ares, M.: Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. In: Proc. Natl. Acad. Sci., USA, vol. 97, pp. 262–267 (2000)Google Scholar
  2. 2.
    Clare, A., King, R.D.: Machine Learning of Functional Class from Phenotype Data. Bioinformatics 18, 160–166 (2002)CrossRefGoogle Scholar
  3. 3.
    Danchin, A.: From Function to Sequence, an Integrated View of the Genome Texts. Physica A 273, 92–98 (1999)CrossRefGoogle Scholar
  4. 4.
    des Jardins, M., Karp, P., Krummenacker, M., Lee, T., Ouzounis, C.: Prediction of Enzyme Classification from Pprotein Sequence without the Use of Sequence Similarity. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, June 21–26. AAAI, Halkidiki (1997)Google Scholar
  5. 5.
    Aha, D., Kibler, D., Albert, M.: Instance-based Learning Algorithms. Machine Learn. 6, 37–66 (1991)Google Scholar
  6. 6.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSIBLAST: a New Generation of Protein (1997)Google Scholar
  7. 7.
    King, R., Karwath, A., Clare, A., Dehaspe, L.: Accurate Prediction of protein Functional Class in the M.tuberculosis and E.coli Genomes Using Data Mining. Yeast 17, 283–293 (2000a)CrossRefGoogle Scholar
  8. 8.
    King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome Scale Prediction of Protein Functional Class from Sequence Using Data Mining. In: KDD (2000)Google Scholar
  9. 9.
    King, R., Karwath, A., Clare, A., Dehaspe, L.: The Utility of Different Representations of Protein Sequence for Predicting Functional Class. Bioinformatics 17, 445–454 (2001)CrossRefGoogle Scholar
  10. 10.
    Klein, P., Kanehisa, M., DeLisi, C.: Prediction of Protein Function from Sequence Properties: Discriminant Analysis of a Data Base. Biochim. Biophys. Acta 787, 221–226 (1984)Google Scholar
  11. 11.
    Marcotte, E., Pellegrini, M., Thompson, M., Yeates, T., Eisenberg, D.: A Combined Algorithm for Genome-wide Prediction of Protein Function. Nature 402, 83–86 (1999a)CrossRefGoogle Scholar
  12. 12.
    Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting Protein Function and Protein Protein Interaction from Genome Sequences. Science 285, 751–753 (1999b)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Changxin Song
    • 1
  • Ke Ma
    • 2
  1. 1.Department of ComputerQinghai Normal UniversityXiningP.R.China
  2. 2.Network centerQinghai Normal UniversityXiningP.R.China

Personalised recommendations