Predicting Protein Function by Genomic Data-Mining
In this paper, we investigated data analysis methods to discover useful genomic data for predicting protein function. Nowadays, non-SIM based bioinformatics methods are becoming popular. One such method is Data Mining Prediction (DMP). This is based on combining evidence from amino-acid attributes, predicted structure and phylogenic patterns; and uses a combination of Inductive Logic Programming data mining, and decision trees to produce prediction rules for functional class. We examined the scientific literature for direct experimental derivations of ORF function. It confirmed the DMP predictions. Accuracy varied between rules, and with the detail of prediction, but they were generally significantly better than random. These DMP predictions have been confirmed by direct experimentation. DMP is, to the best of our knowledge, the first non-SIM based prediction method to have been tested directly on new data.
KeywordsFunctional Class Vote Rule Assigned Function Predict Protein Function Unambiguous Function
Unable to display preview. Download preview PDF.
- 1.Brown, M., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T., Ares, M.: Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. In: Proc. Natl. Acad. Sci., USA, vol. 97, pp. 262–267 (2000)Google Scholar
- 4.des Jardins, M., Karp, P., Krummenacker, M., Lee, T., Ouzounis, C.: Prediction of Enzyme Classification from Pprotein Sequence without the Use of Sequence Similarity. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, June 21–26. AAAI, Halkidiki (1997)Google Scholar
- 5.Aha, D., Kibler, D., Albert, M.: Instance-based Learning Algorithms. Machine Learn. 6, 37–66 (1991)Google Scholar
- 6.Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSIBLAST: a New Generation of Protein (1997)Google Scholar
- 8.King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome Scale Prediction of Protein Functional Class from Sequence Using Data Mining. In: KDD (2000)Google Scholar
- 10.Klein, P., Kanehisa, M., DeLisi, C.: Prediction of Protein Function from Sequence Properties: Discriminant Analysis of a Data Base. Biochim. Biophys. Acta 787, 221–226 (1984)Google Scholar