Intelligent Extraction Versus Advanced Query: Recognize Transcription Factors from Databases
Many entries in major biological databases have incomplete functional annotation and thus, frequently, it is difficult to identify entries for a specific functional category. We combined information of protein functional domains and gene ontology descriptions for highly accurate identification of transcription factor (TF) entries in Swiss-Prot and Entrez Gene databases. Our method utilizes support vector machines and it efficiently separates TF entries from non-TF entries. The 10-fold cross validation of predictions produced on average a positive predictive value of 97.5% and sensitivity of 93.4%. Using this method we have scanned the whole Swiss-Prot and Entrez Gene databases and extracted 13826 unique TF entries. Based on a separate manual test of 500 randomly chosen extracted TF entries, we found that the non-TF (erroneous) entries were present in 2% of the cases.
KeywordsGene Ontology Transcription Factor Activity Pfam Domain Gene Ontology Annotation Human Protein Reference Database
- 6.Stegmaier, P., Kel, A.E., Wingender, E.: Systematic DNA-Binding Domain Classification of Transcription Factors. In: Genome Inform. Ser. Workshop, vol. 15(2), pp. 276–286 (2004)Google Scholar
- 11.Scholkopf, B., Burges, C., Smola, A.: Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1990)Google Scholar