Journal of Biosciences

, Volume 29, Issue 3, pp 245–259 | Cite as

Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome ofMycobacterium tuberculosis H37Rv

  • Seema Namboori
  • Natasha Mhatre
  • Sentivel Sujatha
  • Narayanaswamy Srinivasan
  • Shashi Bhushan Pandit


The sequencing of theMycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78% of the encoded gene products. For 69% of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by ∼ 11%. Remote similarity detection methods have enabled domain assignments for 1325 ‘hypothetical proteins’. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at∼dots.


Genome data analysis hypothetical proteins,Mycobacterium tuberculosis protein structures structural and functional domains 

Abbreviations used


Adenosine 3,5′-cyclic monophosphate


guanosine 3’,5’-cyclic monophosphate


cyclopropane mycolic acid


cyclic nucleotide monophosphate




hidden Markov model


low-complexity regions


mycobacterial cell entry




non-redundant database


phosphoglycerate mutase


position-specific scoring matrices


short chain dehydrogenase/reductases


Universal stress protein


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Altschul S F, Gish W, Miller W, Myers E W and Lipman D J 1990 Basic local alignment search tool;J. Mol. Biol. 215 403–410PubMedGoogle Scholar
  2. Altschul S F, Madden T L, SchÄffer A A, Zhang J, Zhang Z, Miller W and Lipman D J 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search program;Nucleic Acids Res. 25 3389–3402PubMedCrossRefGoogle Scholar
  3. Ames G F 1993 Bacterial periplasmic permeases as model systems for the superfamily of traffic ATPases, including the multidrug resistance protein and the cystic fibrosis transmembrane conductance regulator;Int. Rev. Cytol. 137 1–35CrossRefGoogle Scholar
  4. Aravind L and Ponting C P 1999 The cytoplasmic helical linker domain of receptor histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins;FEMS Microbiol. Lett. 176 111–116PubMedCrossRefGoogle Scholar
  5. Arruda S, Bomfim G, Knights R, Huima-Byron T and Riley L W 1993 Cloning of anM. tuberculosis DNA fragment associated with entry and survival inside cells;Science 261 1454–1457PubMedCrossRefGoogle Scholar
  6. Balaji S, Sujatha S, Kumar S S C and Srinivasan N 2001 PALI-a database of Phylogeny and ALIgnment of homologous protein structures;Nucleic Acids Res. 29 61–65PubMedCrossRefGoogle Scholar
  7. Bork P and Gibson T J 1996 Applying motif and profile searches;Methods Enzymol. 266 162–184Google Scholar
  8. Buchan D W, Shepherd A J, Lee D, Pearl F M, Rison S C, Thornton J M and Orengo C A 2002 Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database;Genome Res. 12 503–514PubMedCrossRefGoogle Scholar
  9. Camus J, Pryor M J, Médigue C and Cole S T 2002 Re-annotation of the genome sequence ofMycobacterium tuberculosis H37Rv;Microbiology 148 2967–2973PubMedGoogle Scholar
  10. Chambers H F, Moreau D, Yajko D, Miick C, Wagner C, Hackbarth C, Kocagoz S, Rosenberg E, Hadley W K and Nikaido H 1995 Can penicillins and other beta-lactam antibiotics be used to treat tuberculosis?;Antimicrob. Agents Chemother. 39 2620–2624PubMedGoogle Scholar
  11. Chang G, Spencer R H, Lee A T, Barclay M T and Rees D C 1998 Structure of the MscL homolog fromMycobacterium tuberculosis: a gated mechanosensitive ion channel;Science 282 2220–2226PubMedCrossRefGoogle Scholar
  12. Chothia C and Gerstein M 1997 Protein evolution. How far can sequences diverge?;Nature (London) 385 579–581CrossRefGoogle Scholar
  13. Chothia C and Lesk A M 1986 The relation between the divergence of sequence and structure in proteins;EMBO J. 5 823–826PubMedGoogle Scholar
  14. Cole S T, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon S V, Eiglmeier K, Gas S, Barry C E 3rd,et al 1998 Deciphering the biologyof Mycobacterium tuberculosis from the complete genome sequence;Nature (London) 393 537–544CrossRefGoogle Scholar
  15. Cole S T 1999 Learning from the genome sequence ofMycobacterium tuberculosis H37Rv;FEBS Lett. 452 7–10PubMedCrossRefGoogle Scholar
  16. Devos D and Valencia A 2001 Intrinsic errors in genome annotation;Trends Genet. 17 429–431PubMedCrossRefGoogle Scholar
  17. Doran T J, Hodgson A L, Davies J K and Radford A J 1992 Characterisation of a novel repetitive DNA sequence fromMycobacterium bovis;FEMS Microbiol. Lett. 75 179–185PubMedCrossRefGoogle Scholar
  18. Eddy S R 1998 Profile hidden Markov models;Bioinformatics 14 755–763PubMedCrossRefGoogle Scholar
  19. Evans S V 1993 SETOR: hardware-lighted three-dimensional solid model representations of macro molecules;J. Mol. Graph. 11 134–138PubMedCrossRefGoogle Scholar
  20. Fetrow J S, Siew N, Di Gennaro J A, Martinez-Yamout M, Dyson J H and Skolnick J 2001 Genomic-scale comparison of sequence-and structure-based methods of function prediction: Does structure provide additional insight?;Protein Sci. 10 1005–1014PubMedCrossRefGoogle Scholar
  21. Finn J T, Grunwald M E and Yau K W 1996 Cyclic nucleotidegated ion channels an extended family with diverse functions;Annu. Rev. Physiol. 58 395–426CrossRefGoogle Scholar
  22. Fischer D and Eisenberg D 1999 Predicting structures for genome proteins;Curr. Opin. Struct. Biol. 9 208–211PubMedCrossRefGoogle Scholar
  23. Fisher M A, Plikaytis B B and Shinnick T M 2002 Microarray analysis of theMycobacterium tuberculosis transcriptional response to the acidic conditions found in phagosomes;J. Bacteriol. 184 4025–4032PubMedCrossRefGoogle Scholar
  24. Flesselles B, Anand N N, Remani J, Loosemore S M and Klein M H 1999 Disruption of the mycobacterial cell entry gene ofMycobacterium bovis BCG results in a mutant that exhibits a reduced invasiveness for epithelial cells;FEMS Microbiol. Lett. 177 237–242PubMedCrossRefGoogle Scholar
  25. Gamieldien J, Ptitsyn A and Hide W 2002 Eukaryotic genes inMycobacterium tuberculosis could have a role in pathogenesis and immunomodulation;Trends Genet. 18 5–8PubMedCrossRefGoogle Scholar
  26. Gardner P R, Gardner A M, Martin L A and Salzman A L 1998 Nitric oxide dioxygenase: An enzymic function for flavohemoglobin;Proc. Natl. Acad. Sci. USA 95 10378–10383PubMedCrossRefGoogle Scholar
  27. George K M, Yuan Y, Sherman D R and Barry C E 1995 The Biosynthesis of Cyclopropanated Mycolic Acids inMycobacterium tuberculosis;J. Biol. Chem. 270 27292–27298PubMedCrossRefGoogle Scholar
  28. Gerstein M 1998 How representative are the known structures of the proteins in a complete genome? A comprehensive structural census;Fold. Des. 3 497–512PubMedCrossRefGoogle Scholar
  29. Gribskov M, McLachlan A D and Eisenberg D 1987 Profile analysis: detection of distantly related proteins;Proc. Natl. Acad. Sci. USA 84 4355–4358PubMedCrossRefGoogle Scholar
  30. Hardison R C 1996 A brief history of hemoglobins: Plant, animal, protist, and bacteria;Proc. Natl. Acad. Sci. USA 93 5675–5679CrossRefGoogle Scholar
  31. Hegyi H and Gerstein M 1999 The relationship between protein structure and function a comprehensive survey with application to the yeast genome;J. Mol. Biol. 288 147–164PubMedCrossRefGoogle Scholar
  32. Higgins C F 1992 ABC transporters: From microorganisms to man;Annu. Rev. Cell Biol. 8 67–113.PubMedCrossRefGoogle Scholar
  33. Hoersch S, Leroy C, Brown N P, Andrade M A and Sander C 2000 The GeneQuiz web server protein functional analysis through the Web;Trends Biochem. Sci. 25 33–35PubMedCrossRefGoogle Scholar
  34. Hubbard B K, Koch M, Palmer D R, Babbitt P C and Gerlt J A 1998 Evolution of enzymatic activities in the enolase superfamily characterization of the (D)-glucarate/galactarate catabolic pathway inEscherichia coli;Biochemistry 37 14369–14375PubMedCrossRefGoogle Scholar
  35. Huynen M, Doerks T, Eisenhaber F, Orengo C, Sunyaev S, Yuan Y and Bork P 1998 Homology-based fold predictions forMycoplasma genitalium proteins;J. Mol. Biol. 280 323–326PubMedCrossRefGoogle Scholar
  36. Izard T and Blackwell N C 2000 Crystal structures of the metal-dependent 2-dehydro-3-deoxy-galactarate aldolase suggest a novel reaction mechanism;EMBO J. 19 3849–3856PubMedCrossRefGoogle Scholar
  37. Johnson M S, Overington J P and Blundell T L 1993 Alignment and searching for common protein folds using a data bank of structural templates;J. Mol. Biol. 231 735–752PubMedCrossRefGoogle Scholar
  38. Kelley L A, MacCallum R M and Sternberg M J 2000 Enhanced genome annotation using structural profiles in the program 3D-PSSM;J. Mol. Biol. 299 499–520PubMedCrossRefGoogle Scholar
  39. Kisker C, Hinrichs W, Tovar K, Hillen W and Saenger W 1995 The Complex Formed Between Tet Repressor and Tetracycline-Mg2+ Reveals Mechanism of Antibiotic Resistance;J. Mol. Biol. 247 260–280PubMedCrossRefGoogle Scholar
  40. Lewis S, Ashburner M and Reese M G 2000 Annotating eukaryote genomes;Curr. Opin. Struct. Biol. 10 349–354PubMedCrossRefGoogle Scholar
  41. Li W W, Quinn G B, Alexandrov N N, Bourne P E and Shindyalov I N 2003 A comparative proteomics resource: proteins ofArabidopsis thaliana;Genome Biol. 4 R51 EpubPubMedCrossRefGoogle Scholar
  42. Liu J, Rosenberg E Y and Nikaido H 1995 Fluidity of the Lipid Domain of Cell Wall FromMycobacterium chelonae;Proc. Natl. Acad. Sci. USA 92 11254–11258PubMedCrossRefGoogle Scholar
  43. Letunic I, Copley R R, Schmidt S, Ciccarelli F D, Doerks T, Schultz J, Ponting C P and Bork P 2004 SMART 40: towards genomic data integration;Nucleic Acids Res. 32 D142–144PubMedCrossRefGoogle Scholar
  44. Makarova K S, Aravind L, Galperin M Y, Grishin N V, Tatusov R L, Wolf Y I and Koonin E V 1999 Comparative Genomics of the Archaea (Euryarchaeota) Evolution of Conserved Protein Families, the Stable Core, and the Variable Shell;Genome Res. 9 608–628PubMedGoogle Scholar
  45. Martinac B and Kloda A 2003 Evolutionary origins of mechanosensitive ion channels;Prog. Biophys. Mol. Biol. 82 11–24PubMedCrossRefGoogle Scholar
  46. McCue L A, McDonough K A and Lawrence C E 2000 Functional classification of cNMP-binding proteins and nucleotide cyclases with implications for novel regulatory pathways inMycobacterium tuberculosis;Genome Res. 10 204–219PubMedCrossRefGoogle Scholar
  47. Meyer F, Goesmann A, McHardy A C, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R,et al 2003 GenDB-an open source genome annotation system for prokaryote genomes;Nucleic Acids Res. 31 2187–2195PubMedCrossRefGoogle Scholar
  48. Mishra R K and Kasik J E 1970 The mechanisms of mycobacterial resistance to penicillins and cephalosporins;Int. J.Clin. Pharmacol. 3 73–77Google Scholar
  49. Müller A, MacCallum R M and Sternberg M J E 1999 Bench-marking PSI-BLAST in Genome Annotation;J. Mol. Biol. 293 1257–1271PubMedCrossRefGoogle Scholar
  50. Murzin A G and Bateman A 1997 Distant homology recognition using structural classification of proteins;Proteins (Suppl. 1) 105–112Google Scholar
  51. Murzin A G and Brenner S E, Hubbard T and Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures;J. Mol. Biol. 247 536–540PubMedCrossRefGoogle Scholar
  52. Oppermann U, Filling C, Hult M, Shafqat N, Wu X, Lindh M, Shafqat J, Nordling E, Kallberg Y, Personn B,et al 2003 Short-chain dehydrogenases/reductases (SDR): the 2002 update;Chem. Biol. Interact. 143–144, 247–253PubMedCrossRefGoogle Scholar
  53. Orengo C A, Todd A E and Thornton J M 1999 From protein structure to function;Curr. Opin. Struct. Biol. 9 374–382PubMedCrossRefGoogle Scholar
  54. Pandit S B, Gosar D, Abhiman S, Sujatha S, Dixit S S, Mhatre N S, Sowdhamini R and Srinivasan N 2002 SUPFAM-a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes;Nucleic Acids Res. 30 289–293PubMedCrossRefGoogle Scholar
  55. Pawlowski K, Zhang B, Rychlewski L and Godzik A 1999 TheHelicobacter pylori genome from sequence analysis to structural and functional predictions;Proteins 36 20–30PubMedCrossRefGoogle Scholar
  56. Pearl F M, Lee D, Bray J E, Buchan D W, Shepherd A J and Orengo C A 2002 The CATH extended protein-family database providing structural annotations for genome sequences;Protein Sci. 11 233–244PubMedCrossRefGoogle Scholar
  57. Pearson W R and Lipman D J 1988 Improved tools for biological sequence comparison;Proc. Natl. Acad. Sci. USA 85 2444–2448PubMedCrossRefGoogle Scholar
  58. Rost B, Liu J, Nair R, Wrzeszczynski K O and Ofran Y 2003 Automatic prediction of protein function;Cell. Mol. Life Sci. 60 2637–2650PubMedCrossRefGoogle Scholar
  59. Rychlewski L, Zhang B and Godzik A 1998 Fold and function predictions forMycoplasma genitalium proteins;Fold Des. 3 229–238PubMedCrossRefGoogle Scholar
  60. Schaffer A A, Wolf Y I, Ponting C P, Koonin E V, Aravind L and Altschul S F 1999 IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices;Bioinformatics 12 1000–1011CrossRefGoogle Scholar
  61. Schroeder B G, Peterson L M and Fleischmann R D 2002 Improved quantitation and reproducibility inMycobacterium tuberculosis DNA microarrays;J. Mol. MicroBiol. Biotechnol. 4 123–126PubMedGoogle Scholar
  62. Snider D E Jr, Raviglione M and Kochi A 1994 Global Burden of Tuberculosis; inTuberculosis: Pathogenesis, protection, and control (ed.) B R Bloom (Washington DC: Am. Soc. Microbiol.)pp3–11Google Scholar
  63. Sonnhammer ELL, Eddy S R and Durbin R 1997 Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments;Proteins 28 405–420PubMedCrossRefGoogle Scholar
  64. Sonnhammer ELL, Von Heijne G and Krogh A 1998 A hidden Markov model for predicting transmembrane helices in protein sequences; inProceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California (eds) J Glasgow, T Littlejohn, F Major, R Lathrop, D Sankoff and C Sensen, pp 175–182Google Scholar
  65. Sousa M C and McKay D B 2001 Structure of the universal stress protein ofHaemophilus influenzae;Structure (Camb) 9 1135–1141CrossRefGoogle Scholar
  66. Strong M, Mallick P, Pellegrini M, Thompson M J and Eisenberg D 2003 Inference of protein function and protein linkages inMycobacterium tuberculosis based on prokaryotic genome organization a combined computational approach;Genome Biol. 4 R59 EpubPubMedCrossRefGoogle Scholar
  67. Tatusov R L, Galperin M Y, Natale D A and Koonin E V 2000 The COG database: a tool for genome-scale analysis of protein functions and evolution;Nucleic Acids Res. 28 33–36PubMedCrossRefGoogle Scholar
  68. Thornton J M 2001 From genome to function;Science 292 2095–2097PubMedCrossRefGoogle Scholar
  69. Voladri R K R, Lakey D L, Hennigan S H, Menzies B E, Edwards K M and Kernodle D S 1998 Recombinant Expression and Characterization of the Major P-Lactamase ofMycobacterium tuberculosis;Antimicrob. Agents Chemother. 42 1375–1381PubMedGoogle Scholar
  70. Wagner J, Lerner R A and Barbas C F 3rd 1995 Efficient aldolase catalytic antibodies that use the enamine mechanism of natural enzymes;Science 270 1797–1800PubMedCrossRefGoogle Scholar
  71. Wootton J C and Federhen S 1993 Statistics of local complexity in amino acid sequences and sequence databases;Comput. Chem. 17 149–163CrossRefGoogle Scholar
  72. Zhu H and Riggs A F 1992 Yeast Flavohemoglobin is an Ancient Protein Related to Globins and a Reductase Family;Proc. Natl. Acad. Sci. USA 89 5015–5019PubMedCrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2004

Authors and Affiliations

  • Seema Namboori
    • 1
  • Natasha Mhatre
    • 1
  • Sentivel Sujatha
    • 1
    • 2
  • Narayanaswamy Srinivasan
    • 1
  • Shashi Bhushan Pandit
    • 1
  1. 1.Molecular Biophysics UnitIndian Institute of ScienceBangaloreIndia
  2. 2.Reddy US TherapeuticsNorcrossUSA

Personalised recommendations