Skip to main content

A Database and Evaluation for Classification of RNA Molecules Using Graph Methods

  • Conference paper
  • First Online:
Graph-Based Representations in Pattern Recognition (GbRPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11510))

Abstract

In this paper, we introduce a new graph dataset based on the representation of RNA. The RNA dataset includes 3178 RNA chains which are labelled in 8 classes according to their reported biological functions. The goal of this database is to provide a platform for investigating the classification of RNA using graph-based methods. The molecules are represented by graphs representing the sequence and base-pairs of the RNA, with a number of labelling schemes using base labels and local shape. We report the results of a number of state-of-the-art graph based methods on this dataset as a baseline comparison and investigate how these methods can be used to categorise RNA molecules on their type and functions. The methods applied are Weisfeiler Lehman and optimal assignment kernels, shortest paths kernel and the all paths and cycle methods. We also compare to the standard Needleman-Wunsch algorithm used in bioinformatics for DNA and RNA comparison, and demonstrate the superiority of graph kernels even on a string representation. The highest classification rate is obtained by the WL-OA algorithm using base labels and base-pair connections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shabash, B., Wiese, K.C.: RNA visualization: relevance and the current state-of-the-art focusing on pseudoknots. IEEE/ACM Trans. Comput. Biol. Bioinformatics 14(3), 696–712 (2017). https://doi.org/10.1109/TCBB.2016.2522421

    Article  Google Scholar 

  2. Wilson, R.C., Algul, E.: Categorization of RNA molecules using graph methods. In: Bai, X., Hancock, E.R., Ho, T.K., Wilson, R.C., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018. LNCS, vol. 11004, pp. 439–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97785-0_42

    Chapter  Google Scholar 

  3. Huang, J., Li, K., Gribskov, M.: Accurate classification of RNA structures using topological fingerprints. PLoS ONE 11, e0164726 (2016)

    Article  Google Scholar 

  4. Chen, L., Calin, G.A., Zhang, S.: Novel insights of structure-based modeling for RNA-targeted drug discovery. J. Chem. Inf. Model. 52(10), 2741–2753 (2012). https://doi.org/10.1021/ci300320t. pMID: 22947071

    Article  Google Scholar 

  5. Miao, Z., Westhof, E.: RNA structure: advances and assessment of 3D structure prediction. Annu. Rev. Biophys. 46(1), 483–503 (2017). https://doi.org/10.1146/annurev-biophys-070816-034125. pMID: 28375730

    Article  Google Scholar 

  6. Rybarczyk, A., et al.: New in silico approach to assessing RNA secondary structures with non-canonical base pairs. BMC Bioinformatics 16, 276–288 (2015). https://doi.org/10.1186/s12859-015-0718-6

    Article  Google Scholar 

  7. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 43(3), 443–453 (1970)

    Article  Google Scholar 

  8. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011). http://dl.acm.org/citation.cfm?id=2078187

    MathSciNet  MATH  Google Scholar 

  9. Vert, J.-P.: The optimal assignment kernel is not positive definite (2008). ArXiv e-prints http://adsabs.harvard.edu/abs/2008arXiv0801.4061V

  10. Lodhi, H.: Computational biology perspective: kernel methods and deep learning. Wiley Interdisc. Rev. Comput. Stat. 4(5), 455–465. https://doi.org/10.1002/wics.1223

  11. What is fasta format? https://zhanglab.ccmb.med.umich.edu/FASTA/

  12. Shelton, J.M., Brown, S.J.: Fasta-O-Matic: a tool to sanity check and if needed reformat fasta files (2015). bioRxiv https://www.biorxiv.org/content/early/2015/08/21/024448

  13. Protein data bank contents guide: atomic coordinate entry format description. Wwpdb.org. http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html

  14. Protein data bank Japan. Pdbj.org. https://pdbj.org

  15. Nucleic acid database (NDB). Ndbserver.rutgers.edu. http://ndbserver.rutgers.edu/

  16. RCSB PDB. Rcsb.org. https://www.rcsb.org

  17. Klosterman, P., Tamura, M., Holbrook, S., Brenner, S.: SCOR: a structural classification of RNA database. Nucleic Acids Res. 30, 392–394 (2002)

    Article  Google Scholar 

  18. Chojnowski, G., Walen, T., Bujnicki, J.M.: RNA bricks - a database of RNA 3D motifs and their interactions. Nucleic Acids Res. 42(D1), D123–D131 (2014). http://dx.doi.org/10.1093/nar/gkt1084

    Article  Google Scholar 

  19. Ray, S.S., Halder, S., Kaypee, S., Bhattacharyya, D.: HD-RNAS: an automated hierarchical database of RNA structures. Front. Genet. 3, 59 (2012). https://www.frontiersin.org/article/10.3389/fgene.2012.00059

    Article  Google Scholar 

  20. York RNA Graph Dataset. https://www.cs.york.ac.uk/cvpr/RNA.html

  21. Antczak, M., et al.: RNApdbee 2.0: multifunctional tool for RNA structure annotation. Nucleic. Acids Res. 46(W1), W30–W35 (2018). https://doi.org/10.1093/nar/gky314

    Article  Google Scholar 

  22. 3DNA: a suite of software programs for the analysis, rebuilding and visualization of 3-dimensional nucleic acid structures. x3dna.org. http://x3dna.org/

  23. Duin, R.P.W., Pękalska, E., Harol, A., Lee, W.J., Bunke, H.: On euclidean corrections for non-euclidean dissimilarities. In: da Vitoria, L.N., et al. (eds.) SSPR/SPR 2008. LNCS, vol. 5342, pp. 551–561. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89689-0_59

    Chapter  Google Scholar 

  24. Kriege, N.M., Giscard, P.-L., Wilson, R.C.: On valid optimal assignment kernels and applications to graph classification. In: Advances in Neural Information Processing Systems, pp. 1615–1623 (2016)

    Google Scholar 

  25. Borgwardt, K.M., Kriegel, H.: Shortest-path kernels on graphs. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27–30 November 2005, Houston, pp. 74–81 (2005). https://doi.org/10.1109/ICDM.2005.132

  26. Giscard, P.-L., Wilson, R.C.: The all-paths and cycles graph kernel. arXiv preprint arXiv:1708.01410 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enes Algul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Algul, E., Wilson, R.C. (2019). A Database and Evaluation for Classification of RNA Molecules Using Graph Methods. In: Conte, D., Ramel, JY., Foggia, P. (eds) Graph-Based Representations in Pattern Recognition. GbRPR 2019. Lecture Notes in Computer Science(), vol 11510. Springer, Cham. https://doi.org/10.1007/978-3-030-20081-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20081-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20080-0

  • Online ISBN: 978-3-030-20081-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics