Abstract
While biomedical literature is rapidly increasing, text classification remains a challenge for researchers, curators and librarians. In the context of this work, we use the Caipirini (http://caipirini.org) service to report on the exploration of a literature corpus related to the G1, S, G2 and M phases of the human cell cycle respectively. We use Support Vector Machines (SVMs) and a well-studied dataset to compare each of the cell cycle phases against all others in order to find abstracts that are related to one specific phase at a time. Finally we measure the performance of the results using the standard accuracy, precision and recall metrics. We find differences between the results of each of the four phases and we compare with previous findings of relevant work. We conclude that the results concur and help interpreting the observed classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology. Genome Biol. 6(7), 224 (2005), doi:10.1186/gb-2005-6-7-224
Krallinger, M., Erhardt, R.A., Valencia, A.: Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today 10(6), 439–445 (2005), doi:10.1016/S1359-6446(05)03376-3
Lewis, J., Ossowski, S., Hicks, J., Errami, M., Garner, H.R.: Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18), 2298–2304 (2006), doi:btl388
Goetz, T., von der Lieth, C.-W.: PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts. Nucleic Acids Res. 33, W774–W778 (2005)
Poulter, G.L., Rubin, D.L., Altman, R.B., Seoighe, C.: MScanner: a classifier for retrieving Medline citations. Bioinformatics 9, 108 (2008), doi:1471-2105-9-108
Tuchler, T., Velez, G., Graf, A., Kreil, D.P.: BibGlimpse: the case for a light-weight reprint manager in distributed literature research. BMC Bioinformatics 9, 406 (2008), doi:1471-2105-9-406
Nobata, C., Cotter, P., Okazaki, N., Rea, B., Sasak1, Y., Tsuruoka, Y., Tsujii, J.I., Ananiadou, S.: Kleio: A Knowledge-enriched Information Retrieval System for Biology. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 787–788. Association for Computing Machinery (2008)
Fontaine, J.F., Barbosa-Silva, A., Schaefer, M., Huska, M.R., Muro, E.M., Andrade-Navarro, M.A.: MedlineRanker: flexible ranking of biomedical literature. Nucleic Acids Res. 37(Web Server issue), W141–W146 (2009), doi:gkp353
Soldatos, T.G., O’Donoghue, S.I., Satagopam, V.P., Barbosa-Silva, A., Pavlopoulos, G.A., Wanderley-Nogueira, A.C., Soares-Cavalcanti, N.M., Schneider, R.: Caipirini: using gene sets to rank literature. BioData Mining 5(1), 1 (2012), doi:10.1186/1756-0381-5-1
Soldatos, T., O’Donoghue, S.I., Satagopam, V.P., Brown, N.P., Jensen, L.J., Schneider, R.: Martini: using literature keywords to compare gene sets. Nucleic Acid Res. 38(1), 26–38 (2010), doi:10.1093/nar/gkp876
Jensen, L.J., Jensen, T.S., de Lichtenberg, U., Brunak, S., Bork, P.: Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443(7111), 594–597 (2006), doi:10.1038/nature05186
PubMed, http://pubmed.org
Entrez gene database, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
Ensembl, http://ensembl.org
Fan, R.-E., Chang, K.W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Medical Subject Headings (MeSH) Fact sheet. In: National Library of Medicine (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soldatos, T.G., Pavlopoulos, G.A. (2012). Mining Cell Cycle Literature Using Support Vector Machines. In: Maglogiannis, I., Plagianakos, V., Vlahavas, I. (eds) Artificial Intelligence: Theories and Applications. SETN 2012. Lecture Notes in Computer Science(), vol 7297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30448-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-30448-4_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30447-7
Online ISBN: 978-3-642-30448-4
eBook Packages: Computer ScienceComputer Science (R0)