Abstract
The paper presents methods developed by the Methods of Semantic Recognition of Scientific Documents group in the research within the scope of the SYNAT project. It describes document representation format together with a proof of concept system converting scientific articles in PDF format into this representation. Another topic presented in the article is an experiment with clustering documents by style.
Keywords
The authors are supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Consortium BazTech: BazTech - Database of the Polish Technical Journal Contents (2011), http://baztech.icm.edu.pl/
The DBPedia Community: The DBPedia Knowledge Base (2011), http://DBpedia.org
PubMed Central, http://www.ncbi.nlm.nih.gov/pmc/
S. Hoa Nguyen, Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) To be published in: Intelligent Tools for Building a Scientific Information Platform (2011)
Mulberry Technologies, Inc.: Journal Archiving and Interchange Tag Set Tag Library version 3.0 (2008), http://dtd.nlm.nih.gov/archiving/tag-library
Shinyama, Y.: PDFMiner: Python PDF parser and analyzer (2010), http://www.unixuser.org/~euske/python/pdfminer/
Szczuka, M., Janusz, A., Herba, K.: Clustering of rough set related documents with use of knowledge from dBpedia. In: Yao, J. (ed.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Betliński, P., Gora, P., Herba, K., Nguyen, T.T., Stawicki, S. (2012). Semantic Recognition of Digital Documents. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-24809-2_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24808-5
Online ISBN: 978-3-642-24809-2
eBook Packages: EngineeringEngineering (R0)