Skip to main content

Automatic Document Annotation with Data Mining Algorithms

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 930))

Abstract

By combining both semantically annotated documents and semantically annotated services, it is possible for digital solutions to automatically retrieve and assign documents not only to their own services but also to those provided by others, thus improving and optimizing the experience of its users. Most of the information exchanged in and between services is still either in paper form or over email and is mostly unstructured and in lack of any form of annotation. Manual and semi-automatic approaches are not suitable to deal with the huge amounts of heterogeneous and constantly flowing data existent in this scenario, thus raising the issue of automatic annotation. In this paper, three data mining algorithms are used to annotate a set of documents and their results compared to manually provided annotations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. McIlraith, S.A., Son, T.C., Zeng, H.: Semantic web services. IEEE Intell. Syst. 16(2), 46–53 (2001)

    Article  Google Scholar 

  2. Uren, V., et al.: Semantic annotation for knowledge management: requirements and a survey of the state of the art. In: Web Semantics: science, services and agents on the World Wide Web 4.1, pp. 14–28 (2006)

    Google Scholar 

  3. Abioui, H., et al.: Semantic annotation of documents: a comparative study. Int. J. Adv. Eng. Manage. Sci. 2(11)

    Google Scholar 

  4. Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci 36, 306–323 (2010)

    Article  Google Scholar 

  5. Pech, F., et al.: Semantic annotation of unstructured documents using concepts similarity. Sci. Program. 2017, 10 (2017)

    Google Scholar 

  6. Oliveira, P., Rocha, J.: Semantic annotation tools survey. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE (2013)

    Google Scholar 

  7. Corcho, O.: Ontology based document annotation: trends and open research problems. Int. J. Metadata Semant. Ontol. 1(1), 47–57 (2006)

    Article  MathSciNet  Google Scholar 

  8. Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: Semantic Computing (ICSC). IEEE (2015)

    Google Scholar 

  9. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  10. Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30, 3–26 (2007)

    Article  Google Scholar 

  11. Allahyari, M., Kochut, K.J., Janik, M.: Ontology-based text classification into dynamically defined topics. In: 2014 IEEE International Conference on Semantic Computing (ICSC). IEEE (2014)

    Google Scholar 

  12. Martine TLO. https://www.ics.forth.gr/isl/MarineTLO/. Accessed 20 Nov 2018

  13. Martine Top Level Ontology Specification. https://www.ics.forth.gr/isl/ontology/content-MTLO/html/index.html. Accessed 20 Nov 2018

  14. Martine TLO Warehouse. https://www.ics.forth.gr/isl/MarineTLO/#warehouse. Accessed 20 Nov 2018

  15. LingPipe. http://alias-i.com/lingpipe. Accessed 30 Nov 2018

  16. Teahan, W.J.: Text classification and segmentation using minimum cross-entropy. In: Content-Based Multimedia Information Access, vol. 2 (2000)

    Google Scholar 

  17. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 2(1), 86–97 (2012)

    Google Scholar 

  18. Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) (2005)

    Google Scholar 

  19. Alvarado, A.B.R., Arevalo, I.L., Leal, E.T.: The acquisition of axioms for ontology learning using named entities. IEEE Lat. Am. Trans. 14(5), 2498–2503 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The present work has been developed under the EUREKA - ITEA2 Project INVALUE (ITEA-13015), INVALUE Project (ANI|P2020 17990), and has received funding from FEDER Funds through NORTE2020 program and from National Funds through FCT under the project UID/EEA/00760/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alda Canito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Canito, A., Marreiros, G., Corchado, J.M. (2019). Automatic Document Annotation with Data Mining Algorithms. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S. (eds) New Knowledge in Information Systems and Technologies. WorldCIST'19 2019. Advances in Intelligent Systems and Computing, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-030-16181-1_7

Download citation

Publish with us

Policies and ethics