Journal of Intelligent Information Systems

, Volume 46, Issue 2, pp 369–389 | Cite as

SAUText - a system for analysis of unstructured textual data

  • Grzegorz ProtaziukEmail author
  • Jacek Lewandowski
  • Robert Bembenik


Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.


Text mining Text analysis system Ontology enrichment Synonym discovery 


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc.Google Scholar
  2. Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain.Google Scholar
  3. Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece.Google Scholar
  4. Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM.Google Scholar
  5. Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238).Google Scholar
  6. Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi: 10.1007/978-3-540-92673-3_11.Google Scholar
  7. Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer.Google Scholar
  8. Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer.Google Scholar
  9. Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360).Google Scholar
  10. Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer.Google Scholar
  11. Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.CrossRefzbMATHGoogle Scholar
  12. Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA.Google Scholar
  13. Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web.Google Scholar
  14. Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305).Google Scholar
  15. Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515).Google Scholar
  16. Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press).Google Scholar
  17. Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria.Google Scholar
  18. Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525).Google Scholar
  19. Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502).Google Scholar
  20. Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press.Google Scholar
  21. Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342.Google Scholar
  22. Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer.Google Scholar
  23. White, T. (2015). Hadoop: 1e: O’Reilly Media.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Grzegorz Protaziuk
    • 1
    Email author
  • Jacek Lewandowski
    • 1
  • Robert Bembenik
    • 1
  1. 1.Institute of Computer ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations