Skip to main content
Log in

SAUText - a system for analysis of unstructured textual data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Apache Solr, https://lucene.apache.org/solr/

  2. Apache Cassandra http://cassandra.apache.org/

  3. Akka http://akka.io/

  4. accessed June 2015

References

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc.

  • Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain.

    Google Scholar 

  • Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece.

  • Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM.

  • Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238).

  • Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi:10.1007/978-3-540-92673-3_11.

    Google Scholar 

  • Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer.

  • Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer.

  • Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360).

  • Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer.

  • Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.

    Article  MATH  Google Scholar 

  • Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA.

  • Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web.

  • Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305).

  • Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515).

  • Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press).

  • Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria.

  • Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525).

  • Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502).

  • Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press.

  • Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342.

    Google Scholar 

  • Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer.

  • White, T. (2015). Hadoop: 1e: O’Reilly Media.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grzegorz Protaziuk.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Protaziuk, G., Lewandowski, J. & Bembenik, R. SAUText - a system for analysis of unstructured textual data. J Intell Inf Syst 46, 369–389 (2016). https://doi.org/10.1007/s10844-015-0384-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-015-0384-1

Keywords

Navigation