SAUText - a system for analysis of unstructured textual data

Protaziuk, Grzegorz; Lewandowski, Jacek; Bembenik, Robert

doi:10.1007/s10844-015-0384-1

SAUText - a system for analysis of unstructured textual data

Published: 23 September 2015

Volume 46, pages 369–389, (2016)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Grzegorz Protaziuk¹,
Jacek Lewandowski¹ &
Robert Bembenik¹

345 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Apache Solr, https://lucene.apache.org/solr/
Apache Cassandra http://cassandra.apache.org/
Akka http://akka.io/
accessed June 2015

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc.
Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain.
Google Scholar
Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece.
Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM.
Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238).
Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi:10.1007/978-3-540-92673-3_11.
Google Scholar
Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer.
Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer.
Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360).
Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer.
Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.
Article MATH Google Scholar
Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA.
Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web.
Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305).
Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515).
Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press).
Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria.
Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525).
Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502).
Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press.
Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342.
Google Scholar
Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer.
White, T. (2015). Hadoop: 1e: O’Reilly Media.

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Grzegorz Protaziuk, Jacek Lewandowski & Robert Bembenik

Authors

Grzegorz Protaziuk
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Lewandowski
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bembenik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grzegorz Protaziuk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Protaziuk, G., Lewandowski, J. & Bembenik, R. SAUText - a system for analysis of unstructured textual data. J Intell Inf Syst 46, 369–389 (2016). https://doi.org/10.1007/s10844-015-0384-1

Download citation

Received: 14 January 2015
Revised: 31 August 2015
Accepted: 01 September 2015
Published: 23 September 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10844-015-0384-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SAUText - a system for analysis of unstructured textual data

Abstract

Access this article

Similar content being viewed by others