Skip to main content
Log in

Knowledge discovery in multidimensional knowledge representation framework

An integrative approach for the visualization of text analytics results

  • Original Article
  • Published:
Iran Journal of Computer Science Aims and scope Submit manuscript

Abstract

Visualization of results is one of the central challenges in big data analytics and integrative text mining. With a growing amount of unstructured data and different perspectives on big data, knowledge graphs have difficulties to simultaneously represent and visualize all analyzed dimensions of knowledge. This paper proposes integrative text mining as a solution to combine results from different dimensional analysis in a multidimensional knowledge representation (MKR) for knowledge discovery and visualization purpose. Analysis results from named entity recognition, topic detection, sentiment analysis and the extraction of semantic relationships are, therefore, integrated into a common representation structure. In the implementation part of this research, an application is introduced which utilizes MKR based on the results of stated text mining methods applied on a German and English news data set. State-of-the-art visualizations are used in the application and MKR adaptively transforms the visualization type of the knowledge graph according to the selected context for knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.wikidata.org/.

  2. http://rtw.ml.cmu.edu/rtw/.

  3. https://concept.research.microsoft.com/.

  4. https://www.google.com/intl/es419/insidesearch/.

  5. https://newsroom.fb.com/company-info/.

References

  1. Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Sci. Am. 284(5), 28–37 (2001)

    Article  Google Scholar 

  2. Zenkert, J., Fathi, M.: Multidimensional knowledge representation of text analytics results in knowledge bases. In: Electro Information Technology (EIT), 2016 IEEE International Conference on. IEEE, pp. 0541–0546. (2016)

  3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. pp. 1247–1250. ACM, (2008)

  4. Fabian, M., Gjergji, K., Gerhard, W., et al.: Yago: A core of semantic knowledge unifying wordnet and wikipedia. In: 16th International World Wide Web Conference, WWW, pp. 697–706. (2007)

  5. Simitsis, A., Baid, A., Sismanis, Y., Reinwald, B.: Multidimensional content exploration. Proceed. VLDB Endow. 1(1), 660–671 (2008)

    Article  Google Scholar 

  6. Zhang, D.: Integrative text mining and management in multidimensional text databases. Ph.D. dissertation, University of Illinois at Urbana-Champaign, (2013)

  7. Lin, C. X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text cube: Computing ir measures for multidimensional text database analysis. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on IEEE, pp. 905–910. (2008)

  8. Lassila, O., Swick, R. R.: Resource description framework (RDF) model and syntax specification. (1999)

  9. McGuinness, D.L., Van Harmelen, F., et al.: OWL web ontology language overview. W3C Recomm. 10(10), 2004 (2004)

    Google Scholar 

  10. Hitzler, P., Krötzsch, M., Rudolph, S., Sure, Y.: Semantic Web: Grundlagen. Springer, Berlin Heidelberg (2007). http://www.springer.com/de/book/9783540339939

  11. Broekstra, J., Klein, M., Decker, S., Fensel, D., Van Harmelen, F., Horrocks, I.: Enabling knowledge representation on the web by extending RDF schema. Comput. Netw. 39(5), 609–634 (2002)

    Article  Google Scholar 

  12. Michalik, P., Stofa, J., Zolotova, I.: Concept definition for big data architecture in the education system. In: Applied Machine Intelligence and Informatics (SAMI), 2014 IEEE 12th International Symposium on IEEE, pp. 331–334. (2014)

  13. Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I.A., Siddiqa, A., Yaqoob, I.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247 (2017)

    Article  Google Scholar 

  14. Bohlouli, M., Dalter, J., Dornhöfer, M., Zenkert, J., Fathi, M.: Knowledge discovery from social media using big data-provided sentiment analysis (SoMABiT). J. Inform. Sci. 41(6), 779–798 (2015)

    Article  Google Scholar 

  15. Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  16. Fasel, D., Meier, A.: Big Data: Grundlagen, Systeme und Nutzungspotenziale. Springer, Fachmedien Wiesbaden (2016). http://www.springer.com/de/book/9783658115883

  17. Gadatsch, A., Landrock, H.: Zielsetzung von big-data-projekten. In: Big Data für Entscheider. pp. 11–16. Springer, (2017)

  18. Nandury, S. V., Begum, B. A.: Strategies to handle big data for traffic management in smart cities. In: Advances in Computing, Communications and Informatics (ICACCI), 2016 International Conference on IEEE, pp. 356–364. (2016)

  19. Bohnacker, U., Dehning, L., Franke, J., Renz, I.: Textual analysis of customer statements for quality control and help desk support. In: Classification, Clustering, and Data Analysis, pp. 437–445. Springer, (2002)

  20. Abts, D., Mülder, W.: Grundkurs Wirtschaftsinformatik: eine kompakte und praxisorientierte Einführung. Springer, Fachmedien Wiesbaden (2009). http://www.springer.com/de/book/9783658163785

  21. Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. InLdv Forum 20(1), 19–62 (2005)

    Google Scholar 

  22. Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  23. Wilks, Y.: Information extraction as a core language technology. In: Information Extraction A Multidisciplinary Approach to an Emerging Information Technology, pp. 1–9. (1997)

    Google Scholar 

  24. Kodratoff, Y.: Knowledge discovery in texts: a definition, and applications. In: Foundations of Intelligent Systems, pp. 16–29. (1999)

    Google Scholar 

  25. Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, pp. 1030–1038. (2009)

  26. Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., Curran, J. R.: Named entity recognition in wikipedia. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. Association for Computational Linguistics, pp. 10–18. (2009)

  27. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  28. Bikel, D. M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, pp. 194–201. (1997)

  29. Sekine, S., et al.: “Nyu: Description of the japanese ne system used for met-2. In: Proc. Message Understanding Conference, (1998)

  30. Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, pp. 8–15. (2003)

  31. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Nyu: Description of the mene named entity system as used in muc-7. In: Proceedings of the Seventh Message Understanding Conference MUC-7. Citeseer (1998)

  32. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, pp. 188–191. (2003)

  33. Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. IJCAI 7, 2733–2739 (2007)

    Google Scholar 

  34. Palmer, D. D., Day, D. S.: A statistical profile of the named entity task. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, pp. 190–193. (1997)

  35. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1–8. (1999)

  36. Jiang, J.: Information extraction from text. In: Mining Text Data. pp. 11–41. Springer, (2012)

  37. Ramshaw, L. A., Marcus, M. P.: Text chunking using transformation-based learning. In: Natural Language Processing Using Very Large Corpora. pp. 157–176. Springer, (1999)

  38. Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  39. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  40. Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)

    Article  Google Scholar 

  41. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc., (1999)

  42. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)

    Google Scholar 

  43. Kuhlen, R., Seeger, T., Strauch, D.: Grundlagen der praktischen Information und Dokumentation: Band 1: Handbuch zur Einführung in die Informationswissenschaft und -praxis, Band 2: Glossar. Walter de Gruyter, (2004)

  44. Klahold, A., Uhr, P., Ansari, F., Fathi, M.: Using word association to detect multitopic structures in text documents. IEEE Intell. Syst. 29(5), 40–46 (2014)

    Article  Google Scholar 

  45. Uhr, P., Klahold, A., Fathi, M.: Imitation of the human ability of word association. Int. J. Soft Comput. Softw. Eng. (JSCSE) 3(3), 248 (2013)

    Google Scholar 

  46. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)

    Article  Google Scholar 

  47. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)

    Article  Google Scholar 

  48. Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inform. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  49. Kaur, A., Gupta, V.: A survey on sentiment analysis and opinion mining techniques. J. Emerg. Technol. Web Intell. 5(4), 367–371 (2013)

    Google Scholar 

  50. Schouten, K., Frasincar, F.: Survey on aspect-level sentiment analysis. IEEE Trans. Knowl. Data Eng. 28(3), 813–830 (2016)

    Article  Google Scholar 

  51. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  52. Broß, J.: Aspect-oriented sentiment analysis of customer reviews using distant supervision techniques. Ph.D. dissertation, Freie Universität Berlin (2013)

  53. Varghese, R., Jayasree, M.: A survey on sentiment analysis and opinion mining. Int. J. Res. Eng. Technol. 2(11), 312–317 (2013)

    Article  Google Scholar 

  54. Uhr, P., Zenkert, J., Fathi, M.: Sentiment analysis in financial markets a framework to utilize the human ability of word association for analyzing stock market news reports. In: Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on IEEE, pp. 912–917. (2014)

  55. Akbik, A., Broß, J.: Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In: WWW Workshop (2009)

  56. Sleator, D. D., Temperley, D.: Parsing English with a link grammar (1995). arXiv:cmp-lg/9508004

  57. Fayyad, U.M., Wierse, A., Grinstein, G.G.: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, Burlington (2002)

    Google Scholar 

  58. Stahl, F., Gabrys, B., Gaber, M.M., Berendsen, M.: An overview of interactive visual data mining techniques for knowledge discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 3(4), 239–256 (2013). https://doi.org/10.1002/widm.1093. [Online]. Available:

    Article  Google Scholar 

  59. Schiller, A., Teufel, S., Thielen, C.: Guidelines für das tagging deutscher textcorpora mit stts. Universitäten Stuttgart und Tübingen, (1995)

  60. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  61. Forney, G.D.: The viterbi algorithm. Proceed. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  62. Remus, R., Quasthoff, U., Heyer, G.: Sentiws-a publicly available german-language resource for sentiment analysis. In: LREC (2010)

  63. Mohammad, S.M., Turney, P.D.: Crowdsourcing a word–emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)

    Article  MathSciNet  Google Scholar 

  64. Dornseiff, F.: Der deutsche Wortschatz nach Sachgruppen. Walter de Gruyter, Berlin (2004)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Zenkert.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zenkert, J., Klahold, A. & Fathi, M. Knowledge discovery in multidimensional knowledge representation framework. Iran J Comput Sci 1, 199–216 (2018). https://doi.org/10.1007/s42044-018-0019-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42044-018-0019-0

Keywords

Navigation