TextVis: An integrated visual environment for text mining

Landau, David; Feldman, Ronen; Aumann, Yonatan; Fresko, Moshe; Lindell, Yehuda; Lipshtat, Orly; Zamir, Oren

doi:10.1007/BFb0094805

David Landau¹,
Ronen Feldman¹,
Yonatan Aumann¹,
Moshe Fresko¹,
Yehuda Lindell¹,
Orly Lipshtat¹ &
…
Oren Zamir²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1510))

Included in the following conference series:

European Symposium on Principles of Data Mining and Knowledge Discovery

428 Accesses
3 Citations

Abstract

TextVis is a visual data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. TextVis takes a multi-strategy approach to text mining, and enables defining complex analysis schemas from basic components, provided by the system. An analysis schema is constructed by dragging functional icons from a tool-pallette onto the workspace and connecting them according to the desired flow of information. The system provides a large collection of basic analysis tools, including: frequent sets, associations, concept distributions, and concept correlations. The discovered patterns are presented in a visual interface allowing the user to operate on the results, and to access the associated documents. TextVis is a complete text mining system which uses agent technology to access various online information sources, text preprocessing tools to extract relevant information from the documents, a variety of data mining algorithms, and a set of visual browsers to view the results. This paper provides an overview on the TextVis system. We describe the system’s architecture, the various tools, and discuss the advantages of our visual environment for mining large document collections.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal A., Srikant R.: Fast algorithms for mining association rules. In: Proceedings of the VLDB Conference, (1994).
Google Scholar
Agrawal A., Imielinski T., Swami A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, (1993) 207–216.
Google Scholar
Cutting D. R., Karger D. R., Pederson J. O., Tukey J. W.: Scatter/Gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15^th International ACM SIGIR Conference on Research and Development in Information Retrieval, (1992) 318–329.
Google Scholar
Fayyad, U.; Piatetsky-Shapiro, G.; and Smyth P.: Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proceedings of the 2^nd International Conference of Knowledge Discovery and Data Mining (KDD), (1996) 82–88.
Google Scholar
Feldman R., Aumann A., Amir A., Zilberstein A., Kloesgen W.: Maximal Association Rules: a New Tool for Mining for Keyword Co-occurrence in Document Collections. In Proceedings of the 3^rd International Conference on Knowledge Discovery (KDD),(1997) 167–170.
Google Scholar
Feldman R., and Hirsh H. “Exploiting Background Information in Knowledge Discovery from Text”, Journal of Intelligent Information Systems, (1997).
Google Scholar
Feldman R., Dagan I., Kloesgen W.: Efficient Algorithms for Mining and Manipulating Associations in Texts. In: Proceedings of EMCSR96, (1996).
Google Scholar
Feldman R., Dagan I.: KDT—knowledge discovery in texts. In: Proceedings of the First International Conference on Knowledge Discovery (KDD), (1995).
Google Scholar
Klösgen W.: Efficient Discovery of Interesting Statements. The Journal of Intelligent Information Systems, 4(1) (1995).
Google Scholar
Google Scholar
Klösgen W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, (Eds.) Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA (1996).
Google Scholar
Lagus, K., Honkela, T., Kaski, S., Kohonen, T.: Self-organizing maps of document collections: A new approach to interactive exploration. In: Proceedings of the 2^nd International Conference on Knowledge Discovery and Data Mining (KDD), (1996) 238–243.
Google Scholar
Rocchio, J. J.: Document retrieval systems—optimization and evaluation. Ph.D. Thesis, Harvard University, (1966).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan, Israel
David Landau, Ronen Feldman, Yonatan Aumann, Moshe Fresko, Yehuda Lindell & Orly Lipshtat
Department of Computer Science, University of Washington, Seattle, WA
Oren Zamir

Authors

David Landau
View author publications
You can also search for this author in PubMed Google Scholar
Ronen Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Aumann
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Fresko
View author publications
You can also search for this author in PubMed Google Scholar
Yehuda Lindell
View author publications
You can also search for this author in PubMed Google Scholar
Orly Lipshtat
View author publications
You can also search for this author in PubMed Google Scholar
Oren Zamir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan M. Żytkow Mohamed Quafafou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Landau, D. et al. (1998). TextVis: An integrated visual environment for text mining. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0094805

Download citation

DOI: https://doi.org/10.1007/BFb0094805
Published: 19 October 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics