Skip to main content

Efficient Visualization of Document Streams

  • Conference paper
Book cover Discovery Science (DS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6332))

Included in the following conference series:

Abstract

In machine learning and data mining, multidimensional scaling (MDS) and MDS-like methods are extensively used for dimensionality reduction and for gaining insights into overwhelming amounts of data through visualization. With the growth of the Web and activities of Web users, the amount of data not only grows exponentially but is also becoming available in the form of streams, where new data instances constantly flow into the system, requiring the algorithm to update the model in near-real time. This paper presents an algorithm for document stream visualization through a MDS-like distance-preserving projection onto a 2D canvas. The visualization algorithm is essentially a pipeline employing several methods from machine learning. Experimental verification shows that each stage of the pipeline is able to process a batch of documents in constant time. It is shown that in the experimental setting with a limited buffer capacity and a constant document batch size, it is possible to process roughly 2.5 documents per second which corresponds to approximately 25% of the entire blogosphere rate and should be sufficient for most real-life applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albrecht-Buehler, C., Watson, B., Shamma, D.A.: Visualizing Live Text Streams Using Motion and Temporal Pooling. IEEE Computer Graphics and Applications 25/3, 52–59 (2005)

    Article  Google Scholar 

  2. Havre, S., Hetzler, B., Nowell, L.: ThemeRiver: Visualizing Theme Changes over Time. In: Proceedings of InfoVis 2000, pp. 115–123 (2000)

    Google Scholar 

  3. Shaparenko, B., Caruana, R., Gehrke, J., Joachims, T.: Identifying Temporal Patterns and Key Players in Document Collections. In: Proceedings of TDM 2005, pp. 165–174 (2005)

    Google Scholar 

  4. Krstajić, M., Mansmann, F., Stoffel, A., Atkinson, M., Keim, D.A.: Processing Online News Streams for Large-scale Semantic Analysis. In: Proceedings of DESWeb 2010 (2010)

    Google Scholar 

  5. Fortuna, B., Grobelnik, M., Mladenić, D.: Visualization of Text Document Corpus. Informatica, pp. 270–277 (2005)

    Google Scholar 

  6. Deerwester, S., Dumais, S., Furnas, G., Landuer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41/6, 391–407 (1990)

    Article  Google Scholar 

  7. Groenen, P.J.F., van der Velden, M.: Multidimensional Scaling. Econometric Institute Report EI 2004-15, Netherlands, April 6 (2004)

    Google Scholar 

  8. Paulovich, F.V., Nonato, L.G., Minghim, R.: Visual Mapping of Text Collections through a Fast High Precision Projection Technique. In: Proceedings of the 10th Conference on Information Visualization, pp. 282–290 (2006)

    Google Scholar 

  9. Salton, G.: Developments in Automatic Text Retrieval. Science 253, 974–979 (1991)

    Article  MathSciNet  Google Scholar 

  10. Hartigan, J.A., Wong, M.A.: Algorithm 136: A k-Means Clustering Algorithm. Applied Statistics 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  11. Gansner, E.R., Koren, Y., North, S.C.: Graph Drawing by Stress Majorization, pp. 239–250 (2004)

    Google Scholar 

  12. Sorkine, O., Cohen-Or, D.: Least-Squares Meshes. In: Proceedings of Shape Modeling International, pp. 191–199 (2004)

    Google Scholar 

  13. Paige, C.C., Saunders, M.A.: Algorithm 583: LSQR: Sparse Linear Equations and Least Squares Problems. ACM Transactions on Mathematical Software 8, 195–209 (1982)

    Article  Google Scholar 

  14. Rakhlin, A., Caponnetto, A.: Stability of k-Means Clustering. In: Advances in Neural Information Processing Systems, pp. 1121–1128 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grčar, M., Podpečan, V., Juršič, M., Lavrač, N. (2010). Efficient Visualization of Document Streams. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds) Discovery Science. DS 2010. Lecture Notes in Computer Science(), vol 6332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16184-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16184-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16183-4

  • Online ISBN: 978-3-642-16184-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics