Skip to main content

A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Abstract

In this paper, we describe a novel co-training based algorithm for identifying database user sessions from database traces. The algorithm learns to identify positive data (session boundaries) and negative data (non-session boundaries) incrementally by using two methods interactively in several iterations. In each iteration, previous identified positive and negative data are used to build better models, which in turn can label some new data and improve performance of further iterations. We also present experimental results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proc. of the Workshop on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  2. Chan, J., Koprinska, I., Poon, J.: Co-training with a Single Natural Feature Set Applied to Email Classification. In: Proc. of the 9th Australasian Document Computing Symposium, pp. 47–54 (2004)

    Google Scholar 

  3. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence 118(1/2), 69–113 (2000)

    Article  MATH  Google Scholar 

  4. Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proc. 17th ICML, pp. 327–334 (2000)

    Google Scholar 

  5. Huang, X., Peng, F., An, A., Schuurmans, D.: Dynamic web log session identification with statistical language models. J. of American Soc. for Info. Sci.&Tech. 55(14), 1290–1303 (2004)

    Article  Google Scholar 

  6. Kiritchenko, S., Matwin, S.: Email Classification with Co-training. In: Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (2001)

    Google Scholar 

  7. TPC-C. Transaction Processing Performance Council (TPC) Benchmark C Standard Specification Revision 5.0 (February 2001)

    Google Scholar 

  8. Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. Accepted in the 10th international conference on database systems for advanced applications (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yao, Q., Huang, X., An, A. (2005). A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_25

Download citation

  • DOI: https://doi.org/10.1007/11546849_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28558-8

  • Online ISBN: 978-3-540-31732-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics