A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data

Yao, Qingsong; Huang, Xiangji; An, Aijun

doi:10.1007/11546849_25

A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data

Qingsong Yao¹⁸,
Xiangji Huang¹⁸ &
Aijun An¹⁸

Conference paper

1534 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Abstract

In this paper, we describe a novel co-training based algorithm for identifying database user sessions from database traces. The algorithm learns to identify positive data (session boundaries) and negative data (non-session boundaries) incrementally by using two methods interactively in several iterations. In each iteration, previous identified positive and negative data are used to build better models, which in turn can label some new data and improve performance of further iterations. We also present experimental results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proc. of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Chan, J., Koprinska, I., Poon, J.: Co-training with a Single Natural Feature Set Applied to Email Classification. In: Proc. of the 9th Australasian Document Computing Symposium, pp. 47–54 (2004)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence 118(1/2), 69–113 (2000)
Article MATH Google Scholar
Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proc. 17th ICML, pp. 327–334 (2000)
Google Scholar
Huang, X., Peng, F., An, A., Schuurmans, D.: Dynamic web log session identification with statistical language models. J. of American Soc. for Info. Sci.&Tech. 55(14), 1290–1303 (2004)
Article Google Scholar
Kiritchenko, S., Matwin, S.: Email Classification with Co-training. In: Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (2001)
Google Scholar
TPC-C. Transaction Processing Performance Council (TPC) Benchmark C Standard Specification Revision 5.0 (February 2001)
Google Scholar
Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. Accepted in the 10th international conference on database systems for advanced applications (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

York University, Toronto, M3J 1P3, Canada
Qingsong Yao, Xiangji Huang & Aijun An

Authors

Qingsong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiangji Huang
View author publications
You can also search for this author in PubMed Google Scholar
Aijun An
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, Q., Huang, X., An, A. (2005). A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_25

Download citation

DOI: https://doi.org/10.1007/11546849_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics