Abstract
In this paper, we describe a novel co-training based algorithm for identifying database user sessions from database traces. The algorithm learns to identify positive data (session boundaries) and negative data (non-session boundaries) incrementally by using two methods interactively in several iterations. In each iteration, previous identified positive and negative data are used to build better models, which in turn can label some new data and improve performance of further iterations. We also present experimental results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proc. of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Chan, J., Koprinska, I., Poon, J.: Co-training with a Single Natural Feature Set Applied to Email Classification. In: Proc. of the 9th Australasian Document Computing Symposium, pp. 47–54 (2004)
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence 118(1/2), 69–113 (2000)
Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proc. 17th ICML, pp. 327–334 (2000)
Huang, X., Peng, F., An, A., Schuurmans, D.: Dynamic web log session identification with statistical language models. J. of American Soc. for Info. Sci.&Tech. 55(14), 1290–1303 (2004)
Kiritchenko, S., Matwin, S.: Email Classification with Co-training. In: Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (2001)
TPC-C. Transaction Processing Performance Council (TPC) Benchmark C Standard Specification Revision 5.0 (February 2001)
Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. Accepted in the 10th international conference on database systems for advanced applications (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yao, Q., Huang, X., An, A. (2005). A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_25
Download citation
DOI: https://doi.org/10.1007/11546849_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)