Segmenting and Labeling Query Sequences in a Multidatabase Environment

Acar, Aybar C.; Motro, Amihai

doi:10.1007/978-3-642-25109-2_24

Aybar C. Acar²⁹ &
Amihai Motro³⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7044))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

589 Accesses
1 Citations

Abstract

When gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious.

We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (~70% accuracy when 15% of queries in the sequence are spurious).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acar, A.C., Motro, A.: Inferring user goals from sets of independent queries in a multidatabase environment. In: Ras, Z., Tsay, L.-S. (eds.) Advances in Intelligent Information Systems. SCI, vol. 265, pp. 225–243. Springer, Heidelberg (2010)
Chapter Google Scholar
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of Knowledge Discovery and Data Mining, pp. 407–416 (2000)
Google Scholar
Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report ICSI-TR-97-021, University of Berkeley (1997)
Google Scholar
Cardiff, J., Catarci, T., Santucci, G.: Semantic query processing in a heterogeneous database environment. Journal of Intelligent and Cooperative Information Systems 6(2), 151–192 (1997)
Article Google Scholar
Chen, M.-S., Park, J.S., Yu, P.S.: Efficient data mining for path traversal patterns. Knowledge and Data Engineering 10(2), 209–221 (1998)
Article Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1), 5–32 (1999)
Article Google Scholar
Godfrey, P., Gryz, J.: Semantic query caching for heterogeneous databases. In: Proceedings of Knowledge Representation Meets Databases, pp. 6.1–6.6 (1997)
Google Scholar
He, D., Goker, A.: Detecting session boundaries from web user logs. In: Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval (2000)
Google Scholar
Jin, R., Yan, R., Zhang, J., Hauptmann, A.: A Faster Iterative Scaling Algorithm for Conditional Exponential Model. In: Proceedings of the 20th Int. Conf. on Machine Learning, pp. 282–289 (2003)
Google Scholar
Joachims, T.: Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science (2002)
Google Scholar
Kindermann, R., Snell, J.: Markov random fields and their applications. American Mathematical Society, Providence (1980)
Book MATH Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th Int. Conf. on Machine Learning, pp. 282–289 (2001)
Google Scholar
Levy, A.Y., Sagiv, Y.: Semantic query optimization in datalog programs. In: Proceedings of Principles of Database Systems, pp. 163–173 (1992)
Google Scholar
Liu, D., Nocedal, J.: On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming 45(1), 503–528 (1989)
Article MATH MathSciNet Google Scholar
McCallum, A.: Efficiently inducing features of conditional random fields. In: Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2003), pp. 403–411 (2003)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Wallach, H.: Efficient Training of Conditional Random Fields. Master’s thesis, University of Edinburgh (2002)
Google Scholar
Yao, Q., Huang, X., An, A.: A Machine Learning Approach to Identifying Database Sessions Using Unlabeled Data. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 254–264. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, 06800, Turkey
Aybar C. Acar
Department of Computer Science, George Mason University, Fairfax, VA, 22030, USA
Amihai Motro

Authors

Aybar C. Acar
View author publications
You can also search for this author in PubMed Google Scholar
Amihai Motro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STAR Lab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussel, Belgium
Robert Meersman
DEBII, Curtin University of Technology, Technology Park, De Laeter Way, 6102, Bentley, WA, Australia
Tharam Dillon
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660, Boadilla del Monte, Madrid, Spain
Pilar Herrero
Smeal College of Business, Pennsylvania State University, University Park, PA 16802, U.S.A.
Akhil Kumar
Institute of Databases and Information Systems, Ulm University, Germany
Manfred Reichert
City University of Hong Kong, Hong Kong
Li Qing
National University of Singapore (NUS), Singapore
Beng-Chin Ooi
Dipartemento Tecnologie dell’Informazione, Universitá degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy
Ernesto Damiani
VU Station B #1829, Vanderbilt University, 2015 Terrace Place, TN 37203, Nashville, USA
Douglas C. Schmidt
Virginia Tech, 24060, Blacksburg, VA, USA
Jules White
Digital Enterprise Research Institute (DERI), National University of Ireland, IDA Business Park, Lower Dangan, Galway, Ireland
Manfred Hauswirth
Kno.e.sis Center, Wright State University, Dayton, Ohio, USA
Pascal Hitzler
IBM India Research Lab, 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh Mohania

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Acar, A.C., Motro, A. (2011). Segmenting and Labeling Query Sequences in a Multidatabase Environment. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2011. OTM 2011. Lecture Notes in Computer Science, vol 7044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25109-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-25109-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25108-5
Online ISBN: 978-3-642-25109-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics