Abstract
This paper presents the Unoporuno system: an application of natural language processing methods to the sociology of migration. Our approach extracts names of people from a scientific publications database, refines Web search queries using bibliographical data and decides of the international mobility category of a person according to the location analysis of those snippets classified as mobility traces. In order to identify mobility traces, snippets are filtered with a name validation grammar, analyzed with mobility related semantic features and classified with a support vector machine. This classification method is completed by a semi-automatic one, where Unoporuno selects 5 snippets to help a sociologist decide upon the mobility status of authors. Empirical evidence for the automatic person classification task suggest that Unoporuno classified 78% of the mobile persons in the right mobility category, with F=0.71. We also present empirical evidence for the semi-automatic task: in 80% of the cases sociologist are able to choose the right category with a moderate level of inter-rater agreement (0.60) based on the 5 snippet selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Auriol, L., Felix, B., Schaaper, M.: Mapping Careers and Mobility of Doctorate Holders: Draft Guidelines, Model Questionnaire and Indicators. OECD Science, Technology and Industry Working Papers (2010/01) (2010)
Meyer, J.B., Wattiaux, J.P.: Diaspora Knowledge Networks; Vanishing Doubts and Increasing Evidence. International Journal on Multicultural Societies. UNESCO 8(1), 4–24 (2006)
Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigó, E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Task. In: Conference on Multilingual and Multimodal Information Access Evaluation, CLEF (2010)
Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007). ACL (2007)
Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)
Sekine, S., Artiles, J.: WePS2 Attribute Extraction Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)
Artiles, J., Gonzalo, J., Amigó, E.: The impact of query refinement in the web people search task. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 361–364. Association for Computational Linguistics, Stroudsburg (2009)
Liu, J., Birnbaum, L., Pardo, B.: Categorizing blogger’s interests based on short snippets of blog posts. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 1525–1526. ACM (2008)
Barr, C., Jones, R., Regelson, M.: The linguistic structure of English Web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 1021–1030. Association for Computational Linguistics, Stroudsburg (2008)
Li, X.: Understanding the semantic structure of noun phrase queries. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1337–1345. Association for Computational Linguistics, Stroudsburg (2010)
Marcos, M.C., Gonzalez-Caro, C.: Comportamiento de los usuarios en la página de resultados de los buscadores. Un estudio basado en eye tracking. El Profesional de la Información 19(4) (July-August 2010)
Mateos, P., Longley, P., Webber, R.: El analisis geodemográfico de apellidos en México. Papeles de Población (65), 73–103 (2010)
Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-source Language Processing Tools. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), Valletta (2010)
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc. (August 2009)
Steinberger, R., Pouliquen, B., Kabadjov, M.A., Belyaeva, J., der Goot, E.V.: JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In: Proceedings of the International Conferenece, RANLP 2011, pp. 104–110 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Flores, J.J.G., Zweigenbaum, P., Yue, Z., Turner, W. (2012). Tracking Researcher Mobility on the Web Using Snippet Semantic Analysis. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-33983-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33982-0
Online ISBN: 978-3-642-33983-7
eBook Packages: Computer ScienceComputer Science (R0)