Skip to main content

Tracking Researcher Mobility on the Web Using Snippet Semantic Analysis

  • Conference paper
Advances in Natural Language Processing (JapTAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

Abstract

This paper presents the Unoporuno system: an application of natural language processing methods to the sociology of migration. Our approach extracts names of people from a scientific publications database, refines Web search queries using bibliographical data and decides of the international mobility category of a person according to the location analysis of those snippets classified as mobility traces. In order to identify mobility traces, snippets are filtered with a name validation grammar, analyzed with mobility related semantic features and classified with a support vector machine. This classification method is completed by a semi-automatic one, where Unoporuno selects 5 snippets to help a sociologist decide upon the mobility status of authors. Empirical evidence for the automatic person classification task suggest that Unoporuno classified 78% of the mobile persons in the right mobility category, with F=0.71. We also present empirical evidence for the semi-automatic task: in 80% of the cases sociologist are able to choose the right category with a moderate level of inter-rater agreement (0.60) based on the 5 snippet selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auriol, L., Felix, B., Schaaper, M.: Mapping Careers and Mobility of Doctorate Holders: Draft Guidelines, Model Questionnaire and Indicators. OECD Science, Technology and Industry Working Papers (2010/01) (2010)

    Google Scholar 

  2. Meyer, J.B., Wattiaux, J.P.: Diaspora Knowledge Networks; Vanishing Doubts and Increasing Evidence. International Journal on Multicultural Societies. UNESCO 8(1), 4–24 (2006)

    Google Scholar 

  3. Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigó, E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Task. In: Conference on Multilingual and Multimodal Information Access Evaluation, CLEF (2010)

    Google Scholar 

  4. Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007). ACL (2007)

    Google Scholar 

  5. Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)

    Google Scholar 

  6. Sekine, S., Artiles, J.: WePS2 Attribute Extraction Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)

    Google Scholar 

  7. Artiles, J., Gonzalo, J., Amigó, E.: The impact of query refinement in the web people search task. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 361–364. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  8. Liu, J., Birnbaum, L., Pardo, B.: Categorizing blogger’s interests based on short snippets of blog posts. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 1525–1526. ACM (2008)

    Google Scholar 

  9. Barr, C., Jones, R., Regelson, M.: The linguistic structure of English Web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 1021–1030. Association for Computational Linguistics, Stroudsburg (2008)

    Chapter  Google Scholar 

  10. Li, X.: Understanding the semantic structure of noun phrase queries. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1337–1345. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  11. Marcos, M.C., Gonzalez-Caro, C.: Comportamiento de los usuarios en la página de resultados de los buscadores. Un estudio basado en eye tracking. El Profesional de la Información 19(4) (July-August 2010)

    Google Scholar 

  12. Mateos, P., Longley, P., Webber, R.: El analisis geodemográfico de apellidos en México. Papeles de Población (65), 73–103 (2010)

    Google Scholar 

  13. Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-source Language Processing Tools. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  14. Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc. (August 2009)

    Google Scholar 

  15. Steinberger, R., Pouliquen, B., Kabadjov, M.A., Belyaeva, J., der Goot, E.V.: JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In: Proceedings of the International Conferenece, RANLP 2011, pp. 104–110 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Flores, J.J.G., Zweigenbaum, P., Yue, Z., Turner, W. (2012). Tracking Researcher Mobility on the Web Using Snippet Semantic Analysis. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics