Patents and Publications

The Lexical Connection
  • Elise Bassecoulard
  • Michel Zitt


The quantitative appraisal, partly through bibliometrics, of science-technology connections has made great progress in the last decade. We investigate in this chapter the lexical linkage between articles and patents, an alternative method to the systematic exploitation of the citations of patents to scientific papers. We explore in particular the ability to establish correspondence tables between patent classification and scientific categories. After a reminder of the methodological background (S&T linkages, lexical methods, statistical measures) we report an exploratory study based on a subset of the Chemical Abstracts database (CA) that covers both articles and patents by a very precise indexing system. Connection measures have been established, first on controlled vocabulary, and secondly on some natural language fields. The comparison shows some robustness of the lexical approach, with clear limitations at the micro level: topic sharing between a particular article and a particular patent cannot be interpreted in the general case as the sharing of a research question. At the macro level, for example IPC sub-classes and ISI subject categories, the lexical approach is an appealing technique, complementary to usual citation based analysis built on very sparse matrices, because informetric performances of lexical methods can be tuned in a large scope of precision-recall features. The extension to databases specific either to articles or patents requires language processing which can be alleviated if macro level correspondence is solely sought.


Control Vocabulary Patent System Patent Office International Patent Classification Bibliographic Coupling 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bassecoulard, E., Polanco, X., Zitt, M. (2000). Science-technology relationship: the Lexical Connection. 6th S&T Indicators Conference, 24–27. Leiden: CWTS.Google Scholar
  2. Benzecri, J.P. et coll. (1981). Pratique de ľanalyse des données: Linguistique et lexicologie. Paris: Dunod.Google Scholar
  3. Bookstein A., Swanson, D.R. (1974). Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25 (5), 312–318.Google Scholar
  4. Callon, M. (1994). Is science a public good? Science, Technology and Human Values, 19, 395–424.Google Scholar
  5. Callon, M., Law, J., Rip, A. (1986a). How to study the force of science. In M. Callon, J. Law, A. Rip (Eds.), Mapping the dynamics of science and technology (pp. 3–15). London: Macmillan Press.Google Scholar
  6. Callon, M., Law, J., Rip, A. (1986b). Qualitative scientometrics. In M. Callon, J. Law, A. Rip (Eds.), Mapping the dynamics of science and technology (pp.107–123). London: Macmillan Press.Google Scholar
  7. Chowdhury, G.G., Lynch, M.F. (1992). Automatic interpretation of the texts of chemical patent abstracts. 1. Lexical analysis and categorization. Journal of Chemical Information and Computer Sciences, 32, 463–467.Google Scholar
  8. Crestani, F., Lalmas, M., van Rijsbergen, C.J., Campbell I. (1998). Is this document relevant? ⋯ probably: A survey of probabilistic models in information retrieval. ACM Computing Surveys, 30 (4), 1–30.CrossRefGoogle Scholar
  9. Dasgupta P., David P.A. (1994). Toward a new economics of science. Research Policy, 23, 487–521.Google Scholar
  10. David P.A., Foray D. (1995). Accessing and expanding the science and technology knowledge base, STI Review, 16, 13–68.Google Scholar
  11. De Bruijn B., Martin, J. (2002). Getting to the ©ore of knowledge: mining biomedical literature. International Journal of Medical Informatics, 67, 7–18.Google Scholar
  12. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6), 391–407.CrossRefGoogle Scholar
  13. Engelsman, E.C., van Raan, A.F.J. (1994). A patent based cartography of technology. Research Policy, 23, 1–26.CrossRefGoogle Scholar
  14. ePatent website: (last visited 28/11/2003).
  15. Etzkowitz, H., Leydesdorff, L. (1997). Universities and the global knowledge economy: a triple helix of university-industry-government relations. London: Pinter.Google Scholar
  16. European Patent Office (1998). Organisation of search and documentation in DG 1. http://www.european-patent-office-org/dg1/brochures/index-search-doc.htm. (last visited 28/11/2003).
  17. Faucompré, P., Quoniam, L., Dou, H. (1997). An effective link between science and technology. Scientometrics, 40 (3), 465–480.Google Scholar
  18. Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., Trow, M. (1994). The new production of knowledge: the dynamics of science and research in contemporary societies. London: Sage.Google Scholar
  19. Glaenzel, W., Meyer, M. (2003). Patents cited in the scientific literature: an exploratory study of reverse citation relations. Scientometrics, 58 (2), 415–428.Google Scholar
  20. Gordon, T.T., Cookfair, A.S. (2000). Patent fundamentals for scientists and engineers. 2nd Edition, Boca Raton (FA): CRC Press.Google Scholar
  21. Granstrand, O. (1999). The economics and management of intellectual property. Cheltenham: Edward Elgar.Google Scholar
  22. Grupp, H. (1998). Foundations of the economics of innovation. Cheltenham: Edward Elgar.Google Scholar
  23. Hicks, D.M. (1995). Published papers, tacit competencies and corporate management of the public/private character of knowledge. Industrial and Corporate Change, 4 (2), 401–424.Google Scholar
  24. Hinze, S., Schmoch, U. (2004). Opening the black box. Analytical approaches and their impact on the outcome of statistical patent analysis. In W. Glaenzel, H. Moed, U. Schmoch (Eds.), Handbook of Quantitative Science and Technology Research. Kluwer Academic Publishers.Google Scholar
  25. Jacquemin, C., Daille, B., Royauté, J., Polanco, X. (2002). In vitro evaluation of a program for machine aided indexing. Information Processing and Management, 38, 765–792.CrossRefGoogle Scholar
  26. Jaffe A. (1989). Real effects of academic research. American Economic Review, 79, 957–970.Google Scholar
  27. Kessler, M.M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10–25.Google Scholar
  28. Kiang M. (2003). A comparative assessment of classification methods. Decision Support Systems, 441–454.Google Scholar
  29. Krier, M., Zacca, F. (2002). Automatic categorisation applications at the European patent office. World Patent Information, 24, 187–196.CrossRefGoogle Scholar
  30. Kostoff, R.N. (2003). Text mining for global technology watch. In M.A. Drake (Ed.), Encyclopedia of library and information science (pp 2789–2799). New York: M. Dekker.Google Scholar
  31. Leopold, E., May, M., Paass G. (2004). Data mining and text mining for science and technology research. In W. Glaenzel, H. Moed, U. Schmoch (Eds.), Handbook of quantitative science and technology research. Kluwer Academic Publishers.Google Scholar
  32. Leydesdorff, L. (2002). Researching the hidden Web: patents and the science base of technologies. (last visited 28/11/ 2003).
  33. Luhn, H.P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1 (4), 309–317.Google Scholar
  34. Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2 (2), 159–165.Google Scholar
  35. Mandelbrot, B. (1953). An information theory of the statistical structure of language. In W. Jackson (Ed.), Proceedings of the 2nd Symposium on applications of communication theory. London: Butterworths.Google Scholar
  36. Meyer, M., Siniläinen, T., Utecht, J.T. (2003). Towards hybrid Triple Helix indicators. Scientometrics, 58 (2), 321–350.CrossRefGoogle Scholar
  37. Merton, R.K. (1957). Priorities in scientific discovery: a chapter in the sociology of science. American Sociological Review, 22, 635.Google Scholar
  38. Michel, J., Bettels, B. (2001). Patent citation analysis: a closer look at the basic input data from patent search reports. Scientometrics, 51 (1), 185–201.CrossRefGoogle Scholar
  39. Moens, M.F. (2000). Automatic indexing and abstracting of document texts. Kluwer international series on information retrieval. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  40. Morillo, F., Bordons, M., Gómez, I. (2001). An approach to interdisciplinarity through bibliometric indicators. Scientometrics, 51 (1), 203–222.CrossRefGoogle Scholar
  41. Murray, F. (2002). Innovation as co-evolution of scientific and technological networks: exploring tissue engineering. Research Policy, 31 (8–9), 1389–1403.Google Scholar
  42. Narin F., Noma E. (1985). Is technology becoming science? Scientometrics, 7, 369–381.CrossRefGoogle Scholar
  43. Narin F. (1994). Patent bibliometrics. Scientometrics, 30 (1), 147–155.CrossRefGoogle Scholar
  44. Nenadic, G., Mima, H., Spasic, I., Ananiadou, S., Tsujii, J. (2002). Terminology driven literature mining and knowledge acquisistion in biomedicine. International Journal of Medical informatics, 67, 33–48.CrossRefGoogle Scholar
  45. Pavitt, K. (1985). Patent statistics as indicators of innovative activities: possibilities and problems. Scientometrics, 7, 77–99.CrossRefGoogle Scholar
  46. Price, D.J. de Solla (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27, 292–306.Google Scholar
  47. Rabeharisoa, V. (1992). A Special Mediation between Science and Technology: When Inventors Publish Scientific Articles in Fuel Cells Research. In H. Grupp (Ed.), Dynamics of science based innovation. (pp 45–72). Berlin: Springer.Google Scholar
  48. Salton, G. (1968). Automatic information organisation and retrieval. New York: McGraw-Hill.Google Scholar
  49. Salton, G. (1969). A comparison between manual and automatic indexing methods. American Documentation, 61–71.Google Scholar
  50. Salton, G., McGill, M.J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.Google Scholar
  51. Salton, G., Wu, H. (1981). A term weighting model based on utility theory. In R.N. Oddy, S.E. Robertson, C.J. van Rijsbergen, R.W Williams (Eds.), Information retrieval research. (pp 9–22). Boston: Butterworths.Google Scholar
  52. Sarasua, L., Corremans, G. (2000). Cross-lingual issues in patent retrieval. ACM SIGIR 2000 Workshops on patent retrieval. Online proceedings. (last visited 28/11/2003).
  53. Schmoch, U. (1997). Indicators and the relations between science and technology. Scientometrics, 38 (1), 103–116.Google Scholar
  54. Sparck Jones, K. (1999). The role of NLP in text retrieval. In T. Strzalkowski (Ed.), Natural language information retrieval (pp. 1–24). Boston (MA): Kluwer.Google Scholar
  55. Swanson, D.R. (1986). Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in biology and medicine, 30 (1), 7–18.Google Scholar
  56. Tijssen, R.J.W., Korevaar, J.C. (1997). Unravelling the cognitive and interorganisational structure of public/private R&D networks: A case study of catalysis research in the Netherlands. Research Policy, 25, 1277–1293.CrossRefGoogle Scholar
  57. Tijssen, R.J.W (2004). Measuring science-technology interactions. In W. Glaenzel, H. Moed, U. Schmoch (Eds.), Handbook of quantitative science and technology research. Kluwer Academic Publishers.Google Scholar
  58. Turner, W.A., Buffet P., Laville F. (1991). LEXITRAN for an easier public access to patent databases. World Patent Information, 13 (2), 81–90.CrossRefGoogle Scholar
  59. Verbeek, A., Debackere, K., Luwel, M., Andries, P., Zimmermann, E., Deleus, F. (2002). Linking science to technology: Using bibliographic references in patents to build linkage schemes. Scientometrics, 54 (3), 399–420.CrossRefGoogle Scholar
  60. Weeber, M., Klein, H., Aronson, A.R., Mork, J.G., De Jong-Van Den Berg, L.T.W., Vos, R. (2000). Text based discovery in biomedicine: the architecture of the DAD-system. Journal of the American Medical Informatics Association, Suppl., 903–907.Google Scholar
  61. Wong, S.K.M., Yao, Y.Y. (1995). On modelling information retrieval with probabilistic inference. ACM transactions in Information Systems, 13 (1), 38–68.Google Scholar
  62. Zipf, G. (1949). Human behaviour and the principle of least effort. Reading (MA): Addison-Wesley.Google Scholar
  63. Zitt, M., Ramanana-Rahary, S., Bassecoulard, E. (2003). Bridging citation and reference distribution.: 1 — The referencing structure function and its application to co-citation and co-item studies. Scientometrics, 57 (1), 93–118.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Elise Bassecoulard
    • 1
  • Michel Zitt
    • 1
    • 2
  1. 1.Lereco, Institut National de la Recherche AgronomiqueINRANantesFrance
  2. 2.Observatoire des Sciences et des techniquesOSTParisFrance

Personalised recommendations