User Interface for Managing and Refining Related Patent Terms

  • Girish Showkatramani
  • Arthi KrishnaEmail author
  • Ye Jin
  • Aaron Pepe
  • Naresh Nula
  • Greg Gabel
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 850)


One of the crucial aspects of the patent examination process is assessing the patentability of an invention by performing extensive keyword-based searches to identify related existing inventions (or lack thereof). The expertise of identifying the most effective keywords is a critical skill and time-intensive step in the examination process. Recently, word embedding [1] techniques have demonstrated value in identifying related words. In word embedding, the vector representation of an individual word is computed based on its context, and so words with similar meaning exhibit similar vector representation. Using a number of alternate data sources and word embedding techniques we are able to generate a variety of word embedding models. For example, we initially clustered patent data based on the different areas of interests such as Computer Architecture or Biology, and used this data to train Word2Vec [2] and fastText [3] models. Even though the generated word embedding models were reliable and scalable, none of the models by itself was sophisticated enough to match an experts choice of keywords.

In this study, we have developed a user interface (Fig. 1) that allows domain experts to quickly evaluate several word embedding models and curate a more sophisticated set of related patent terms by combining results from several models or in some cases even augmenting to them by hand. Our application thereby seeks to provide a functional and usable centralized interface towards searching and identifying related terms in the patent domain.
Fig. 1.

Related patent terms.


Human-computer interaction Natural language processing Patent Synonyms Word embedding Patent search Word similarity Clustering 



The authors would like to thank David Chiles, David Landrith and Thomas Beach for their support of this effort.


  1. 1.
    Yitan, L., Linli, X.: Word embedding revisited: a new representation learning and explicit matrix factorization perspective (PDF). In: International Joint Conference on Artificial Intelligence (2015)Google Scholar
  2. 2.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representation of words and phrases and their compositionality. In: NIPS: Proceedings of Neural Information Processing Systems Nevada, USA, pp. 3111–3119 (2013)Google Scholar
  3. 3.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification (2016)Google Scholar
  4. 4.
    Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNs (2015)Google Scholar
  5. 5.
    Zhang, X., Zhao, J., Lecun, Y.: Character-level convolutional networks for text classification. In: Neural Information Processing Systems (2015)Google Scholar
  6. 6.
    Yoon, K., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models (2015)Google Scholar
  7. 7.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  8. 8.
    Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)Google Scholar
  9. 9.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)CrossRefGoogle Scholar
  11. 11.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  12. 12.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR: Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, USA (2013).
  13. 13.
    Merkel, D.: Docker: lightweight linux containers for consistent development and deployment (2014)Google Scholar
  14. 14.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015)Google Scholar

Copyright information

© This is a U.S. government work and its text is not subject to copyright protection in the United States; however, its text may be subject to foreign copyright protection 2018

Authors and Affiliations

  • Girish Showkatramani
    • 1
  • Arthi Krishna
    • 1
    Email author
  • Ye Jin
    • 1
  • Aaron Pepe
    • 1
  • Naresh Nula
    • 1
  • Greg Gabel
    • 1
  1. 1.United States Patent and Trademark OfficeAlexandriaUSA

Personalised recommendations