User Interface for Managing and Refining Related Patent Terms
One of the crucial aspects of the patent examination process is assessing the patentability of an invention by performing extensive keyword-based searches to identify related existing inventions (or lack thereof). The expertise of identifying the most effective keywords is a critical skill and time-intensive step in the examination process. Recently, word embedding  techniques have demonstrated value in identifying related words. In word embedding, the vector representation of an individual word is computed based on its context, and so words with similar meaning exhibit similar vector representation. Using a number of alternate data sources and word embedding techniques we are able to generate a variety of word embedding models. For example, we initially clustered patent data based on the different areas of interests such as Computer Architecture or Biology, and used this data to train Word2Vec  and fastText  models. Even though the generated word embedding models were reliable and scalable, none of the models by itself was sophisticated enough to match an experts choice of keywords.
KeywordsHuman-computer interaction Natural language processing Patent Synonyms Word embedding Patent search Word similarity Clustering
The authors would like to thank David Chiles, David Landrith and Thomas Beach for their support of this effort.
- 1.Yitan, L., Linli, X.: Word embedding revisited: a new representation learning and explicit matrix factorization perspective (PDF). In: International Joint Conference on Artificial Intelligence (2015)Google Scholar
- 2.Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representation of words and phrases and their compositionality. In: NIPS: Proceedings of Neural Information Processing Systems Nevada, USA, pp. 3111–3119 (2013)Google Scholar
- 3.Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification (2016)Google Scholar
- 4.Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNs (2015)Google Scholar
- 5.Zhang, X., Zhao, J., Lecun, Y.: Character-level convolutional networks for text classification. In: Neural Information Processing Systems (2015)Google Scholar
- 6.Yoon, K., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models (2015)Google Scholar
- 8.Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)Google Scholar
- 12.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR: Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, USA (2013). arxiv.org/abs/1301.3781
- 13.Merkel, D.: Docker: lightweight linux containers for consistent development and deployment (2014)Google Scholar
- 14.Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015)Google Scholar