Patent Literatures Translation System Based on Hadoop

Zhang, Di; Huang, Heyan; Huang, Yonggang

doi:10.1007/978-3-642-55038-6_20

Patent Literatures Translation System Based on Hadoop

Di Zhang⁵,
Heyan Huang⁶ &
Yonggang Huang⁵

Conference paper

2164 Accesses
1 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 309))

Abstract

In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

WIPO, http://ipstatsdb.wipo.org/ipstatv2/ipstats/searchresultsTable
Wikipedia, http://en.wikipedia.org/wiki/Centralized_database
Tamer Özsu, M., Valduriez, P.: Principles of distributed database systems. Springer (2011)
Google Scholar
Stonebraker, M.: SQL databases v. NoSQL databases. Communications of the ACM 53(4), 10–11 (2010)
Article Google Scholar
Dimiduk, N., Khurana, A., Ryan, M.H.: HBase in Action. Manning (2013)
Google Scholar
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Article Google Scholar
Dyer, C., Cordova, A., Mont, A., et al.: Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 199–207. Association for Computational Linguistics (2008)
Google Scholar
Gao, Q., Vogel, S.: Training phrase-based machine translation models on the cloud: Open source machine translation toolkit Chaski. The Prague Bulletin of Mathematical Linguistics 93(1), 37–46 (2010)
Google Scholar
Apache Hadoop homepage, http://hadoop.apache.org
Eidelman, V., Wu, K., Ture, F., et al.: Towards efficient large-scale featurerich statistical machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 126–131 (2013)
Google Scholar
Ahmad, R., Kumar, P., Rambabu, B., et al.: Enhancing Throughput of a Machine Translation System using MapReduce Framework: An Engineering Approach. In: ICON (2011)
Google Scholar
International Patent Classification (IPC) – WIPO, http://www.wipo.int/classifications/ipc/en/
Broccolo, D., Marcon, L., Nardini, F.M., et al.: Generating suggestions for queries in the long tail with an inverted index. Information Processing & Management 48(2), 326–339 (2012)
Article Google Scholar
http://www.hjtek.com

Download references

Author information

Authors and Affiliations

Beijing Engineering Research Center of High Volume Language Information Processing & Cloud Computing Applications, Beijing Institute of Technology, Beijing, 100081, China
Di Zhang & Yonggang Huang
School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China
Heyan Huang

Authors

Di Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yonggang Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Zhang .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul National University of Science & and Technology (SeoulTech), Seoul, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
Yi Pan
Digital Media Engineering, Anyang University, Anyang, Korea, Republic of (South Korea)
Cheon-Shik Kim
Information & Communication Technologies, Swinburne University of Technology, Melbourne, Victoria, Australia
Yun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Huang, H., Huang, Y. (2014). Patent Literatures Translation System Based on Hadoop. In: Park, J., Pan, Y., Kim, CS., Yang, Y. (eds) Future Information Technology. Lecture Notes in Electrical Engineering, vol 309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55038-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-55038-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55037-9
Online ISBN: 978-3-642-55038-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics