Abstract
In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
WIPO, http://ipstatsdb.wipo.org/ipstatv2/ipstats/searchresultsTable
Wikipedia, http://en.wikipedia.org/wiki/Centralized_database
Tamer Özsu, M., Valduriez, P.: Principles of distributed database systems. Springer (2011)
Stonebraker, M.: SQL databases v. NoSQL databases. Communications of the ACM 53(4), 10–11 (2010)
Dimiduk, N., Khurana, A., Ryan, M.H.: HBase in Action. Manning (2013)
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)
Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Dyer, C., Cordova, A., Mont, A., et al.: Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 199–207. Association for Computational Linguistics (2008)
Gao, Q., Vogel, S.: Training phrase-based machine translation models on the cloud: Open source machine translation toolkit Chaski. The Prague Bulletin of Mathematical Linguistics 93(1), 37–46 (2010)
Apache Hadoop homepage, http://hadoop.apache.org
Eidelman, V., Wu, K., Ture, F., et al.: Towards efficient large-scale featurerich statistical machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 126–131 (2013)
Ahmad, R., Kumar, P., Rambabu, B., et al.: Enhancing Throughput of a Machine Translation System using MapReduce Framework: An Engineering Approach. In: ICON (2011)
International Patent Classification (IPC) – WIPO, http://www.wipo.int/classifications/ipc/en/
Broccolo, D., Marcon, L., Nardini, F.M., et al.: Generating suggestions for queries in the long tail with an inverted index. Information Processing & Management 48(2), 326–339 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, D., Huang, H., Huang, Y. (2014). Patent Literatures Translation System Based on Hadoop. In: Park, J., Pan, Y., Kim, CS., Yang, Y. (eds) Future Information Technology. Lecture Notes in Electrical Engineering, vol 309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55038-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-55038-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55037-9
Online ISBN: 978-3-642-55038-6
eBook Packages: EngineeringEngineering (R0)