Skip to main content

Patent Literatures Translation System Based on Hadoop

  • Conference paper

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 309))

Abstract

In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. WIPO, http://ipstatsdb.wipo.org/ipstatv2/ipstats/searchresultsTable

  2. Wikipedia, http://en.wikipedia.org/wiki/Centralized_database

  3. Tamer Özsu, M., Valduriez, P.: Principles of distributed database systems. Springer (2011)

    Google Scholar 

  4. Stonebraker, M.: SQL databases v. NoSQL databases. Communications of the ACM 53(4), 10–11 (2010)

    Article  Google Scholar 

  5. Dimiduk, N., Khurana, A., Ryan, M.H.: HBase in Action. Manning (2013)

    Google Scholar 

  6. Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)

    Google Scholar 

  7. Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  10. Dyer, C., Cordova, A., Mont, A., et al.: Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 199–207. Association for Computational Linguistics (2008)

    Google Scholar 

  11. Gao, Q., Vogel, S.: Training phrase-based machine translation models on the cloud: Open source machine translation toolkit Chaski. The Prague Bulletin of Mathematical Linguistics 93(1), 37–46 (2010)

    Google Scholar 

  12. Apache Hadoop homepage, http://hadoop.apache.org

  13. Eidelman, V., Wu, K., Ture, F., et al.: Towards efficient large-scale featurerich statistical machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 126–131 (2013)

    Google Scholar 

  14. Ahmad, R., Kumar, P., Rambabu, B., et al.: Enhancing Throughput of a Machine Translation System using MapReduce Framework: An Engineering Approach. In: ICON (2011)

    Google Scholar 

  15. International Patent Classification (IPC) – WIPO, http://www.wipo.int/classifications/ipc/en/

  16. Broccolo, D., Marcon, L., Nardini, F.M., et al.: Generating suggestions for queries in the long tail with an inverted index. Information Processing & Management 48(2), 326–339 (2012)

    Article  Google Scholar 

  17. http://www.hjtek.com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, D., Huang, H., Huang, Y. (2014). Patent Literatures Translation System Based on Hadoop. In: Park, J., Pan, Y., Kim, CS., Yang, Y. (eds) Future Information Technology. Lecture Notes in Electrical Engineering, vol 309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55038-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55038-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55037-9

  • Online ISBN: 978-3-642-55038-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics