Advertisement

Leveraging High-Performance Computing Infrastructures to Web Data Analytic Applications by Means of Message-Passing Interface

  • Alexey CheptsovEmail author
  • Bastian Koller
Part of the Modeling and Optimization in Science and Technologies book series (MOST, volume 4)

Abstract

Modern computing technologies are increasingly getting data-centric, addressing a variety of challenges in storing, accessing, processing, and streaming massive amounts of structured and unstructured data effectively. An important analytical task in a number of scientific and technological domains is to retrieve information from all these data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major issue is the size, structural complexity, and frequency of the analyzed data’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques, tools, and infrastructures ineffective. We introduce an innovative approach to parallelise data-centric applications based on the Message-Passing Interface. In contrast to other known parallelisation technologies, our approach enables a very high-utilization rate and thus low costs of using productional high-performance computing and Cloud computing infrastructures. The advantages of the technique are demonstrated on a challenging Semantic Web application that is performing web-scale reasoning.

Keywords

Data-as-a-Service Performance Parallelisation MPI OMPIJava 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gonzalez, R.: Closing in on a million open government data sets (2012), http://semanticweb.com/closinginona-millionopengovernmentdatasets_b29994
  2. 2.
    Linked Life Data repository website, http://linkedlifedata.com/
  3. 3.
    OpenPHACTS project website, http://www.openphacts.org/
  4. 4.
    Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Communications of ACM 47, 45–47 (2004)CrossRefGoogle Scholar
  5. 5.
    Linked Open Data initiative, http://lod-cloud.net
  6. 6.
    Cheptsov, A., Koller, B.: A service-oriented approach to facilitate big data analytics on the Web. In: Topping, B.H.V., Iványi, P. (eds.) Proceedings of the Fourteenth International Conference on Civil, Structural and Environmental Engineering Computing. Civil-Comp Press, Stirlingshire (2013)Google Scholar
  7. 7.
    Cheptsov, A.: Semantic Web Reasoning on the internet scale with Large Knowledge Collider. International Journal of Computer Science and Applications, Technomathematics Research Foundation 8(2), 102–117 (2011)Google Scholar
  8. 8.
    Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Computing 37, 610–632 (2011)CrossRefGoogle Scholar
  9. 9.
    Castain, R.H., Tan, W.: MR+. A technical overview (2012), http://www.open-mpi.de/video/mrplus/Greenplum_RalphCastain-2up.pdf
  10. 10.
    Cheptsov, A.: Enabling High Performance Computing for Semantic Web applications by means of Open MPI Java bindings. In: Proc. the Sixth International Conference on Advances in Semantic Processing (SEMAPRO 2012) Conference, Barcelona, Spain (2012)Google Scholar
  11. 11.
    McCarthy, P.: Introduction to Jena. IBM Developer Works (2013), http://www.ibm.com/developerworks/xml/library/j-jena
  12. 12.
    Gonzalez, R.: Two kinds of big data (2011), http://semanticweb.com/two-kinds-ofbig-datb21925
  13. 13.
    Hadoop framework website, http://hadoop.apache.org/mapreduce
  14. 14.
    Bornemann, M., van Nieuwpoort, R., Kielmann, T.: Mpj/ibis: A flexible and efficient message passing platform for Java. Concurrency and Computation: Practice and Experience 17, 217–224 (2005)Google Scholar
  15. 15.
    MPI: A Message-Passing Interface standard. Message Passing Interface Forum (2005), http://www.mcs.anl.gov/research/projects/mpi/mpistandard/mpi-report-1.1/mpi-report.htm
  16. 16.
    Gabriel, E., et al.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Baker, M., et al.: MPI-Java: An object-oriented Java interface to MPI. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 748–762. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  18. 18.
    van Nieuwpoort, R., et al.: Ibis: a flexible and efficient Java based grid programming environment. Concurrency and Computation: Practice and Experience 17, 1079–1107 (2005)CrossRefGoogle Scholar
  19. 19.
    Dean, J., Ghemawat, S.: MapReduce - simplified data processing on large clusters. In: Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)Google Scholar
  20. 20.
    Resource Description Framework (RDF). RDF Working Group (2004), http://www.w3.org/RDF/
  21. 21.
    Lustre file system - high-performance storage architecture and scalable cluster file system. White Paper. Sun Microsystems, Inc. (December 2007)Google Scholar
  22. 22.
    Portable Batch System (PBS) documentation, http://www.pbsworks.com/
  23. 23.
    Dimovski, A., Velinov, G., Sahpaski, D.: Horizontal partitioning by predicate abstraction and its application to data warehouse design. In: Catania, B., Ivanović, M., Thalheim, B. (eds.) ADBIS 2010. LNCS, vol. 6295, pp. 164–175. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable Semantic Web data management using vertical partitioning. In: Proc. The 33rd International Conference on Very Large Data Bases (VLDB 2007) (2007)Google Scholar
  25. 25.
    Curino, C., et al.: Workload-aware database monitoring and consolidation. In: Proc. SIGMOD Conference, pp. 313–324 (2011)Google Scholar
  26. 26.
  27. 27.
    Cheptsov, A., et al.: Enabling high performance computing for Java applications using the Message-Passing Interface. In: Proc. of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering (PARENG 2011) (2011)Google Scholar
  28. 28.
    Carpenter, B., et al.: mpiJava 1.2: API specification. Northeast Parallel Architecture Center. Paper 66 (1999), http://surface.syr.edu/npac/66
  29. 29.
    Kielmann, T., et al.: Enabling Java for High-Performance Computing: Exploiting distributed shared memory and remote method invocation. Communications of the ACM (2001)Google Scholar
  30. 30.
    Baker, M., Carpenter, B., Shafi, A.: MPJ Express: Towards thread safe Java HPC. In: Proc. IEEE International Conference on Cluster Computing (Cluster 2006) (2006)Google Scholar
  31. 31.
    Judd, G., et al.: Design issues for efficient implementation of MPI in Java. In: Proc. of the 1999 ACM Java Grande Conference, pp. 58–65 (1999)Google Scholar
  32. 32.
    Carpenter, B., et al.: MPJ: MPI-like message passing for Java. Concurrency and Computation - Practice and Experience 12(11), 1019–1038 (2000)CrossRefzbMATHGoogle Scholar
  33. 33.
    Open MPI project website, http://www.openmpi.org
  34. 34.
  35. 35.
    HP-JAVA project website, http://www.hpjava.org
  36. 36.
    Liang, S.: Java Native Interface: Programmer’s Guide and Reference. Addison-Wesley (1999)Google Scholar
  37. 37.
    Vodel, M., Sauppe, M., Hardt, W.: Parallel high performance applications with mpi2java - a capable Java interface for MPI 2.0 libraries. In: Proc. of the 16th Asia-Pacific Conference on Communications (APCC), Nagoya, Japan, pp. 509–513 (2010)Google Scholar
  38. 38.
    NetPIPE parallel benchmark website, http://www.scl.ameslab.gov/netpipe/
  39. 39.
    Bailey, D., et al.: The NAS Parallel Benchmarks. RNR Technical Report RNR-94.007 (March 1994), http://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf
  40. 40.
    MPJ-Express tool benchmarking results, http://mpj-express.org/performance.html
  41. 41.
    Sahlgren, M.: An introduction to random indexing. In: Proc. Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), pp. 1–9 (2005)Google Scholar
  42. 42.
    Jurgens, D.: The S-Space package: An open source package for word space models. In: Proc. of the ACL 2010 System Demonstrations, pp. 30–35 (2010)Google Scholar
  43. 43.
    Assel, M., et al.: MPI realization of high performance search for querying large RDF graphs using statistical semantics. In: Proc. The 1st Workshop on High-Performance Computing for the Semantic Web, Heraklion, Greece (May 2011)Google Scholar
  44. 44.
    Extrae performance trace generation library website, http://www.bsc.es/computer-sciences/extrae
  45. 45.
  46. 46.
    Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2), 95–96 (2007)CrossRefGoogle Scholar
  47. 47.
    Weaver, J., Hendler, J.A.: Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 682–697. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  48. 48.
    Sirin, E., et al.: Pellet: a practical owl-dl reasoner. Journal of Web Semantics (2013), http://www.mindswap.org/papers/PelletJWS.pdf
  49. 49.
    Cheptsov, A., Koller, B.: JUNIPER takes aim at Big Data. inSiDE - Journal of Innovatives Supercomputing in Deutschland 11(1), 68–69 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.High Performance Computing Center StuttgartStuttgartGermany

Personalised recommendations