Advertisement

Application of High Performance Computing Techniques to the Semantic Data Transformation

  • José Antonio Bernabé-Díaz
  • María del Carmen Legaz-García
  • José M. García
  • Jesualdo Tomás Fernández-Breis
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 745)

Abstract

The growth of the Life Science Semantic Web is illustrated by the increasing number of resources available in the Linked Open Data Cloud. Our SWIT tool supports the generation of semantic repositories, and it has been successfully applied in the field of orthology resources, helping to achieve objectives of the Quest for Orthologs consortium. However, our experience with SWIT reveals that despite the computational complexity of the algorithm is linear with the size of the dataset, the time required for the generation of the datasets is longer than desired.

The goal of this work is the application of High Performance Computing techniques to speed up the generation of semantic datasets using SWIT. For this purpose, the SWIT kernel was reimplemented, its algorithm was adapted for facilitating the application of parallelization techniques, which were finally designed and implemented.

An experimental analysis of the speed up of the transformation process has been performed using the orthologs database InParanoid, which provides many files of orthology relations between pairs of species. The results show that we have been able to obtain accelerations up to 7000x.

The performance of SWIT has been highly improved, which will certainly increase its usefulness for creating large semantic datasets and show that HPC techniques should play an important role for increasing the performance of semantic tools.

Keywords

Semantic web Data transformation High Performance Computing 

Notes

Acknowledgements

This work has been partially funded by to the Spanish Ministry of Economy, Industry and Competitiveness, the European Regional Development Fund (ERDF) Programme and by the Fundación Séneca through grants TIN2014-53749-C2-2-R and 19371/PI/14.

References

  1. 1.
    Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)CrossRefGoogle Scholar
  2. 2.
    Bizer, C.: The emerging web of linked data. Intell. Syst. IEEE 24(5), 87–92 (2009)CrossRefGoogle Scholar
  3. 3.
    Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Brief. Bioinf. 7, 256–274 (2006)CrossRefGoogle Scholar
  4. 4.
    Bourne, P.E., et al.: Biomedicine as a data driven science. In: National Data Integrity Conference-2015, Colorado State University. Libraries (2015)Google Scholar
  5. 5.
    Carriero, N., Osier, M.V., Cheung, K.H., Miller, P.L., Gerstein, M., Zhao, H., Wu, B., Rifkin, S., Chang, J., Zhang, H., White, K., Williams, K., Schultz, M.: A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies. J. Am. Med. Inf. Assoc. 12(1), 90–98 (2005)CrossRefGoogle Scholar
  6. 6.
    Fernández-Breis, J.T., Chiba, H., Legaz-García, M.D.C., Uchiyama, I.: The orthology ontology: development and applications. J. Biomed. Semant. 7, 34 (2016)CrossRefGoogle Scholar
  7. 7.
    Galperin, M.Y., Fernndez, X.M., Rigden, D.J.: The 24th annual nucleic acids research database issue: a look back and upcoming changes. Nucleic Acids Res. 45(D1), D1–D11 (2017)CrossRefGoogle Scholar
  8. 8.
    Hautaniemi, S., Laakso, M.: High-performance computing in biomedicine. In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p. 233. IEEE (2013)Google Scholar
  9. 9.
    Legaz-García, M.D.C., Miñarro-Giménez, J.A., Tortosa, M.M., Fernández-Breis, J.T.: Generation of open biomedical datasets through ontology-driven transformation and integration processes. J. Biomed. Semant. 7, 32 (2016)CrossRefGoogle Scholar
  10. 10.
    Magalhães, G.G., Sartor, A.L., Lorenzon, A.F., Navaux, P.O.A., Beck, A.C.S.: How programming languages and paradigms affect performance and energy in multithreaded applications. In: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 71–78. IEEE (2016)Google Scholar
  11. 11.
    O’brien, K.P., Remm, M., Sonnhammer, E.L.: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33(suppl-1), D476–D480 (2005)Google Scholar
  12. 12.
    Schmitt, T., Messina, D.N., Schreiber, F., Sonnhammer, E.L.: Letter to the editor: Seqxml and orthoxml: standards for sequence and orthology information. Brief. Bioinf. 12(5), 485–488 (2011)CrossRefGoogle Scholar
  13. 13.
    Sonnhammer, E.L., Gabaldón, T., da Silva, A.W.S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P.D., Dessimoz, C., et al.: Big data and other challenges in the quest for orthologs. Bioinformatics (2014) btu492Google Scholar
  14. 14.
    Tange, O.: GNU parallel - the command-line power tool. The USENIX Mag. 36(1), 42–47 (2011)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • José Antonio Bernabé-Díaz
    • 1
    • 3
  • María del Carmen Legaz-García
    • 2
  • José M. García
    • 3
  • Jesualdo Tomás Fernández-Breis
    • 1
  1. 1.Departamento de Informática y SistemasUniversidad de Murcia, IMIB-ArrixacaMurciaSpain
  2. 2.Biomedical Informatics and Bioinformatics PlatformFundación para la Formación e Investigación Sanitarias de la Región de Murcia, IMIB-ArrixacaMurciaSpain
  3. 3.Departamento de Ingeniería y Tecnología de ComputadoresUniversidad de MurciaMurciaSpain

Personalised recommendations