Application of High Performance Computing Techniques to the Semantic Data Transformation
The growth of the Life Science Semantic Web is illustrated by the increasing number of resources available in the Linked Open Data Cloud. Our SWIT tool supports the generation of semantic repositories, and it has been successfully applied in the field of orthology resources, helping to achieve objectives of the Quest for Orthologs consortium. However, our experience with SWIT reveals that despite the computational complexity of the algorithm is linear with the size of the dataset, the time required for the generation of the datasets is longer than desired.
The goal of this work is the application of High Performance Computing techniques to speed up the generation of semantic datasets using SWIT. For this purpose, the SWIT kernel was reimplemented, its algorithm was adapted for facilitating the application of parallelization techniques, which were finally designed and implemented.
An experimental analysis of the speed up of the transformation process has been performed using the orthologs database InParanoid, which provides many files of orthology relations between pairs of species. The results show that we have been able to obtain accelerations up to 7000x.
The performance of SWIT has been highly improved, which will certainly increase its usefulness for creating large semantic datasets and show that HPC techniques should play an important role for increasing the performance of semantic tools.
KeywordsSemantic web Data transformation High Performance Computing
This work has been partially funded by to the Spanish Ministry of Economy, Industry and Competitiveness, the European Regional Development Fund (ERDF) Programme and by the Fundación Séneca through grants TIN2014-53749-C2-2-R and 19371/PI/14.
- 4.Bourne, P.E., et al.: Biomedicine as a data driven science. In: National Data Integrity Conference-2015, Colorado State University. Libraries (2015)Google Scholar
- 5.Carriero, N., Osier, M.V., Cheung, K.H., Miller, P.L., Gerstein, M., Zhao, H., Wu, B., Rifkin, S., Chang, J., Zhang, H., White, K., Williams, K., Schultz, M.: A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies. J. Am. Med. Inf. Assoc. 12(1), 90–98 (2005)CrossRefGoogle Scholar
- 8.Hautaniemi, S., Laakso, M.: High-performance computing in biomedicine. In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p. 233. IEEE (2013)Google Scholar
- 10.Magalhães, G.G., Sartor, A.L., Lorenzon, A.F., Navaux, P.O.A., Beck, A.C.S.: How programming languages and paradigms affect performance and energy in multithreaded applications. In: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 71–78. IEEE (2016)Google Scholar
- 11.O’brien, K.P., Remm, M., Sonnhammer, E.L.: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33(suppl-1), D476–D480 (2005)Google Scholar
- 13.Sonnhammer, E.L., Gabaldón, T., da Silva, A.W.S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P.D., Dessimoz, C., et al.: Big data and other challenges in the quest for orthologs. Bioinformatics (2014) btu492Google Scholar
- 14.Tange, O.: GNU parallel - the command-line power tool. The USENIX Mag. 36(1), 42–47 (2011)Google Scholar