Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System
Database systems are optimised for managing large data sets, but they face difficulties making an impact to life sciences where the typical use cases involve much more complex analytical algorithms than found in traditional OLTP or OLAP scenarios. Although many database management systems (DBMS) are extensible via stored procedures to implement transactions or complex algorithms, these stored procedures are usually unable to leverage the inbuilt optimizations provided by the query engine, so other optimization avenues must be explored.
In this paper, we investigate how sequence alignment algorithms, one of the most common operations carried out on a bioinformatics or genomics database, can be efficiently implemented close to the data within an extensible database system. We investigate the use of single instruction, multiple data (SIMD) extensions to accelerate logic inside an DBMS. We also compare it to implementations of the same logic outside the DBMS.
Our implementation of an SIMD-accelerated Smith Waterman sequence-alignment algorithm shows an order of magnitude improvement on a non-accelerated version while running inside a DBMS. Our SIMD accelerated version also performs with little to no overhead inside the DBMS compared to the same logic running outside the DBMS.
KeywordsSequence databases Stored procedures SIMD acceleration
- 2.Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013)Google Scholar
- 3.Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016)Google Scholar
- 5.Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009)Google Scholar
- 8.Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015)Google Scholar
- 9.Leturgez, L.: SIMD outside and inside Oracle 12c (2015)Google Scholar
- 11.Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015)Google Scholar
- 14.Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009)Google Scholar
- 17.Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015)Google Scholar
- 19.Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997)Google Scholar
- 21.Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002)Google Scholar
- 22.Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009)Google Scholar