We are pleased to present this special issue of the Journal of Signal Processing Systems on Application-specific Systems, Architectures and Processors.

Our first article, titled “Loop Parallelization Techniques for FPGA Accelerator Synthesis” (10.1007/s11265-017-1229-7) by Oliver Reiche, Mehmet Akif Özkan, Moritz Schmid, Frank Hannig, and Jürgen Teich, demonstrates the promising technique of loop-coarsening in enhancing data-level parallelism in synthesizing image processing kernels for FPGA. With design productivity remains the major hurdle for FPGA accelerator development, the ability to develop highly efficient hardware accelerators through high-level synthesis is not only promising, but also imperative. This work is an important milestone that progress the research field in this direction.

In the article “A Modular Architecture for Structured Long Block-Length LDPC Decoders” (10.1007/s11265-017-1232-z), authors Andrew Wong, Saied Hemati and Warren Gross present a modular hardware architecture to implement LDPC decoders with long block lengths. Their partially parallel architecture allows complex LDPC decoders to be designed efficiently in hardware, while maintaining superior performance and area tradeoffs.

The article “Run-time Reconfigurable Acceleration for Genetic Programming Fitness Evaluation in Trading Strategies” (10.1007/s11265-017-1244-8) by Ingrid Funie, Paul Grigoras, Pavel Burovskiy, Wayne Luk and Mark Salmon applies FPGA acceleration for fitness evaluation in genetic programming applied to financial trading strategies. They apply a run-time reconfigurable approach that reduces resource usage. Their design shows significant speedup compared to software implementations.

In the article, “A hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization”, (10.1007/s11265-017-1227-9) by Yong Chen, Hai Jin, Ran Zheng, Yuandong Liu, and Wei Wang propose a method for accelerating the distributed solution of large sparse systems of linear equations. The authors present a hybrid CPU-GPU implementation of Sparse Cholesky Factorization. They demonstrate improvement in speed over existing solutions.

The authors of the paper “LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows”, (10.1007/s11265-016-1216-4) Yongchao Liu and Bertil Schmidt, present a parallelized compressed sparse row-based sparse matrix-vector multiplication implementation. They propose an efficient mapping to CUDA-enabled GPUs. Experiments on a range of sparse matrices show that their solution outperforms existing solutions.

We would like to thank all of the authors for their contributions to this special issue and we would like to thank the anonymous reviewers for their efforts in ensuring the paper quality. We hope that you enjoy reading this special issue.