The aim of this special issue is to provide an introduction to the burgeoning topic of big data networking. Due to the penetration of various digital devices to our lives, the data are generated in an explosively increasing manner. It is highly agreed that the big data era is coming. The significance and promising future of big data have made it a hot topic in the research community. The development of big data imposes a heavy burden to the underlying infrastructure, as it requires vast computation, storage and communication resources. Specially, as the plumbing of big data, networking performs a critical role as it deeply influences the performance of big data applications. The 3 V (i.e., volume, variety and velocity) characteristics of big data are all directly related to networks, including access network, backbone network and datacenter network. Besides, big data applications also raise concerns of security, privacy and trust on the networking layer.

Many new networking technologies, e.g., software-define networking (SDN), network function virtualization (NFV), cloud-radio access network (CRAN), etc., emerge recently as promising potential solutions to address the challenges brought by big data applications. These technologies substantially improve the elasticity, flexibility, customizability and controllability of networks and therefore naturally fit the requirements of big data. Yet, how to explore these technologies to address the challenges introduced by big data is still not well studied.

Hence, this special issue is devoted to addressing various open challenges related to big data networking. Following an open call for papers, we received 41 submissions and eventually get 12 articles accepted after rigorous peer-review process, with an overall acceptance ratio 29.2%. The 12 articles cover various areas within this theme, such as architecture innovation, security and privacy, algorithm design and evaluation. We briefly discuss the main contributions of these 12 papers as follows:

The first article “Secure and private key management scheme in big data networking” concerns on the security, privacy and trust on the networking layer for big data applications. Fan et al. propose a secure hierarchical key management scheme, where the upper-layer keys encrypt the lower-layer keys, to protect user’s data and privacy.

The second article “Distributed correlation model mining from remote sensing big data based on gene expression programming” by Yang et al. targets on remote sensing data mining and proposes a distributed correlation model based on gene expression programming combined with cloud computing, using an abnormal value recognition algorithm and a global model generation algorithm. The proposed model well fits cloud computing environment as it shows a good speed-up with the increase of computing resources.

The third article “Training deep neural network on multiple GPUs with a model averaging method” by Yao et al. notices that existing multiple-GPU exploration methods via data parallelism are not suitable for deep neural network. Facing such fact, they propose a new framework to coordinate the GPUs on a common training task, showing great speed-up ratio comparing to data parallelism models.

In “Data aggregation with end-to-end confidentiality and integrity for large-scale wireless sensor networks”, the authors present a secure energy-saving data aggregation scheme to ensure the end-to-end confidentiality and integrity of data aggregation in large-scale wireless sensor networks. They actually implement the proposed method on sensor motes to verify the correctness and performance.

In “Big program code dissemination scheme for emergency software-define wireless sensor networks”, the authors invent an adjustable duty cycle based fast disseminate scheme for big program code dissemination in software-defined wireless sensor networks. The proposed scheme well adjusts the duty cycle of sensor nodes to receive program codes so as to reduce the program code dissemination delay.

Chen et al., in their article “Migrating big video data to cloud: a peer-assisted approach for VoD”, suggest to migrate both client/server-based and peer-assisted Video-on-Demand (VoD) services into the hybrid cloud and edge peers in fog computing environment and propose three migration strategies (i.e., active, reactive and smart strategies).

In the next article “Distribution-aware cache replication for cooperative road side units in VANETs”, Chen et al. investigate the cache replication strategy for distributed roadside units (RSUs) in vehicular ad hoc networks and find that naïve replication of the most popular content may not always be the best solution. They propose a distribution-aware replication cooperation strategy that can well fit various scenarios with diverse request demands.

Wang et al., in their article “Energy balancing RPL protocol with multipath for wireless sensor networks”, propose the life cycle index with various factors (e.g., node energy, node hops, throughput, packet loss, etc.) as path selection objective function to optimize the routing protocol for low power and lossy networks to achieve better network performance.

Zhang et al. in their article “Optimizing power consumption of mobile devices for video streaming over 4G LTE networks” show that it is significant to save energy in the network part via exploring the transmission patterns in 4G LTE networks. They develop a self-adaptive method allowing flexible parameter tuning to minimize the video streaming energy consumption on mobile devices.

In article “Self-adaptive bat algorithm for large scale cloud manufacturing service composition”, Xu et al. notice that it is important to select the appropriate services to complete the manufacturing task in cloud manufacturing mode. To address the manufacturing service composition (MSC) problem, which is proved as NP-hard, they propose a self-adaptive bat algorithm and investigate the proposed algorithm via extensive statistical analysis.

In article “FTLLS: A fault tolerant, low latency, distributed scheduling approach based on sparrow”, Li et al. target on the fault tolerant limitation in leading distributed task scheduler Sparrow due to the adoption of sample-based techniques. To address this problem, they present Fault Tolerant, Low Latency Sparrow (FTLLS) by extending Sparrow with an assistant machine to handle worker failures and to make better scheduling decisions.

In article “Key frame extraction scheme based on sliding window and features”, Yu et al. investigate how to quickly examine big video data to obtain the main contents. A key frame extraction algorithm is proposed and evaluated from both subjective and objective perspectives.

These 12 contributions encompass a wide range of state-of-the-art research results in big data networking, thereby appealing to both the experts in the field and those who desire an overview of the current breadth of big data networking research.

Xiaofei Liao, Song Guo, Deze Zeng, Kun Wang

Jan. 4th, 2018