Keywords

1 Introduction

Over the last decade, many efforts have been made to use robots in place of humans in a wide range of complex and dangerous activities, such as search and rescue operations [1, 2], land monitoring [3,4,5], and military missions [6, 7]. In particular, the use of small-scale robots has been progressively promoted due to the following aspects: cost reduction, risk reduction for humans, and failure reduction caused by human factor (e.g., carelessness, inaccuracy, tiredness). Some significant examples are the defusing of Improvised Explosive Devices (IEDs) placed on the ground and the constant monitoring of wide areas at low-altitudes. In these contexts, the use of rovers and UAVs, respectively, equipped with a vision based system can be efficiently adopted also optimizing the missions in terms of performance, speed, and security. Other examples are tasks as object recognition in outdoor environments and change detection in indoor environments, where small-scale robots are used to support moving video-surveillance systems to automatically detect novelties in the acquired video streams.

Often, especially in military field, the protection from intruders of data acquired during the exploration of areas of interest is a crucial task. This is due to the fact that the acquired data may contain sensible information, such as faces of persons, images of restricted areas, or strategic targets. To prevent the leak or the steal of information from images, in this paper a client-server based system to exploit the visual cryptography technique for encrypting acquired visual data is presented. In detail, the first step consists in using visual cryptography to generate two shares from a target image. Following a public key cryptography approach, only one share, i.e., the private key, is stored in the server. Later, the small-scale robot, used to explore the area of interest, captures the scene by an RGB camera and hides the data contained in it by using the same visual cryptography algorithm. Finally, only a share, i.e., the public key, is sent to the server and made available for subsequent decryption processes.

This paper improves and expands the work presented in [8] by introducing the following extensions:

  • In addition to the small-scale rover, also a small-scale UAV is used to enrich the tests, thus confirming the goodness of the previous results;

  • In addition to the single indoor environment, other challenging indoor environments (for the rover) and some challenging outdoor environments (for the UAV) are used to stress the ability of the proposed method;

  • In addition to the previously adopted target objects, also areas of interest are used to test the proposed visual cryptography algorithm. In particular, parts of acquired images are used as invisible markers to identify the areas.

Like for the system presented in [8], the small-scale rover exhaustively explores the area of interest by a Simultaneous Localization and Mapping (SLAM) algorithm, while the small-scale UAV is manually piloted. This last choice is due to the fact that the implementation of a SLAM algorithm for a UAV introduces additional complex issues that, currently, are not the focus of the present paper. Notice that, as for the work presented in [8], the SLAM algorithm for the small-scale rover is inherited by the method reported in [9].

The rest of the paper is structured as follows. In Sect. 2, a brief overview about the visual cryptography is discussed. In Sect. 3, the system architecture and the visual cryptography algorithm are presented. In Sect. 4, the experimental results are shown. Finally, Sect. 5 concludes the paper.

2 Visual Cryptography Overview

As well known, the term visual cryptography is referred to a family of techniques used to encrypt an image by splitting it into n images called shares. The latter do not allow to distinguish any information about the original image unless they are combined together. This means that if only one share is available, the source data is inaccessible. In our knowledge, the only work in literature that combines small-scale robots, visual cryptography, and a SLAM algorithm (at least for the small-scale rover) is that presented in [8]. In particular, in that work, the authors proposed a client-server rover based system to search encrypted objects in unknown environments.

Fig. 1.
figure 1

System architecture. The server initialization stage shows the storing of the private key. The environment exploration stage shows the generation of the public key. Finally, the server stage shows the decryption of the data.

The visual cryptography technique was introduced by Naor and Shamir [10]. The method they proposed has never been heavily modified, but some improvements and variants can be found in the current literature [11]. Authors in [12], for example, proposed an improvement for the perfect black visual cryptography scheme, thus allowing to achieve real-time performance for the decryption step. Another example is reported in [13], where the authors proposed a verifiable multi-toned visual cryptography scheme to securely transmit confidential images (e.g., medical, forensic) on the web. Since, in visual cryptography, the image resulting from the algorithm is decrypted by the human sight, some techniques to enhance the quality of the decrypted images have been also proposed. In particular, the most popular technique is the dithering (or halftoning) [14, 15], which allows to create halftone images. Some works that present a visual cryptography technique for halftone images are reported in [16,17,18]. In the state-of-the-art, works that combine several approaches to improve the ciphering performance [19, 20] or to integrate the visual cryptography with traditional protection schemes to enhance them [21,22,23] can be also found.

Concerning the environment exploration, the present literature is based on the SLAM approaches [24,25,26]. The aim of these approaches is to maximize the area coverage during the environment exploration and, at the same time, to make the robot conscious of its absolute position within it. SLAM approaches can be used with several sensors, such as depth/time of flight cameras [27, 28], thermal cameras [29], or a fusion of them [30, 31]. In addition, these approaches can be used for different tasks, such as mosaicking generation [32], pipe rehabilitation [33], environment mapping [34], and others.

3 Architecture and Visual Cryptography Algorithm

In this section, the system architecture and the visual cryptography algorithm are described. Initially, the client-server approach is explained, then the steps required to encrypt and decrypt the images are reported.

3.1 System Architecture

In Fig. 1, the architecture of the proposed client-server system is shown. The small-scale rover and the small-scale UAV are considered as the client side, instead a standard workstation is considered the server side. Before starting the environment exploration, the server must be initialized. This is done by storing in it an encrypted target image T. This image can represent a specific object or a location of interest that requires to be hidden. By applying the visual cryptography algorithm on T, two shares, i.e., \(S_1\) and \(S_2\), are generated. The adopted public key cryptography approach is designed to store only a share, i.e., \(S_1\), on the server. The latter is defined as the private key of the target image, while the share \(S_2\) is discarded. The environment can be automatically explored with a small-scale rover driven by a SLAM algorithm, or manually explored by piloting a small-scale UAV. During the exploration, the used robot acquires the scene with a standard RGB camera. On the acquired images, the visual cryptography algorithm is applied to generate, once again, two shares, i.e., \(S_3\) and \(S_4\). While \(S_3\) is discarded, \(S_4\) is sent to the server and defined as public key. The latter is used in conjunction with \(S_1\) to decrypt the target image T.

The advantage of using shares instead of clear images is that even if an intruder makes a physical attack (e.g., clients or servers are stolen) or a digital attack (e.g., a video stream is sniffed), the original information cannot be recovered. Moreover, the encryption of objects or locations of interest can be used to generate invisible markers. This is due to the fact that a target image can be decrypted only by using the correct shares. This means that when an image is decrypted, the robot is in a specific position that represents the target spot within the environment.

3.2 Visual Cryptography Algorithm

In this section, the visual cryptography algorithm is explained. For the encrypting and decrypting stages, the approach reported in [18] is used. It consists of several steps to create the shares. The first is the application of a dithering algorithm to the original image I. The dither is a form of noise intentionally applied to reduce the quantization error. As a result, I is converted into an approximate binary image so that the encryption and decryption processes are easier and the decrypted image has a good quality. In the current literature a wide range of dithering algorithms is available:

  • Average Dithering [35]: is one of the simplest techniques. It consists in calculating the middle tone of each area and in assigning this value to that portion of image;

  • Floyd-Steinberg [36]: is still the most used. It consists in diffusing the quantization error to the near pixels of each pixel of the image;

  • Average Ordered Dithering [37]: is similar to average dithering, but it generates cross-hatch patterns;

  • Halftone Dithering [36]: looks similar to newspaper halftone and produces clusters of pixel areas;

  • Jarvis Dithering [38]: is similar to Floyd-Steinberg, but it distributes the quantization error farther than it, increasing computational cost and time.

Due to its easiness of implementation and its good quality results, in the present approach, as dithering algorithm, the Floyd-Steinberg was chosen. This algorithm diffuses the quantization error to the neighbour pixels as follows:

$$\begin{aligned} \begin{bmatrix} 0&0&0 \\ 0&p&\frac{7}{16} \\ \frac{3}{16}&\frac{5}{16}&\frac{1}{16} \end{bmatrix} \end{aligned}$$
(1)

where, the pixel p is the current pixel examined during the execution of the algorithm. Considering that I is scanned from left to right and from top to bottom, the pixels are quantized only once. Since the proposed system uses colour images, the chosen dithering algorithm is applied to each channel of I, thus obtaining three dithered images. In Fig. 2, the result of this step is shown.

Fig. 2.
figure 2

Dithered images for channels: (a) cyano, (b) magenta, and (c) yellow. (Color figure online)

Fig. 3.
figure 3

Sharing and stacking combination in grayscale images. In both pictures, (a) and (b), the first column is the original pixel (i.e., black and white, respectively), the second and third columns are the share 1 and share 2, respectively. Finally, the last column is the stacked shares.

After the generation of the dithered images, also the shares can be created. Since the latter are generated starting from grayscale images, the shares generation algorithm can be defined as follows:

  1. 1.

    I is tranformed into a black and white halftone image H;

  2. 2.

    For each pixel in the halftone image, a random combination is chosen among those depicted in Fig. 3;

  3. 3.

    Repeat step 2 until every pixel in H is decomposed.

figure a
figure b

The pseudo-code is reported in Algorithm 1. To generate the shares for colour images, the third method presented in [18] was used. The choice fell on this method since it requires only two shares to encrypt/decrypt a colour image, in addition, it does not sacrifice too much image contrast in the resulting image. The method works as follows. First, a dithered image for each channel of I is created. Assuming that we are using the YCMK profile, we obtain a dithered image for Cyan (C), Magenta (M), and Yellow (Y) channels. Subsequently, for each halftone image the Algorithm 1 is used to generate six \(2\times 2\) sharing images, called C1, C2, M1, M2, Y1, and Y2. Each of these shares is composed by two white pixels and two colour pixels. To generate the coloured share 1, C1, M1, and Y1 are combined together, while for generating the coloured share 2, C2, M2, and Y2 are combined. The colour intensity of the share blocks is a value between 0 and 1, where 0 means the absence of that colour and 1 means full intensity. So, for a pixel \(p_{i,j}\) the colour intensity for each channel is defined as \((I_C, I_M, I_Y)\). For each block generated with this method, we have that the colour intensity is \((\frac{1}{2}, \frac{1}{2}, \frac{1}{2})\), while after stacking shares 1 and shares 2, the range of colour intensity is between \((\frac{1}{2}, \frac{1}{2}, \frac{1}{2})\) and (1, 1, 1). As for the grayscale algorithm, the decryption step simply consists in overlapping the two shares, thus obtaining the decrypted image \(I_{dec}\). In Algorithm 2 the pseudo-code of the method is shown, while in Fig. 4 a representation of the algorithm is depicted.

Fig. 4.
figure 4

Decomposition and reconstruction of colour pixel [8].

Since the images acquired by the robot may not be acquired at the same distance, position, and angulation of I, and since the pixels composing the two shares must be almost perfectly aligned to perform the decryption, a morphing procedure is applied on \(S_4\). This allows to optimize, in some cases, the alignment of the two shares [39, 40]. Considering that with the shares a feature-based (e.g., by using keypoints and homography) alignment cannot be performed due to their random pixel arrangement, we have defined eight standard transformations to apply. In Fig. 5, these transformations are shown.

After the generation of \(I_{dec}\), a check to verify the validity of the decrypted image is performed. In particular, the difference between the standard deviation of \(I_{dec}\) and its smoothed copy \(Smooth_{I_{dec}}\) is computed. The smoothing operation is performed by using a median filter with kernel size of \(3 \times 3\). Formally:

$$\begin{aligned} Correctness = I_{dec} - Smooth_{I_{dec}} \end{aligned}$$
(2)

If \(I_{dec}\) is a valid decrypted image, we have that the standard deviation between it and its smoothed copy is low (e.g., less than 10), otherwise higher values are obtained.

3.3 SLAM Algorithm

This section briefly reports the SLAM algorithm, previously presented in [9], used by the rover to exhaustively analyse an unknown environment. Notice that, the used rover is composed by two main components. The first is the base, which is composed in turn by the tracks, while the second is the upper part, that contains the RGB sensor and the ultrasonic sensor. Between the base and the upper part a servomotor is mounted, in a way such that the two parts can be moved independently from each other.

The algorithm considers the environment to be explored as a two-dimensional Cartesian plane C(X, Y), composed by points of the form c(x, y) reachable by the rover. At each c, the rover examines the environment by rotating the base of \(0^\circ , 90^\circ , 180^\circ \), and \(270^\circ \) with respect to the x axis. Once the rover has assumed a position, it starts moving the upper part at \(60^\circ , 90^\circ \), and \(120^\circ \) with respect to its local coordinates, thus acquiring an image for each of these angles. If T is not found in these images, the rover checks with the proximity sensor the next point c in which it can move. Once obtained the distance d with the proximity sensor, the robot checks if it is less than the distance \(d_{threshold}\), which represents the maximum distance within which the sensor detects an obstacle. The threshold depends on the size of the robot, the size of the object to be searched, and the resolution of the images acquired by the RGB sensor. If \(d<d_{threshold}\), it means that there is an obstacle and the rover cannot move to that point. Otherwise, the rover moves to the new point \(c'\) and repeat the steps.

Fig. 5.
figure 5

Transformations applied to \(S_4\) for optimizing the shares overlap.

With respect to the work presented in [8], an improvement that has been made to the SLAM algorithm is the addition of the tilt movement of the RGB sensor. This allows enhancing the acquisition of flat objects, such as credit cards, books, and so on. The tilt movement is performed by moving the camera of \(30^\circ , 60^\circ \), and \(75^\circ \) with respect to the camera starting position.

Fig. 6.
figure 6

The used robots: (a) the small-scale rover, (b) the small-scale UAV.

4 Experimental Results

In this section, the experimental tests are presented. In detail, we first report the experiments performed with the small-scale rover (Fig. 6a), then the experiments performed with the small-scale UAV (Fig. 6b). Considering that, currently, there are no datasets for this kind of systems/methods, in all the experiments our acquisitions are used. The experiments were performed in controlled conditions, with unnoticeable changes of illumination, and without moving objects. The experiments with the small-scale rover were performed in three indoor environments, while the experiments with the small-scale UAV were performed in a wide outdoor environment. The communication between robots and server was made by a direct Wi-Fi connection, to reduce the delay introduced by sending the network packets.

4.1 Experiments with the Small-Scale Rover

In Fig. 6a, the small-scale rover used in three indoor environments is shown. It is composed by an Arduino UNO micro-controller, which handles both the servomotors and the ultrasonic sensor used by the SLAM algorithm, and by a Raspberry Pi 2 model B, which handles the camera/video stream and the communications with the server. Despite the used rover is able to perform all the required tasks (i.e., SLAM algorithm and sending the share to the server), its low computational power affects the time needed to explore the environment. The first set of experiments was performed by running the entire system on-board. After capturing the frame, the Raspberry Pi 2 proceeded with the pipeline described in Sect. 3, and it took about 20 s for each frame for completing the entire pipeline. To overcome this problem, the client-server approach was adopted for improving the performance.

In Fig. 7, the used indoor environments are shown. In detail, we chosen two challenging environments (Fig. 7a and b) and one easy environment (Fig. 7c) to test exhaustively both the SLAM algorithm and the visual cryptography pipeline. The challenging environments are rooms containing desks and seats, while the easy environment is a hallway. For each environment, the rover starts the recognition always from the same starting point. To stress the system, we used both clear (i.e., just placed on the ground) and covered (i.e., underneath the desks or the seats) objects. A total of 20 objects (reported in Table 1) with high variability of colours and sizes was used.

Fig. 7.
figure 7

Environments used for indoor experiments: (a) and (b) challenging environments, (c) an easy (c) environment. Each environment was chosen to test the system in different conditions. The dark blue circles are the objects placed in clear, while the dashed green circles represent the covered objects. (Color figure online)

Table 1. List of objects used during the experiments [8].

Concerning the first challenging environment (Fig. 7a), the confusion matrix is reported in Table 2. In this environment, we have noticed that a good correctness value is in the range of [4, 6], while a wrong decrypted image has a correctness value between [19, 25]. As it is possible to see, the proposed system works generally well, but due to their characteristics the decryption fails for the credit card and the cup. In detail, the decryption fails because even if we apply the transformation shown in Fig. 5, it may be not sufficient to correctly align the shares. The sponge, differently from the experiments presented in [8], was correctly decrypted thanks to the tilt movement of the RGB camera implemented in this new version of the SLAM algorithm.

Table 2. Confusion matrix of the first indoor environment [8].
Table 3. Confusion matrix of the second indoor environment.
Table 4. Confusion matrix of the third indoor environment.

Regarding the second environment, the correctness values are reported in Table 3. These values slightly differ from the ones obtained in the first indoor environment due to the different illumination conditions. In this case, the values obtained for good decrypted images are in the range of [7, 9], while the values corresponding to bad decrypted images are in the range of [27, 31]. As for the first environment, the credit card is hard to decrypt, due to its flat shape that makes difficult to acquire correctly the object and then to generate a good share.

Finally, in Table 4 the correctness values of the third indoor environment are shown. As for the previous environments, we have low values for a correct decrypted image and high values otherwise. In detail, correct values are in the range of [3, 5], while high values are in the range of [15, 18]. Since this is an easy environment, we used only a subset of the objects, and to make the experiments more challenging, we tried to occlude the targets. Despite the uncovered parts were correctly decrypted, we obtained high values for the correctness measures due to the occluded ones.

Regarding the execution time, to explore the first and second environment the rover took about 20 to 50 min for each one. This is due to the random approach of the used SLAM algorithm, and also due to the fact that for this kind of rover the exploration under the desks is difficult. Comparing with the timing reported in [8], the lower bound has been improved by 10 min. This is due to the tilt movement implemented in the new version of the SLAM algorithm, which allows to find targets quickly. Concerning the third environment, the rover took about 10 to 25 min to explore it. This is mainly due to the fact that no obstacles, such as seats and desks, were present.

4.2 Experiments with the Small-Scale UAV

In Fig. 6b, the small-scale UAV used in a wide outdoor environment is shown. The rotors are handled by a Pixhawk 2, while the camera, the visual cryptography algorithm, and the Wi-Fi connection are managed, as for the rover, by the Raspberry Pi 2. Concerning the correctness values, they are reported in Table 5. In these experiments, we obtained the best results in term of correctness. This is due to the fact that the pose of the object placed on the ground is the same to the pose of the object at the time of the server initialization. By using the UAV, also flat objects, such as credit cards, can be correctly decrypted (as depicted in Fig. 8). A condition that must be respected for achieving a correct decryption is that the drone must perform a stabilized flight. This means that the flight height must be the same both during the target acquired and during the server initialization. In fact, zoom in/out activities (i.e., change in flight height), pitch, roll, and jaw movements can influence the share alignment.

Fig. 8.
figure 8

The cup (a) and the credit card (b) viewed from the UAV.

Table 5. Confusion matrix of the outdoor environment.

5 Conclusions

In recent years, autonomous (or semi-autonomous) small-scale robots have been increasingly used to face dangerous activities, including civilian and military operations. Usually, these robots send the acquired data to a ground station to perform a wide range of processing. In some cases, there may be the need to protect the sent data from intruders. In this paper, a system to encrypt video streams acquired by small-scale robots engaged in exploration tasks is presented. In particular, the paper shows results for two small-scale robots: a rover and a UAV. While the latter is manually piloted during the mission, the first is equipped with a SLAM algorithm that allows it to explore autonomously the environment and search exhaustively different targets. The experimental tests where performed in indoor and outdoor environments showing the effectiveness of the proposed method. Currently, in literature, there are no other approaches comparable with that we propose. For this reason, it can be considered as baseline in the area of encrypted target search by small-scale robots.