1 Introduction

Training is the process involving with the trainees to learn and practice lead to gain knowledge and skill in their jobs. The presentation methods are the simple ways that deliver the information to the trainee. The simulation method is the method that helps the trainee to learn and practice in the virtual environment. The trainee can feel like operate in the real environment. On-the-Job Training is the method that the trainer directly instructs to the trainee. Also, the trainee can learn and practice at the actual workplace. This method generally is the most efficient [1,2,3].

Several research works presented the systems to improve the training process including avoiding the risk, reducing the cost of training, preparing the trainee in the specific task, assisting learning in the difficult task, and so on. Furthermore, the research works that implement the training method into are facilitating and improve the learning of the trainees as shown in Table 1.

Table 1. Comparison of the training methods

Virtual Reality (VR) technology has been widely spread in various tasks. For the training task, the virtual reality especially augmented reality allowed the developer to create the virtual system for assisting the user in many roles such as virtual teacher, virtual environment, interactive information, and simulation. Teaching algorithm can also be integrated into the system as a virtual trainer. It responded to the user according to the environment and situation [4, 5]. During the training, some task was inconvenient for trainee to be trained in the actual workplace because of risk, cost, restriction, and time limit. The training system mostly aims to create the simulation for training that provided the interactive training environment [6,7,8,9,10,11,12]. Moreover, the virtual training allowed the trainee to practice the same situation many times for the specific scenario. Many case studies from the virtual training could help the trainee to find their suitable solutions for assigned tasks [4, 8, 13].

The above systems still have some problems related to the number of trainees at a time, unrealistic virtual training, distortion of the video camera, request of the marker, the difficulty of interaction, and limitation of point-of-view. The proposed system was designed to solve these problems by integration of networking system for multiple users and providing the holograms for natural interaction with the real environment and actual workplace.

2 System Overview

In general, most of the operational manuals, which are written in the document papers, are inconvenient [14]. Some previous research suggested that the Mobile Augmented Reality System (MARS) enhanced the capability of providing the graphics information to the user autonomously at anytime, in any place [15]. So, the MARS could be suitable for On-the-Job Training method which needs the direct instructions to perform the job in real-time. Moreover, MARS allows the users to use the natural interaction but it has to understand the details of environment at a workplace. Nowadays, the hardware device, which is suitable to the above requirements, is the Microsoft HoloLens.

During proposed On-the-Job Training, both trainer and trainee work at the same place and share some job’s information in form of graphics information at the same time. So, the networking system could be good choice for sharing that kind of information. In addition, the Universal Robot 5 was chosen to be a manipulator in the proposed system because it is easy to be interfaced by sending an URScript via ethernet connection.

The main software platform chosen to development the proposed system is the Unity which is a game engine for developing the 3D application and supports various platforms including the Microsoft HoloLens [16]. In addition, Microsoft provided MixedRealityToolkit [17] that fully supports the development with the Unity.

Microsoft HoloLens was a primary hardware used to interact with the users. A desktop computer with system core was used to manage all user connections and deal with the augmented environment. The system core took care of the augmented environment which contained a lot of holograms such as virtual robot, script generator, script cube, script header, recycle bin, and general-purpose switch. The holograms were mapped and interacted by the user’s gesture to the real environment accordingly to the spatial mapping [18, 19]. All users sent their information to each other via the system core. In addition, all users interacted the same augmented environment at the same actual workplace.

During interaction with the augmented environment, all holograms were managed by the system core. So, the users had to sent the information consisted of user’s position, gaze direction and gesture to the system core. Then, the processed holograms’ information was broadcasted by the system core to all users. Finally, all users were able to visualize the same holograms at a the same time. In the system core, the users’ information was utilized by the system manager to synchronize the interactions between users and holograms as shown in Fig. 1.

Fig. 1.
figure 1

System overview

3 Holograms

In the proposed system, the holograms were the computer graphic display as the 3D object that responded to the gesture and real-world surface [20]. Moreover, it had the potential to mimic the behavior of real object including the industrial robot in the augmented environment.

3.1 Interaction

In the augmented environment, the holograms were interacted by Gaze, Air Tap, and Tap and Hold gestures. The holograms responded to Air Tap as clicking or selecting like a mouse click, Tap and Hold as selecting and dragging like a mouse click and drag. Both gestures were activated after targeting hologram with Gaze gesture.

3.2 Virtual Robot

The virtual robot was used to display the hologram and mimic behavior of the Universal Robot 5 including joint configuration, movement, size, and shape [21]. Multiple users could interact with each part of the virtual robot at the same time with gesture recognition. The virtual robot was constructed from many parts; base, shoulder, elbow, wrist 1, wrise 2, and wrise 3. Each part of a virtual robot was filled with different colors and labeled to help the users to identify each part easily. Moreover, each part was connected together by the configurable joint component and it allowed only one joint to be moved at a time. Each part of the virtual robot was also attached by the rigidbody component for physics simulation [22, 23]. Hence, the virtual robot performed similarly to the real robot as shown in Fig. 2.

Fig. 2.
figure 2

Virtual robot superimposed on the right-sided real robot

3.3 Script Generator

The script generator consisted of two parts, which were a base and a gate. A base was a small gray cylinder with radius of 0.15 m to be snapped with a generated script cube. A gate was a blue ring spinning around itself over a base about 0.5 m. The main role of a script generator was to spawn a script cube when the user finished the manipulation of the virtual robot. During the spawning, a gate played an animation by moving itself down to a base and moving back to the previous position. Finally, a script cube was generate by a script generator then pushed to a base as shown in Fig. 3.

Fig. 3.
figure 3

Script generator

During the spawning, if the previous script cube was still at the base, that previous script cube was moved to the recycle bin automatically.

3.4 Script Cube

The script cube was displayed as a virtual cube sized of 0.3 cubic meter and carried the information of the virtual robot; joint rotation and gripper’s action (active, deactive) during spawning of the script cube. Then, the script cube with the random color was labeled by the number accordingly to the order of spawning. The label was adjusted accordingly to the user’s position. Hence, the user could read the label on every point-of-view as shown in. Moreover, a chain of script cube was made by the connections of several script cubes. Such a connection occurred when many script cubes were placed near each other. Finally, a chain connected to the script header displayed a line with gradient color of yellow and red as shown in Fig. 4. Furthermore, the users could rearrange the order of the script cube in a chain. The Air Tap gesture was used to activate the script cube for executing the virtual robot operations such as rotating and picking.

Fig. 4.
figure 4

(a) A script cube (b) A chain of script cube (Color figure online)

3.5 Script Header

The script header was an action executor for a chain of script cube. The users could simulate the virtual robot operation by using Air Tap gesture to a small green hologram of “PLAY” above the script header. Then, the virtual robot performed the operation accordingly to a chain of script cubes. The sequence of the operation began from a script cube which was connected next to the script header and it ran until the end of a chain as shown in Fig. 5.

Fig. 5.
figure 5

Script header (Color figure online)

3.6 Recycle Bin

The recycle bin was used to destroy a script cube. During training, an unnecessary script cube could occur. The recycle bin was used to clean up an augmented environment by moving that unnecessary script cube into it as shown in Fig. 6.

Fig. 6.
figure 6

Recycle bin

3.7 General-Purpose Switch

The proposed system provided many functions that helped the users to clear many script cubes, show the tooltip, and execute a special script cube of driving the robot to the home position. To access those functions, the Air Tap gesture was used to turn the general-purpose switch composing of a label and a controlling handle to be on and off as shown in Fig. 7.

Fig. 7.
figure 7

Holograms of tooltip, clear script, and robot home

4 Coordinate Synchronization

The Unity engine supports the three dimensional coordinate system. The object in the Unity was described by the (x, y, z) position and the rotation about 3 axes. The HoloLens could track its position and rotation accordingly to the real world. Then, the Unity could read the HoloLens’ position and rotation to update the user’s head’s position and rotation in the Unity coordinate or augmented environment coordinate.

In the proposed system, multiple users were able to join the same augmented environment and work place. So, each HoloLens had to synchronize its coordinate to the other one’s coordinate. The augmented environment had its own coordinate system which started at the (0, 0, 0) position. To synchronize multiple HoloLenses, a simple solution was to put all HoloLenses at the same position and rotation during starting the proposed system as shown in Fig. 8. Hence, all HoloLenses’ coordinates were mapped together and all users were able to interact with the same augmented environment accordingly to the same work place.

Fig. 8.
figure 8

Starting position

5 Experimental Results

To evaluate the proposed system’s performance and usability, some experiments were conducted. The system performances such as frame rate was tested by the researcher. To check the usability, users were asked to participate with the proposed system as the trainees. First, the researcher, who was the trainer of this system, gave a brief about the proposed system’s overview to the users. Then, the users were trained by the trainer using the Microsoft’s Hologram application to operate the HoloLens using its recognized gestures, Air Tap, Tap and Hold, before starting the experiment. Next, experiments were conducted following the below topics. Finally, the feedback was given by the users via the USE Questionnaire [24].

5.1 Experimental Setup

In the experiments occured in the Human-Computer Interface (HCI) Lab, all participants, trainer and trainees, were students at the Institute of FIeld roBOtics, King Mongkut’s University of Technology Thonburi. All devices consist of two HoloLens and a PC server connected to a dual-band WiFi router. The HoloLens was connected by 5.0 GHz wireless connection at 120 Mbps and a server was connected by ethernet connection at 100 Mbps. This configuration of the experimental setup was performed by the following below topics.

5.2 System Performance

Frame Rate

To display the holograms, the system had to make the user feel like the holograms were in the real world led to the fast graphics rendering. Normally, the rendering rate that gives the best experience for the user should be at least 60 fps [25]. In the proposed system, the script cube spawned by the script generator consumed the system resource and reduced the rendering performance. Hence, one experiment was set to measure the frame rate of rendering according to the number of the script cube. The frame rate was captured by the windows device portal over Wi-Fi [26] which covered 5 s before and after the spawning as shown in Fig. 9. The system spawned a script cube every 0.25 s while the position of script cube was randomized with radius of 1 m around the script header. The spawning ran until the number of script cube reaching to 30. The expected minimum frame rate was 24 FPS [27].

Fig. 9.
figure 9

Frame rates during spawning the script cubes

5.3 Usability

Gesture

In the proposed system, Tap and Hold gesture was mostly used by the user. So, one experiment was set to evaluate the capability of interaction with the holograms by Tap and Hold gesture. The number of user’s interactions with the hologram was compared with the number of detected interactions by the system. Nine holograms were spawned in the form of 3 rows by 3 columns. The spawned hologram disappeared when it was moved far from the original position. The users were asked to move the spawned holograms until it disappeared. The number of Tap and Hold gesture was counted by the researcher. The number of Tap and Hold gesture performed by the users was expected to be 9 times according to the number of the spawned holograms (Table 2).

Table 2. Comparison of the Tap and Hold gesture counted by the reseacher and the system

Basic Virtual Robot Operation

The virual robot operation was the core in this proposed system. One experiment was set to evaluate the ease of use, ease of learning, and satisfacation on the virtual robot operation. First, the trainer gave a brief about the proposed system. Secondly, the trainee was trained by the trainer to operate the virtual robot. Then, a specific task was given by the trainer to the trainee to operate the virtual robot to pick a virtual object from point A to point B as shown in Fig. 10. Finally, the feedback was given by the users via the USE Questionnaire (Tables 3, 4 and 5).

Fig. 10.
figure 10

Trainer’s view during the virtual robot operation

Table 3. Result of the ease of use questionnaire (−3 = Strongly disagree, 3 = Strongly agree)
Table 4. Result of the ease of learning (−3 = Strongly disagree, 3 = Strongly agree)
Table 5. Result of the satisfacation (−3 = Strongly disagree, 3 = Strongly agree)

6 Conclusions and Future Work

This research proposed the utilization of augmented reality for robot training system. Several holograms were implemented to help multiple users to do On-the-Job Training related to virtual robot operation. Microsoft HoloLens was used to display the interactive holograms which mimiced the behaviors of the real objects, especially the UR5 robot. The system core, which covered chain of script cube, script generator, script header, recycle bin, general-purpose switch, coordinate synchronization, was developed at the Human-Computer Interface (HCI) Lab, Institute of FIeld roBOtics, King Mongkut’s University of Technology Thonburi. The system performance and usability were tested. The result of the experiments showed that the frame rate was decreased from 60 to 30 fps after spawning because several script cubes began the connections among them. So, 30 script cubes still maintained the frame rate higher than the expected minimum frame rate. The error from Tap and Hold gesture was about 22.22% which was acceptable for the operation of the proposed system. Most of users joining the experiments had no experience with HoloLens and robot operation. The result of the virtual robot operation showed that the users strongly agreed that the proposed system was user friendly, easy to learn to use, and fun to use because its holograms provided the natural interaction for the users. In addition, the users slightly agreed that the proposed system was used without written instructions, learned to be used quickly, and users were satisfied with the system because most of users had no experience with holograms and robot operation and they suggested that some parts of user interface needed to be improved.

In the future work, some parts of user interface need improvement according to feedback from the users. In addition, the URScript could be integrated into the proposed system to control the real robot via socket connection and augmented reality. Hence, the real robot can operate accordingly to the virtual robot operation. Furthermore, the spatial mapping needs to be investigated to improve the coordinate synchronization of the proposed system.