A Unified Framework for Remote Collaboration Using Interactive AR Authoring and Hands Tracking

Yu, Jeongmin; Jeon, Jin-u; Park, Gabyong; Kim, Hyung-il; Woo, Woontack

doi:10.1007/978-3-319-39862-4_13

Jeongmin Yu¹⁵,
Jin-u Jeon¹⁵,
Gabyong Park¹⁵,
Hyung-il Kim¹⁵ &
…
Woontack Woo¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9749))

Included in the following conference series:

International Conference on Distributed, Ambient, and Pervasive Interactions

2447 Accesses
3 Citations

Abstract

In this paper, we present a unified framework for remote collaboration using interactive augmented reality (AR) authoring and hand tracking methods. The proposed framework enables a local user to organize AR digital contents for making a shared working environment and collaborate multiple users in the distance. To develop the framework, we combine two core technologies: (i) interactive AR authoring method utilizing a smart input device for making a shared working space, (ii) hand-augmented object interaction method by tracking two hands in egocentric camera view. We implement a prototype of the proposed remote collaboration framework for testing its feasibility in an indoor environment. To the end, we expect that our framework enables collaboration as feeling a sense of co-presence with remote users in a user’s friendly AR working space.

You have full access to this open access chapter, Download conference paper PDF

Geometry-Aware Interactive AR Authoring Using a Smartphone in a Wearable AR Environment

Gesture-based interactive augmented reality content authoring system using HMD

Article 12 January 2016

Exploring Annotations and Hand Tracking in Augmented Reality for Remote Collaboration

Keywords

1 Introduction

Augmented reality (AR) is technology that enables users to close in supplementary information by seamlessly mixed with virtual objects in the real world [1]. Using this, the users can be worked with digital virtual elements and guided some needed directions. These useful information can be displayed in various devices such as mobile phones, PDA, head mounted display (HMD), and high performance PCs. AR technology is applied to various fields such as interactive games, education, military, gallery/exhibition, and repair/maintenance [2].

For the past few decades, AR applications have been mainly developed for only one user in the manner of one-way interaction with 3D virtual objects [5–7]. Even though they gives useful and interesting experience to the user, they do not provide experience of interaction and collaboration with other users. Recently, HMD-based remote collaboration systems have been developed to collaborate a shared target work with remote users [3, 4]. Unlikely existing remote collaboration systems [9, 10], these systems enable spatially un-limited interactions and give a sense of co-presence to the local user. However, these systems not only provides the confined simple interactions (e.g., flipping, grasping) by tracking a bare hand, but also provides a manually user-defined working environment to users.

Meanwhile, many researchers have been studied on AR authoring systems for easily handling AR digital contents to users. For instance, [11, 12] have been attempted to AR authoring on mobile device. [11] shows interaction with AR contents using multi-touch interface of smart device. [12] presents an AR authoring method for unknown outdoor scene using mobile devices. However, because they do not generate a 3D map using depth sensors, they are unsuitable to register virtual digital contents on indoor environment. On the other hand, Project Tango [13] is a mobile authoring device that builds a 3D map of unknown indoor scene using a depth sensor. However, this system has some cumbersome points that a user spreads own arms enduringly during performing and sees the augmented spot through a narrow mobile device display.

In this paper, to settle above mentioned shortcomings, we present a novel HMD-based remote collaboration framework using interactive AR authoring and hand-augmented object interaction technologies. To develop the proposed framework, we integrate two main technologies which are interactive AR authoring with a wearable smart device (e.g., smartphone) for making a shared working space, and hand-augmented object interaction by tracking two bare hands in egocentric camera view. Through the proposed system, the local user can author his/her own working space easily without any professional programming skills [8], and collaborate remote users through intuitive interactions between tracking two hands and augmented objects. Through a preliminary prototype system implementation, we confirm its feasibility as a future remote collaboration platform. We expect that the proposed system can be applicable to many AR collaborative applications such as medical surgery education, urban planning, games and so forth.

The remainder of this paper is organized as follows. The proposed framework is presented in Sect. 2. In Sect. 3 introduces the initial implementation and preliminary result of the proposed framework. Lastly, the conclusions and outline plans for future works are presented in Sect. 4.

2 Proposed Framework

2.1 Overall Framework

Figure 1 shows the proposed overall system diagram of HMD-based remote collaboration. For this system, we use a smartphone for AR digital contents authoring, and use an egocentric short-range RGB-D camera and a wearable sensor (e.g., smartwatch) for accurate two hands tracking, and use an exocentric RGB-D camera for full-body tracking. For AR authoring, we use the positions, rotations and touch directions information from smartphone. For hands-augmented object interaction, we first segment bare hands from the egocentric camera. Then, hands and fingers are tracking based on a model fitting method. After registration between virtual and real hands, we can interact with a 3D augmented object for performing a shared target task. The detail methodological descriptions of interactive AR authoring and hands-augmented object interaction are presented as follows.

2.2 Interactive AR Authoring

Figure 2 shows the pipeline of proposed interactive AR authoring system. We first compute the initial local reference coordinates of a target working space, and then these local reference coordinates transformed by the obtained simultaneous localization and mapping (SLAM)-based coordinates. AR digital contents/objects are placed in the transformed local coordinates.

Figure 3 shows the concept of our AR authoring system. Before working remote collaboration system, we organize a shared AR working space where the local and remote users perform a target task.

2.2.1 Local Reference Coordinate System

To generate local reference coordinate system, we first select an original point and find their rotation coordinate system by analyzing 3D point clouds acquired from RGB-D camera. Then, a user choose regions of interest (RoI) in a scene using mobile input device. The RoI is detected a circular with a radius of 50 pixels. After a selecting RoI, we estimate the planes using point clouds of RoI. The plane of parameters π _i (a,b,c,d) are estimated by RANSAC method [14] as follows:

$$ \uppi_{i} = \mathop {\text{argmin}}\limits_{a,b,c,d} \sum\nolimits_{l}^{N} {\frac{{|ax + by_{l} + cz_{l} + d|}}{{\sqrt {a^{2} + b^{2} + c^{2} } }}} , (x_{l,} y_{l} ,z_{l} ) \in RoI. $$

(1)

We assume that the maximum number of planes of RoI is three. It is possible that finding the original point of local reference coordinate system is expressed as a linear least squares problem.

$$ p_{center} = \mathop {\text{argmin}}\limits_{{x_{i} , y_{i} , z_{i} }} \left\| {\left[ {\begin{array}{*{20}c} {a_{1} } & {b_{1} } & {c_{1} } \\ {a_{2} } & {b_{2} } & {c_{2} } \\ {a_{3} } & {b_{3} } & {c_{3} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ {z_{i} } \\ \end{array} } \right] - \left[ {\begin{array}{*{20}c} {d_{1} } \\ {d_{2} } \\ {d_{3} } \\ \end{array} } \right]} \right\|^{2} , $$

(2)

The point which has the minimum sum of squares of distance among planes will be selected as the original point of local reference coordinate system. The directions of three axis are expressed with three intersection lines on planes.

2.2.2 Adjusting SLAM Coordinates to Local Coordinates

Based on [15], we estimate a camera pose and build a 3D point map in an unknown scene. It is necessary to adjust SLAM-based coordinates system to local reference coordinate systems for seamless registration with virtual object in real space. For this, we calculate two relations for adjustment process. First relation is a scale unit. The scale parameter of SLAM-based coordinates system is randomly selected in the initialization stage. This scale parameter should be replacement to a real scale unit. Without refining the scale parameter, the users couldn’t register virtual object to the position they want. The scale ratio parameter λ is calculated using distance from camera position to starting point of each coordinate systems. The depth of RGB-D camera is presented with a meter scale unit.

$$ \lambda = \frac{distance\;from\;camera\;to\;origin\;in\;virtual\;scale\;units}{distance\;from\;camera\;to\;origin\;in\;real\;scale\;units}. $$

(3)

Second relation is translation matrix $ P_{local, n} $ which transforms the points of SLAM coordinates to local coordinates at n_th frame. We first calculate the initial matrix R that represents transform between coordinates, and then we compute the motion matrix $ M_{n} $ for each frame. This matrix represents an accumulated camera motion from the initial frame to the current frame.

$$ M_{n } = M_{n - 1 } \times \ldots \times M_{1 } \times M_{0 } . $$

(4)

$ P_{SLAM,n} $ which is the matrix of transforming points from world coordinates to SLAM based coordinates is computed by motion matrix and $ P_{SLAM,0} $.

$$ P_{SLAM,n} = M_{n - 1 } \times P_{SLAM,0} . $$

(5)

Matrix $ P_{local, n} $ can be expressed by the following equation:

$$ P_{local,n} = \frac{1}{\lambda } \times M_{n - 1 } \times {\text{R}} \times M_{n - 1}^{ - 1} \times P_{SLAM,n} , n \ne 0. $$

(6)

If we obtain this initial coordinates, we can apply relation matrix R, motion matrix M and the scale unit parameter to it. The translation matrix $ P_{local,n} $ provides an augmented space to matching real space.

2.2.3 3D Contents Authoring

The shared common working space is built by smartphone gestures such as tap, pinch, and rotate. The smartphone device is better than user’s bare hand as the input device, because it enables delicate arrangement of virtual object in real space.

2.3 Hands-Augmented Object Interaction

Tracking two hands is important for natural interaction with virtual objects. There are two main approaches for this. [16] utilizes a generative method to track full articulations of two hands. The generative method has an advantage with respect to good generalization and continuous solution. However, it has a weakness that the solution falls easily into local minima if the solution is not good in previous frame. [17] utilizes a discriminative method to detect full articulations of two hands. This method has an advantage that the solution in the present frame is not affected by the solution of previous frame. So, it can detect full articulation of two hands in single frame. However, it gives a discrete solution and tends to the overfitting on training data. To complement the weakness of each method, our method utilizes both of the generative and discriminative methods.

2.3.1 Hand Feature Extraction

The proposed method utilize a convolutional neural network (CNN) with heterogeneous input devices for hand-virtual object interaction which illustrated in Fig. 4. As the used input devices, we are a RGB-D camera and an IMU sensor. The hand image is parsed into a normal deep network with convolution and pooling layer. The activation function is used as a rectified linear unit. The feature map obtained from last pooling layer is unified with 3 DoFs data from the IMU sensor. The remaining layers is fully connected layer so that we can unify the two heterogeneous data. Consequently, we get some heat maps that detect the position of joints with the highest probability.

2.3.2 Hand Pose Estimation

To estimate full articulations of the hands, we adopt two optimization schemes (See Fig. 5). The input datum are the segmented hand images and the heat maps generated from the proposed feature extraction algorithm. First, the inverse kinematics (IK) optimizer is conducted. This algorithm has good advantage about fast convergence. The designed objective function Eq. (7) calculates the error between fingertip and target position so that it find the parameter of articulations. The J is Jacobian matrix and $ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {e} $ is a vector from source to target, $ \Delta \theta $ is the variance of joint parameter. However, if the feature extraction is not accurate in some case, IK algorithm would fail.

$$ {\text{E}}_{1} = \left\| {J\Delta \theta - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {e} } \right\|^{2} + \lambda \left\| {\Delta \theta } \right\|^{2} . $$

(7)

To overcome this problem, the particle swarm optimization (PSO) method is employed. This is conducted only when the solution is not satisfied by a threshold. The objective function Eq. (8) is to measure the discrepancy between observation and hand model. $ O_{i} $ is 3D point in observation and $ M_{\text{i}} $ is 3D point in model. $ w_{i,j} $ is the weight between model and observation.

$$ {\text{E}}_{2} = \sum\nolimits_{i} {\sum\nolimits_{j} {w_{i,j} } } \left\| {O_{i} - M_{j} } \right\|_{2} $$

(8)

3 Implementation

3.1 Hardware Configuration

We configure our prototype system using commodity devices. Our system consists of a computing unit (PC) for computation, a video see-through HMD (HMD and stereoscopic RGB camera) for visualization, a near-range depth sensor and a smartwatch for bimanual hand tracking, a smartphone for AR authoring. We additionally use exocentric body tracker for body tracking.

For a video see-through HMD, we use Oculus Rift DK2 and attach Ovrvision stereoscopic RGB camera. Oculus Rift DK2 supports position and rotation tracking by external HMD tracker. For a near-range depth sensor, we use a Creative Senz3D. We use a Samsung Gear Live for smartwatch. Finally, we used a Microsoft Kinect v2 for body tracker.

3.2 Software Configuration

We implement the initial prototype in Unity Engine [18]. Figure 6 illustrates the components and their relationship of proposed system.

For interactive AR contents authoring, we use the positions, rotations and touch directions information which comes from a smartphone input device. With these information, a user can organize his/her AR working space and share the space with remote users.

For hands-augmented object interaction and collaborations with remote users, we utilize a near-range depth sensor and a smartwatch for bimanual hand tracking. Bimanual hand tracking result is used for virtual object manipulation. Also, user’s bimanual hand posture information is combined with body pose information from body tracker, and generates combined body-hand pose information is sent to the remote space through network in real-time. At the same time, remote user’s combined body and hand pose information is received in real-time, and is used for manipulating avatar movement.

We also utilize point cloud from a near-range depth sensor, to generate occlusion mask mesh which is used for enhancing user’s depth perception between hand and virtual object. Then, final virtual scene is merged with real world view, and HMD displays virtual-real combined image.

3.3 Initial Result

Figure 7 shows initial result of our prototype of HMD-based collaboration system. Remote user is summoned to local user’s space as a virtual avatar, and both users use bimanual hand gesture to interact with virtual objects. Unlike previous collaboration system [3], our system supports two hands interaction with virtual objects by tracking their hands. Furthermore, after integration with AR authoring method, our system can enable a local user to organize a user-friendly working space without any professional programming skills.

4 Conclusions and Future Works

In this paper we have presented a novel unified framework for HMD-based remote collaboration using interactive AR authoring and two hands tracking, which enables a local user to organize a user-friendly working space without any professional programming skills, and collaborate physically remote users through intuitive hands-augmented object interactions. Preliminary implementation result shows its strong possibility as a future remote collaboration platform. We expect that the proposed framework can be applicable to many AR collaborative applications such as urban planning, games, medical surgery education, and so on.

As the future works, we plan to the development of two hands tracking with wearable sensor and integration AR authoring and collaboration system.

References

Azuma, R.: A survey of augmented reality. Presence 6(4), 355–385 (1997)
Article Google Scholar
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B.: Recent advances in augmented reality. IEEE Comput. Graph. Appl. 21(6), 34–47 (2001)
Article Google Scholar
Noh, S., Yeo, H., Woo, W.: An HMD-based mixed reality system for avatar-mediated remote collaboration with bare-hand interaction. In: Eurographics-ICAT-EGVE (2015)
Google Scholar
Jo, D., Kim, K., Kim, G.: SpaceTime: adaptive control of the teleported avatar for improved AR tele-conference experience. CAVW 26, 259–269 (2015)
Google Scholar
Ha, T., Billinghurst, M., Woo, W.: An interactive 3D movement path manipulation method in an augmented reality environment. Elsevier Interact. Comput. 24(1), 10–24 (2012)
Article Google Scholar
Ha, T., Feiner, S., Woo, W.: WeARHand: head-worn, RGB-D camera-based, bare-hand user interface with visually enhanced depth perception. In: IEEE ISMAR, pp. 219–228 (2014)
Google Scholar
Jang, Y., Noh, S., Chang, H., Kim, T., Woo, W.: 3D finger CAPE: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans. Vis. Comput. Graph. 21(4), 501–510 (2015)
Article Google Scholar
Wang, Y., Langlotz, T., Billinghurst, M., Bell, T.: An authoring tool for mobile phone AR environments. In: Proceedings of New Zealand Computer Science Research Student Conference, pp. 1–4 (2009)
Google Scholar
Higuchi, K., Chen, Y., Chou, P., Zhang, Z., Liu Z.: ImmerseBoard: immersive telepresence experience using a digital whiteboard. In: CHI (2015)
Google Scholar
Beck, S., Kunert, A., Kulik, A., Froehlich, B.: Immersive group-to-group telepresence. IEEE Trans. Vis. Comput. Graph. 19(4), 616–625 (2013)
Article Google Scholar
Jung, J., Hong, J., Park, S., Yang, H.: Smartphone as an augmented reality authoring tool via multi-touch based 3D interaction method. In: Proceedings of the 11th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, pp. 17–20 (2012)
Google Scholar
Langlotz, T., Mooslechner, S., Zollmann, S., Degendorfer, C., Reitmayr, G., Schmalstieg, D.: Sketching up the world: in situ authoring for mobile augmented reality. Pers. Ubiquit. Comput. 16(6), 623–630 (2012)
Article Google Scholar
https://www.google.com/atap/project-tango/
Fischler, M.A., Robert, C.B.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Klein, G., David, M.: Parallel tracking and mapping for small AR workspaces. In: ISMAR (2007)
Google Scholar
Oikonomidis, I., Lourakis, M., Argyros, A.: Evolutionary quasi-random search for hand articulations tracking. In: CVPR (2014)
Google Scholar
Rogez, G., Khademi, M., Supančič III, J.S., Montiel, J.M.M., Ramanan, D.: 3D hand pose detection in egocentric RGB-D images. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 356–371. Springer, Heidelberg (2015)
Chapter Google Scholar
http://unity3d.com/

Download references

Acknowledgements

This work was supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2015M3A6A3073 746, NRF-2014R1A2A2A01003005).

Author information

Authors and Affiliations

KAIST UVR Lab, Daejeon, South Korea
Jeongmin Yu, Jin-u Jeon, Gabyong Park, Hyung-il Kim & Woontack Woo

Authors

Jeongmin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jin-u Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Gabyong Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyung-il Kim
View author publications
You can also search for this author in PubMed Google Scholar
Woontack Woo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Woontack Woo .

Editor information

Editors and Affiliations

Smart Future Initiative, Frankfurt, Germany
Norbert Streitz
Eindhoven University of Technology, Eindhoven, The Netherlands
Panos Markopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J., Jeon, Ju., Park, G., Kim, Hi., Woo, W. (2016). A Unified Framework for Remote Collaboration Using Interactive AR Authoring and Hands Tracking. In: Streitz, N., Markopoulos, P. (eds) Distributed, Ambient and Pervasive Interactions. DAPI 2016. Lecture Notes in Computer Science(), vol 9749. Springer, Cham. https://doi.org/10.1007/978-3-319-39862-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-39862-4_13
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39861-7
Online ISBN: 978-3-319-39862-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics