MobiEye: turning your smartphones into a ubiquitous unobtrusive vital sign monitoring system

Abstract

Recent advances in mobile and wearable technologies have stimulated significantly growing demands for more affordable, user-friendly, pervasive healthcare solutions that can be adopted by the public to proactively manage their health conditions and alleviate the burdens of hospitalization. This study seeks to propose a personalized ubiquitous health monitoring system that can unobtrusively monitor individuals’ vital signs, anywhere at any time. The proposed MobiEye framework makes use of the regular camera available on any smartphones or tablets to record various most important physiological signals without the need for acquiring extra specialized medical devices or attaching any sensor to the body. Through recording the reflected light intensities corresponding to the subtle blood flow changes with blood volume pulses, the proposed technique accurately extracts blood volume pulses from the facial videos recorded in real-world scenarios with the designed protocol. Experiments show that the proposed system achieved \(96 \, \% \,\) accuracy on average (with the standard deviations of \(\pm 1.2\)) for the heart rate estimation and higher correlations between the pulse transit time and the reference systolic blood pressure \((mean \, r = 0.89,\, SE= 0.05)\).

Introduction

Recent advancements in wearable technologies have opened a new era in smart and connected healthcare devices to address the demanding needs of the growing population in clinical settings and specifically in home environments. Given the substantial technological innovations, numerous opportunities have emerged to develop miniaturized wearable devices at a lower cost. Smart health monitoring systems have great potential to reduce health care costs and to effectively use the available resources by reducing false alarms as well as unnecessary hospitalizations. Furthermore, it can significantly enhance the pervasive personalized healthcare in home environments in terms of early detection of life-threatening events, continuous monitoring and assessment of chronic diseases, and at-home rehabilitation.

Fig. 1
figure1

Health monitoring in the clinical setting and examples of physiological signals

The traditional clinical-oriented healthcare has been moving towards patient-centered, personalized health care solutions. The standard vital sign monitoring systems shown in Fig. 1a are neither affordable nor accessible to be set up at home for individual health monitoring because of several prohibitive limitations such as the need of a professional for data interpretation and the size, cost, and maintenance of the sophisticated machinery. On the other hand, smart health monitoring devices can provide important information about the vital signs, including heart rate (HR), blood pressure (BP), and respiration rate (RR) at a much lower cost and with minimal calibration efforts. They can also provide better communications for patients in remote areas such that professionals can examine patients remotely and provide clinical consultations (as illustrated in Fig. 1c). With the increasing awareness of health among the entire society, systems are shifting towards the proactive prevention of possible risks rather than the diagnosis and treatment afterward.

Continuous vital sign monitoring systems are becoming more and more popular, considering the importance of maintaining a healthy lifestyle as well as preventing major health issues. Some critical vital signs used by medical professionals and healthcare providers in routine check-ups include HR, BP, RR, and Body Temperature. The other vital signs that are usually monitored for detailed diagnosis may include Heart Rate Variability (HRV), Oxygen Saturation Level (\(\hbox {SpO}_{2}\)), Blood Glucose, and pain. It has been well acknowledged that Downey et al. (2018), early detection of life-threatening events is possible as subtle changes in patient’s vitals are usually present 8 to 24 hours before. Based on these detectable changes, healthcare providers can give appropriate treatment and take precautionary actions to prevent clinical deterioration.

Clinical vital sign monitoring systems use electrocardiograph (ECG) and photoplethysmograph (PPG) devices for monitoring cardiovascular activities (Fig. 1b). ECG is usually monitored by placing electrodes over certain locations on the body, such as the chest, arms, and legs. With respect to the standard procedures for accurately measuring cardiovascular activities, conventional ECG methods have some limitations including (1) electrode patches attached to the body can be uncomfortable because of the sticky adhesive; (2) skin irritation may be caused in a relatively long period of measurement; (3) patient physical activities are constrained. PPG is an alternative method to monitor cardiovascular health by measuring the blood volume pulse (BVP) at peripheral arterial regions. The morphological properties of the pulse waveform represent the arterial contraction and dilation monitored by the autonomic nervous system (ANS). The signal of interest for extracting vital signs is BVP changes in microvessels underneath the skin in certain regions, e.g., finger, toe, earlobe, and facial area due to cardiac cycles. PPG is traditionally measured with the pulse oximeter, a device attached to superficial arterial regions, and is connected to the measuring device with or without cables for interpretations.

Camera-based PPG monitoring is an emerging technique discussed in many recent studies Poh et al. (2010), Kwon et al. (2012), Lewandowska et al. (2011), Wieringa et al. (2005). This technique aims to extract the PPG from the recorded videos of an individual’s skin, such as the face, forehead, palm, etc. Prior research on camera-based non-contact vital sign monitoring has proved the effectiveness of this technique with considerable accuracy, compared with the conventional contact electrode techniques. With comparatively light setup and minimal calibration, camera-based approaches have great potentials in home-based healthcare scenarios. Besides, the non-contact measurement largely reduces the restrictions on body motions and improves the user experience. Unfortunately, the magnitude of the PPG signal is significantly low because the peripheral arterial structure underneath the skin contains 2–5 % of total blood volume, and only  5% of it responds to the cardiac activities resulting in changes in blood volume amplitude (BVA). Concerning the non-contact extraction method for vital signs, the signal strength is also greatly reduced because of the separation between the ROI and the camera. Quantifying the noises in the signal of interest is crucial for extracting valid PPG because of the morphological variability due to different sources. In summary, camera-based vital sign monitoring has few challenges, such as low strength of the signal of interest, motion artifacts, constant changes of ambient environments, natural minute body movements such as movements due to RR and HR.

To provide the general public a low-cost and convenient way to manage their health proactively, in this study, we propose a mobile-based, ubiquitous and unobtrusive health monitoring system, named MobiEye, which possesses the advantages of ease of use, low cost, no need of professionals, and minimum calibration and maintenance. To overcome those long-standing challenges associated with the camera-based PPG monitoring, we propose a Laplacian pyramid based method to enhance the low magnitude PPG signals efficiently. A novel signal processing flow is presented to extract the PPG from the enhanced video frames, which is further used to estimate the clinically significant heart rate and blood pressure information. To evaluate the effectiveness and robustness of the proposed algorithm in real-life scenarios, videos were recorded under generalized conditions, i.e., with the cameras on different devices (including laptops, smartphones, and tablets) in multiple common device handling positions in varying surrounding light conditions from a large population set.

This paper is organized as follows: Sect. 2 gives a brief introduction to the state-of-the-art research efforts in non-contact vital signs monitoring. Section 3 introduces our proposed framework with the description of each element. Section 4 provides detailed experimental protocol and experimental settings for video recordings. Results are demonstrated in Sect. 5, followed by discussion and conclusion in Sects. 6 and 7, respectively.

Related work

The word photoplethysmography comes from the Greek word “plethysmos” describing the blood volume changes in certain parts of the body. With each cardiac cycle, blood is released from the ventricles by contracting the heart muscles, and the pressure pulse travels through the circulatory system. This pressure pulse induces changes in BVA, resulting in periodic changes in proportion to the heart rate measured by the PPG technique. Two types of PPG techniques are usually used, namely transmission mode and reflectance mode. In general, the PPG device has the transmitter that transmits the infrared (IR) light and the receiver that monitors the changes in light due to blood flow. The transmission mode PPG device measures the changes in transmitted light through body structure, and the reflectance mode PPG device monitors the changes in backscattered light from the measuring area. The non-contact PPG monitoring system is primarily based on the reflectance mode PPG technique, where the measuring area is monitored with different modalities, including the Doppler technique, microwave Doppler radar, thermal imaging, and piezoelectric measurements, etc. This section explores the previous work on non-contact PPG extraction techniques with various devices.

Doppler radar techniques have been used in many applications such as motion tracking, speed measurements Li and Lin (2010), and implemented for measuring small movements due to heartbeat and respiration activities. Early studies showed the potential importance of distant measurement of the vital signs in different applications Lin (1975), such as life detection in a natural disaster. A low-intensity microwave beam was used in early radio frequency (RF) based vital monitoring systems Chen et al. (1986). The subject was exposed to low-intensity microwave signals, and the modulations in backscattered signals due to heartbeats and respiration were extracted with the microwave receiving system. Thermal imaging technique has been used extensively in remote sensing of vital signs Sun et al. (2005), Chekmenev et al. (2005). The periodic changes in the blood flow create time-varying thermal patterns proportional to cardiac activities, and continuous breathing generates temperature variations in facial thermography. It was intended to be used for non-contact vital sign monitoring as well as for distant identification Chekmenev et al. (2005). Another study Sun et al. (2005) used a highly sensitive thermal imaging system focusing on major superficial vessels to extract changes in emitted thermal signals. The Fast Fourier Transform (FFT) was applied to the selected lines of interest to evaluate the modulation frequency, which was further quantified for pulse rate measurement. The several application areas of this system included vital sign monitoring in sports training, cardiovascular disease, sleep monitoring, etc. Another application of the thermal IR imaging method was to monitor the breathing rate Jin and Ioannis (2010). These early studies have provided a substantial basis for distant vital sign monitoring. However, the diverse instrument adopted in these studies is not ideal for low-cost and minimum calibration settings.

Table 1 Comparison of selected state-of-the-art non-contact health monitoring systems

An RR and HR monitoring system based on visible light sensing (VLS) Abuella and Ekin (2018) included the photodetector sensor that measures the power spectrum of reflected signals from the subject’s body. Compared with the standard pulse oximeter, the proposed system achieved 94% accuracy for RR and HR measurements. However, such a system needs to be validated more comprehensively for a large group of subjects, varying distances between the subject and the device, and for the test environment with constantly changing light conditions. A comprehensive system called “Vital-SCOPE” Sun et al. (2018) included multiple sensors, reflective photosensor, medical radar, and thermopile for measuring different vital signs simultaneously. The system measured pulse rate, body temperature, and RR within 10 s of measurement. Although this study proposed multiple vital sign measurements on a single platform, it used several contact sensors that restrict the subject’s movements.

In recent years, optical techniques have advanced to camera-based vital sign monitoring. The changes in blood volume and its pulsate nature were explained as a difference between the absorbance coefficients of the arterial blood and the surrounding tissue structure Aoyagi (1974). This made a substantial basis for novel optoelectronic devices for non-invasive measurement of PPG. A monochrome complementary metal-oxide-semiconductor (CMOS) camera-based arterial \(\hbox {SpO}_{2}\) level measurement was proposed to extract the two dimensional spatially distributed PPG at different wavelengths (660 nm, 810 nm, 940 nm) Wieringa et al. (2005). The significant correlation between the signals extracted with the proposed technique and the ground truth proved the potential of camera-based techniques for \(\hbox {SpO}_{2}\) measurements. The early research on smartphone-based pulse rate monitoring was explained in Kwon et al. (2012). The front-facing camera of the smartphone was used to record videos from 10 subjects and further analyzed with independent component analysis (ICA) Poh et al. (2010) method for extracting HR. Monkaresi et al. (2013) simulated real-world challenges to validate the ICA Poh et al. (2010) based technique as well as proposed an improved approach for HR monitoring. This study was performed on the different datasets recorded under realistic environments, e.g., data recorded at rest conditions, data recorded in simulated human-computer interaction (HCI), and data recorded in the physiological workout environment. Two different approaches were used to extract the HR, namely ICA-based technique and machine learning (ML) technique. Further, the linear regression model and k-nearest neighbor (kNN) technique validated with the extracted features for ML. These studies have either used an external illumination source for video recordings or performed in a lab environment specifically for HCI or physiological workout without considering the light effects.

PPG has been extensively studied for BP estimation based on statistical modeling, pulse transit time (PTT), and ML techniques Jeong et al. (2006), Mousavi et al. (2019), Xing and Sun (2016). Systolic blood pressure (SBP), diastolic blood pressure (DBP), and mean arterial pressure (MAP) were estimated and compared against the standard data recorded invasively. A multilayer feedforward backpropagation technique was proposed Xing and Sun (2016) to estimate BP, using FFT based features instead of traditional time series features. A novel approach for BP estimation considered the external factors such as contact pressure between ROI and the sensor, as well as the temperature at the measuring site and demonstrated the relationship between BP, blood volume (BV), and cardiac output (CO) Jeong et al. (2006). Although these studies proposed different techniques for BP monitoring, the data was recorded with contact devices limiting the applications for continuous health monitoring.

Remote vital sign monitoring systems reported here were validated for multiple datasets. In addition to the aforementioned representative work, Table 1 gives an overview of the state of the art for vital sign monitoring systems. Although these studies have proposed different techniques for non-contact vital sign monitoring, they need to be further validated for many real-world challenges such as patient comfort, illumination level, and obstructive monitoring, etc. Our study has specifically considered and validated the proposed approach under a variety of distinct real-world conditions.

Proposed framework

Our proposed processing flow includes two primary stages: extraction of PPG from the selected facial region and estimation of vital signs. The overview of the MobiEye architecture is shown in Fig. 2 and described accordingly in this section.

Fig. 2
figure2

MobiEye architecture

PPG Extraction

As discussed earlier in the Introduction section, PPG extraction from the facial videos is challenging due to very low signal strength and high susceptibility to external noises. It is important to define an accurate geometrical model for the interaction of incident light with the structures such as human skin to understand BVP changes with the optical techniques. The spectral properties of the skin are highly dependent on the structural characteristics, such as the thickness and composition of each layer. The skin structure is made of three main structures/layers: stratum corneum composed of corneocytes Talreja et al. (2001), epidermis, and derma Barun et al. (2007). The absorption of light in the stratum corneum layer is relatively low with uniform transmitting properties Tuchin (2015). The epidermis layer is composed of four layers with absorption properties, mostly coming from melanin chromophore. The two layers of the dermis, namely papillary dermis and reticular dermis, are mainly composed of irregular and dense connecting tissues and blood vessels. The natural chromosome, hemoglobin present in the blood absorbs incident light. The amount of oxygenated hemoglobin varies between 90-95% in the arteries, and 47% of the hemoglobin in veins is oxygenated Angelopoulou (2001). The slight difference in absorption spectra between the oxygenated and deoxygenated hemoglobin establishes a technique to monitor the cardiac cycle with the optical method Baranoski and Krishnaswamy (2008).

The light reflection occurs at two main levels for human skin: surface specular reflection and sub-surface diffuse reflection Li and Ng (2009). The changes in the BVP directly affect the sub-surface reflection, which can be monitored by assessing the properties of the reflected light. The camera records an image as a two-dimensional function f(xy) in a pixel grid format with the amount of energy acquired over the exposure time by the camera sensor Fuchs et al. (2010) at each pixel located along the X and Y axes Gonzalez and Woods (2007). Mathematically, it can be simulated as the integration of flux and measuring kernel over time t.

$$\begin{aligned} f(x,y) = \int _{-\infty }^{\infty } \phi (x,y,t).k(t,x,y)dt \end{aligned}$$
(1)

where, \(\phi (x,y,t)\) is the flux approaching at pixel and k(txy) encodes the temporal response. The recorded image f(xy) can be characterized by two primary components, namely the magnitude of source illumination incident on the corresponding view and the magnitude of illumination reflected by the physical sources in the view. The reflections originating from different surrounding sources can be formularized as:

$$\begin{aligned} f(x,y) = I_s(x,y) I_e(x,y) \end{aligned}$$
(2)

where \(I_s\) represents the intensity from skin or ROI, and \(I_e\) shows the intensities from the surrounding environmental sources. Here, the surrounding light intensity of \(I_e\) is a noise.

The intensity reflected from the skin can be further simulated as the reflection from the skin surface \(I_{s1}\) and the reflection from the sub-skin surface \(I_{s2}\), respectively. The signal of interest for PPG extraction is the sub-surface reflections as it changes in proportion to the BVP. Considering the anatomy of the skin layer structure, most of the incident light gets absorbed, and the amount of the blood changes in proportion to cardiac activity/pulse wave is only \(\sim 5\%\) of the total blood volume, resulting in very minute changes in the reflection morphology. As these variations are inherently small, they are more vulnerable to any noise, including motion artifacts and surrounding light changes.

Image pyramid

The images are represented as 2-D intensity arrays with spatially correlated pixels (in both x and y), i.e., each pixel is dependent on or similar to the neighboring pixels. Because correlated pixels replicate the data, the 2-D image array contains redundant information. As most of the pixels are correlated, and the pixel intensity values can be predicted in reference to the neighboring pixels, the valuable information associated with each pixel is small Gonzalez and Woods (2007). To extract meaningful information, a 2-D intensity array should be transformed by reducing the redundancy among spatially as well as temporally correlated pixels. An image pyramid is a conceptually simple but effective technique to represent images at different spatial resolution levels Burt and Adelson (1983). The image pyramid consists of pictures with decreasing resolutions arranged in a pyramid structure. The base of the image pyramid contains high resolution or original photos, and the image resolution decreases as it moves up to higher levels.

To overcome the low signal strength challenge, image magnification Wu et al. (2012) technique is proposed to reduce the noise and improve the signal magnitude with accurate morphological representation. Image magnification technique utilizes the localized spatial sampling followed by band-pass filtering to reveal the subtle changes corresponding to cardiac pulses visually. The sub-band coding Gonzalez and Woods (2007) decomposes the image into multiple band-limited elements that can be processed independently and merged to restore an original image with the increased signal-to-noise ratio (SNR). The idea of sub-band amplification allows us to enhance and visualize the subtle temporal blood volume changes at each pixel location in the selected ROI. Here, we choose a multi-resolution analysis technique, a Laplacian pyramid Burt and Adelson (1983), for sub-band coding that decomposes spatio-temporal frame sequence with certain band-limited elements into several resolution levels. The construction of the Laplacian pyramid follows the architecture of the Gaussian pyramid but instead saves the residual of the blurred image from each level Shao et al. (2013). The lowest level of the Gaussian pyramid can be considered as the original frame. To construct the low-pass filtered images of subsequent levels, a Gaussian filter is convolved with the preceding level iteratively Ji et al. (2017) as follows:

$$\begin{aligned} G_l(x,y) = \sum _{x}\sum _{y}\omega (i,j)G_{l-1}(2x+i,2y+j) \end{aligned}$$
(3)

where l indexes the level in the Gaussian pyramid G, i.e., \(l\in \{1,2,3,\ldots ,L\}\), (xy) represents the pixel positions and \(\omega (i,j)\) represents the kernel. Each level in the Laplacian pyramid is an approximation of the disparities in Gaussian levels Shao et al. (2013). Moreover, each layer in the Laplacian pyramid is reconstructed by subtracting the preceding level in the Gaussian pyramid from the constructed subsequent level. The construction of L level Laplacian pyramid can be represented as:

$$\begin{aligned} P_l = {\left\{ \begin{array}{ll} expand(G_l) - expand(G_{l+1}) &{} \quad l<L \\ expand(G_l) &{} \quad l=L \end{array}\right. } \end{aligned}$$
(4)

where the function expand generates the expanded layer of a Laplacian pyramid P. The detailed description of the Gaussian and Laplacian pyramid construction is shown in Fig. 3.

Fig. 3
figure3

Image pyramid construction

Salient features segregated in each layer of the Laplacian pyramid at different scales were emphasized with the following operations. Each spatial band was temporally processed by applying a band-pass filter with the selected frequency range. The selected frequency range, which is 0.5–4 Hz, corresponds to the human pulse rate, i.e., 30 – 240 beats per minute (BPM). This range was further narrowed down in the signal processing step for subject-specific signal extraction. The temporal processing was carried out uniformly on each spatial level and corresponding pixels in each layer. As the subtle changes in BVP are very low in magnitude, the extracted filtered signal was enhanced by a factor \(\alpha\). An amplified image can be obtained by adding the magnified extracted signal to the original and accumulating all the spatial levels in an image pyramid.

The pre-filtered frames were further processed for face detection and ROI selection. The face detector used the Viola-Jones algorithm Viola and Jones (2001) along with Kanade-Lucas-Tomasi (KLT) algorithm Tomasi and Detection (1991) for tracking the set of feature points Shi and Tomasi (1993) across the selected face over time. The forehead region was chosen as ROI in the detected facial regions with the assumption that the forehead is the least affected region from any facial muscle activities such as smile, talk, eat, etc.

Signal Processing

The PPG was extracted in signal averaging step and further processed in the following steps: ideal band-pass filter, ICA, and signal smoothing. All the pixels across the selected ROI do not ensure pixel intensity change corresponding to BVP, so that it is beneficial to take the spatial average over the ROI. It reduces the ripples in an extracted signal due to inherent noises. The following equation represents the spatial averaging over the selected ROI.

$$\begin{aligned} I_c(n)=\frac{1}{\hat{X}.\hat{Y}}\sum _{x}^{\hat{X}}\sum _{y}^{\hat{Y}}r(x,y,c,n) \end{aligned}$$
(5)

where \(I_c(n)\) is an averaged pixel intensity over the selected ROI r in \(n^{th}\) frame. c represents the color channel, X and Y are the width and height of ROI, respectively. Video frames were spatially filtered on the human heartbeat range in the amplification stage that may comprise certain noises such as the body’s natural movements with respiration. To reduce such ripples, the averaged signal was band-pass filtered for narrowed subject-specific frequency range, e.g., 0.83–1.17 Hz for 50–70 BPM. The source signals are the linear mixture of latent variables with the unknown mixing architecture. The ICA technique Hyvärinen et al. (2004) was applied to separate the source signals from the random variables. To extract the PPG with accurate morphological representation, a moving average filter was applied. In summery, an image pyramid technique, along with the signal processing algorithm, was used to extract PPGs from the selected ROI.

Vital sign monitoring

HR detection

FFT was applied to PPG to obtain the power spectrum for HR monitoring. The signal was filtered for each subject on a specific frequency range in the signal processing step, so the resulting frequency associated with the highest power in the power spectrum can be considered as an accurate estimation of pulse frequency.

BP estimation

Several studies have shown a significant correlation between the pulse transit time (PTT) and BP. PTT is the time taken by a pulse wave to travel from the left ventricle to a distant monitoring location Mukkamala and Hahn (2017). Usually, PTT is monitored by evaluating the R peaks in ECG and the corresponding high peaks in PPG extracted from the finger, earlobe, or toe. Along with the ECG-PPG technique for PTT estimation, there are several other techniques proposed for estimating PTT, including PPG-PPG, ballistocardiography (BCG), electrical bio-impedance (EBI), and seismocardiogram (SCG). To address the limitations of current techniques, MobiEye proposes a reliable PTT estimation method from facial regions. Our proposed algorithm extracts PPG signals from different selected ROIs on the face and monitors the latency between the local maxima for PTT estimation. The selected measuring sites are shown in Fig. 4. The evaluated PTT is further used to estimate the user’s SBP characteristics. The proposed model needs to be calibrated based on the correlation between the PTT and the reference BP data. The inversion of PTT is essentially a pulse wave velocity (\(PWV=1/PTT\)), which relies substantially on the subject’s BP Mukkamala and Hahn (2017). Since PWV has been proven to be linearly proportional to BP Muehlsteff et al. (2006), the absolute blood pressure (ABP) estimation is refined based on their correlation coefficients \(K_1\) and \(K_2\), as the following equation:

$$\begin{aligned} BP=\frac{K_1}{PTT}+K_2 \end{aligned}$$
(6)
Fig. 4
figure4

PPG extraction sites (in blue) for PTT measurements

Fig. 5
figure5

a Recording position 1, b recording position 2

Experimental setup

A series of experiments were designed and conducted to validate the proposed approach for real-world challenges. The experimental settings in previous studies often used a fixed digital camera in front of the subject with strictly controlled surrounding illuminations. However, for a ubiquitous system heavily involved in people’s daily life, it is necessary to consider influential factors for the extraction of the subject’s vital signs in diverse real-world scenarios. The factors that can significantly affect the accuracy of MobiEye from the user end include but not limited to:

  • Camera resolution

  • Surrounding illumination

  • Distance between camera and measuring site

  • Recording device dimensions

  • Device holding angle/position

  • Camera frame rate

So far, very few studies have comprehensively considered the various factors from the user’s perspective while designing the experiments. We have proposed an experimental protocol based on a brief survey to simulate users’ preferences and behavioral habits. The survey was conducted with a set of questionnaires regarding:

  1. 1.

    different camera devices such as smartphones, tablets, virtual and augmented reality devices, e.g., “Which device are you using currently?”

  2. 2.

    device-handling positions, e.g., “How do you usually hold your smart devices?” “Does the size of the device (smartphone or tablet) affect the handling?”

  3. 3.

    distance between the device and face, e.g., “What is the most comfortable distance to hold the device?”

  4. 4.

    acceptance of continuous health monitoring, e.g., “How do you think of the importance of continuous health monitoring?”

21 participants were recruited randomly, and no particular screening was performed prior to the survey, as the protocol was intended to be as widely applicable (ubiquitous) as possible. The subjects who participated in the protocol survey were excluded from participating in data collection to avoid biased influence. Based on the survey, we selected three representative camera devices that were commonly used in daily life for video recording and the two most common device-holding positions (as shown in Fig. 5) to validate our algorithm for different factors. The specifications (i.e., resolution and frame rate) of selected devices are given in Table 2.

Table 2 Devices used for recording
Fig. 6
figure6

Representative examples of recording environments

The data were collected from 21 subjects with age ranging from 23 to 34 years (10 males, 11 females). All subjects were informed with the experimental procedure and the scope of the study prior to the data collection. No history of chronic diseases or abnormal cardiovascular activities were reported.

The reference PPG (ground truth) data was collected with the Shimmer Optical Pulse sensing probe connected through the Shimmer3 GSR+ unit (Shimmer Inc., Dublin, Ireland). The standard guidelines for BP measurements were followed, i.e., three measurements were taken before each video recording with the FDA-cleared automatic cuff based BP measurement device (Omron Wrist Blood Pressure Monitor). The average of BP readings was considered as the ground truth BP data. The same device was used to record the ground truth HR measurements.

Fig. 7
figure7

ROI before (first row) and after amplification (second row)

The video was recorded immediately after the recording of BP and HR ground truth, with reference PPG data recorded simultaneously. Each participant was seated at a desk for webcam video recording under ambient light. The video recordings with iPad and iPhone were carried out in two device handling positions (as shown in Fig. 5) considering different recording distances and angles between camera and face at random indoor and outdoor locations. In total, five videos of 35 s per each were recorded for each individual along with the ground truth PPG, HR, and BP readings. Some of the representative recording environments are shown in Fig. 6.

The goal of the proposed study is to provide an accurate estimation of vital signs in a minimum time window. So, the videos were recorded for 35 s with each device in selected recording positions, and the first and last few frames were excluded to minimize the motion artifacts. Participants were free to talk as well as to move their heads or bodies during the recording. The final video used for processing was 30 s long. Further processing was performed with MATLAB 2018b (MathWorks Inc., Natick, MA) on the desktop computer (Intel Core i7 CPU @ 3.40 GHz, 32 GB memory). All the experimental protocols were reviewed and approved by the Internal Review Board of Binghamton University.

In order to evaluate the influence of surrounding light conditions, we have conducted a pilot study on 5 subjects (3 males, 2 females). In total 5 videos with a length of 10–12 s were recorded from each subject in different indoor and outdoor surroundings. The ambient illumination was measured using a standard light meter (Dr. Meter LX1330B Digital Illuminance/Light Meter) along with HR data from a pulse oximeter (iHealth Air PO3).

Fig. 8
figure8

Average SNR of PPG extracted with image pyramid method and ROI averaging method

Fig. 9
figure9

PPG extraction after each step in the proposed algorithm

Results

To evaluate and validate the proposed approach, the quality improvements of extracted PPG and the performance of vital sign monitoring under different experimental conditions are discussed in this section.

PPG extraction

MobiEye has used the image magnification technique along with the signal processing technique to reduce redundant information in an image and to extract better PPG signal with the substantial morphological properties. The first row of images in Fig. 7 shows regular frames of selected ROI from the original video, and the second row represents the magnified images visualizing color changes along with the blood volume. The ROI averaging based method proposed by Poh et al. (2010) has been well proven by several studies to be effective for PPG extraction from facial videos. We evaluated the robustness of the proposed technique against the ROI averaging method in terms of SNR. SNR is obtained using a modified periodogram based on the Kaiser window. Figure 8 shows the average SNR values of PPG extracted with both techniques from different devices and operation postures. The significant increases in the SNR values illustrate that the image magnification and processing technique proposed in MobiEye can successfully reduce noises and extract PPG with substantially improved quality. Figure 9 illustrates the PPG extracted after the implementation of each technique in the proposed algorithm. It is seen that the raw PPG signal has been significantly enhanced after applying image magnification and further improved with the implementation of signal processing steps.

Fig. 10
figure10

HR measurements with different devices for each subject

Fig. 11
figure11

Bee swarm plots for HR

Fig. 12
figure12

Accuracy (%) for HR extracted from videos and reference HR data from BP device

Fig. 13
figure13

Correlation between averaged estimated and ground truth HR

Fig. 14
figure14

Comparison of surrounding light intensity (illuminance) and estimated HR during pilot study

Fig. 15
figure15

Bland-Altman plot for HR measurements with different devices for each subject

HR monitoring

The heat map shown in Fig. 10 visualizes the HR measurements for each subject using different devices and postures along with the averaged ground truth. Unique color represents the corresponding HR value, and a similar color pattern for each subject across all the devices demonstrates the reliability of the HR monitoring for the videos recorded with the proposed protocol. Figure 11 evaluates the beeswarm boxplot for HR results with the median and range of HR measurements using each device and posture, along with the averaged ground truth. The similar median lines in red validate the consistent performance of the proposed technique on videos recorded under different experimental conditions.

The HR values estimated from the videos are compared with the ground truth results recorded with Shimmer optical probe and Omron BP monitoring devices. Figure 12 shows the averaged percentage accuracy for HR values extracted from the videos for all the devices with different recording angles. The accuracy ranges between 81 and 100% across all the devices. Figure 13 evaluates the correlation between HR estimated with the camera and ground truth HR outcomes. Figure 15 compares the Bland-Altman plots for the HR extracted with the proposed technique and the reference HR data for each recording condition. Almost all of the data falls with the 95% confidence interval (\(\pm 1.96\) SD), indicating the robustness of HR evaluation in given conditions.

Considering the surrounding light conditions as another important influential factor, the results of the pilot study are shown in Fig. 14 for the accuracy of HR estimated from videos recorded under different surrounding light conditions. The figure compares the percentage accuracy with different light illumination levels for five subjects. The relatively consistent accuracy at different illumination levels validates that our approach is robust against changing light conditions.

Fig. 16
figure16

Histogram for systolic and diastolic BP

Fig. 17
figure17

PTT distribution across all the recorded videos

Fig. 18
figure18

Swarm plot for BP measurements with different devices for each subject along with averaged ground truth results

Fig. 19
figure19

Bland-Altman plot for BP measurements with different devices for each subject

BP estimation

We now analyze the statistical significance between the true BP and the estimated BP using our proposed technique. Figure 16 shows the distribution of SBP and DBP throughout the study for each subject representing a broad range of validation data. The PTT distributions showed in Fig. 17 observed reliable extraction across all the recording conditions with little variation. Apart from a few outliers, the graph provides consistent outcomes throughout the study. Similarly, a swarm plot shown in Fig. 18 compares the SYS BP results with the averaged ground truth results. Although, PTT is highly sensetive for several noise source, the mean readings are in a specific range.

The Bland-Altman plot shown in Fig. 19 summarizes the BP results predicted based on the proposed technique. It is shown that most of the SBP ranges present a good fit by an established linear dependency (i.e., with small differences centered around 0) with only a few more substantial disparities for larger SBP measurements. Despite few discrepancies, the linear mapping predictions based on the inverted PTT are well within \(\pm 20\) mmHg corresponding to recorded SBP.

Discussion

In the previous section, we comprehensively evaluated the effectiveness and robustness of the proposed MobiEye health monitoring system. Based on the evaluation results, the proposed approach is validated to be able to continuously obtain high-quality PPG that possesses a high level of consistency with the reference data for remote vital signs monitoring on HR and BP measurements. In this section, we further discuss several aspects concerning our proposed solution.

We started this study with the premise that the pixel intensity changes in each frame of the recorded video due to the nature of subtle changes in BVP, which provides plausible opportunities to extract PPG signals, in particular, to implement an unobtrusive health monitoring platform. The importance of continuous health monitoring has been underlined in many prior studies towards the goal of early detection of life-threatening events based on observable changes in vital signs. Traditional techniques used in clinical settings have several disadvantages and are not particularly useful for continuous health monitoring purposes. Camera-based vital signs monitoring techniques can overcome these challenges and deliver reliable and comfortable user experiences. As described in Sect. 3, the proposed solution uses image magnification and signal processing techniques to optimize the PPG extraction process. Several studies have suggested different techniques such as the RGB method that uses averaging of green channel pixels over the selected ROI. The ICA method uses the same technique of spatial averaging following ICA for source separation. In comparison with the widely adopted ICA-based ROI averaging method, our approach outperforms with higher SNR. Considering that the interaction of the incident light with the underlying tissue structures is not fully understood, noises from different resources, including subjects’ motions, can influence the recorded data and even largely mask the source signals. Accordingly, the image pyramid technique does not necessarily fully represent the true color changes with BVP. Regarding the plots in Fig. 9, the chart offers insights into the signal quality after each processing step. The raw signal is extracted as a simple average of the selected ROI, which provides minimal or little meaningful information. Spatial filtering can obtain useful data from the extracted raw signal. However, the signal of interest will not be revealed if the filter range is not selected appropriately.

We want to draw attention to the sensitive aspects of health monitoring techniques in terms of practical challenges. It is essential to consider several factors when applying physiological monitoring, such as frame resolution, frame rate, ambient light condition, and camera sensor. However, considering the implementation of these systems in the day-to-day life of end-users, it is necessary to validate the techniques above for the data collected under a protocol considering and emulating the actual real-world scenarios. The primary use of this technique is likely to be in-home environments while being incorporated into smartphones, tablets, and personal computers. So the several influential factors taken into consideration are rooted in the context of users’ interactions with the devices, e.g., the angle between face and camera, the distance between ROI and camera, the size of the device, etc. The takeaway from here is that the implementation for real-world tasks can only be guaranteed with sustained quality data collection.

As discussed earlier, the nature of incident light interaction with skin and underlying tissue limits the accurate modeling of BVP estimation. It may not fully capture the source signals with camera-based approaches. Also, this study has several limitations. Considering the implementation and usefulness of the proposed approach for end-users, the apparent limitation of this work is the small number of subjects at this stage. Furthermore, all the participants are known to have good cardiovascular health, which leaves the applicability of the proposed approach to cardiovascular disease patients underexplored.

The proposed protocol will be further advanced to incorporate more subjects with a broader age range and more diverse cardiovascular conditions such as atrial fibrillation, hypertension, etc. The study follows the standard measurement protocols to monitor BP (average of three readings) with the commercial wrist BP device for ground truth readings. Although these recordings are assumed to be accurate, the devices can be prone to potential errors. The averaged recordings have been considered as the ground truth results to minimize the possible errors. The other source of errors that have not been considered includes sudden notifications/calls on the recording devices, involuntary motion artifacts in a crowed place, etc. The proposed data collection protocol will be modified in the future to consider these scenarios. Our previous work established strong evidence of the pyramid technique for long-term evaluation of HR extracted from the videos recorded over a month. It is strongly demanded to have the assessments over a much more extended period to demonstrate and prove the robustness and permanence of the MobiEye system.

It is anticipated that MobiEye, when being implemented in miniaturized devices, could serve as an efficient and convenient data collection module in clinical settings that may allow professionals to monitor the patients remotely. The long-term data collection can provide useful data metrics for assessing an individual’s evolving health status.

Conclusion

We present a novel mobile-based pervasive health monitoring system MobiEye in this study, which takes advantage of the regular camera available in popular mobile devices to achieve the continuous recording of multiple vital signs from the facial videos. MobiEye possesses superior advantages in terms of ease of use, low cost, no need for professionals, and minimum calibration and maintenance. Although many prior studies have explored camera-based vital sign monitoring approaches, our solution shows better signal quality with the proposed image and signal processing framework. It possesses high applicability to other mobile devices on the market. The in-the-wild validations demonstrated the extraction of high-quality, robust PPG data for physiological data estimation. Therefore, our work suggests a feasible and effective alternative for ubiquitously continuous monitoring of users’ health data while offering users the maximal level of flexibility and comfort in home environments. To evaluate the effectiveness and robustness of the proposed algorithm in real-life scenarios, videos were recorded under generalized conditions, i.e., with the cameras on different devices (including laptops, smartphones, and tablets) in multiple standard device-handling positions in varying surrounding light conditions from a large population set.

References

  1. Abuella, H., Ekin, S.: Wireless vital signs monitoring system using visible light sensing (vls). arXiv preprint arXiv:1807.05408 (2018)

  2. Angelopoulou, E.: Understanding the color of human skin. In: Human vision and electronic imaging VI, International Society for Optics and Photonics, vol. 4299, pp. 243–252 (2001)

  3. Aoyagi, T.: Improvement of the earpiece oximeter. Abstr. Jpn. Soc. Med. Electron. Biol. Eng. 1974, 90–91 (1974)

    Google Scholar 

  4. Baranoski, G.V.G., Krishnaswamy, A.: Light interaction with human skin: From believable images to predictable models. In: ACM SIGGRAPH ASIA 2008 Courses, SIGGRAPH Asia’08. ACM (2008)

  5. Barun, V.V., Ivanov, A., Volotovskaya, A., Ulashchik, V.: Absorption spectra and light penetration depth of normal and pathologically altered human skin. J. Appl. Spectrosc. 74(3), 430–439 (2007)

    Article  Google Scholar 

  6. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)

    Article  Google Scholar 

  7. Chandrasekhar, A., Natarajan, K., Yavarimanesh, M., Mukkamala, R.: An iPhone application for blood pressure monitoring via the oscillometric finger pressing method. Sci. Rep. 8(1), 13136 (2018)

    Article  Google Scholar 

  8. Chekmenev, S.Y., Rara, H., Farag, A.A.: Non-contact, wavelet-based measurement of vital signs using thermal imaging. In: The first international conference on graphics, vision, and image processing (GVIP), pp. 107–112 (2005)

  9. Chen, K.M., Misra, D., Wang, H., Chuang, H.R., Postow, E.: An X-band microwave life-detection system. IEEE Trans. Biomed. Eng. 7, 697–701 (1986)

    Article  Google Scholar 

  10. Downey, C., Chapman, S., Randell, R., Brown, J., Jayne, D.G.: The impact of continuous versus intermittent vital signs monitoring in hospitals: a systematic review and narrative synthesis. Int. J. Nurs. Stud. 84, 19–27 (2018)

    Article  Google Scholar 

  11. Fuchs, M., Chen, T., Wang, O., Raskar, R., Seidel, H.P., Lensch, H.P.: Real-time temporal shaping of high-speed video streams. Comput. Graph. 34(5), 575–584 (2010)

    Article  Google Scholar 

  12. Gonzalez, R.C., Woods, R.E.: Image processing. Dig. Image Proces. 2, 1 (2007)

    Google Scholar 

  13. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, vol. 46. Wiley, Hoboken (2004)

    Google Scholar 

  14. Jeong, I.C., Ko, J.I., Hwang, S.O., Yoon, H.R.: A new method to estimate arterial blood pressure using photoplethysmographic signal. In: 2006 international conference of the IEEE engineering in medicine and biology society, IEEE, pp. 4667–4670 (2006)

  15. Ji, X., Cheng, J., Tao, D., Wu, X., Feng, W.: The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl. Based Syst. 122, 64–74 (2017)

    Article  Google Scholar 

  16. Jin, F., Ioannis, P.: Thermistor at a distance: unobtrusive measurement of breathing. IEEE Trans. Biomed. Eng. 57(4), 988–998 (2010)

    Article  Google Scholar 

  17. Kwon, S., Kim, H., Park, K.S.: Validation of heart rate extraction using video imaging on a built-in camera system of a smartphone. In: 2012 annual international conference of the IEEE engineering in medicine and biology society, IEEE, pp. 2174–2177 (2012)

  18. Lewandowska, M., Rumiński, J., Kocejko, T., Nowak, J.: Measuring pulse rate with a webcam—a non-contact method for evaluating cardiac activity. In: 2011 federated conference on computer science and information systems (FedCSIS), IEEE, pp. 405–410 (2011)

  19. Li, C., Lin, J.: Recent advances in Doppler radar sensors for pervasive healthcare monitoring. In: 2010 Asia-Pacific microwave conference, IEEE, pp. 283–290 (2010)

  20. Li, L., Ng, C.S.L.: A, physically-based human skin reflection model. In: WSEAS International Conference. Proceedings. Mathematics and Computers in Science and Engineering, 10. World Scientific and Engineering Academy and Society (2009)

  21. Lin, J.C.: Noninvasive microwave measurement of respiration. Proc. IEEE 63(10), 1530–1530 (1975)

    Article  Google Scholar 

  22. Magdalena Nowara, E., Marks, T.K., Mansour, H., Veeraraghavan, A.: SparsePPG: Towards driver monitoring using camera-based vital signs estimation in near-infrared. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1272–1281 (2018)

  23. Monkaresi, H., Calvo, R.A., Yan, H.: A machine learning approach to improve contactless heart rate monitoring using a webcam. IEEE J. Biomed. Health Inf. 18(4), 1153–1160 (2013)

    Article  Google Scholar 

  24. Mousavi, S.S., Firouzmand, M., Charmi, M., Hemmati, M., Moghadam, M., Ghorbani, Y.: Blood pressure estimation from appropriate and inappropriate ppg signals using a whole-based method. Biomed. Signal Process. Control 47, 196–206 (2019)

    Article  Google Scholar 

  25. Muehlsteff, J., Aubert, X., Schuett, M.: Cuffless estimation of systolic blood pressure for short effort bicycle tests: the prominent role of the pre-ejection period. In: 2006 international conference of the IEEE engineering in medicine and biology society, IEEE, pp. 5088–5092 (2006)

  26. Mukkamala, R., Hahn, J.O.: Toward ubiquitous blood pressure monitoring via pulse transit time: predictions on maximum calibration period and acceptable error limits. IEEE Trans. Biomed. Eng. 65(6), 1410–1420 (2017)

    Article  Google Scholar 

  27. Patil, O.R., Gao, Y., Li, B., Jin, Z.: CamBP: a camera-based, non-contact blood pressure monitor. In: Proceedings of the 2017 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2017 ACM international symposium on wearable computers, ACM pp. 524–529 (2017)

  28. Poh, M.Z., McDuff, D.J., Picard, R.W.: Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 18(10), 10762–10774 (2010)

    Article  Google Scholar 

  29. Shao, L., Zhen, X., Tao, D., Li, X.: Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans. Cybern. 44(6), 817–827 (2013)

    Article  Google Scholar 

  30. Shi, J., Tomasi, C.: Good features to track. In: Cornell University, Tech. rep. (1993)

  31. Sun, N., Garbey, M., Merla, A., Pavlidis, I.: Imaging the cardiovascular pulse. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol. 2, pp. 416–421 (2005)

  32. Sun, G., Matsui, T., Watai, Y., Kim, S., Kirimoto, T., Suzuki, S., Hakozaki, Y.: Vital-SCOPE: design and evaluation of a smart vital sign monitor for simultaneous measurement of pulse rate, respiratory rate, and body temperature for patient monitoring. J Sensors 2018, 5 (2018)

    Google Scholar 

  33. Suzuki, S., Matsui, T., Kagawa, M., Asao, T., Kotani, K.: An approach to a non-contact vital sign monitoring using dual-frequency microwave radars for elderly care. J. Biomed. Sci. Eng. 6(07), 704 (2013)

    Article  Google Scholar 

  34. Talreja, P.S., Kasting, G.B., Kleene, N.K., Pickens, W.L., Wang, T.F.: Visualization of the lipid barrier and measurement of lipid pathlength in human stratum corneum. Aaps Pharmsci. 3(2), 48–56 (2001)

    Article  Google Scholar 

  35. Tomasi, C., Detection, T.K.: Tracking of point features. In: Tech. rep., Tech. Rep. CMU-CS-91-132, Carnegie Mellon University (1991)

  36. Tuchin, V.: Tissue Optics: Light Scattering Methods and Instruments for Medical Diagnostics. SPIE Press, Washington, DC (2015)

    Google Scholar 

  37. Villarroel, M., Jorge, J., Pugh, C., Tarassenko, L.: Non-contact vital sign monitoring in the clinic. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), IEEE, pp. 278–285 (2017)

  38. Vinci, G., Lenhard, T., Will, C., Koelpin, A.: Microwave interferometer radar-based vital sign detection for driver monitoring syst. In: 2015 IEEE MTT-S international conference on microwaves for intelligent mobility (ICMIM), IEEE, pp. 1–4. (2015)

  39. Viola, P., Jones, M., et al.: Rapid object detection using a boosted cascade of simple features. CVPR 1(1), 511–518 (2001)

    Google Scholar 

  40. Wieringa, F.P., Mastik, F., van der Steen, A.F.: Contactless multiple wavelength photoplethysmographic imaging: a first step toward “SpO2 camera” technology. Ann. Biomed. Eng. 33(8), 1034–1041 (2005)

    Article  Google Scholar 

  41. Wu, H.Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W.: Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph. 31, 4 (2012)

    Article  Google Scholar 

  42. Xing, X., Sun, M.: Optical blood pressure estimation with photoplethysmography and FFT-based neural networks. Biomed. Opt. Express 7(8), 3007–3020 (2016)

    Article  Google Scholar 

  43. Zhang, Q., Zeng, X., Hu, W., Zhou, D.: A machine learning-empowered system for long-term motion-tolerant wearable monitoring of blood pressure and heart rate with ear-ECG/PPG. IEEE Access 5, 10547–10561 (2017)

    Article  Google Scholar 

  44. Zhao, F., Li, M., Jiang, Z., Tsien, J.Z., Lu, Z.: Camera-based, non-contact, vital-signs monitoring technology may provide a way for the early prevention of SIDS in infants. Front. Neurol. 7, 236 (2016)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhanpeng Jin.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patil, O., Wang, W., Gao, Y. et al. MobiEye: turning your smartphones into a ubiquitous unobtrusive vital sign monitoring system. CCF Trans. Pervasive Comp. Interact. (2020). https://doi.org/10.1007/s42486-020-00033-3

Download citation

Keywords

  • Vital signs
  • Health monitoring
  • ECG
  • PPG
  • Heart rate
  • Blood pressure