The diagnosis of sleep apnea syndrome is performed in sleep clinics by a full-night polysomnography (PSG) recording. A PSG system comprises multiple sensors, including two respiratory inductance plethysmography (RIP) belts over the thorax and abdomen to measure the respiratory effort. According to the American Academy of Sleep Medicine (AASM) Scoring Manual v2.4, the respiratory effort signals measured by the two belts are to be used to determine the apnea types. The signals are inspected for continuous or increased breathing effort (obstructive), absence of breathing effort (central), or a combination of both (mixed). Additionally, when airflow signals from the primary sensors, such as nasal pressure cannula or nasal thermistor, are not available or are unreliable, the RIPSum—the sum of the thorax and abdomen RIP belt signals—can be used for detection of respiratory events [1]. RIPSum is listed as an alternative because there is an absence or a reduction in respiratory effort during apneas and hypopneas [2,3,4].

The drawback of RIP belts is that they are attached to the body and connected by cables. These belts and all the other sensors included in a standard PSG are uncomfortable and may contribute to the first night effect (FNE). FNE is a term used to describe loss of sleep quality during PSG recordings [5]. Also, the signals from respiratory belts are often distorted and more susceptible to signal loss when patients are moving [2, 6]. In line with this, alternative sensors have been introduced to measure respiratory effort for the detection of apnea syndrome, e. g., two Doppler radars were placed under the mattress to measure vibrations to detect apnea and hypopnea [7]. With Doppler radars, 94% correlation was achieved with PSG scoring for the respiratory disturbance index (RDI). RDI reports respiratory events during sleep, including respiratory effort-related arousals. The Sonomat system (Sonomat Pty Ltd, Balmain, NSW, Australia), on the other hand, uses four electrodes in the mattress foam in locations near the abdomen and thorax [8]. The validation study of the Sonomat system reported a correlation with PSG of higher than 0.89 for the apnea–hypopnea index (AHI). Another alternative system is the SleepMinderTM (Biancamed Ltd, Dublin, Ireland), using an ultra-low-power radiofrequency transceiver to record respiration [9]. The AHI derived from SleepMinder was shown to have a 91% correlation with PSG scoring.

Another alternative sensor to monitor respiration during sleep is a contactless three-dimensional (3D) time-of-flight (TOF) camera. A 3D TOF camera measures the distance of objects in its field of view with a given spatial resolution and frame rate. The use of 3D cameras for medical applications is not entirely new; previous investigations have been performed using depth cameras to monitor body movements [10,11,12,13]. To our knowledge, depth cameras for the investigation of respiratory events were first introduced in by Falie and Ichim [14], but no test results from overnight sleep recordings were provided.

We used a respiratory effort signal derived from a 3D TOF camera to monitor respiration during sleep. With the camera recording for 30 frames/s, it captured the depth changes following the movement of the abdomen and thorax from a person’s breathing. We investigated whether this respiratory effort signal derived from a 3D camera is comparable to RIP belt signals. We performed the following comparisons between the two respiratory effort signals according to how RIP belts are used in the AASM Scoring Manual:

  • absence or presence of effort for the classification of apneas into obstructive, central, or mixed

  • decrease in effort for the scoring of apnea or hypopnea events when primary sensors are unavailable.

Materials and methods

Study design and investigation methods

In this study, 52 full-night PSG and 3D camera recordings of patients with suspected sleep apnea syndrome from the Advanced Sleep Research GmbH (Berlin Clinic) and Kepler University Clinic (Linz Clinic) were included. The median age of the patients was 57 years, the youngest 25 and the oldest 80 years. There were 20 female patients. Median body mass index (BMI) was 28.5 kg/m2 (range 21.8–45.4 kg/m2). This study was approved by the ethics committee of the state of Upper Austria (B-130-17) and the ethics committee of the Charité—Universitätsmedizin Berlin (EA1/127/16).

The 3D camera (KINECT V2; Microsoft, Redmond, WA, USA) setup was integrated with the PSG to synchronize recording time. The camera was mounted above the bed so that it captured the whole bed and the patient.

Apnea and hypopnea events

Overall, 639 apnea and 2235 hypopnea events were manually scored according to the AASM Scoring Manual v2.4 and using the 1B rule for hypopneas. The scoring task was carried out in our laboratory, with consultations with sleep experts from the Medical University of Vienna. These annotations served as ground truth for our investigations. From the annotations, there are 7 recordings with healthy AHI (<5), 15 recordings with mildly disturbed AHI (5–15), 9 recordings with moderately disturbed AHI (15–30), and 21 with severely disturbed AHI (>30). The severity category of AHI was based on the recommendation of the AASM Task Force [15]. The AHI was computed using the total sleep time from the PSG’s hypnogram.

As we are only concerned with apnea and hypopnea events, when respiratory events are mentioned in this paper, it only pertains to these two unless stated otherwise.

RespEffort3D

RespEffort3D is the respiratory effort signal derived from the depth signal of the 3D camera. The relevant pixels over the thorax and abdomen region were selected automatically and the depth signals of the selected pixels were then averaged per frame to create the respiratory effort signal.

Classification of apnea types

The respiratory effort signals monitored by the abdomen and thorax RIP belts were used to classify apnea into three categories: obstructive, central, and mixed. We show that the RespEffort3D signal can be used to classify the events, with the classification by RIP belts serving as the ground truth. The RespEffort3D and the RIP belts signals were visually inspected to classify the apnea events. Accuracy and per-type classification rate were computed, wherein classification by RIP belts served as ground truth. In total, 618 apnea events were tested, whereas the remaining 21 apnea events were excluded due to unreliable effort signals due to body movements.

Effort signals during respiratory events

According to the AASM Scoring Manual, RIPSum can be used as an alternative sensor when the primary sensors are not working or unreliable. RIPSum typically exhibits a decrease in signal during an apnea and hypopnea, which is the basis for scoring. In hypopneas, a 50% decrease in thoracoabdominal movements is usually observed and the decrease in apneas is usually larger [3, 16]. Consequently, we expect RespEffort3D to do the same, so that it can be used to score events, too. In order to compare RespEffort3D to RIPSum, we computed the size of the decrease in effort before and during the event for both signals. This is achieved by calculating a feature we call the effort reduction factor (ERF). ERF is the ratio of the effort during the event compared to the effort before the event. It is calculated for every apnea and hypopnea by the following steps:

  1. 1.

    Create an event segment of the respiratory effort during an apnea or hypopnea episode.

  2. 2.

    Create a baseline segment from the 15 s segment right before the event. If an event occurrs within the 15 s segment, the baseline segment is shortened from the end of the earlier event to the start of the current event. The 15 s duration was chosen as we found this to be the optimal segment length to estimate the pre-event baseline.

  3. 3.

    For each segment, calculate the effort strength from the average peak-to-peak amplitude of the effort.

  4. 4.

    Calculate the ERF from the ratio of the event effort strength to the baseline strength.

As such, an ERF value greater than 1 means there is an increase in effort during the event segment compared to the baseline segment, while a value of less than 1 means a decrease in effort during the event. The ERFs calculated for both RespEffort3D and RIPSum for all respiratory events excluding events with unreliable effort signal are tested for correlation using the Pearson’s correlation coefficient (Pearson’s r). The significance of the correlation is evaluated for p = 0.0001.

To assess the ability of scoring by RespEffort3D, a simple algorithm to find events of reduced effort in RIPSum and RespEffort3D was used. These reduced-effort events are candidate respiratory events, as further rules are needed according to the AASM manual. The reduced efforts from the two sensors were compared against each other. They were also compared to the annotated respiratory events. Pearson’s correlation coefficient was used to measure the correlation.

Results and discussion

This study is based on 52 full-night recordings including PSG and 3D camera signals. For each PSG recording, manual scoring was performed according to the AASM Scoring Manual v2.4. RespEffort3D was calculated for all recordings. RespEffort3D, as shown in Fig. 1, is similar to the RIPSum, nasal thermistor, and nasal pressure cannula signals from standard PSG during and between respiratory events, as it measures the respiration and reduction in respiration effort.

Fig. 1
figure 1

Respiratory effort signal derived from the depth signal of the 3D camera (RespEffort3D) and polysomnography during and between respiratory events. Airflow_Th signal from the nasal thermistor, Airflow_Pr signal from the nasal pressure cannula, RIPSum thorax/abdomen respiratory inductance plethysmography belts’ signal

Classification of apneas by RespEffort3D

A total of 618 apneas were classified by the thorax and abdomen RIP belts by visual inspection using the AASM rules, serving as the ground truth. The same set of apneas were classified by visual inspection using RespEffort3D. The results of these classification by RespEffort3D are shown in Table 1, with 80% of the apneas classified correctly by RespEffort3D. Breaking this down into the individual types, RespEffort3D performed very well in classifying central apneas, with only 1% error. An example of RespEffort3D showing an absence of respiratory effort during a central apnea event can be found in Fig. 2. For obstructive apneas, 79% were classified correctly. During an obstructive apnea event, a decrease in respiratory effort in RespEffort3D is commonly observed (see Fig. 3).

Table 1 Apnea classification by RespEffort3D, with RIPSum classification as ground truth
Fig. 2
figure 2

Central apnea as seen in thorax/abdomen respiratory inductance plethysmography belts (RIPSum) and the respiratory effort signal derived from the depth signal of the 3D camera (RespEffort3D). Airflow_Th signal from the nasal thermistor, Airflow_Pr signal from the nasal pressure cannula

Fig. 3
figure 3

Obstructive apnea as seen in thorax/abdomen respiratory inductance plethysmography belts (RIPSum) and the respiratory effort signal derived from the depth signal of the 3D camera (RespEffort3D). Airflow_Th signal from the nasal thermistor, Airflow_Pr signal from the nasal pressure cannula

Of the 87 misclassified obstructive apneas, 83 were classified as central and the remaining as mixed. Of the 32 misclassified mixed apneas, 28 were classified as central apneas and 4 as obstructive. This misclassification of obstructive and mixed apneas as central only happened in 14 recordings, and in three of these, misclassification is more prevalent. However, no distinguishable pattern can be seen among these recordings with respect to the patients’ BMI or their body position during sleep. One explanation for no effort seen in RespEffort3D for obstructive and mixed apnea is paradoxical breathing, which is the out-of-phase movement of the abdomen and thorax [4]. In order to improve the classification, it might be worthwhile to separate the 3D-derived effort signal into abdomen and thorax parts in future work.

In contrast to other classification studies, 90% test accuracy was achieved by using the thorax effort signal to train a neural network model [17]. A 73.1% classification accuracy was reported in one study using midsagittal jaw motion [18]. In the current study, no algorithm was yet developed for apnea classification using RespEffort3D, but the visual inspection method already provided information on the capability of RespEffort3D, where 80% is a good indicator. As such, in order to improve accuracy, separating the RespEffort3D into its abdomen and thorax parts is important for better classification of mixed and obstructive events.

Effort signals during respiratory events

The ERFs of all apnea and hypopnea events were calculated for both RIPSum and RespEffort3D and compared to each other. Fig. 4 shows the ERF score comparison between RIP and RespEffort3D, where each point corresponds to a respiratory event. Between the two respiratory effort signals, Pearson’s r is 0.62 (r ≠ 0, p = 0.0001). This suggests similarity in behavior during respiratory events and validates the decrease in RespEffort3D similar to RIPSum. The average decrease in RIPSum during hypopneas is 54 and 51% for RespEffort3D, which is not far off the reported 50% reduction in effort in the thoracoabdominal movement [16] and the reported 50% reduction in polyvinylidene fluoride belts’ sum (PVDFSum), another alternative sensor [3]. On the other hand, the average decrease in effort during apneas of all recordings for RIPSum and RespEffort3D is 72 and 73%, respectively. In the literature a 90% reduction was reported for PVDFSum in apneas [3]. The decrease in RespEffort3D by less than 90% can be attributed to the continuous breathing during obstructive apneas. In some events an increase is observed for both RespEffort3D and RIPSum; however, this is not surprising, because as specified in [1], continuous or increased inspiratory effort occurs in obstructive cases.

Fig. 4
figure 4

Effort reduction factor (ERF) for all respiratory events between the respiratory effort signals derived from the depth signal of the 3D camera (RespEffort3D) and thorax/abdomen respiratory inductance plethysmography belts (RIPSum). Each point corresponds to a respiratory event (2235 hypopneas and 639 apneas). A log10(ERF) below 100 means decrease in effort signal during respiratory events

In order to strengthen our claims that RespEffort3D can score apneas and hypopneas, we applied a simple algorithm to find events of reduction of more than 10 s to RespEffort3D and RIPSum. In Fig. 5, the number of RespEffort3D events per recording is plotted against the number of apnea and hypopnea events. There is a moderate correlation between the two, with Pearson’s r = 0.75 (r ≠ 0, p = 0.0001). Between RespEffort3D events and RIPSum events there is a moderate correlation of r = 0.78 (r ≠ 0, p = 0.0001). Between RIPSum events and RespEffort3D events there is a strong correlation of r = 0.88 (r ≠ 0, p = 0.0001), see Fig. 6. These events of reduction in the effort signals serve as candidates for respiratory events, as further rules are needed to score hypopneas. This also explains the systematic overestimation of the reduced effort events. One limitation of this comparison between reduced effort events and respiratory events (combination of apnea and hypopneas) is that there is no distinction between apneas and hypopneas. Further investigation is necessary to identify the difference between apneas and hypopneas in RespEffort3D.

Fig. 5
figure 5

Respiratory effort signal derived from the depth signal of the 3D camera (RespEffort3D) events vs. respiratory events. Every point corresponds to a single recording, red line is x = y

Fig. 6
figure 6

Respiratory effort signal derived from the depth signal of the 3D camera (RespEffort3D) events vs. thorax/abdomen respiratory inductance plethysmography belts (RIPSum) events. Each point corresponds to a recording, red line is x = y

Our results show that RespEffort3D decreases similarly to RIPSum during respiratory events. We found a strong correlation between the number of events of reduction in RespEffort3D and RIPSum. Taken together, this is a very strong indication that RespEffort3D can be used for the scoring of respiratory events, similarly to the RIPSum.

Conclusion and recommendation

We were able to derive a respiratory effort signal, RespEffort3D, from a 3D TOF camera recording mounted above the bed. The RIPSum and the RespEffort3D were shown to have similar behavior for the two crucial tasks, i. e., classification of apneas and scoring of respiratory events. We can therefore recommend the use of 3D cameras as a replacement for the RIP belts for sleep apnea syndrome screening. This will lead to a better sleep quality for the patients, as the uncomfortable RIP belt sensors can be omitted.

For future work, deriving separate abdomen and thorax signals from the 3D TOF camera is worth looking into to improve the apnea classification rate. Furthermore, encouraged by the strong signal drops during respiratory events, we are planning to investigate the descriptive power of the RespEffort3D signal for the detection of respiratory events when combined with other noninvasive sensors.