Background

Observing animals has been the hallmark approach of ethological studies. Often credited with formalizing the field, Altmann [1] gave researchers a toolkit for sampling behaviour state and context in the field through comparable and repeatable approaches to measures of activity and interaction. Quantitative observational studies have been used to understand behaviour in a wide range of contexts, such as individual- or population-level foraging decisions [2, 3] and investigating the mechanisms for conflict and conflict-avoidance [4]. Comparative observations also allow examination of how behaviour may vary over time such as differences between day and night activities [5, 6] or across individuals, including personality types and consistent individual differences [7,8,9]. With the advancement of animal-borne data loggers, researchers have been able to extend the application of behavioural studies to species that have typically been very difficult to observe in the wild, such as marine mammals. More specifically, triaxial accelerometers have been used to infer behaviour remotely in pinnipeds when they are unobservable during trips to and from feeding aggregations [10,11,12,13,14] and other at-sea activities [15, 16]. Often, these accelerometry deployments focus on building coarse-scale activity budgets for resolving energetics associated with foraging and diving or towards more fine-scale event detection, such as head-striking behaviour, to infer the rate of prey consumption relative to energy expenditure at sea [17,18,19,20]. These studies tend to focus on species who exhibit an income approach to the reproductive period of their life history, in which they must regularly supplement their energy stores to maintain and provision their pups, or focus on detecting and classifying behaviour outside of the reproductive period (e.g. [10, 14]). While accelerometers have been used extensively to study the behaviour of terrestrial animals, rarely has any accelerometry research been geared to the consequences of behaviour associated with the brief, but important on-land portion of seal life history (e.g. [21,22,23,24,25,26]).

The application of machine learning methods has also become a popular tool for remotely classifying behaviour from accelerometers in a variety of species (e.g. [27,28,29,30].). While accelerometers often present a novel tool for capturing behaviour, the associated data sets can quickly become monumental tasks to examine manually [31]. Supervised machine learning presents a way to overcome this. By using a period of time where the behaviour of an individual is known, a concurrent set of accelerometry data can be labelled and used to train a classification algorithm of choice in order to classify behaviour outside of the observable period [31]. Many different algorithms are available to use in classification, ranging from simple linear discriminant analyses (e.g. [32]) and decision tree algorithms (e.g. [33]) to more advanced black box type approaches such as random forests (e.g. [24]), support vector machines (e.g. [27]), and artificial neural networks (e.g. [34]). Gaining access to individuals in order to build a training data set can often be challenging. Captive surrogates have been used with accelerometers mounted in an analogous way to those in the wild and used to train an algorithm to classify the behaviour of their wild counterparts (e.g. [22, 23, 35, 36]). One such study noted, however, that captive surrogates may not exhibit behaviour in the same mechanistic fashion as those in the wild which may lead to poor, yet undetectable, model performance in classification of unknown data in wild individuals [26]. Having access to behavioural information in a wild context is therefore key to ensuring that trained data match that of a wild cohort of individuals and will likely more accurately characterize behaviour when out of sight.

During their 18-day period on shore, breeding female grey seals have fixed resources that they must allocate to maintain themselves and their pup [3, 37,38,39]. Behavioural decisions and small fluctuations in activity likely have an impact on this energetic allocation. Grey seals offer a good system to look at activity in detail, but visual observations to assess behaviour are limited to daylight hours. During the UK grey seal breeding season in the autumn, this may only be about one-third of their daily cycle at best. The use of supervised machine learning algorithms would be extremely powerful in order to elucidate behaviour outside of this limited observable time. While many previous studies have evaluated the mechanics of behaviour at sea, the authors have been unable to find any published studies that attempt to resolve and classify lactation and breeding behaviour on land in grey seals and other pinnipeds using accelerometry (e.g. [40]). Accelerometry-derived activity will also allow not only for the assessment of behaviour overnight, an area of research that is largely either ignored or inaccessible (e.g. [5]), but also will overcome the limitations of visual focal sampling by recording data continuously and simultaneously over many individuals, free from observer biases.

In order to conserve resources, grey seals typically tend to remain inactive for long periods of time and only move about to either reposition themselves relative to a pup or to intercept a threat, be it a male or another female and her pup [38]. Grey seals are also known to occasionally travel to and from pools for thermoregulation, but the cost of which is largely unevaluated [41, 42]. Most active behaviours are therefore limited to those such as vigilance or pup-checking where the head may be in motion, leaving the body largely unmoving. Consistent individual differences in time spent alert have already been shown as an important indicator of stress management and coping styles in grey seals [9, 43]. While many studies advise placement of accelerometers close to the centre of mass as a better indicator of energy expenditure (e.g. [31]), head-mounted accelerometers may give a better indication of vigilance, an important indicator of stress management in many terrestrial animals [44,45,46,47,48,49]. This motivated the comparison of the resolution of data from both head (vigilance) and torso-mounted accelerometers (energy expenditure) in the same context and directly assess trade-offs associated with behaviour detection for a largely inactive model species (Fig. 1). Our study encompassed two successive breeding seasons, during which time individuals were exposed to varying environmental and animal density conditions across years on the breeding colony that may confound an in situ accelerometry study. As grey seals are typically site faithful [50], the amount of variability and repeatability between years for accelerometry feature characteristics measured in repeat capture females were quantified as well as the amount of variance between individual females.

Fig. 1
figure 1

Example of accelerometers mounts for female grey seals. Example of attachment set up for a a head-mounted accelerometer, and b a torso-mounted accelerometer in addition to a head mount, contained within a custom-designed ballistic nylon footprint on a female grey seal. Tag-frame axes labelled with arrows pointing in direction of positive acceleration values for each axis (X, Y, and Z). Each accelerometer was configured to measure ± 2 g at 50 Hz (2015) and 25 Hz (2016). Heart rate monitor also pictured in panel B as part of larger study design

The main aim of this study was to build a useable ethogram of behavioural states as derived from accelerometers during lactation to potentially extend to other efforts to study grey seals and other pinnipeds who exhibit a capital breeding system. Video footage of female grey seals was decoded using a very detailed ethogram of behaviours as part of a larger effort to study grey seal ethology. These detailed behaviours were condensed into broader categories of 8 behavioural states and used to label the concurrent acceleration data collected during the 2015 and 2016 breeding seasons on the Isle of May, Scotland. Several females in 2016 were equipped with two accelerometers, one on the head and one on the torso, to evaluate the effect of placement on behaviour detection. Due to an unforeseen glitch in the firmware of the accelerometers, sampling rates differed between seasons (50 Hz in 2015; 25 Hz in 2016). Labelled accelerometry data were then used to train a random forest algorithm using a subset of training data (60%), with model performance assessed through the remaining data (40%) separately for each season. In order to examine trade-offs in behaviour detection with accelerometer placement, separate random forest models were constructed for a subset of individuals who were tagged with both an accelerometer on the head and torso. Random forest model results from pooled data were also compared to results of random forests fit to each individual. This was done in order to compare and contrast the trade-offs in model accuracy and training data sample size. In addition, we wished to evaluate the stereotypy of behaviours for females recaptured in two subsequent breeding seasons, with the 2015 data subsampled to match the sampling rate of 2016, by quantifying the amount of inter-individual variability present in the accelerometry features using variance and repeatability estimates.

Results

Using random forests, we were able to classify four of six core behaviours (Rest, Presenting/Nursing, Alert, and Flippering pup) reliably during lactation in grey seals (Table 1). Between years and accelerometer placement schemes, static behaviours (Rest, Presenting/Nursing, Flippering pup) were consistently classified well based on measures of precision (true positive accuracy), recall (sensitivity), and F1 (the harmonic mean of Precision and Recall) between training (60%) and testing data (40%). All non-Rest behaviours were misclassified to some extent as Rest, resulting in a high number of false positives (values in italics across the top row; Table 2). Accelerometers sampling at a higher frequency (50 Hz in 2015; Fig. 2) was better able to classify behaviours such as Alert than those sampling at a lower frequency (25 Hz in 2016; Table 3), resulting in an F1 of 45% greater for 2015. However, torso-mounted accelerometers generally performed better than head-mounted accelerometers on many of the static behaviours associated with lactation, such as Presenting/Nursing and Rest, despite the lower sampling rate. This resulted in F1 being 29% greater for accelerometers mounted on the torso against those mounted on the head in 2016 (Table 3). Locomotion events, however, were completely undetected in the random forest models for torso-mounted accelerometers. Error estimates and out-of-bag errors (bootstrapped samples from random forest model building) against number of trees grown can be found in the supplementary materials (see Additional files 13).

Table 1 Ethogram of female grey seal behaviour during lactation
Table 2 Confusion matrix of behaviour classified from random forests
Fig. 2
figure 2

Precision and recall for head-mounted accelerometers. Scatter plot of precision and recall for the random forest model for head-mounted accelerometers for 2015 (sampled at 50 Hz) on lactating grey seals. Behaviours include Rest, Alert, Presenting/Nursing, Locomotion, Comfort Movements, and Flippering pup as defined in Table 1

Table 3 Comparison of behaviour classification across accelerometer mounts

Of the feature variables calculated to summarize the acceleration data (see definitions and derivations in Table 4), components relating to static acceleration (those relating to body posture) were found to be the most important for classifying behaviours. According to random forest models, stZ, stX, stY ranked as top three most important variables, followed by Pitch and Roll relative to the decreasing Gini index (Fig. 3). Gini will approach zero as each of the branches contain a single behavioural category; therefore, a greater decrease in mean Gini indicates that the feature variable in question is more important for splitting these branches and differentiating the behaviours within the random forest model [53]. Summaries of these top five feature variables with respect to behaviour can be found in the additional files (see Additional file 4) as well as a list of full Gini index rankings of all features (Additional file 5). Power spectrum densities in all acceleration dimensions and those pertaining to VeDBA and VeDBAs were also very important (Additional file 5).

Table 4 Summary of feature variables extracted from acceleration data
Fig. 3
figure 3

Variable importance for classifying female grey seal behaviour. Ten feature variables with the highest mean decrease in Gini, indicating the relative importance of each of the feature variables within the random forest model classifying 6 behaviours in lactating grey seals using head-mounted accelerometers (2015, 50 Hz). Top feature variables included static acceleration components (stZ, stY, stX) and their derivatives, pitch and roll, as well as smoothed VeDBA and elements of power spectrum densities (PSD1, PSD2) in the X and Y dimensions as defined in Table 4

The effects of year and individual on the top feature variable, stZ, were modelled as a generalized linear mixed effects model with maternal post-partum mass a fixed effect to account for the potential influence of inter-annual variation in cost of transport associated with changes in mass between years. The variance of these two random effects, individual and year, were computed over 1000 bootstrapped samples using the package ‘rptR’ for repeat capture females in R [63, 64]. Overall, Presenting/Nursing and Comfort Movement were found to vary greatly between individuals for the top feature variable, stZ, for torso-mounted data (Fig. 4). The variance component due to individuals was 12.2 ± 5.3%, for Presenting/Nursing and 21.2 ± 9.6% for Comfort Movement across bootstrapped samples (Table 5). Other behaviours, however, showed less than 5% variance. No variance was explained by the effect of year across bootstrapped samples. However, top feature variables most likely to be associated with the position and movement mechanics for each behaviour appear to be repeatable across individuals, indicating varying degrees of stereotypy (Table 5). Alert and Locomotion, largely upright behaviours, appear to be consistent for each seal with respect to stZ, while Rest and Presenting/Nursing, where the head is most often tilting in a downward direction, were consistent and repeatable with respect to stX (Table 5). Flippering pup was found to be highly significant and repeatable within individuals between years with respect to Roll, potentially indicating a side preference and a high degree of stereotypy (adjusted-R = 0.925; D = 1070, p < 0.001 as determined from a likelihood ratio test). This led to evidence that some females lay preferentially on one side of their body (as indicated by Roll) during the Flippering pup behaviour, potentially indicating lateralization given its highly significant repeatability (Fig. 5). Four of the females were found to preferentially lay on their right side, where Roll was significantly less than 0 as determined through a one sample signed rank test (‘0J’: V = 148, p < 0.001; ‘74,789’: V = 1017, p < 0.001; ‘74,904’: V = 3598, p < 0.001; and ‘74,962’: V = 1207, p < 0.001; see Fig. 5). Likewise, five additional females were found to preferentially lay on their left side, where Roll was significantly greater than 0 as determined through a one sample signed rank test (‘45,447’ V = 145,710, p < 0.001; ‘58,038’: V = 46,524, p < 0.001; ‘74,920’: V = 475,420, p < 0.001; ‘72,146’: V = 1,125,800, p < 0.001; and ‘4H’: V = 84,251, p < 0.001; see Fig. 5).

Fig. 4
figure 4

Individual variability of behaviours with respect to static acceleration in the Z-axis. Boxplot of each behavioural group (Rest, Alert, Presenting/Nursing (Nurse), Locomotion (Loco.), Comfort movements (CM), and Flippering pup (Flip. Pup)) with respect to static acceleration in the Z-axis (stZ) for torso-mounted accelerometers, the feature variable found to be most important in differentiating behaviour in the final random forest model. A high degree of variability existed between individuals and would likely contribute to a lower Precision and Recall when fitting random forests using pooled data

Table 5 Variance and repeatability estimates for individual ID
Fig. 5
figure 5

Individual differences in side preference for Flippering pup behaviour. Boxplot of static acceleration in the Y-axis, as represented by the derivative Roll, with respect to individual for repeat capture females. Some females appear to show preference for being positioned on the right (values towards − 1) or the left (values towards + 1), indicating individual lateralization in a female–pup interaction (Flippering pup) and was found to be highly significantly repeatable. Those with (**) were found to spend significantly more time on their right (R) or left (L) side as determined through a one sample signed rank test

Discussion

Four behaviours, representing upwards of 90% of a lactating female grey seal’s activity budget, were classified well using accelerometry. Overall, several core behaviours of grey seals during lactation were resolved more successfully than others and the reasons varied. Behaviours that were largely stationary, such as Rest and Presenting/Nursing, were best classified in our random forest model. We were also able to reliably classify a form of mother–pup interaction, Flippering pup, with many females showing a specific bias towards left- or right-side positioning, potentially indicating a form of lateralization. Our two movement behaviours of interest, Locomotion and Comfort Movement, were poorly classified regardless of sampling rate (year) or accelerometer placement, despite being among the most popular behaviours to classify in the literature across taxa [54, 65,66,67]. Torso-mounted accelerometers generally performed better than head-mounted accelerometers on the same individuals, but a higher sampling rate still achieved better classification for most behaviours. While a higher sampling rate may have achieved better classification overall at the cost of a shorter deployment time, especially with the consideration of technical issues from tag malfunction in this study, we were still able to resolve a coarse level of behaviour with 4 of 6 target behaviours classified reliably. It was notable that individuals differed significantly, as indicated by individual ID contributing a large portion of variance in modelling. Individuals were largely consistent within themselves, however, in the mechanics of behaviour between years.

Limitations of behavioural classification

Random forests have been used to classify behaviour in a wide range of taxa, including domestic sheep (Ovis aries, [68]), Eurasian beavers (Castor fibre; [69]), brown hares (Lepus europaeus; [24]), puma (Puma concolor; [70]), griffon vultures (Gyps fulvus; [32]), and other pinniped species (e.g. [40]). In all these studies, only three or four behavioural states with extremely disparate feature characteristics could be discriminated successfully, as was the case in the current investigation. While random forests computationally intensive to train, they take much less time to classify new behavioural data and are generally robust given their two layers of randomness in classification [53]. Unsurprisingly in the estimates of error across trees (Additional files 13), movement behaviours (Locomotion and Comfort Movement) with the poorest Precision and Recall also had the highest error rates. Some on-land behaviours of interest in female grey seals may be too variable in execution (amplitude of signal) and duration (presence in time) to classify accurately given the sensitivity of the accelerometers within the current study design in grey seals. In signal theory, random signals as might arise from a behaviour like Comfort Movement are very difficult to characterize [71]. These signals are often contaminated with multiple spectral densities and frequencies that will vary in magnitude over time. Often these signals violate the assumptions of transforms, such as the fast Fourier transform used here, that may lead to inconsistent features, even when properly windowed through more advanced signal processing methods; it may not be possible to accurately and consistently extract some of the behaviours of interest from acceleration data, even with the addition of more feature variables.

Stationary behaviours during lactation

Overall, static acceleration and its subsequent components were considered the most important features for discriminating behaviour. Rest and Presenting/Nursing were among the best classified on both head- and torso-mounted accelerometers (Precision of 69–75% and 72–80%, respectively, and Recall of 76–93% and 19–56%, respectively). These behaviours involve extensive periods of little to no movement, with only periodic adjustments of body position lasting for brief periods (e.g. Comfort Movements). Resting, and other static behaviours, is often the most easily identifiable behaviour as found in a variety of taxa through accelerometry [70, 72, 73]. Rest and Presenting/Nursing behaviours represent the key trade-off in energy conservation in lactating phocids, maximizing the transfer of finite energy stores to the pup [39, 74,75,76,77,78]. Rest and Presenting/Nursing represent most (65–90%) of a female grey seal’s activity budget in the wild [38, 79,80,81]. In the current study, these two behaviours represented almost half of the testing data (Table 2). As capital breeders, grey seal mothers do not return to sea to forage and supplement their energy stores [82]. Resting often seems to be viewed in ethology as the leftover period of a behavioural activity budget. Grey seals of both sexes must budget time spent resting in order to maximize their energy allocation to breeding [39, 83, 84]. For male grey seals, increasing time spent resting may extend tenure within a key breeding territory as they may spend several weeks on the colony without supplemental energy income [85].

A key aim of many studies of lactating phocids is to track the energetics of reproduction. While Rest can be variable in overall body positioning in grey seals, Presenting/Nursing is stereotypical as indicated by its relatively high repeatability, with females alternating regularly between lying on the right or left side to maximize access to both teats as indicated by the wide range of the static acceleration signal across years (Additional file 4). Maternal expenditure during lactation is most accurately quantified by the fat and protein content of milk, overall milk output, or enzyme activity levels as an indication of the female’s ability to mobilize fat [82, 86, 87]. These previous studies often involved many repeated sampling events over the lactation period that potentially cause disturbance to both the female and her pup. When repeated physiological samples are unavailable, researchers often calculate mass transfer efficiency by measuring the ratio of the amount of maternal mass lost to the mass gained by the pup based on two capture events at the beginning and end of lactation [39]. Accelerometers may give a useful behavioural estimate of maternal effort in nursing to compare across populations, especially with respect to topographical considerations, tidal effects, or the effect of disturbance. While not directly useable as a measure of discrete energy transfer between females and pups, this behaviour may only be a useful indication of energetic differences relating to extreme outliers of low mass transfer efficiencies.

The stationary pup interaction in the form of Flippering pup was also classified well, irrespective of accelerometer sampling protocols. This behaviour also had the lowest calculated inter-individual variability and the highest significant repeatability score with respect to body position. While many other pup-directed behaviours can be identified through conventional behavioural observation, this was the only other maternal behaviour that was reliably classified outside of Presenting/Nursing. Similar to Presenting/Nursing, females often engage in Flippering pup behaviour while lying on one side or the other, repeatedly stroking or scratching the pup. While this behaviour involves a similar body position to that of Presenting/Nursing or Rest, there is a slight average increase in the frequency associated with the x-axis of movement with this behaviour, making it relatively stereotypical in feature space. As this behaviour is often observed preceding nursing events, this may be an important tool for further assessing patterns in maternal care. Interestingly, some females appear to be selective in choosing which side to lay on, likely using their opposite front flipper to stroke the pup, as indicated by the slight saturation towards positive acceleration (indicating right side preference; significant in four females) or negative acceleration (indicating left side preference; significant in five females) in Roll (Fig. 5). Our definition of Flipper pup likely broadly defines a class of movement, but may contain differences in flippering associated with a positive affective state, generally preceding a nursing event, or with a negative affective state, such as stimulating a pup to move away from a threat source. It is likely that we would find stronger side preferences in this behaviour associated with these different affective states. These results add to a growing body of evidence for preferential lateralization in mammals, both for humans and others [88,89,90]. While we could detect no bias in Presenting/Nursing towards lying on the left or right, our result indicates that some grey seal females may exhibit a preference towards left handed flippering of the pup irrespective of affective state, which is consistent with research indicating that this will keep the pup in the left eye allowing control by the right hemisphere of the brain, associated kin recognition and threat recognition in mammals [88, 89, 91,92,93]. This intriguing evidence of handedness in female grey seals should be built upon by detailed studies of behaviour to assess degree of lateralization in other non-nursing mother–pup interactions and social contexts.

Vigilance during lactation

We were able to classify a single broad vigilance category well from accelerometry data when sampled at a high rate (Precision 64% and Recall 76% for 2015). Alert behaviour, even when the head is moving periodically to scan for threats, often involves many intermittent pauses of relative stillness. What traditionally an ethologist might classify as a single bout of vigilance or alert behaviour over a period of 1 min, an accelerometer might only characterize short periods detectable movement, accurately classified as Alert, interspersed with short periods of data that may behave as Rest. Given the fine-scale resolution of second-by-second behaviour, Alert may be indistinguishable as a single state lasting several seconds or minutes. In fact, Alert behaviours were most often mistaken for Rest. Some degree of post hoc thresholding might be necessary to improve the derivations of time-activity budgets of states over time.

Vigilance has been studied extensively in a variety of terrestrial species [44, 46, 48, 94]. Understanding how individuals allocate time (and consequently energy) to vigilance has been a major topic of study in behavioural ecology. Often in ungulates and other prey species, this represents a trade-off associated with balancing time foraging and acquiring energy (head-down) and looking out for potential sources of danger (head-up; [21, 49, 68]). Studying the functions of vigilance has led to insight into the evolution of group living and predator–prey dynamics (e.g. [95, 96]). Even predators must balance vigilance activity, balancing vigilance for threats and prey items alike [46, 47]. Grey seals, too, must balance the time that they spend vigilant watching out for threats to their young, though we are only able to comment on the amount of time spent in a general state of Alert. With no indication of context, it is impossible to comment on the functionality of accelerometer-derived vigilance activity. Most terrestrial studies evaluating vigilance have used collar-mounted accelerometers [97, 98]. Other types of Alert or even social and aggressive behaviours and contexts may be better classified with the placement of an accelerometer in a location with a greater variety of postural dynamics, such as being glued on to the neck behind the head. The extraction of context-specific types of alert behaviours may allude to fine-scale decision-making during this sensitive period of development for mother and pup.

Phocid locomotion on land

Perhaps surprisingly, Locomotion was not well classified in our grey seals on land. Identifying modes of locomotion is a popular aim in the accelerometry literature, from flight to running to swimming [16, 65, 99, 100]. Locomotion types are often bounded by various biomechanical pressures that limit their interpretation [101, 102] and are easily identifiable and separable by their spectral densities and frequencies [70]. In other pinnipeds, differences in at-sea locomotion detected with tags mounted along the dorsal midline, often expressed as stroke frequency, are used as a reliable indicator of energetic expenditure [67]. Often, as in this study, frequency and spectral density elements are extracted using a fast Fourier transform [103]. This transform assumes that the signal is stable in time and space in order to dissolve it into its spectral elements [62, 71]. Behaviours like swimming in marine mammals are often stable and can last over many minutes or hours. However, if a signal is too brief or inconsistent in execution, this transform is not likely to accurately detect changes in frequency and power; the signal may be missed entirely. In the case of grey seals on land, locomotion is typically brief as females tend to stay within a few body lengths of their pups, with only the rare long-distance trip to a pool of water [41, 42]. In total, Locomotion only comprises about 1% of a female’s activity budget, even across different seal breeding colonies where topographical differences may alter locomotory needs (e.g. [3, 81]). Generally grey seals appear to limit the time spent locomoting, likely as a mechanism for conserving energy and to avoid being away from offspring [38]. Female grey seals must prioritize maximizing energy stores upon arrival to a breeding colony to maintain themselves and nourish their pup during lactation while fasting [52]. While Locomotion was clearly present within the accelerometry signal upon visual inspection, with individual ‘steps’ visible, it generally was missed entirely by our classification algorithms as indicated by a high precision (92%) and extremely low recall (5.4%) when sampled at the highest rate in 2015. In addition to being brief, grey seal Locomotion on land may not be stereotypical enough to accurately classify when moving over short distances as females will often alternate between vigilance and directed movement, as well as being able to locomote while still on their side. Even though PSD was an important predictor of behavioural classification in the current study, Locomotion was only identifiable in head-mounted accelerometer deployments and was often confused with Alert or Rest behaviours, but very poorly classified (Table 2). Seal locomotion on land, especially at slower speeds, is typically led by the head and forelimbs, rather than the centre of mass. This may explain in part why Locomotion was marginally better classified in the head-mounted accelerometers, rather than on the torso. It may be possible for accelerometers mounted on the torso, but sampling at a higher rate to capture the more subtle movements, to accurately detect Locomotion and subsequent energy usage on land, but may still suffer from the confounding effects discussed above.

Limitations of accelerometry and individual differences

Context-dependent and interaction behaviours were removed from classification as they were unidentifiable in feature space given our study design. Several studies have also identified the confounding factors of classifying such contextual behaviours. One study on baboons found poor classification Precision and Recall when attempting to separate grooming behaviour when the individual was either the actor (grooming another) or the receiver (being groomed by another; [25]). Another study in captive elephants showed that although differences in affective state could be discriminated, acceleration needed to be sampled at extremely high levels (1000 Hz) in order to elucidate minute differences in postural dynamics [104]. Given the inherent trade-offs in battery longevity, storage capacity, and sampling rate as well as best practice recommendations for tagging, it is unlikely that this type of highly sensitive measurement could yet be applicable in a wild setting. Torso-mounted accelerometers show promise in extracting key behaviours while seals are on land, though a higher sampling rate that was used here may be necessary to classify behaviours with greater Precision and Recall. In addition, a higher sampling rate may be able to highlight minute differences in postural dynamics that may improve in the identification of contextual interactions in grey seals. Nevertheless, the resolution of behaviour identified in the current study is comparable to other previous efforts to classify behaviour in various other vertebrates, such as [13, 23, 40, 59, 66].

When examining inter-annual differences in behavioural mechanics for repeat capture females, it was found that individual ID included as a random effect explained a relatively high amount of variance. We found that while there was clear inter-individual differences in behaviour in certain behaviours, females were largely consistent within themselves between years. For comparison, we fitted random forests to individual seals and indeed found higher F1 values across the board for all behaviours. While building random forests for each individual certainly overcomes this inter-individual variability, clearly apparent in Fig. 4, with respect to behavioural mechanics, only a small subset of the individuals actually had enough training data to build a random forest for all 6 behaviours investigated here. One of our main aims by pooling data from all individuals was to increase the overall sample size of behavioural reference data, especially with the goal to overcome the difficulty of observing behaviour in a wild context without the use of captive surrogates. As with the results presented here, researchers must consider the trade-offs with data availability (in either a wild context or with captive surrogates) and random forest model accuracy (fitting to an individual or pooling data) within the context of the study at hand.

While the exact reason for such a high amount of variance is unclear, differences in substrate within and among study locations on the colony likely contributed to inter-individual differences and may have confounded classification of behaviour from accelerometers, even when every effort is made to tag the same individuals. Care should be taken in future work to consider the overall effect of individual variability, especially associated with the surrounding context, when classifying behaviour using accelerometers (e.g. [40]). Several other studies have pointed out the potential confounding effects of environment in dictating the overall body position of an individual [54, 99]. Static acceleration was one of the most important predictors of behaviour in the favoured random forest model classifying our 6 behavioural states. While female grey seals tended to return to similar locations on the colony between years, the topography of the island is highly variable and has already been shown to be an important consideration in the behaviour of this species [3, 50, 79]. It is unclear how or whether the effect of topography on body position and dynamic movement can be addressed or corrected for without the application of more sensors to model movement within quantified fine-scale topography, such as the addition of magnetometers and GPS (e.g. [105]). Individuals did vary significantly within themselves with respect to Presenting/Nursing within static components of acceleration. Rather than being a mechanistic error, this likely indicates an attempt by females to maximize access to milk, ensuring the pup has fairly equal access to nipples during suckling bouts. Separating left and right side Presenting/Nursing may improve classification. In addition, it is more than likely that higher Precision and Recall might be achieved if the behaviours were defined exclusively by their mechanics. This would, however, be at the risk of losing what little contextual information is contained in the behaviours that were attempted to be classified, which, arguably, is key to understanding the functions of such behaviours.

Conclusions

Head-mounted accelerometers were better able to identify rare behaviours using random forest models when sampling at a higher frequency than accelerometers sampling at a lower frequency. Accelerometers placed on the centre of gravity appear to show promise in extracting a number of key behavioural states during lactation and would likely benefit from a higher sampling rate than tested here. Grey seals often remain inactive for long periods of time during lactation to conserve resources. Most of the movement is therefore limited to head movement or postural changes for nursing. While we achieved a coarse level of behavioural resolution, it might be recommended to place accelerometers on the neck of breeding grey seals to access the greatest changes in position and postural dynamics, if additional sensor data are not possible. States identified using torso mounted accelerometers may be more important in quantifying differences in energetic expenditure. Improved accuracy could be achieved by attempting to classify fewer behaviours that are defined exclusively by their mechanics, but at potential loss of contextual and social information. It has also been shown that individuals may vary in the execution of behaviours in a wild context, supporting previous work that has flagged discrepancies within training data sets. Future work should consider this when training a classification algorithm using only a handful of animals as this may lead to poor detection in subsequent deployments. It is our hope that the results presented here may inform work in other species for classifying behaviour during lactation in other phocid seals.

Methods

Study animals and accelerometer deployments

This study focused on lactating adult female grey seals on the Isle of May in Scotland (56.1°N, 2.55°W), located in the outer Firth of Forth and managed by Scottish Natural Heritage as a National Nature Reserve. Adult female grey seals typically begin to arrive on the island in early October to pup and mate, with peak density around mid-November and slowly declining until mid-December [106]. Adult female grey seals were sampled both early and late in their approximate 18-day lactation period [39, 106, 107]. Accelerometer attachment took place at the initial sampling event, with removal at the final handling event. Fifty-three female grey seals were equipped with small data-logging accelerometers (AXY-Depth, TechnoSmart Europe, Italy) during the core of the lactation period (10.7 ± 2.7 days) for the 2015 and 2016 breeding seasons (n = 11 females recaptured in successive breeding seasons). All individuals during the 2015 and 2016 seasons were equipped with an accelerometer mounted on the head, while 10 individuals in the 2016 season were additionally equipped with an accelerometer on the torso, mounted roughly between the shoulder blades (Fig. 1). Tags were housed in custom-designed ballistic nylon pouches attached onto dry pelage using superglue (Loctite, formula 422; Fig. 1). Due to an unforeseen glitch in the firmware of the accelerometers, sampling rates differed between seasons (50 Hz in 2015; 25 Hz in 2016). This allowed us to capture a seal’s fastest movements that last between 0.5 and 1 s (e.g. head lunges associated with intraspecific interactions).

Derivation of accelerometry features

Acceleration signals were processed to derive 33 separate feature variables measured in all three axes of movement X, Y, and Z [54, 55]. Static acceleration (stX-Z), the gravitational component indicating position and posture in each axis of movement, was calculated using a moving average filter over a 3 s overlapping window, or 150 data points when sampled at 50 Hz (75 data points at 25 Hz; [17, 54,55,56]). Dynamic acceleration (dyXZ), the component due to movement and posture dynamics of an individual, was then calculated by subtracting the static component from the raw acceleration in each axis [23, 54, 55, 57]. Partial dynamic body acceleration (PBDAx-z) was calculated as the absolute value of dynamic acceleration in each axis [40, 58, 59]. Overall, dynamic body acceleration (ODBA) and vectorial dynamic body acceleration (VeDBA) were also calculated as,

$$\begin{aligned} {\text{ODBA}} & = \left| {{\text{dy}}X} \right| + \left| {{\text{dy}}Y} \right| + \left| {{\text{dy}}Z} \right| \\ {\text{VeDBA}} & = \sqrt {{\text{dy}}X^{2} + {\text{dy}}Y^{2} + {\text{dy}}Z^{2} } \\ \end{aligned}$$

We also included a smoothed vector of VeDBA (VeDBAs), derived as a 3-s running mean as with static acceleration [60, 61]. The ratio of VeDBA to PDBA was also included to add the relative contribution of each axis of PBDA to the vector of movement [25]. The change in acceleration over time, the third derivative of position commonly referred to as jerk, was derived by taking the differential of each axis of acceleration. We also calculated the norm of jerk by taking the square root of the sum of the squared differential of acceleration in each dimension,

$${\text{norm}}\,{\text{jerk}} = f_{\text{s}} * \sqrt {\sum {\text{diff}}\left( A \right)^{2} }$$

where fs is the sampling frequency in Hz and A is each axis of acceleration as outlined in [18]. Pitch and Roll in radians were derived by taking the arcsine of static acceleration in the heave (dorso-ventral movement) and sway (lateral movement) axes, respectively [54]. Once derived, these attributes were summarized by their mean over a 1-s window in order to match video observation resolution.

To characterize oscillations in dynamic body movement, elements of power spectral density and frequency were also calculated for each second of acceleration data using Fourier analysis using methodology laid out in [25]. A fast Fourier transform decomposes an acceleration signal and translates it from a time domain signal to a stationary frequency domain signal whereby elements of frequency and power (amplitude) can be extracted [62]. Traditional Fourier analysis assumes that the signal continues indefinitely. Therefore, to avoid potential issues of spectral leakage and to sample enough of a data window to capture, cyclical behaviours like Locomotion, spectral elements were calculated over a window spanning 1 s on either side of the current time point [62]. In order to summarize these windows, the first two maximum power spectral density peaks (PSD) were extracted along with their associated frequencies (Freq) in each axis of movement [25]. A summary list of feature variables can be found in Table 4.

Time-matching behaviours and training data sets

Over the deployment period, each individual was sampled for behaviour using a focal sampling approach for at least 3 dedicated sessions during daylight hours [1]. Videos were recorded using a digital high definition video recorder (Panasonic HC-V700 1920 × 1080 resolution with 46 × zoom; Panasonic Corp.) on a tripod from at least 50 m away. Video footage for all individuals and years were decoded in real-time by the lead author (CRS) according to the ethogram of behavioural states as listed in Table 1 at a resolution of 1 s. Approximately 10% of the video footage was re-watched to check consistency in behavioural decoding, resulting in average difference in cumulative time spent in each behaviour of about 5 s per video (approximately 0.07 ± 1.8% difference in the resulting activity budget), with moderate agreement (Cohen’s kappa = 0.57). Concurrent sections of summarized attributes of acceleration data were extracted and time-matched to the 8 behavioural states to create a set of training data for each year and tag attachment type. Labelled data for 2015 head-mounted accelerometers totalled 45.7 h (nind = 29 individuals), while 2016 head- and torso-mounted accelerometers totalled 91.3 (nind = 24) and 65.7 h (nind = 10), respectively, averaging 7.36 ± 15.5 h of video footage for each behaviour across all years. The mean proportion of time spent in each behaviour from video footage (± standard deviation) is included in Table 1 for all study females.

Random forests

The random forest algorithm is a fairly recent development and extension of classification and regression trees [53]. Classification trees are typically built by assembling binary partitions along increasingly homogenous regions with respect to the desired classification [108]. These homogeneous splits, referred to as nodes, are continuously subdivided until there is no longer a decrease in the Gini impurity index, G (or in this case, it will approach zero as a single behaviour is included in the node):

$$G = \mathop \sum \limits_{i = 1}^{n} p_{i} \left( {1 - p_{i} } \right)$$

where n is the number of behavioural classes and pi is the proportion of each class in a set of observations. Random forest fits many of these classification trees to a data set, combining predictions from all trees to classify new data [25, 53, 108]. First, a training data set is sampled randomly with replacement, resulting in several bootstrapped samples. With each of these simulated data sets, the model grows one tree to classify the observations into different classes, or behaviours, by hierarchical decision-making down each node [53, 108]. This algorithm utilizes bootstrapped samples from the original data set to grow each individual tree, using a random selection of predictor variables, or in this case accelerometry features, to partition the data. Out-of-bag observations, those observations not included in each bootstrapped sample, are then used to calculate model accuracies and error rates and then averaged across all trees. Random forests offer a great number of iterations, in the form of number of trees grown, and several layers of randomness in order to build a robust and powerful tool for classification of new data, while limiting overfitting and problems associated with unbalanced data sets, as we might find in a seal’s activity budget where rest often dominates the activity budget (e.g. [25, 38]). Random forests also have the advantage of allowing for the assessment of variable importance by way of subtracting the parent variable Gini index value relative to the next two subsequent Gini index values for each feature variable. For this machine learning algorithm, the data were split into a 60/40% training and testing sets and grew 500 trees using the ‘randomForest’ package in R [109].

Classification and assessment of random forests

To compare model performance in each of the machine learning algorithms used in this study, Precision, Recall, and the F1 statistic were calculated from the resulting confusion matrices as produced from each of the cross-validations used with the testing data sets. Following cross-validation, resulting values of true positives (correctly classified positive values, TP), false positives (incorrectly classified positive values, FP), and false negatives (incorrectly classified values that were negative, FN) for each behavioural category were used to calculate Precision, Recall, and F1 [110]. Precision, also referred to as the true positive accuracy, was defined as the proportion of positive behavioural classifications that were correct [57], and was calculated as;

$${\text{Precision}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}}$$

Recall, also known as sensitivity, was defined as the proportion of new data pertaining to behaviours that were correctly classified as positive [57] and was calculated as;

$${\text{Recall}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}}$$

The F1 statistic represents the harmonic mean of Precision and Recall and was used as a metric for overall performance of each behavioural classification category as it computes the harmonic mean of both performance metrics [110]. F1 was calculated as;

$$F1 = \frac{2}{{\frac{1}{\text{Precision}} + \frac{1}{\text{Recall}}}}$$

Values closer to 1 for all metrics stated above represent better model performance. Model creation and validation were performed separately for the 2015 and 2016 season, as well as separately for head-mounted and torso-mounted accelerometers (2016 only), resulting in 3 separate random forest models. Variable importance plots for the random forest models were also examined.

Mechanics of behaviour

The repeatability of the mechanics of behaviour with respect to features that were found to be most important in random forest model building was also assessed across seasons for repeat capture females (nind= 11), something that is rarely available in non-captive individuals. Due to an unforeseen malfunction in the firmware of the accelerometers, loggers had to sample at a lower rate in 2016 as previously mentioned. To achieve equivalent sampling rates between seasons, the 2015 accelerometry data were down-sampled by half when compared to the 2016 accelerometry data. Generalized linear mixed effects models were built to predict top feature variables that were deemed most relevant for each behaviour. Individual ID and year were included as random effects in the model. To account for the potential changes in cost-of-transport between years, individual estimated post-partum masses were added as a fixed effect R (package ‘nlme’; [111]). Variance and repeatability estimates associated with individual ID and year were calculated using the ‘rptR’ package [63], calculated over 1000 bootstrapped samples. As a result of the inclusion of a fixed effect in this model, all repeatability measures are adjusted-R (adj.-R) as per [63]. Significance of repeatability was assessed through the use of a likelihood ratio test to compare to a model without the random effect within the package.