Introduction

Autism spectrum disorder (ASD) is a lifelong neurodevelopmental disorder characterized by difficulties in social interaction, communication, and repetitive behaviours [1]. It is commonly described as a childhood disorder, as symptoms often first become apparent during early development. Its symptoms are diverse and behaviours associated with ASD vary in expression among those with the disorder, which can often make diagnosis difficult [2]. Recently, diagnostic categories for variants of autism, e.g. pervasive developmental disorder not otherwise specified (PDD-NOS) and Asperger’s syndrome (AS), have been reclassified as ASD in the DSM-5 [1], (see [3,4,5] for re-diagnosis review). ASD has been recognised as a more meaningful and sufficiently inclusive construct to describe the variations in behaviour, functioning, and presentation that commonly characterise the phenotype [6,7,8]. However, there is evidence that individuals with these variant diagnoses differ significantly from more severe ASD in intelligence [9,10,11], verbal ability [12], and overall cognitive profile [13, 14], including discreet differences in brain morphology [15]. The diagnostic definitions used in assessment have changed despite evidence of differences in symptomology between PDD-NOS/AS and ASD [16] with these differences in behavioural trait severity evident in early life and showing trait stability [17].

During the first years of a child’s life, parents and caretakers are most likely to be the first to observe, evaluate and interpret a child’s behaviour. They are also often the first to seek professional advice when a child’s behaviour seems ‘odd’ or ‘unusual’ [18, 19]. While developmental variation in the population is to be expected, a cluster of behaviours or missed milestones (‘red flags’) can signal the presence of a potential underlying problem and may be indicative of disordered behaviour [20]. Parental ‘prediction’ of ASD diagnosis via problematic behaviours has been shown to be reliable, both in prospective and longitudinal studies [21, 22] and in studies utilising retrospective population data [23, 24].

It has been proposed, therefore, that assessment should take place as soon as developmental issues become apparent [25, 26]. The average ASD diagnosis age in the US is 4 years [27] though diagnosis can be made as early as 2 years [28] with predictive diagnostic tools existing for those as young as 12 months [29]. Early intervention strategies have been shown to be advantageous in capitalising on toddlers’ developmental plasticity [30] and have shown significant benefits in adaptive behaviour, language, and overall functioning [31,32,33].

While critics of early diagnosis cite ‘normal’ slow development and ASD false-positive diagnoses [34] the benefits of early diagnosis and intervention have been strongly advocated. Initial diagnoses are routinely revisited and show early behavioural symptoms to be consistent with ASD outcomes [35,36,37]. Early intervention has been shown to be beneficial for the development of cognitive, social, and communication skills [38] and because further refinement of these skills comes with growth and experience, an early ASD diagnosis does not always predict diagnostic outcomes or level of deficit in later childhood [36]. A favourable or ‘optimal outcome’ (OO) of improvement in functioning to the point the child loses the ASD diagnosis is not a recent concept [39]. Recent research has highlighted that OO children still show deficits in social relationships and some developmental issues that affect social functioning [40, 41] but can function normally with typically developing children. Higher initial functioning in combination with early intervention seems to be the strongest predictor of OO [42,43,44] and while trait expression and behaviour improve out of the clinical range, physical brain activity is closer to an ASD population [45].

The current study, therefore, utilised retrospective data with the aim to explore the variation in parent reports of child ‘red flag’ traits, before and after age 3, in the general population, to test (1) whether a spectrum of presentation in behavioural (‘red flag’) traits could be identified in both early and later childhood while displaying the variance of previous autism diagnostic categories, (2) whether severity in ‘red flag’ traits ≥ 3 years would be associated with established ASD risk correlates and (3) whether severity in ‘red flag’ traits ≤ 3 years of age would meaningfully correlate with severity in later childhood or an ‘optimal outcome’. It was predicted that distinct profiles of ‘red flag’ traits would emerge for both time periods and that these profiles would reflect variation in ASD risk mirroring the PDD-NOS/AS/HFA variant diagnoses. It was also predicted that parental reporting of ‘red flag’ traits would be a reliable indicator of ASD risk and that this would be demonstrated via dose–response associations with established ASD risk correlates. Finally, given that extant evidence supports the ‘parental concern’ model, with concerns raised in specific developmental domains between the ages of 1 and 3 years correlating with ASD diagnosis and later diagnostic outcomes [46,47,48,49,50] it was hypothesised that the most severe ‘red flag’ trait profiles, retrospectively reported by parents, at ≤ 3 years of age, would be associated with the severity of ‘red flag’ trait reporting in later childhood in keeping with previous ‘optimal outcome’ findings.

Method

Sample

Data for this study were sourced from the national survey, Mental Health of Children and Young People in Great Britain (GB), 2004 [51], collected by the Office of National Statistics (ONS; all analyses were performed in accordance with ONS ethical and data handling regulations); the survey was a thorough census of medical, emotional, and social health of young people in GB. A sample was obtained from the Department for Work and Pensions’ Child Benefit Centre (CBC) based on Child Benefit records and from this, a sample of postal areas in England, Scotland, and Wales, and ultimately a random sample of addresses was selected, excluding any household where the CBC had an ‘action’ open such as child death or CBC involvement, describing these cases as ‘sensitive’. This census was multiphasic, conducted in 1999 and 2004. As the chosen measure did not appear in the 1999 phase, this study utilises only the 2004 data.

The resultant sample of 12,294 households was contacted via post by the CBC with survey details/opt-out instructions with 1085 (9%) opting out, 631 (5%) having moved, 82 (1%) being found ineligible, and 1798 (15%) not approached for an interview. Of the 10,496 approached for an interview, 2183 (21%) refused and 313 (3%) could not be contacted, leaving a sample of 7977 households where an ONS representative conducted interviews with a parent. The final interview sample (N = 7977) was 52% male (4111) with a mean age of 10.54 (SD 3.40) and range of 4–17.

Measures

A general development questionnaire was created by the ONS for this survey, containing a parent/caregiver directed autism sub-questionnaire of 43 items. Led by five binary-answer anchor questions pertaining to the parent/caregiver’s child before the age of 3, each concerned an area of potential autistic behaviour:

  1. 1.

    Was there anything that seriously worried you or anyone else about the way his/her speech developed?

  2. 2.

    Was there anything that seriously worried you or anyone else about how s/he got on with other people?

  3. 3.

    Was there anything that seriously worried you or anyone else about the way his/her pretend or make-believe play developed?

  4. 4.

    Was there anything that seriously worried you or anyone else about any odd rituals or unusual habits that were very hard to interrupt?

  5. 5.

    Was there anything that seriously worried you or anyone else about his/her ability to learn and do new things—such as puzzles or helping get dressed?

Following was a binary-answer gatekeeper question ‘Have the things that seriously worried you or someone else now cleared up completely?’ A positive answer ended the autism section while a negative answer led to the remaining autism questions. Ten questions were selected from the body of this questionnaire; two questions to approximate each of the five behaviours detailed in the anchor questions (see “Appendix 1”) acting as broad early ‘red flag’ behavioural markers [37].

Additional data

The Mental Health of Children and Young People in Great Britain, 2004 [51] survey also involved a full health questionnaire, including ICD-10 criteria of mental and physical health issues. Four known comorbid conditions for autism were chosen as covariates for analysis; epilepsy, learning difficulties, poor coordination, and any anxiety disorder. These were scored as binary variables indicating either presence or absence of each condition. In addition, an ICD-10 diagnosis of autism spectrum disorder was also used as a validator in support of the latent factor typified by ‘red flag’ behaviours being autistic behavioural traits.

Analytic strategy

LTA is a longitudinal modelling technique used to examine whether individuals transition between latent classes over time. LTA consists of two components; a measurement model and an autoregressive model [52, 53]. In LTA, the measurement model (i.e. LCA) describes the structure of the latent classes at the various time points. The autoregressive model (i.e. Markov model) examines individual-level transitions between these classes over time [52, 53]. For a much more detailed description of LTA and its applications in social and behavioural sciences see Nylund [54]. LTA was conducted in the following steps.

Step 1: determine the best measurement model

To determine the best measurement model, a series of LCAs were specified and tested separately at the two time points using available binary red flag ASD trait variables as observed indicators. The first LCA was used to determine the number and qualities of sub-types of autistic trait variation (red flags) based on endorsement of each of the five behavioural anchor questions (≤ 3 years of age) from the general development questionnaire devised by the ONS. These five items were binary and treated as categorical. Three latent class models were tested (a 2-through a four-class latent class model). The second LCA was used to determine the number and qualities of sub-types of autistic trait variation based on endorsement of each of the ten behavioural questions from the body of the questionnaire relating to red flag traits ≥ 3 years of age. These items were treated as categorical. Three latent class models were tested (a 2 through to a four-class latent class model).

Models were compared using a range of common fit statistics. The Akaike information criterion (AIC) [55], the Bayesian information criterion (BIC) [56], and the sample size-adjusted Bayesian Information Criterion (ssaBIC) [57] were used to compare model fit, with lower values indicative of better fit. The Lo–Mendel–Rubin likelihood ratio test (LMR-LRT) is used to compare a solution with k number of classes with a solution with k − 1 classes [58]. A non-significant p value indicates that the model with k − 1 classes provides a better fit [58]. Model fit was also assessed using the entropy criterion [59]. This statistic determines how accurately individuals were assigned to their classes based on the posterior probabilities [59]. Entropy values range from 0 to 1, with higher values reflecting more accurate classification [59]. To ensure that the models converged on global rather than local solutions, 100 random sets of starting values and 50 final stage optimizations were used.

It remains debated whether the LMR-LRT or BIC is more useful when it comes to determining the optimal number of classes in an LCA [54]. A number of simulation studies suggest that the BIC is highly effective at identifying the correct underlying class structure, while the LMR-LRT can occasionally extract too many classes when the sample size is large (N > 1000) [54, 60]. Given that the sample size was relatively large in the present study, the BIC was considered a more reliable indicator of the optimal class solution.

Step 2: validate classes using ASD diagnosis and clinical correlates

Multinomial logistic regression was used to test whether the red flag trait classes/typologies at time one were meaningful in relation to ASD. Associations between class membership and an ASD diagnosis variable and four common clinical correlates of ASD was conducted.

Step 3: specify latent transition model

In this step, the LTA model is specified, producing a matrix of latent transition probabilities. This model affords the opportunity to classify individuals as ‘movers’ (i.e. those who transition from one class to a different class over time) or ‘stayers’ (i.e. those who remain in the same class across time) [52, 53]. Mover–stayer models more accurately describe transitions between classes, as transition probabilities are estimated for ‘movers’ only [52, 53].

All analyses included the first-stage sampling were made weighted variables to account for non-equal probabilities of selection. This standardising technique adds an additional ‘weight’ to under-represented sub-populations that may not be accurately represented due to missing data and was used rather than excluding cases listwise. Analyses were conducted using Mplus 4 [61].

Results

Table 1 (section a) shows the fit indices for the first latent class analysis (≤ 3 years of age). The three-class solution was the model of best fit; the likelihood ratio Chi square was non-significant, the AIC was lower for the three-class solution than for the two-class solution, and the Lo–Mendell–Rubin’s LRT showed that the four-class solution was not significantly better than the three-class solution. The entropy value (0.83) also showed a meaningful classification of cases.

Table 1 Fit indices for the latent class analyses

The three-class model, shown in Fig. 1, revealed a ‘High endorsement class’ comprised of 1.9% of the population where the probability of ‘red flag’ trait endorsement was > 70% for all five traits. A larger ‘Moderate endorsement class’ also emerged, representing 10.8% of the population and was characterised by moderate endorsement probabilities of language, social, and developmental problem behaviours. A large ‘Low endorsement class’ (baseline class) comprised of 87.3% of the population was characterised by extremely low endorsement probabilities (< 10%) across all ‘red flag’ traits.

Fig. 1
figure 1

Endorsement probability plot for autism anchor questions

A second LCA was carried out to identify distinct groups characterised by red flag traits after the age of 3 using 10 questions from the body of the questionnaire (see “Appendix 1”); two exemplifying each of the five behaviours described by the anchor questions. Another three-class solution emerged. Table 1 (section B) shows the fit indices for the second latent class analysis. The three-class solution was the model of best fit; the AIC was lower for the three-class solution than for the two-class and the Lo–Mendell–Rubin’s LRT showed that the four-class solution was not significantly better than the three-class solution. The entropy value (0.72) showed a meaningful classification of cases. Probability estimates for class membership in both LCAs were robust.

The three-class model for the ‘≥ 3 years of age’ data, shown in Fig. 2, revealed a small moderate presentation class; 16.7% of the sample characterised by varied endorsement of the five ‘red flag’ behaviour categories. A larger high presentation class emerged, where 31.5% of the sample were characterised by high endorsement probabilities of most items. A larger low presentation class was also evident, 51.8% of the sample which was characterised mainly by moderate endorsement probabilities relating to pretend/play and ritual/habit traits, with a higher endorsement of language issues.

Fig. 2
figure 2

Endorsement probability plot for > 3 years of age ASD items

A multinomial logistic regression was conducted to test whether severity in ‘red flag’ traits ≤ 3 years would be associated with established ASD risk correlates (see Table 2). The ASD risk variables showed significant odds ratios across the high and moderate classes when compared to the low class. Moderate presentation was described by highly significant odds ratios of epilepsy, learning difficulties, poor coordination, and anxiety and by significant odds ratios of ASD diagnosis. High presentation showed very highly significant odds ratios of epilepsy, learning difficulties, poor coordination, and ASD diagnosis and highly significant odds ratios of anxiety.

Table 2 ASD risk for high, moderate, and low presentations at ≤ 3 years of age

A latent transition analysis (LTA) was used to identify transitions in class membership from early to later childhood (see Tables 3, 4).

Table 3 Average latent class probabilities for most likely latent class membership
Table 4 Latent transition probabilities from ≤ 3 years of age to > 3 years of age

Members of the ‘≤ age 3’ high class were more likely to remain in the high class ≥ age 3 (82.3%) than transition to the moderate (15.1%) or low (2.6%) classes. The moderate class showed a mixed effect, with 22.6% remaining moderate, 33.1% transitioning ‘up’ to high and 44.2% transitioning ‘down’ to low. The low class were more likely to remain low (74.2%) than to transition into either the moderate (14.5%) or high (11.3%) classes. When examined in the context of a move-or-stay model (see Table 5), the majority of the overall sample (89.9%) fell into the stayer category; beginning and remaining low. Movers showed a ‘downward’ transitional trend, with 72.7% of this sample category transitioning from high to moderate or moderate to low.

Table 5 Count and relative percent of mover–stayer patterns

Discussion

The first analysis revealed three profiles characterised by behaviours that would strongly reflect ‘red flag’ risk at an age where initial assessment is common. The scale questions narrowed in focus, helping to present a better picture of these behaviours, for example, going from ‘speech development’ to correct application of speech to the situation and appropriate phrase use. The differences in functioning between the classes may have demonstrated varied expression of ASD in accordance with older diagnostic categories, however, it is important to note that they were only approximations of ASD ‘red flags’. Variation in the latent class profiles after the age of 3, however, showed levels of expression in specific traits that were consistent with variation in ASD presentation [62]. The Low Presentation profile, largest at 51.8% of the subsample, was described by in/formal language use problems and some repetitive play and obsession behaviours but low instances of the other traits. These issues were enough for parents to report when asked, but may not have constituted an actual clinical threshold for ASD, as such traits vary naturally in the population [8]. The line between ‘personality quirk’ and ‘symptom’ is often quite thin [63], especially for the atypical functioning of individuals formerly categorised under PDD-NOS [16]. The Low Presentation profile represents a group that varies from a baseline population (‘no ongoing concerns’ or minimal trait expression) to just over the cusp of an older clinical designation. The moderate presentation profile, smallest at 16.7%, was typified by all-cause developmental problems, repetitive play, and in/formal language use problems in keeping with the presentation of AS. This profile also showed a 100% endorsement of eye contact issues, one of the hallmarks of ASD [64, 65], though it was low in other domains. The high presentation profile, 31.5% of the subsample, showed overall higher endorsements of more behaviours at levels which could be interpreted as past the clinical threshold for disorder and fitting the perception of ‘classical’ autism.

The LTA exposed a rough estimation of ‘red flag’ behaviour before the age of 3 and the second analysis focused on a more nuanced picture of these behaviours after the age of 3. A quasi-latent transitional analysis between those two ‘time points’ described transition between the latent classes, representing changes in symptom severity and occurrence over time. Those with membership in the ‘≤ age 3’ high class had the highest probability of remaining high ≥ age 3 while the greatest percentage of the sample population were those classified as low ≤ age 3 who remained so after. Those classified as moderate showed the greatest variance of transition ≥ age 3 as would be expected from this class, having the lowest overall classification probabilities in both LCAs. These results showed a clear relationship between initial ‘red flag’ behaviour severity (ASD risk) and behaviour persistence. Wolff et al. [66] observed this effect in the persistence of repetitive behaviour in toddlers over time, with the most severe behaviour persisting in an ASD sample. It is important to note, however that possible interventions, family/school socialisation, and general development [36] may have accounted for the downward trend in symptom severity over time but detailed intervention/treatment data was not available (see limitations below).

In seeking external validation for the hypothesis of the latent factor being ASD risk, the resultant high and moderate profiles were examined against the low category as a baseline in terms of four known ASD comorbidities (learning difficulties, poor coordination, any anxiety disorder, and epilepsy) and instances of ICD-10 ASD diagnosis in the sample. Odds ratios were highly significant as empirical predictors of ASD risk and confirmation that it was the latent factor. ASD diagnosis odds were 954 times higher than baseline for the high class and 20.73 times higher for the moderate class, indicating that ‘red flag’ behaviour traits are a predictor of ASD risk. Higher odds ratios were found in the High class than the moderate class for all items, further confirming a difference in severity in accordance with the hypothesis of multiple levels of severity and functioning aligning with older variant diagnoses. Epilepsy rates are important to note here as the relationship between ASD and epilepsy is well established with research indicating a potential genetic relationship [67,68,69,70].

These results support the hypothesis of parental perception of ‘red flag’ behaviours before the age of 3 being predictive of ASD risk and severity. Analysis revealed distinct behavioural profiles with variance in expression of ASD conducive with variant diagnostic categories, validated by the presence of known comorbidities and gender rates. The data from this survey are from 2004 and the sample’s mean age was 10.51 (SD 3.39), indicating that most children from the sample were below the age of 3 in the mid to late 1990s. Research into ASD as a developmental disorder increased during the 1970s–1980s, challenging the notion of autism resulting from ‘refrigerator mothers’ [71]. The 1990s saw a wider awareness of variation in ASD and the concept of a spectrum of behaviours/diagnoses but the true tipping point of media awareness (and perhaps over-awareness) would not come until the early 2000’s [72]. The survey asked about ‘red flag’ behaviours worrying parents at that time, meaning before widespread ASD awareness and media coverage of the ‘autism epidemic’, removing an element of potential contamination in parental perception.

There is a solid foundation of research supporting the stability of ASD behaviours in early diagnosis ≤ age 3 to later reassessment ≥ age 4 in both general population [73, 74] and high risk samples [75, 76]. That stability does not contradict the OO model of improvement through early intervention programs and socialisation based on severity of initial symptomology. The trend for functioning-based improvement [40, 42] was replicated in this study as severe individuals tending to stay severe and moderate-to-low severity individuals tended to improve.

The evidence presented should be evaluated with the study’s limitations in mind.

The ASD section of the general development questionnaire designed by the ONS was in nonclinical language, was not a measure intended for clinical diagnosis, and was not a pre-existing, tested psychometric. The latent classes in the first LCA had a large catchment due to the general ‘red flag’ ASD terminology used; intended for parents rather than clinicians. The wording, “Was there anything that seriously worried you or anyone else about the way his/her speech developed?” failed to differentiate between language use and speech disorder. “Odd rituals or unusual habits that were very hard to interrupt” could have described the repetitive behaviours of ASD but could also have described the ritualistic learning behaviour common in toddlers [77, 78]. Issues of ‘learning development’ could have referred to any developmental delay or learning disability; even the traditional ‘late bloomer’ who goes on to develop with no impairment [79, 80]. While this may have seemed like a limitation to the validity of the study, the non-clinical language was actually considered a strength of the analysis, in that it described issues in a way that might be considered consistent with parental disclosures of concern to GPs/paediatricians. No health professionals were involved in the interviews and questionnaire responses were only parental perception/interpretation of the child’s behaviour. No data were collected concerning any treatment the children may have received for their issues after the age of 3, any effects that treatment may have had, or the presence of other developmental disorders. In addition, this parental-report data was susceptible to hindsight bias as no prospective screening was performed to isolate an ASD risk sample. Hindsight bias appears in diagnosis amongst clinicians trained to use diagnostic criteria and conduct assessments [81, 82] so its effect cannot be understated in non-clinical individuals. As there was little diagnostic outcome information available for the sample, the model of change in ASD behaviours over time could only be described as a trend model; further studies would benefit from a longer longitudinal design. In addition, the LTA performed was only a quasi-LTA, as the data used for each time period differed, and the items in the second model attempted only to approximate those in the first.

Parents’ perception of early ‘red flag’ behaviours certainly seem to act as valid predictors of future ASD risk while reinforcing the variation in expression of ASD that had been previously described by variant diagnosis. While many missed milestones or developmental delays can often clear up in time, some behavioural symptomology may persist and signify developmental disorder. Meaningful transition from disorder to a typical development population is the hoped-for OO model depending on the initial level of functioning. Further studies of large general population data with a longitudinal design, featuring clinical measures and clinician involvement could help create a clearer picture of ASD variation and transition in the population. Such parental report data would be valuable assets in the development and testing of future diagnostic tools.