Twin studies are a key tool in quantitative behavioral genetic research. They rely on the fact that monozygotic and dizygotic twins share genes and environments to varying degrees. Models utilizing data from twins range from simple univariate genetic models to complex multivariate models dealing with cross-sectional as well as longitudinal information. More recent extensions of twin studies incorporate data from twins and their families.
Twin studies are considered a key tool in quantitative behavioral genetic research methods, in which genetically informative data (i.e., data from sibling pairs, families, and pedigrees) are used to infer on genetic and environmental causes of individual differences in one or more measured variables (i.e., phenotypes). Galton’s pioneering use of twins to study inheritance (Galton 1869) marks the beginning of the systematic investigation into the sources of individual psychological differences. The development of the twin method is usually also ascribed to Galton (1876) although it is uncertain whether Galton was aware of the distinction between monozygotic and dizygotic twins (see Boomsma et al. 2002).
Twin studies rely on the fact that monozygotic (identical; MZ) and dizygotic (fraternal; DZ) twins systematically vary in the extent to which they share genes and environmental influences. While MZ twins are assumed to share 100% of their genes, DZ twins share on average 50% of their segregating genes, the same percentage as non-twin siblings (Plomin et al. 2013). Twins also share many aspects of their environment, e.g., uterine environment, parental behavior, socioeconomic status, culture, or neighborhood.
Twin studies compare the resemblance of MZ and DZ twins in a given trait to explore the effects of genetic and environmental variance on a phenotype by leveraging on the known differences in genetic similarity between MZs and DZs (Neale and Maes 2004). Assuming the validity of the equal environment assumption (i.e., the assumption that the environments of MZs and DZs do not differ in any systematic way that would affect the trait under study; Plomin et al. 2013), greater phenotypic similarity between MZ twins compared to DZ twins is taken as evidence of the importance of genetic effects for the trait under consideration. Effectively, quantitative behavior genetic research has demonstrated that all human behavioral traits are heritable, that the effect of common environmental influences (i.e., the effect of being raised in the same family) is smaller than the effect of genes, and that a considerable portion of the variation in complex human behavioral traits is accounted for by environmental effects that are not shared by members of a family. These findings, termed the three laws of behavior genetics by Turkheimer (2000), marked the end of the so-called nature-nurture debate and led to a shift in behavior genetic research from investigating main effects of genes and environments to a more integrated understanding of the interplay of genes and environments.
However, rapidly developing novel molecular genetic methodologies and the availability of measured DNA newly stirred up the discussion as to whether the era of twin studies or, more generally, the classical quantitative behavioral genetic approach has come to an end (e.g., Charney 2012). In this regard, Turkheimer and Harden (2014) argue that molecular genetic methodologies themselves have a number of methodological difficulties and that there is no reason to “move on from one poorly understood method to the next, motivated not by the theoretical completion of the old paradigm but rather by the availability of new technology” (p. 160). Moreover, they make the case that figuring out how “heritable” traits are, had never been the core motivation of behavioral genetic studies of human complex traits. Heritability itself is defined as the proportion of the total phenotypic variation in a given trait accounted for by the total effect of the genotype (broad-sense heritability) or by the additive effect of the genotype (narrow-sense heritability). With that in mind, it becomes clear that heritability, as a standardized variance component, is not invariant across times and populations and, thus, is not a meaningful indicator of a causal effect from genotype to phenotype (Turkheimer and Harden 2014). The power of the twin design, even in times of molecular genetics, rather arises from the possibility to analyze associations between phenotypic traits while controlling for the heritability of these traits. That way, twin studies generate a significant but imperfect quasi-experimental control over nonexperimental phenotypic associations (Turkheimer and Harden 2014). Further, more recent developments in twin modeling extend the analytic focus to (step)parents, (half-)siblings, children, and partners of twins. By utilizing such extended twin family designs, researchers can investigate, e.g., the effect of assortative mating, the direct effects from parents on children, or the correlation between genes and environment. Finally, twin studies are also still important to investigate the interplay between genes and environments.
Two genetic effects are distinguished, namely, additive genetic effects (A), which encompass the sum of all allelic effects within and across genes, and nonadditive genetic effects, which represent effects of alleles (dominance, D) or loci (epistasis, I) that interact with other alleles or loci. Both nonadditive genetic effects are completely shared in MZ twins, while DZ twins, on average, share only 25% of the dominance and 0% of the epistatic effects. However, as epistatic effects are hard to detect in nonexperimental designs, they are typically not estimated in twin studies. With respect to environmental effects, shared environmental effects (C; common environment) comprise all environmental conditions and experiences that contribute to the resemblance of family members. In contrast, non-shared environmental effects (E) are unique to each family member and therefore contribute to their phenotypic dissimilarity. It is important to note that environmental influences are defined in terms of their effect. Even if twins are exposed to the same event (e.g., parental divorce), the impact of this event on each individual twin may be different and would therefore contribute to the non-shared rather than the shared environmental effect. The best-known design to infer on those genetic and environmental effects is the classical twin design (CTD), which is based on the analyses of reared-together MZ and DZ twins (Boomsma et al. 2002).
The Classical Twin Design
The validity of the CTD is depending on specific assumptions (for an exhaustive discussion of this topic, we refer the reader to the literature, e.g., Plomin et al. 2013), i.e., the accuracy of the equal environments assumption and the absence of assortative mating, gene-environment interactions, gene-environment correlations, and nonadditive genetic effects, which are confounded with C in the CTD (Neale and Maes 2004). In more complex designs, however, it becomes feasible to estimate those effects directly, and also to tease apart shared environmental and nonadditive genetic effects.
Beginning in the late 1970s, quantitative behavioral genetics transitioned to modeling genetic covariance structures using maximum likelihood methods (Martin and Eaves 1977). Within this structural equation modeling (SEM) approach, genetic and environmental effects are modeled as the contribution of unmeasured (latent) variables to the phenotypic differences between individuals (Boomsma et al. 2002; Franić et al. 2012; Neale and Maes 2004). This model-fitting approach has numerous benefits, such as the possibility to test for gender and age effects, to compute confidence intervals on parameters, or to explicitly compare models.
The simplest model is the univariate twin model which can easily be extended to multiple variables, measurement occasions, and/or groups. Although not the focus of the current chapter, it should be noted that it is also possible to model any measured environmental and (or) genetic information directly.
Univariate Twin Modeling
In general, all SEM software tools, such as Mplus (Muthén and Muthén 1998–2010), can be adapted to fit twin models. However, the OpenMx package (Neale et al. 2016) for R (R Core Team 2016) provides a flexible matrix syntax well suited for the model requirements of family data (Neale and colleagues 2003) provide an introduction into twin modeling based on matrix algebra).
Analyses based on the univariate twin model have contributed to our understanding of the causes of individual differences in a plethora of human traits, such as cognitive abilities, personality, and psychopathology (see Polderman et al. 2015). However, as Martin and Eaves (1977) already outlined 40 years ago, a powerful extension of the model lies in the possibility to analyze multivariate phenotypes. The following section is intended to provide a basic introduction into the principles of multivariate genetic modeling. The description covers the most commonly used models but is by no means exhaustive. For more advanced models and further reading of the formal quantification of such models, the reader is referred to the pertinent literature (e.g., Neale et al. 2003, 2016).
The first latent factors (genetic, shared, and non-shared environmental) load on all observed variables, the second on all except the first, and so on. It is important to note that the order of the observed variables is arbitrary in cross-sectional data and that the model merely allows for inferences about the genetic and environmental overlap between the variables. Only with genetically informative longitudinal data it can, e.g., be tested whether new genetic influences become important over time (by testing whether the influence of the second genetic factor on the second measurement significantly differs from zero) or whether genetic amplification is present, i.e., when the paths from the first genetic factor to the respective traits are equal (Posthuma 2009). For example, Klump et al. (2007) used trivariate genetic modeling to investigate the emergence of new genetic effects in disordered eating symptoms between the ages 11 and 18. They found that genetic factors accounted for a small proportion of variance at age 11 (6%), but that genes increased in importance at ages 14 and 18, explaining almost half of the variance in disordered eating. The authors conclude that their findings highlight the transition from early to mid-adolescence as a critical time for the emergence of a genetic diathesis for problematic eating behavior.
Further Applications of Twin Studies
Behavior genetic studies based on twin data have revealed a plethora of alternative parameterizations of multivariate or more complex designs. Some of these are described below.
Direction of Causation and Random Effects Models
Under certain conditions (see Heath et al. 1993; Gillespie and Martin 2005), cross-sectional twin data are also informative about the direction of causation between two traits, that is, whether trait A causes trait B or vice versa. For example, Gillespie et al. (2012) used this approach to examine the direction of causation between disrupted sleep, anxiety, and depression.
With recourse to phenotypic random effects models (also known as hierarchical linear models or mixed effects models; see Raudenbush and Bryk (2002) for an introduction), cross-sectional twin data can also be used to augment classic linear regression analyses. Rather than studying the genetic and environmental effects on the respective traits, the goal of such regression-based analyses of twin data is to estimate the part of the regression that is independent of the heritability of the traits (see Turkheimer and Harden (2014) for a detailed description of this approach).
Twin studies are also concerned with the question whether sex moderates the genetic and environmental effects on a trait; these models are known as sex-limitation models. The CTD provides the information to further investigate (1) the magnitude of genetic and environmental effects on male and female phenotypes (quantitative sex differences) and to (2) determine whether or not the same genetic factors or shared environmental experiences influence a trait in males and females (qualitative sex differences; Neale and Maes 2004). The second question, however, can only be studied when data from opposite-sex DZ twin pairs are available (Eaves et al. 1978). Various studies in different domains of genetic research, e.g., psychopathology, intelligence, personality, and well-being, indicate the absence of substantial sex-related differences. One exception is a study by Rettew et al. (2006) showing that different genes may be involved in the variation of neuroticism in males and females.
Thus far, all of the described models relied, among other things, on the assumption that genes operate in the same manner across different levels of environmental variables. However, it is understood that the exposure to a given environment can moderate the importance of genetic and/or environmental contributions for a given phenotype (Neale and Maes 2004). This gene-environment interaction (GxE) may have a biasing effect on parameter estimates derived in the CTD (Purcell 2002). The most widely used continuous moderator model (Purcell 2002) tests whether the genetic and environmental effects found within the CTD change as a linear function of the moderator after accounting for the main effect of the moderator on the outcome. For example, the heritability of cognitive ability appears to vary as a function of family socioeconomic status in US samples, with higher heritability at the socially valued end of the distribution (Tucker-Drob and Bates 2016). The classic GxE model has recently been extended by van der Sluis and colleagues to address some methodological drawbacks inherent in the modeling procedure of the continuous moderator model (van der Sluis et al. 2012).
When a trait is measured repeatedly for each twin in a pair, these data can be utilized to disentangle the genetic and environmental contributions to stability and change in a trait over time. Different methods have been developed for serially correlated longitudinal data. In the Cholesky factorization, the multiple trait measures are treated in a multivariate genetic analysis framework (Posthuma 2009). Markov chain (or simplex) models assume that future values of the trait solely depend on the current trait values, not on the entire past history (Dolan et al. 1991). Also, growth curve models can be applied to longitudinal twin data to investigate the role of genetic and environmental factors in growth and change (Neale and McArdle 2000).
A powerful extension of the CTD arises from the incorporation of data from other groups of family members. Nuclear models are based on data from parents and children, while extended family models (such as stealth and cascade) also rely on data from nonnuclear family members such as uncles or cousins (see Keller et al. 2009). The benefit of such models lies for one thing in the possibility to simultaneously estimate A, D, and E while decomposing C into multiple components, including an environment common to all family members and a twin-specific environment. It is also possible to estimate additional types of effects, such as gene-environment correlations (the effect that certain genotypes are selectively found in certain environments, represented by a correlation among genetic and environmental factors) and vertical cultural transmission (the effects of parents on offspring due to environmental factors). Further, the genetic effects can be adjusted for potential assortative mating among parents.
Twin studies remain a powerful tool in quantitative behavior genetic research. New modeling techniques and flexible SEM procedures can be extended to incorporate more complex effects in order to deal with some of the limitations of the CTD that arise from the underlying assumptions or power-related issues inherent in the CTD (e.g., Visscher 2004). However, it is not possible to include all these modifications at the same time, and researchers are advised to select the model to be fitted based on a net of a priori-derived hypotheses.
- Falconer, D. S., & Mackay, T. F. C. (1996). Introduction to quantitative genetics (4th ed.). Harlow: Pearson.Google Scholar
- Franić, S., Dolan, C. V., & Boomsma, D. I. (2012). Structural equation modeling in genetics. In R. H. Hoyle (Ed.), Handbook of structural equation modeling in genetics (pp. 341–358). New York: The Guilford Press.Google Scholar
- Gillespie, N. A., & Martin, N. G. (2005). Direction of causation models. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science. Chichester: Wiley.Google Scholar
- Gillespie, N. A., Gehrman, P., Byrne, E. M., Kendler, K. S., Heath, A. C., & Martin, N. G. (2012). Modeling the direction of causation between cross-sectional measures of disrupted sleep, anxiety and depression in a sample of male and female Australian twins. Journal of Sleep Research, 21, 675–683. https://doi.org/10.1111/j.1365-2869.2012.01026.x.CrossRefPubMedPubMedCentralGoogle Scholar
- Keller, M. C., Medland, S. E., Duncan, L. E., Hatemi, P. K., Neale, M. C., Maes, H. M., & Eaves, L. J. (2009). Modeling extended twin family data I: Description of the cascade model. Twin Research and Human Genetics, 12, 8–18. https://doi.org/10.1375/twin.12.1.8.CrossRefPubMedPubMedCentralGoogle Scholar
- Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén.Google Scholar
- Neale, M. C., & Maes, H. H. (2004). Methodology for genetic studies of twins and families. Dordrecht: Kluwer.Google Scholar
- Neale, M. C., Boker, S. M., Xie, G., & Maes, H. H. (2003). Mx: Statistical modeling. Richmond: Department of Psychiatry. Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University.Google Scholar
- Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kirkpatrick, R. M., … Boker, S. M. (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535–549. https://doi.org/10.1007/s11336-014-9435-8.
- Plomin, R., DeFries, J. C., Knopik, V. S., & Neiderheiser, J. (2013). Behavioral genetics. New York: Palgrave Macmillan.Google Scholar
- Polderman, T. J. C., Benyamin, B., de Leeuw, C. A., Sullivan, P. F., van Bochoven, A., Visscher, P. M., & Posthuma, D. (2015). Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics, 47, 702–709. https://doi.org/10.1038/ng.3285.CrossRefPubMedGoogle Scholar
- R Core Team. (2016). R: A language and environment for statistical computing [R Foundation for Statistical Computing]. Vienna. Retrieved from https://www.R-project.org/.
- Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks: SAGE.Google Scholar
- Rettew, D. C., Vink, J. M., Willemsen, G., Doyle, A., Hudziak, J. J., & Boomsma, D. I. (2006). The genetic architecture of neuroticism in 3301 Dutch adolescent twins as a function of age and sex: A study from the Dutch twin register. Twin Research and Human Genetics, 9, 24–29. https://doi.org/10.1375/twin.9.1.24.CrossRefPubMedPubMedCentralGoogle Scholar
- Turkheimer, E., & Harden, K. P. (2014). Behavior genetic research methods. In H. Reis & C. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 159–187). Cambridge: Cambridge University Press.Google Scholar