Myosin identification and classification
The myosin protein family is known to be particularly diverse and to be characterised by multiple and independent gene losses that happened throughout eukaryotic evolution. Thus, it is not possible to choose representative species for obtaining a comprehensive picture of myosin evolution. Instead, we performed deep sequence and taxonomic sampling covering both as many major lineages and related species as possible, resulting in a final dataset of 7852 myosins from 929 species (Fig. 1, Additional file 1: Text, Additional file 1: Figures S1-S3). In order to minimize the effects of missing or wrong sequence data on phylogenetic tree computations we extensively manually corrected automatic gene predictions. Every effort has been made to not only correctly predict and reconstruct myosin motor domains but also to improve tail domain sequences to get the best representation of myosin domain architectures. However, tail domains are usually less conserved than motor domains and some myosins with unique tail architecture might still contain incorrect sequence (Additional file 1: Figures S3-S5).
Initial myosin classification was based on motor domain phylogeny. The myosin superfamily is particularly complex, compared to other cytoskeletal protein families, necessitating stringent criteria for choosing appropriate nodes to distinguish classes from variants (Figs. 1 and 2, Additional file 1: Text). For example, setting a parent node of two classes in the phylogenetic tree as new class-defining node will turn different myosin subfamilies (classes) into subfamily variants (same class). Variants result from ancient gene duplication and there is usually a species with a single class member that diverged before the duplication event. Variants are also expected to have similar domain architectures while classes usually acquired new domains. Thus, we compared the domain architectures and species phylogenies of the leaves in the tree and combined myosins into a class until choosing parent nodes would considerably break species relations and myosin architecture similarity. Classes should have high bootstrap support and be stable independent of changes to the dataset. We observed that a few single myosins and some small groups of myosins group differently in trees from different datasets. These “jumping myosins” are very divergent members of their classes and they mutually influence their phylogenetic grouping. The “correct” placement of one subgroup seems to lead to misplacement of other groups. To obtain stable phylogenetic groupings, “missing links” coming from higher taxonomic sampling of the respective sequences/groups are needed. Here, we classified these “jumping myosins” by analysing trees of full-length myosins and by comparing gene structures (Fig. 2, Additional file 1: Figures S6 and S7). Readers interested in the full details of the classification process are recommended to read the Additional file 1: Text.
Intron position conservation as an additional and independent criterion for myosin classification
All sequenced eukaryotes contain at least a few spliceosomal introns. Many studies comparing gene structures across ancient eukaryotic lineages suggest an intron-rich eukaryotic ancestor and describe intron loss to have happened at a substantially higher rate than intron gain [30,31,32]. Therefore, we assumed that gene structure conservation could provide additional protein sequence-independent information for myosin classification. Fortunately, the myosin motor domain is one of the largest known protein domains . This considerably increases the chance to observe intron position conservation even in genes from species with very low intron densities. Myosin gene structures were reconstructed with WebScipio . The regions coding for the motor domains were than compared with GenePainter , which resolves conserved intron positions at nucleotide resolution. To obtain class-specific intron position patterns we wanted to exclude that the intron patterns are dominated by genes with low intron densities (e.g. the intron-poor fungal myosin genes) or by multiple (so far) unique intron positions (e.g. from branches with low taxonomic sampling). Given the high intron loss rates in all eukaryotic lineages it is also clear that most intron positions are only present in a minority of the available sequences. Therefore, to become part of a class-specific intron position pattern an intron position needs to be present in a minimum number of genes. This number depends on the number of sequences available for a myosin class. These intron positions were termed “conserved intron positions”. The gene structure comparison resulted in 349 (421 including orphans) conserved intron positions across all myosin classes, of which 156 (47%; 221 respectively 52% including orphans) are shared between at least two myosin classes (Fig. 3 and Additional file 1: Figures S8-S10). Accordingly, 193 (200 including orphans) conserved intron positions are unique to a single class. Still, some classes contain only single exon genes (e.g. class-37 and classes-74 to −79) or not enough genes with introns to determine intron patterns (Fig. 3a). The conservation of the intron positions across myosins of the same class is in agreement with the phylogenetic tree-, species- and domain architecture-based class assignment. The gene structure comparison also does not show any common introns – apart from the generally conserved introns – between the class-5 and class-11, and the class-2 and class-18 myosins (Fig. 3c and d). The intron position conservation also supports the class assignment of the “jumping myosins” (Additional file 1: Text).
Myosin diversity across eukaryotes
Myosins have been identified in almost all eukaryotes sequenced so far. Still, a few species do not contain myosin genes. We propose that the red algae Porphyridium purpureum and Chondrus crispus (rhodophytes), and the diplomonad Spironucleus salmonicida are species without myosins. Previously, we have described the absence of myosins in the unicellular red algae Cyanidioschyzon merolae (rhodophyte), the flagellated protozoan parasite Giardia lamblia (diplomonad), and the protozoan parasite Trichomonas vaginalis (parabasalid) . For Giardia and Trichomonas the genomes of several strains are available. Thus, we can exclude that the lack of myosins is due to genome assembly gaps. The absence of myosins could be branch-specific (Diplomonadida, Parabasalia) or species-specific. To date, this cannot be conclusively revealed as only single species of these branches have been sequenced. At the very least red algae are not myosin-free in general, which is demonstrated by Galdieria sulfuraria and its myosin gene .
The myosins group into 79 classes, of which 70 currently have at least five members (Additional file 1: Figure S1C). two hundred eighty-seven sequences (termed orphans) from 69 species still remain unclassified (Fig. 1) although some of these orphans are close homologs to established classes. Examples are the lophotrochozoan orphans that have the same domain architecture as the newly defined class-80 myosins, which contain most prominently a MH2 domain (MAD homology 2; also called DWB domain: domain B in dwarfin family proteins) at the C-terminus. However, in all trees of motor domain sequences these sequences are polyphyletic and form a sister group to or group together with the class-36 myosins. Improved taxonomic sampling will help to unambiguously resolve their class membership. Other examples are three choanoflagellate orphan myosins, which are closely related to the class-3 myosins and even share the N-terminal phospho-kinase domain. Because the node of the supposed last common ancestor is not strongly supported and because they do not form a sister group to the class-3 myosins in all trees, we suggest to not yet terming these three choanoflagellate orphans class-3 myosins. Other orphans are mainly specific to species unique for their branch (e.g. Cyanophora paradoxa, Guillardia theta, Aureococcus anophagefferens) or specific to branches consisting of only two related species (e.g. Ectocarpus siliculosus + Saccharina japonica, Emiliania huxleyi + Prymnesium parvum, Monosiga brevicollis + Salpingoeca rosetta). Assigning classes to the 287 orphan myosins by merging within-species gene duplicates and cross-species homologs would result in 160 additional, distinct myosin types (Fig. 1, Additional file 1: Figure S7). Accordingly, it seems likely that the current number of myosin classes will at least triplicate as soon as more species of the respective branches become sequenced.
We based the class numbering on our previous analysis . Three previously distinct, but taxonomically restricted classes (part of the “jumping myosins”, see Additional file 1: Text for more details) are now joined to broader distributed classes: the previously arthropod-specific class-21 myosins are now a subgroup of the class-3 myosins; the previously nematode-specific class-12 and the previously vertebrate-specific class-35 myosins are now subgroups of the class-15 myosins. The free class numbers were not reused to avoid confusion with previous literature. In total 47 new classes have been established (Fig. 2, Additional file 1: Figures S6 and S7, and Additional file 1: Text). There are two new classes within Metazoa: the mollusc myosins containing a chitin synthase and multiple transmembrane domains in the tail  are now grouped together with rotifer and annelid homologs into class-36. The new class-80 myosins have not been described so far. Members of this class are currently restricted to Ambulacraria (echinoderms and hemichordates). Closely related myosins are available in annelids and molluscs (see notes about orphan myosins above) so that the phylogenetic distribution of this class might extend in the future.
In addition to establishing new classes, we both extended the taxonomic distribution of many classes to earlier branching lineages and refined the distribution within established branches. For example, we identified class-6, class-18 and class-28 myosins in the Ichthyosporean Capsaspora owczarzaki thus timing the origin of these classes back to the last common holozoan ancestor. Also, we confirmed the presence of class-22 myosins in several fungal lineages  and found strong support for orthology with a group of amoebozoan myosins including the former Dictyostelium MyoI (now Myo22). Thus, we date the origin of class-22 back to the last common ancestor of the Amorphea. It is now without doubt that these amoebozoan myosins are class-22 myosins, and neither form an independent class as proposed in  nor belong to the class-7 myosins. While the class-19 myosins have previously not been identified in hexapods [27, 38], we were now able to identify and reconstruct class-19 myosins in Hymenoptera and Orthopteroidea. This indicates that class-19 myosins have been lost independently in most insects. The identification of a class-19 myosin in Apis mellifera (not detected in our previous analyses although an almost complete genome assembly was already available at that time) also shows that continuous re-analysis of species’ myosin repertoires will occasionally reveal additional, presently not detectable myosins.
Two myosins in the last eukaryotic common ancestor
To reconstruct the evolution of the myosin family, we plotted myosin class gain events onto the most commonly agreed tree of the eukaryotes [3, 12,13,14,15] (Fig. 4). Accordingly, the almost ubiquitous distribution of the class-1 myosins strongly suggests that a class-1 prototype motor was present in the LECA as proposed earlier . Did the other classes evolve independently from this prototype class-1 myosins across the major domains? Class-1 myosins contain a unique and almost invariant proline insertion at the base of the lever helix of (Additional file 1: Figure S13), which makes it very unlikely that new classes evolved from class-1 myosins several times. In the latter case of multiple independent duplications, one would expect this proline to be retained (and possibly mutated) in at least one of the new classes or one of the orphans. This insertion is, however, not found in any other myosin. Given the length of the motor domain independent loss of the proline insertion seems highly unlikely. Also, a new class that evolved late from class-1 myosins would most likely have an intron position pattern more closely related to the class-1 intron pattern than to any other intron pattern. Such a closely related intron pattern is, however, also not found (Fig. 3, Additional file 1: Figure S10). These considerations suggest that the LECA must have contained another prototype myosin, from which all other classes evolved (Fig. 4). Which myosin was first, the class-1 myosin prototype or the other myosin prototype? Because all intron positions conserved in more than 8 classes are also present in class-1 myosins (Additional file 1: Figure S8), it is most likely that the ur-myosin had a gene structure with a class-1 myosin intron pattern and that the other prototype myosin appeared by duplication of the class-1 myosin prototype. In the alternative scenario, in which the class-1 prototype would have resulted from a duplication of the other myosin prototype, at least a few intron positions conserved in several myosin classes except class-1 would have been expected. Such introns are, however, not found. The class-1 specific proline insertion might have been gained in the class-1 myosin prototype after the other myosin prototype appeared, or the other prototype myosin lost the insertion before further gene duplication events happened.
Eukaryote-eukaryote horizontal gene transfer
The only other classes with members present in more than one major lineage are class-2 and class-4. Both classes are ubiquitous in one lineage (class-2 myosins in Amorphea and class-4 myosins in Rhizaria, Haptophyceae and Stramenopiles) and are otherwise narrowly distributed in isolated, late-diverging branches (Fig. 4). This distribution can be explained by two scenarios: The first suggests appearance of class-2 and class-4 in Amorphea and SAR/Haptophyceae, respectively, followed by horizontal gene transfer (HGT, Fig. 4). In the second scenario both classes would have been present in the LECA and subsequently been lost independently in many lineages (Additional file 1: Figure S14). Naegleria species contain three class-2 myosins that all group with the amoebozoan class-2 myosins in the phylogenetic trees. In contrast, five of the six heterolobosean class-1 myosins group basal to all amorphean class-1 myosins. Therefore, the phylogenetic data suggest an origin of the class-2 myosins in the last common amorphean ancestor, and gain of the class-2 myosins in heteroloboseans by HGT from an early Amoebozoa. The alternative scenario of a class-2 myosin in the LECA is not in agreement with the phylogenetic grouping of the heterolobosean homologs and would require at least three independent class-2 myosin loss events, two loss events in the ancestors of the kinetoplastids and parabasalids and one in the ancestor of the Diaphoretickes (SAR + Archaeplastida) if the Diaphoretickes is considered a monophyletic taxon (Additional file 1: Figure S14). Even more loss events would have to be considered if other similarly likely early and independent branchings of the SAR and Archaeplastida were assumed [3, 12, 14]. Given the importance of class-2 myosins in cytokinesis in extant species  and given that non-muscle class-2 myosins are the only invariant myosins in all amorphean species it seems likely that their ancestor had a similarly complex machinery. This means that this machinery would have been lost multiple times independently of each other, if class-2 myosins were present in the LECA. Alternatively, class-2 myosins might have had a different function during early eukaryotic evolution and their role in cytokinesis had been established later independently in the ancestors of the Amorphea and the Heterolobosea.
Class-4 myosins are present in Stramenopiles, Rhizaria and Haptophyceae and, in addition, in the unrelated Lobosa (Amoebozoa) and Apusozoa taxa (only a single apusozoan species, Thecamonas trahens, has been sequenced so far; Fig. 4). The restricted distribution of class-4 myosins outside the SAR/Haptophyceae branch alongside with their phylogenetic grouping to rhizarian class-4 myosins and their domain architecture shared with rhizarian class-4 myosins suggests that an early lobosan or apusozoan species obtained a class-4 prototype by HGT from an ancient Rhizaria. Subsequently, the ancient Lobosa and Apusozoa shared the class-4 myosin via another HGT event. In an alternative scenario, a class-4 myosin would have been present in the LECA and subsequently been lost in multiple lineages (Additional file 1: Figure S14). This scenario, however, is not compatible with the phylogenetic grouping of the class-4 myosins, and would also include multiple tail domain changes in haptophycean and Stramenopiles class-4 myosins while the rhizarian, lobosean and aposozoan class-4 myosin domain architectures remained conserved. A phylogenomic analysis of the Acanthamoeba castellanii (Lobosa) and Naegleria gruberi genomes has identified hundreds of genes arisen through inter-kingdom HGT . Although a genome-wide analysis of inter-domain HGT between eukaryotes is still missing, it seems possible that these phagotrophic protozoans also acquired a significant number of eukaryotic genes from phylogenetically unrelated species . Assuming that the phagotrophic lifestyle had already been present in ancient loboseans, apusozoans and heteroloboseans this could explain HGT of class-2 and class-4 myosins.
The phylogenetic distribution of class-2 and class-4 myosins suggests that the HGT events in ancient loboseans, apusozoans and heteroloboseans already occurred more than 1 billion years ago. The only obvious example in our dataset suggesting gain of myosins by relatively recent HGT is the marine diatom Nitzschia (Stramenopiles: Bacillariophyta). Nitzschia is supposed to have incorporated thousands of algal genes , which could explain the presence of an algae-like class-11 myosin gene (Additional file 1: Figure S15A). More surprising was the identification of not-yet described HGT of heterolobosean-related genes, a class-1 myosin and three myosins related to orphan myosins from Naegleria, indicating integration of genetic information from multiple species from several domains into the Nitzschia genome (Additional file 1: Figure S15A).
Could there have been more myosins in the last eukaryotic common ancestor?
Other studies suggested three  and six  myosin subtypes in the LECA: two myosin-1-like subtypes, and four ancestral myosins representing class-2, class-4, class-5, and class-6 myosins. According to our data, one of the myosin-1 subtypes (containing the SH3 domain C-terminal to the TH1 domain) and the class-2 and class-4 myosins are restricted to isolated late-diverging lineages apart from their main occurrence, indicating a late origin by duplication (from the universal class-1 myosin) and origin by HGT in case of class-2 and class-4 myosins, as described above. The proposed deep branching of class-6 myosins, indicated by the grouping of haptophyte myosins to class-6 myosins , is not supported by our data. Reasons why those few sequences from a single haptophyte (E.huxleyi), which miss the characteristic class-6 myosin motor domain loops and tail domains, should be grouped to class-6 myosins restricted to Holozoa were not given. While previous analyses always showed independent branchings for class-5 and -11 myosins, our data reveal considerable phylogenetic support for a common origin of class-5 and class-11 myosins in several of the trees (see for example the tree in Additional file 1: Figure S6). However, because an ancestral DIL domain-containing myosin is not yet supported by gene structure data and the phylogenetic grouping of class-5 and class-11 is inconsistent, we refrain from proposing a class-5 prototype in the LECA based on the current data. Proposing a common origin for the class-5 and class-11 myosins solely based on the shared tail domain architecture  seems arbitrary because many other domains are also shared between classes from different major taxa. Similarly, although MyTH4-FERM domains are present in many classes, their origin could be related to only two or three independent domain fusion events at the origin of the Amorphea and SAR. Assuming a MyTH4-FERM domain containing myosin in the LECA would accordingly entail multiple independent loss events in other major branches. The molecular phylogenetic data and gene structure comparisons currently do not support a myosin-rich LECA. Instead, our data suggest that myosin diversification started after the four major eukaryotic domains, Amorphea, Excavata, SAR and Haptophyta, and plants (Archaeplastida), had been established in early eukaryotic evolution.
The timing of myosin gain and loss shows a burst of myosin innovation in the Mesoproterozoic era
The presence of at least 79 myosin classes in extant species alongside with two classes in the last common eukaryotic ancestor raises several questions. New classes could have emerged continuously over time or multiple classes could have appeared in “burst” events in-between the relatively short time from the formation of a lineage to the further split of this lineage. The invention of new classes might have happened at similar rates in the various major eukaryotic lineages, and myosin class evolution might coincide with major events in Earth history. New classes might also be related to major eukaryotic innovations. To correlate the evolution of myosin diversity with time we enhanced the tree of the eukaryotes with divergence time estimates from a comprehensive taxon-rich study using multiple fossil records for generating a time-resolved phylogenetic tree  (Fig. 4). Almost identical results are obtained when using the TimeTree Of Life divergence time estimates or the median/mean time estimates of all studies available from the TimeTree webpage that have included the respective branching  (Additional file 1: Figure S16A).
Independently of whether two (class-1, unknown myosin) or three (class-1, class-2, class-4) myosins are assumed for the LECA, early eukaryotic evolution in the Mesoproterozoic era (1600–1000 Ma) is characterized by multiple myosin inventions in all major lineages (Fig. 4). The most prominent bursts happened in the ancestor of the Holozoa (Ichthyosporea, Metazoa, Choanoflagellida), the last common ancestor of the Apicomplexa and Dinophyceae (dinoflagellates), at the origin of the Stramenopiles, and, although later in time, at the origin of the Kinetoplastids. Reconstructing these early myosin innovations requires sufficiently deep taxonomic sampling, which we are the first to provide. For example, the last common holozoan ancestor acquired seven new classes (classes-6, −7, −9, −10, −15, −18, −28) resulting in a set of 11 myosin subtypes (classes-1, −2, −5, −6, −7, −9, −10, −15, −18, −22, −28; Fig. 4). The subsequent evolution towards bilateria was accompanied by only one or two new myosins at each split (Fig. 5). Notably, the origins of these five additional myosins are based on single/three species representatives of the respective taxa (Ctenophora, Porifera, Placozoa, Cnidaria). Due to the limited taxonomic sampling it seems likely that these myosin inventions will have to be assigned to earlier branchings, if not to the origin of the Holozoa, as soon as further sequenced genomes become available. In contrast to these sporadic myosin inventions in early metazoans, massive and independent myosin loss events characterize the further evolution of the metazoan myosins, and the dense taxonomic sampling now allows tracing of their evolution at high resolution (Fig. 5). The only myosins shared by all sequenced metazoans are the class-2 myosins. In contrast to HGT of class-2 and class-4 myosins between early eukaryotes, the scattered distribution of the metazoan-specific myosin classes can be explained best by multiple and independent loss events.
A burst of myosin innovation is also found in the ancestor of the Apicomplexa and Dinophyceae, which became apparent through the expanded taxonomic sampling of alveolates that provides not only strong support for our previous class assignments , but also demonstrates the early origin of the newly defined class-46, class-55, and class-57 myosins. Back in 2007 members of these classes were already identified  but not yet classified because of their limited taxonomic distribution. These ancient classes have most probably been missed in other studies because the first sequenced Apicomplexa incidentally lost many myosin genes (Additional file 1: Figure S15B). Class-26 myosins have recently been found in the dinophytes Vitrella brassicaformis and Chromera velia (data not shown but available at CyMoBase) demonstrating that improved species sampling in the future will lead to more ancient origins of many classes. Similarly, the kinetoplastids except Trypanosoma cruzei were long thought to contain only single class-1 and class-13 myosins. Therefore, the myosin repertoire expansion in T.cruzei  has long been regarded as species-specific. However, by identifying orthologs in the early diverging kinetoplastid Bodo saltans  and in further Trypanosoma relatives we can now confidently predate the myosin burst to at least the kinetoplastid ancestor.
The Stramenopiles are probably as divergent as the Holozoa/Metazoa, but gene-rich and taxonomically broad phylogenetic studies are rare and only a few dozen genomes have been sequenced so far. Accordingly, divergence time estimates for major phyla such as the Oomycetes and Ochrophyta differ by as much as 400 million years [3, 45]. Fossil records are rare, and given the Stramenopiles’ underrepresentation in taxonomically broader studies we suppose that even the oldest reported divergence times considerably underestimate their early evolution. Currently, six myosin classes are common to most Stramenopiles (classes-1, −4, −30, −31, −75, −81). More classes are specific to Labrinthulomycetes, Oomycota, and Ochrophyta, and dozens of orphan myosins have not yet been classified (Fig. 4 and Additional file 1: Figure S15A). The further evolution within these major branches resembles the metazoan myosin evolution with massive and independent myosin loss events. Therefore, we propose that analysis of further genomes will lead to an increased set of myosin classes in the last Stramenopiles common ancestor and more loss events in major subbranches, similar to the situation in the Holozoa, Alveolata, and Kinetoplastida.
Our data showing a Mesoproterozoic origin in myosin invention and a late Proterozoic start of myosin loss events (Fig. 6) are in contrast to the recent hypothesis based on a LECA containing six myosins . Although myosin classes are not entirely comparable, in the former study  25 myosin gain but also 64 loss events happened up to a similar branching depth as shown in Fig. 4, assuming a root of the eukaryotes at the unikont/bikont split. Placing the root at similarly likely other branches [3, 12, 14] would not result in more gain, but even more loss events. The lower number of gain events compared to our findings is a result of the considerably lower taxonomic sampling compared to our study. However, most of the loss events resulted from proposing a myosin-rich LECA, which we suppose to be less likely than a LECA with only two myosins.