Expanding our understanding of marine viral diversity through metagenomic analyses of biofilms


Recent metagenomics surveys have provided insights into the marine virosphere. However, these surveys have focused solely on viruses in seawater, neglecting those associated with biofilms. By analyzing 1.75 terabases of biofilm metagenomic data, 3974 viral sequences were identified from eight locations around the world. Over 90% of these viral sequences were not found in previously reported datasets. Comparisons between biofilm and seawater metagenomes identified viruses that are endemic to the biofilm niche. Analysis of viral sequences integrated within biofilm-derived microbial genomes revealed potential functional genes for trimeric autotransporter adhesin and polysaccharide metabolism, which may contribute to biofilm formation by the bacterial hosts. However, more than 70% of the genes could not be annotated. These findings show marine biofilms to be a reservoir of novel viruses and have enhanced our understanding of natural virus-bacteria ecosystems.


Viruses make a significant contribution to nutrient and energy conversion processes in marine ecosystems via the modulation of the structure and functions of protistan, bacterial, and archaeal communities (Breitbart 2012; Kristensen et al. 2011; Suttle 2005; Zhang et al. 2014). The viral shunt releases around 10 billion tons of carbon per day and is probably a fundamental step in marine carbon cycling (Breitbart 2012). Viruses that infect bacterial hosts, known as phages, express auxiliary metabolic genes (AMGs) that influence the central metabolic processes of their hosts, such as photosynthesis and nutrient acquisition (Thompson et al. 2011; Xu et al. 2018).

However, because of their large and highly dynamic populations, the vast majority of marine viruses remain unexplored. Great efforts have been spent on the isolation of viruses infecting many major marine bacterial lineages, such as Prochlorococcus (Sullivan et al. 2003), SAR11 (Zhao et al. 2013), and Roseobacter (Zhang et al. 2019a). Recent advances in culture-free approaches (e.g., metagenomics) have facilitated an unprecedented increase in the analysis of the diversity of marine microbes (bacteria and archaea) and viruses (reviewed in Coutinho et al. 2018). The global ocean dsDNA viromic dataset was established by the Tara Ocean Project with the goal of exploring ocean virus diversity to better understand the ecological and evolutionary drivers behind these viral communities and to reveal new mechanisms by which these viruses affect global oceanic microbial processes (Brum et al. 2015). In addition to the Tara Ocean Project, several other projects have revealed viruses to be the most abundant biological entities in marine ecosystems, e.g., in coral reefs (Thurber et al. 2017), and in marine sediments (Danovaro et al. 2008; Engelhardt et al. 2014).

Most oceanic surveys have focused on the viruses infecting free-living bacterioplankton while those associated with biofilms were neglected. Biofilm formation confers several ecological advantages on bacteria and archaea, such as environmental protection, increased access to nutrients, and enhanced interspecies interactions (Dang and Lovell 2016). The individual and collective viral protection is conferred by the biofilm architecture (Vidakovic et al. 2018). Biofilms supported on artificial surfaces have been used as models to study biofilm developmental processes, microbe-invertebrate interactions, and novel microbial diversity and functions in marine environments (Chung et al. 2010; Salta et al. 2013; Zhang et al. 2015). In a recent study, Zhang et al. (2019b) examined 101 biofilm samples formed on man-made panels and natural rocks immersed in eight locations across the Atlantic, Indian, and Pacific Oceans and investigated the microbial (bacterial and archaeal) diversity and functional potential within these microbiomes. In the present study, we analyzed the viral sequences extracted from the same 101 biofilm metagenomes with the aim of obtaining a systematic understanding of viral diversity and function.


Metagenomic identification of viruses in marine biofilms

The locations of where the 101 biofilm samples were collected are shown (Fig. 1). The biofilms were developed on eight types of artificial substrates (polystyrene Petri dishes, zinc panels, aluminum, poly(ether-ether-ketone), polytetrafluoroethylene, poly(vinyl chloride), stainless steel, and titanium). The substrates were deployed at a depth of 1–2 m at eight locations around the world (the South Atlantic, the Red Sea, the waters off Hong Kong, Yung Shue O Bay, the East China Sea, and three sites in the South China Sea). Metagenome assembly generated 72,132,494 contigs in total, from which 3974 viral sequences (longer than 5.0 kbp) were predicted. The viral sequences had a maximum length of 351.55 kbp, an average length of 14.57 kbp, and an average cytosine bases (GC) content of 47.75%. The novelty of the viruses was evaluated by comparing the viral sequences with the Integrated Microbial Genome/Virus (IMG/VR) database, which contains sequences from almost 8500 isolated viruses and over 700,000 viral contigs from metagenomes. Consistent with the rules used in a previous study (Paez-Espino et al. 2017), biofilm-derived viral sequences of over 1 kbp with 90% or higher similarity to sequences in the IMG/VR database were considered to be known viruses; only 358 (9.01%) biofilm-derived viral sequences were found in the IMG/VR database (Fig. 2).

Fig. 1

Sampling locations of the 101 biofilms. The eight locations include (1) South Atlantic, (2) Red Sea, (3) Hong Kong Water, (4) Yung Shue O Bay, (5) East China Sea, (6) South China Sea 1, (7) South China Sea 2, and (8) South China Sea 3. Tara surface seawater samples used for comparison are also shown

Fig. 2

Similarity between viral sequences identified from marine biofilms and those documented in the IMG/VR database. The biofilm-derived viral sequences were BLASTn searched against the IMG/VR database. The BLASTn hits with over 90% similarity for more than 1000 bp alignments were considered to be known viruses (dots in red), while the other hits (dots in blue) and those with no significant similarity were considered to be novel viruses

74,895 open reading frames (ORFs) were predicted from the biofilm-derived viral sequences. To confirm the VirSorter prediction, HMMER was used to search the ORFs against the virus orthologous groups (VOGs) database. As a result, all the viral sequences had ORFs that achieved hits in the VOG database. In total, 31,038 VOG hits (41.44% of the total ORFs) were obtained, of which 2764 were non-redundant. The 30 most abundant VOGs consisted of genes encoding viral structural proteins, such as terminase large unit (VOG09355), base plate protein J (VOG00195), terminase large unit gp2 (VOG00080), and probable capsid protein gp17 (VOG02249) (Supplementary Fig. S1). The other abundant VOGs included genes responsible for DNA replication and transcription, such as DNA polymerase (VOG00073) (Supplementary Fig. S1).

Taxonomic classification indicated that 81.60% of the VOGs were associated with Caudovirales, 0.32% were associated with Maveriviricetes, with the remaining VOGs considered to be unclassified viruses (Supplementary Fig. S2). Phylogenetic analysis using the terminase large subunit VOG9355, identified from the biofilm viral sequences and sequences from the VOG database, revealed three relatively independent branches formed by the biofilm viruses, most likely representing novel viral lineages (Fig. 3).

Fig. 3

Phylogenetic tree of the terminase large subunit gene VOG9355 identified from the biofilm phage sequences. Closely related terminase large subunit gene sequences documented in the VOG database were revealed by hmmscan and then used as a reference. The protein sequences that could be aligned by ClustalW were used to construct a maximum likelihood tree with 1000 replicates. Bootstrap values (> 50) are shown on the branches. All the gene sequences from biofilms are shown in blue, and branches that represent potentially novel viral lineages are shown in red

Endemism of the viruses to marine biofilms

To explore the niche specificity of the viruses detected in the biofilms, the abundance of the biofilm-derived viruses was investigated by mapping the metagenomic reads of the 101 biofilm and 91 seawater samples (10 million reads per sample) to the viral sequences. To this end, 250 viral sequences with coverage > 1 in at least one biofilm and coverage = 0 in all seawater samples were identified (Fig. 4), suggesting the existence of viruses that are endemic to the biofilm niche. To confirm this result, five phages that were abundant in the Red Sea biofilms were selected their distribution in nine Red Sea biofilm samples and nine adjacent seawater samples were investigated. The number of reads mapped to these phages exceeded 100 in almost all of the biofilm samples but was close to zero in the seawater metagenomes (Supplementary Fig. S3).

Fig. 4

The endemism of biofilm-derived viruses. Metagenomic reads of 101 biofilm and 91 seawater samples were mapped to the biofilm-derived viral contigs to compare their abundance in biofilms and seawater. Metagenomic reads (101 bp and ten million reads per sample) were mapped to gene sequences. Viral sequences with coverage > 1 in biofilms and coverage = 0 in seawater samples are presented

Viruses in single genomes and their functions

To investigate the hosts of the viruses and the potential virus-host interactions, 479 microbial genome bins extracted from the biofilm metagenomes were analysed. These genome bins belonged to 20 different microbial phyla, including Proteobacteria (272 genomes), six ‘Candidatus’ phyla (7 genomes), Acidobacteria (6 genomes), Actinobacteria (10 genomes), Bacteroidetes (100 genomes), Cyanobacteria (34 genomes), Deinococcus-Thermus (1 genome), Firmicutes (2 genomes), Lentisphaerae (2 genomes), Parcubacteria (1 genome), Planctomycetes (22 genomes), Rhodothermaeota (1 genome), Verrucomicrobia (19 genomes), and Euryarchaeota (2 genomes) (Supplementary Fig. S4). Viral sequences from the genome bins were identified using the software PHASEER (McCoy et al. 2007), which was designed for mining phage sequences from draft genomes. In total, 149 phage sequences were distributed in 101 bacterial genome bins of Alphaproteobacteria, Gammaproteobacteria, Acidobacteria, Actinobacteria, Bacteroidetes, Candidatus, Gracilibacteria, Cyanobacteria, Firmicutes, Lentisphaerae, Oligoflexia, Planctomycetes, and Rhodothermaeota (Fig. 5a). Within these taxa, Gammaproteobacteria (n = 43), followed by Alphaproteobacteria (n = 30), possessed the largest number of phage-containing genome bins (Fig. 5a). The GC content of the phage contigs was compared with that of the bacterial genomes and found to be very similar (Supplementary Fig. S5).

Fig. 5

Identification of viral sequences from microbial genome bins and viral gene functional annotation. A Viral sequences were identified from 100 microbial genomes distributed across 11 bacterial phyla (Proteobacteria were divided into Alpha- and Gamma-proteobacteria). B The number of viral genes annotated by BLASTp searching against the COG database for functional classification

To detect potential functions encoded by these phages, all genes derived from the phage sequences (4121 predicted ORFs) were analyzed by classifying the gene functions using the COG database (Galperin et al. 2015; Tatusov et al. 2000), which resulted in 22 COG categories (Fig. 5b). In total, 1023 ORFs (24.82%) resulted in hits in the COG database; however, 521 ORFs were classified as “general function” predictions only [R] or as “function unknown” [S]. Of the remaining 502 COGs, 40 were classified as being involved in amino acid transport and metabolism [E], nucleotide transport and metabolism [F], or as carbohydrate transport and metabolism [G], such as the genes encoding Na+/glutamate symporter [COG0786], deoxynucleotide kinases [COG1428], and chitinase [COG3325] (Fig. 5b).

The functions of all genes derived from these phage contigs (4121 predicted ORFs) were further analyzed by searching them against the KEGG (Kanehisa et al. 2017) and CAZy databases (Lombard et al. 2014). The genes were characterized by searching the 1062 ORFs against the KEGG database’s annotated sequences and the top 18 abundant KEGG matches are shown in Fig. 6. Interestingly, the most abundant KEGG hit was for trimeric autotransporter adhesin (K21449) and the genes for viral structure (e.g., K06909), transcriptional regulation (e.g., ParB family transcriptional regulator, chromosome partitioning protein K03497) and DNA replication (e.g., putative DNA primase/helicase K06919) were also annotated. KEGG annotation also revealed uncharacterized but relatively conserved genes (n = 96; 9.0% of all KEGG hits), such as K06903, K06907, and K06904. In parallel, 351 ORFs were annotated by CAZy: these mostly included genes for lysozymes, chitinases, lyase, and peptidoglycan lytic transglycosylases (Supplementary Fig. S6). In total 1133 ORFs were annotated by the CAZy or KEGG database, while the remaining 72.51% achieved no hits.

Fig. 6

Potential auxiliary metabolic genes of phage genes extracted from the bacterial bins. The gene functions were predicted by BLASTp searching against the KEGG database. The top 18 (gene number > 10) KEGGs are shown


The finding here that biofilms are composed of a number of previously unknown viruses is consistent with the notion that biofilm formation promotes virus accumulation and may be a potential library of infectious pathogens (Bettarel et al. 2006). When the biofilm-derived viral sequences were aligned with the VOG database, the most abundant genes were found to be related to structure and replication. More specifically, the base plate is a part of tailed prokaryotic viruses, such as Caudovirales, and it suggests the prevalence of tailed viruses in marine biofilms. The terminase large subunit is a viral DNA-packaging motor, which cleaves viral DNA into smaller pieces and inserts them into a procapsid powered by ATP hydrolysis (Rao and Feiss 2008). Capsid proteins encoded by relatively short genes function to protect nucleic acids and the tertiary structure of capsid proteins contain all the information required for virus assembly (Hagan and Zandi 2016). The annotation of these VOGs validates the conserved structure and function of biofilm-derived viruses; however, phylogenetic analysis of these proteins also indicates the existence of novel viral lineages in marine biofilms.

There have been few studies reporting on virus endemism in environmental niches. In this study, it is shown that biofilm virus endemism is much greater than in seawater: 250 viral sequences were present in the biofilms collected from the different oceans, but they were absent from all the seawater samples. While surface-associated microbes and viruses must be seeded from seawater, many viruses are very scarce in seawater and so are unlikely to be sampled. Extracellular DNA released through cell lysis mediated by phages has been shown to enhance biofilm formation (Gödeke et al. 2011). Certain viruses are capable of forming biofilm-like assemblies for propagation (Thoulouze and Alcover 2011). In addition, phages can select for a mucoid bacterial phenotype to co-evolve and induce biofilm formation (Scanlan and Buckling 2012). One of the underlying mechanisms coordinating this relationship between viruses and biofilms involves quorum-sensing signals, which upregulate the expression of CRISPR-related genes (Høyland-Kroghsbo et al. 2017; Patterson et al. 2016) and decrease the level of phage receptors (Høyland-Kroghsbo et al. 2013; Tan et al. 2015). Another reason why so many novel viruses were discovered in biofilms is the seawater filtering process, which can highly concentrate the low abundant viruses that are missed during seawater sampling.

According to previous metagenomic analyses of marine viruses (Coutinho et al. 2017; Mizuno et al. 2013), Cyanobacteria, Actinobacteria, Alphaproteobacteria, Gammaproteobacteria, and Verrucomicrobia are the most prevalent phage hosts. Results presented here are consistent with previous reports with Alpha- and Gamma-proteobacteria being the major hosts of phages in the biofilms. The proportion of guanine and GC content in DNA provides survival advantages in the adaption to environmental conditions (Almpanis et al. 2018; Mann and Chen 2010). Results presented here show a similar GC content between the phages and their hosts, suggesting that the viruses have adapted to their hosts and that certain environmental factors have had roles in shaping the intimate relationships between the phages and the bacteria in the biofilms (Motlagh et al. 2017). Viral sequences identified from microbial genomes are probably phages; however, due to technical limitations, it is difficult to extract all the genomes from metagenomes and distinguish all the phages from free viruses.

With regard to phage function, more than 70% of the ORFs could not be annotated by the COG, KEGG, or CAZy databases, indicating the limited understanding of the function of biofilm-derived viruses and the need for additional experimental research. COG annotation suggested that the phages inhabiting biofilms may encode enzymes involved in central carbon metabolism. No phage genes for photosynthesis were detected, suggesting that the phages contribute little to carbon fixation in the biofilm communities, which is in contrast to previous findings that showed photosynthetic genes are prevalent in phages infecting subtidal microbial communities (McMinn et al. 2020; Sullivan et al. 2005; Thompson et al. 2011). Notably, 89 genes were found to code for trimeric autotransporter adhesin (K21449), which is a trimeric autotransporter that promotes biofilm formation in bacteria (Fey et al. 2002; Luqman et al. 2018; Raghunathan et al. 2011); mutation of this gene abolished the ability of biofilms to attach to plastic surfaces (Lazar Adler et al. 2013); over-expression of this gene in Salmonella enterica increased cell aggregation and adhesion to human intestinal Caco-2 epithelial cells (Raghunathan et al. 2011). Similarly, a recent study showed that SadA-expressing Staphylococci from the human gut showed increased cell adherence and internalization (Luqman et al. 2018). The high abundance of K21449 indicates the role of phages in facilitating biofilm formation by the bacterial hosts and thus provides clues to the specificity of the viral sphere in marine biofilms. Transcriptional regulators may also have significant mediating effects on the interactions between human beings and Epstein-Barr viruses (Arvey et al. 2012); however, the function of transcriptional regulators in marine viruses is unclear. Furthermore, the polysaccharide metabolism genes (e.g., chitinases) annotated by CAZy are probably used by phages to lyse hosts and are involved in carbon recycling within the biofilm communities.


Here we found that over 90% of the biofilm-derived viruses had no overlap with the IMG/VR database and provided evidence for the existence of viruses endemic to biofilms, suggesting that biofilm formation enables the discovery and reconstruction of viral genomes from marine environments. We identified potential auxiliary metabolic genes for trimeric autotransporter adhesin and polysaccharide metabolism in viral sequences integrated into the biofilm-derived microbial genomes, suggesting that phages may contribute to biofilm formation by the bacterial hosts, yet more than 70% of the phage genes functions remain unknown. Taken together, the present study has unveiled a hidden marine virosphere with novel viral diversity and unexplored functions.

Materials and methods


The biofilms were developed on eight types of artificial substrates: polystyrene petri dishes (9 × 1.2 cm), zinc panels (11 × 11 cm), aluminum, poly(ether-ether-ketone), polytetrafluoroethylene, poly(vinyl chloride), stainless steel, and titanium (5 × 5 cm). The artificial substrates were deployed at a depth of 1–2 m at eight locations around the world: the South Atlantic, the Red Sea, the waters off Hong Kong, Yung Shue O Bay, the East China Sea, and three sites in the South China Sea. The petri dishes were immersed in seawater for 12 days to allow for biofilm formation; the other artificial substrates were immersed for 30 days to allow for visible bacterial attachment. Biofilms that had formed on natural rocks were also collected. After collection, the biofilms were immediately transferred to the laboratory, and the surface bacterial cells were removed using sterile cotton tips and stored in 5 ml of DNA storage buffer (500 mmol/L NaCl, 50 mmol/L Tris–HCl, 40 mmol/L EDTA, and 50 mmol/L glucose) at − 80 °C. During biofilm development, adjacent seawater samples were collected and successively filtered through 0.1-μm polycarbonate membrane filters (Millipore, Massachusetts, USA). The filters were stored in 5 ml of DNA storage buffer at − 80 °C. In total, 101 biofilms and 24 seawater samples were collected. Additionally, 67 Tara seawater samples collected from marine surface (Sunagawa et al. 2015) were also used for comparisons between the biofilms and seawater (Supplementary Table S1).

DNA extraction and sequencing

Biofilms from the cotton tips and seawater samples on the filters were re-suspended in Tris–HCl buffer, pelleted by centrifugation at 4000 g for 10 min and then lysed with lysozyme (37 °C for 30 min) and the lysis buffer provided by the TIANamp Genomic DNA Kit (Tiangen Biotech, Beijing, China). Then, DNA extraction was performed using the TIANamp Genomic DNA Kit, following the manufacturer’s protocol. DNA sequencing for the Red Sea samples was performed at the Beijing Genomics Institute (BGI, Beijing, China), and the other samples were sequenced at the Novogene Bioinformatics Institute (Novogene, Beijing, China). After the construction of 350-bp insert libraries, the DNA was sequenced on the HiSeq X Ten System at Novogene and the HiSeq 2500 System at BGI. Quality control was performed on a local server using the software NGS QC Toolkit (version 2.0) (Patel and Jain 2012) to remove low-quality reads (assigned by a quality score < 20 for > 30% of the read length) or unpaired high-quality reads. Information on metagenomic reads is given in Supplementary Table S1.

Metagenomic assembly and microbial genome binning

Following quality control, reads from the biofilm metagenomes were assembled into contigs using the software MEGAHIT (version 1.0.2) (Li et al. 2015) with kmer values of 21–121, increasing in steps of 10. Coverage information was generated by mapping metagenomic reads to the contigs using Bowtie2 (fastq as input format under a sensitive-local model). The contigs as well as the coverage information were used as input for MaxBin (version 2.0) (Wu et al. 2016) to assign the contigs to single genomes. The single genomes were further analyzed using MetaBAT for purification. The completeness and contamination of the genome bins were analyzed using CheckM (Parks et al. 2015). Duplicated genomes were removed based on the average nucleotide identity (ANI) information provided by the ANI calculator (Yoon et al. 2017), where genome pairs with ANI values exceeding 0.99 were taken as redundant genomes. Information of the assembled metagenomic contigs is given in Supplementary Table S2. Information on the genome bins is provided in Supplementary Table S3.

Viral sequences prediction and annotation

The software VirSorter (version 1.0.5) (Roux et al. 2015), installed on a local server, was used to identify viral sequences from the metagenomic contigs and genome bins. The database ‘Refseqdb’ and the mode ‘BLASTp’ were used for mining viral sequences, and only viruses in the categories of ‘sure’ or ‘somewhat sure’ were retained for the following analyses. Metagenomic reads of 101 biofilms and 91 seawater samples were mapped to the viral sequences using bbmap (version 2) (Bushnell 2014) to indicate viral coverage in biofilms and seawater (minimum alignment identity = 0.76). All the metagenomes for mapping were normalized to 10 million reads per metagenome, and all reads were trimmed to 101 bp in length by NGS QC Toolkit (version 2.0). The viral ORFs were predicted using Prodigal (version 2.0) (Hyatt et al. 2010) in the Meta model (only closed ends were allowed). A HMMER hmmscan (Johnson et al. 2010) against the VOG database (https://vogdb.org) was performed to classify the ORFs using an e-value cutoff of 1e − 7, and then the taxonomic affiliation was examined by MEGAN (Huson et al. 2016). The reference genes were selected from VOG database with hmmscan, and a phylogenetic tree was established with ClustW and 1000 bootstraps by MEGA 6 (Tamura et al. 2013). For potential function mining, annotation of the phage genes was performed by BLASTp (e value 1e − 7) searching against the COG (Galperin et al. 2015; Tatusov et al. 2000), KEGG (Kanehisa et al. 2017), and CAZy (Lombard et al. 2014) databases. The workflow of the present study is summarized in Supplementary Fig. S7.

Data availability

All the metagenomic datasets (101 biofilm and 24 adjacent seawater metagenomes) have been deposited in the NCBI database under BioProject accession no. PRJNA438384. The 479 microbial genome bins are uploaded to figshare (https://figshare.com/s/2994fdafe79112b99907, https://doi.org/10.6084/m9.figshare.7082684).


  1. Almpanis A, Swain M, Gatherer D, McEwan N (2018) Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microb Genom 4:000168

    Google Scholar 

  2. Arvey A, Tempera I, Tsai K, Chen HS, Tikhmyanova N, Klichinsky M, Leslie C, Lieberman PM (2012) An atlas of the Epstein-Barr virus transcriptome and epigenome reveals host-virus regulatory interactions. Cell Host Microbe 12:233–245

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. Bettarel Y, Bouvy M, Dumont C, Sime-Ngando T (2006) Virus-bacterium interactions in water and sediment of West African inland aquatic systems. Appl Environ Microbiol 72:5274–5282

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Breitbart M (2012) Marine viruses: truth or dare. Ann Rev Mar Sci 4:425–448

    PubMed  Article  Google Scholar 

  5. Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S, Cruaud C, De Vargas C, Gasol JM, Gorsky G, Gregory AC, Guidi L, Hingamp P, Iudicone D, Not F, Ogata H, Pesant S, Poulos BT, Schwenck SM et al (2015) Patterns and ecological drivers of ocean viral communities. Science 348:1261498

    PubMed  Article  CAS  Google Scholar 

  6. Bushnell B (2014) BBMap: a fast, accurate, splice-aware aligner. Lawrence Berkeley National Lab (LBNL), Berkeley

    Google Scholar 

  7. Chung HC, Lee OO, Huang YL, Mok SY, Kolter R, Qian PY (2010) Bacterial community succession and chemical profiles of subtidal biofilms in relation to larval settlement of the polychaete Hydroides elegans. ISME J 4:817–828

    CAS  PubMed  Article  Google Scholar 

  8. Coutinho FH, Silveira CB, Gregoracci GB, Thompson CC, Edwards RA, Brussaard CP, Dutilh BE, Thompson FL (2017) Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat Commun 8:1–2

    Article  CAS  Google Scholar 

  9. Coutinho FH, Gregoracci GB, Walter JM, Thompson CC, Thompson FL (2018) Metagenomics sheds light on the ecology of marine microbes and their viruses. Trends Microbiol 26:955–965

    CAS  PubMed  Article  Google Scholar 

  10. Dang H, Lovell CR (2016) Microbial surface colonization and biofilm development in marine environments. Microbiol Mol Biol Rev 80:91–138

    CAS  PubMed  Article  Google Scholar 

  11. Danovaro R, Dell’Anno A, Corinaldesi C, Magagnini M, Noble R, Tamburini C, Weinbauer M (2008) Major viral impact on the functioning of benthic deep-sea ecosystems. Nature 454:1084–1087

    CAS  PubMed  Article  Google Scholar 

  12. Engelhardt T, Kallmeyer J, Cypionka H, Engelen B (2014) High virus-to-cell ratios indicate ongoing production of viruses in deep subsurface sediments. ISME J 8:1503–1509

    PubMed  PubMed Central  Article  Google Scholar 

  13. Fey P, Stephens S, Titus MA, Chisholm RL (2002) SadA, a novel adhesion receptor in Dictyostelium. J Cell Biol 159:1109–1119

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Galperin MY, Makarova KS, Wolf YI, Koonin EV (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43:261–269

    Article  CAS  Google Scholar 

  15. Gödeke J, Paul K, Lassak J, Thormann KM (2011) Phage-induced lysis enhances biofilm formation in Shewanella oneidensis MR-1. ISME J 5:613–626

    PubMed  Article  CAS  Google Scholar 

  16. Hagan MF, Zandi R (2016) Recent advances in coarse-grained modeling of virus assembly. Curr Opin Virol 18:36–43

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Høyland-Kroghsbo NM, Mærkedahl RB, Svenningsen SL (2013) A quorum-sensing-induced bacteriophage defense mechanism. mBio 4:00362

    Article  CAS  Google Scholar 

  18. Høyland-Kroghsbo NM, Paczkowski J, Mukherjee S, Broniewski J, Westra E, Bondy-Denomy J, Bassler BL (2017) Quorum sensing controls the Pseudomonas aeruginosa CRISPR-Cas adaptive immune system. Proc Natl Acad Sci USA 114:131–135

    PubMed  Article  CAS  Google Scholar 

  19. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Bio 12:e1004957

    Article  CAS  Google Scholar 

  20. Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf 11:119

    Article  CAS  Google Scholar 

  21. Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf 11:431

    Article  CAS  Google Scholar 

  22. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:353–361

    Article  CAS  Google Scholar 

  23. Kristensen DM, Mushegian AR, Koonin EV (2011) Systems biology of bacteriophage proteins and new dimensions of the virus world discovered through metagenomics. Genome Biol 12:9

    Article  Google Scholar 

  24. Lazar Adler NR, Dean RE, Saint RJ, Stevens MP, Prior JL, Atkins TP, Galyov EE (2013) Identification of a predicted trimeric autotransporter adhesin required for biofilm formation of Burkholderia pseudomallei. PLoS ONE 8:79461

    Article  CAS  Google Scholar 

  25. Li D, Liu CM, Luo R, Sadakane K, Lam TW (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676

    CAS  Article  Google Scholar 

  26. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:490–495

    Article  CAS  Google Scholar 

  27. Luqman A, Nega M, Nguyen MT, Ebner P, Götz F (2018) SadA-expressing staphylococci in the human gut show increased cell adherence and internalization. Cell Rep 22:535–545

    CAS  PubMed  Article  Google Scholar 

  28. Mann S, Chen YP (2010) Bacterial genomic G+C composition-eliciting environmental adaptation. Genomics 95:7–15

    CAS  PubMed  Article  Google Scholar 

  29. Marshall D, Sample C (1995) Epstein-Barr virus nuclear antigen 3C is a transcriptional regulator. J Virol 69:3624–3630

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ (2007) Phaser crystallographic software. J Appl Crystallogr 40:658–674

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. McMinn A, Liang Y, Wang M (2020) Minireview: the role of viruses in marine photosynthetic biofilms. Mar Life Sci Technol 2:203–208

    Article  Google Scholar 

  32. Mizuno CM, Rodriguez-Valera F, Kimes NE, Ghai R (2013) Expanding the marine virosphere using metagenomics. PLoS Genet 9:1003987

    Article  CAS  Google Scholar 

  33. Motlagh AM, Bhattacharjee AS, Coutinho FH, Dutilh BE, Casjens SR, Goel RK (2017) Insights of phage-host interaction in hypersaline ecosystem through metagenomics analyses. Front Microbiol 1:1–15

    Google Scholar 

  34. Paez-Espino D, Chen IM, Palaniappan K, Ratner A, Chu K, Szeto E, Pillay M, Huang J, Markowitz VM, Nielsen T, Huntemann M, Reddy TBK, Pavlopoulos GA, Sullivan MB, Campbell BJ, Chen F, Mcmahon KD, Hallam SJ, Denef VJ, Cavicchioli R et al (2016) IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res 30:1030

    Article  CAS  Google Scholar 

  35. Paez-Espino D, Pavlopoulos GA, Ivanova NN, Kyrpides NC (2017) Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. Nat Protoc 12:1673–1682

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Patel RK, Jain M (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7:30619

    Article  CAS  Google Scholar 

  38. Patterson AG, Jackson SA, Taylor C, Evans GB, Salmond GP, Przybilski R, Staals RH, Fineran PC (2016) Quorum sensing controls adaptive immunity through the regulation of multiple CRISPR-Cas systems. Mol Cell 64:1102–1108

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. Raghunathan D, Wells TJ, Morris FC, Shaw RK, Bobat S, Peters SE, Paterson GK, Jensen KT, Leyton DL, Blair JM, Browning DF, Pravin J, Floreslangarica A, Hitchcock J, Moraes CTP, Piazza RMF, Maskell DJ, Webber M, May RC, Maclennan CA et al (2011) SadA, a trimeric autotransporter from Salmonella enterica serovar Typhimurium, can promote biofilm formation and provides limited protection against infection. Infect Immun 79:4342–4352

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Rao VB, Feiss M (2008) The bacteriophage DNA packaging motor. Annu Rev Genet 42:647–681

    CAS  PubMed  Article  Google Scholar 

  41. Roux S, Enault F, Hurwitz BL, Sullivan MB (2015) VirSorter: mining viral signal from microbial genomic data. Peer J 3:985

    Article  CAS  Google Scholar 

  42. Salta M, Wharton JA, Blache Y, Stokes KR, Briand JF (2013) Marine biofilms on artificial surfaces: structure and dynamics. Environ Microbiol 15:2879–2893

    PubMed  Google Scholar 

  43. Scanlan PD, Buckling A (2012) Co-evolution with lytic phage selects for the mucoid phenotype of Pseudomonas fluorescens SBW25. ISME J 6:1148–1158

    CAS  PubMed  Article  Google Scholar 

  44. Sullivan MB, Waterbury JB, Chisholm SW (2003) Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature 424:1047–1051

    CAS  PubMed  Article  Google Scholar 

  45. Sullivan MB, Coleman ML, Weigele P, Rohwer F, Chisholm SW (2005) Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol 3:790–806

    CAS  Article  Google Scholar 

  46. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejocastillo FM, Costea PI, Cruaud C, Dovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F et al (2015) Structure and function of the global ocean microbiome. Science 348:1261359

    PubMed  Article  CAS  Google Scholar 

  47. Suttle CA (2005) Viruses in the sea. Nature 437:356–361

    CAS  PubMed  Article  Google Scholar 

  48. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Tan D, Svenningsen SL, Middelboe M (2015) Quorum sensing determines the choice of antiphage defense strategy in Vibrio anguillarum. MBio 6:00627

    Article  CAS  Google Scholar 

  50. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Thompson LR, Zeng Q, Kelly L, Huang KH, Singer AU, Stubbe J, Chisholm SW (2011) Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc Natl Acad Sci USA 108:757–764

    Article  Google Scholar 

  52. Thoulouze MI, Alcover A (2011) Can viruses form biofilms? Trends Microbiol 19:257–262

    CAS  PubMed  Article  Google Scholar 

  53. Thurber RV, Payet JP, Thurber AR, Correa AM (2017) Virus–host interactions and their roles in coral reef health and disease. Nat Rev Microbiol 15:205–216

    CAS  PubMed  Article  Google Scholar 

  54. Vidakovic L, Singh PK, Hartmann R, Nadell CD, Drescher K (2018) Dynamic biofilm architecture confers individual and collective mechanisms of viral protection. Nat Microbiol 3:26–31

    CAS  PubMed  Article  Google Scholar 

  55. Wu YW, Simmons BA, Singer SW (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:605–607

    CAS  PubMed  Article  Google Scholar 

  56. Xu Y, Zhang R, Wang N, Cai L, Tong Y, Sun Q, Chen F, Jiao N (2018) Novel phage-host interactions and evolution as revealed by a cyanomyovirus isolated from an estuarine environment. Environ Microbiol 20:2974–2989

    CAS  PubMed  Article  Google Scholar 

  57. Yoon SH, Ha SM, Lim J, Kwon S, Chun J (2017) A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek 110:1281–1286

    CAS  PubMed  Article  Google Scholar 

  58. Zhang R, Wei W, Cai L (2014) The fate and biogeochemical cycling of viral elements. Nat Rev Microbiol 12:850–851

    CAS  PubMed  Article  Google Scholar 

  59. Zhang W, Wang Y, Bougouffa S, Tian R, Cao H, Li Y, Cai L, Wong YH, Zhang G, Zhou G, Zhang X, Bajic VB, Al-Suwailem A, Qian PY (2015) Synchronized dynamics of bacterial niche-specific functions during biofilm development in a cold seep brine pool. Environ Microbiol 17:4089–4104

    CAS  PubMed  Article  Google Scholar 

  60. Zhang W, Ding W, Li YX, Tam C, Bougouffa S, Wang R, Pei B, Chiang H, Leung P, Lu Y, Sun J, Fu H, Bajic VB, Liu H, Webster NS, Qian PY (2019) Marine biofilms constitute a bank of hidden microbial diversity and functional potential. Nat Commun 10:517

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Zhang Z, Chen F, Chu X, Zhang H, Luo H, Qin F, Zhai Z, Yang M, Sun J, Zhao Y (2019) Diverse, abundant, and novel viruses infecting the marine Roseobacter RCA lineage. Msystems 4:e00494-e519

    PubMed  PubMed Central  Google Scholar 

  62. Zhao Y, Temperton B, Thrash JC, Schwalbach MS, Vergin KL, Landry ZC, Ellisman M, Deerinck T, Sullivan MB, Giovannoni SJ (2013) Abundant SAR11 viruses in the ocean. Nature 494:357–360

    CAS  PubMed  Article  Google Scholar 

Download references


The authors are grateful to a grant from the National Key Research and Development Program of China (2018YFC0310600) and two grants from Ocean University of China (841912035 and 842041010) to W.Z. The authors are also grateful to a grant from China Ocean Mineral Resources Research and Development Association (DY135-B2-03) and a grant from the Hong Kong Branch of South Marine Science and Engineering Guangdong Laboratory (SMSEGL20SC01) to P.Y.Q.

Author information




WZ, P-YQ, and ZR designed the project; WD, RW, and ZL performed the analysis; WD, RW, and WZ wrote the manuscript.

Corresponding authors

Correspondence to Pei-Yuan Qian or Weipeng Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Animal and human rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Edited by Chengchao Chen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 3277 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ding, W., Wang, R., Liang, Z. et al. Expanding our understanding of marine viral diversity through metagenomic analyses of biofilms. Mar Life Sci Technol (2021). https://doi.org/10.1007/s42995-020-00078-4

Download citation


  • Ocean
  • Virus
  • Biofilm
  • Metagenomics