Introduction

The correct identification of any biological species is important in many situations, e.g., biodiversity analyses, monitoring of endangered species, control actions of weeds, pests and invasive species, and studies on the impact of climate change on species distribution. However, this is not always simple or even possible based on morphology alone. Therefore, precise DNA sequencing-based methods of identification have become increasingly common, especially after the development of the idea of DNA barcoding (Hebert et al. 2003), i.e., the use of short DNA sequences from a standardized position in the genome. The practical use of DNA barcodes requires reference databases, namely compiled public libraries of sequences linked to named specimens. There has been considerable international effort to build databases, many of them being region- or taxonomic group-based. The global International Barcode of Life Project (iBOL; www.ibol.org) is a large biodiversity genomics initiative that paves the way for a global digital identification system for life.

The biological materials used for compiling reference libraries for DNA barcodes are either fresh or specimens available in museums, herbaria, botanical gardens and zoos, or in other ex situ conservation collections (von Cräutlein et al. 2011). For plant barcoding, herbarium collections are a major source of materials. It has been well documented that DNA is often well preserved in herbarium samples and, therefore, can be used for DNA analyses, such as sequencing in molecular systematic and phylogenetic studies, and DNA barcoding (e.g., Lehtonen and Christenhusz 2010; von Cräutlein et al. 2011; Xu et a. 2015; Kuzmina et al. 2017).

In this study, we developed rbcL and matK barcodes for angiosperm species collected in Finland during a period of well over 100 years and maintained in herbarium collections. This work is a part of the Finnish Barcode of Life (FinBOL; www.finbol.org) initiative. Besides producing barcodes for Finnish taxa, we specifically report the barcoding success for herbarium materials varying widely in age, also paying attention on success rate variation and genetic distances among different plant families. An additional attempt was to reveal, whether the level of intraspecific variation differs between native and introduced species, the hypothesis being that the non-native species contain less variation due to their possibly sporadic introductions and short evolutionary history in the area.

Materials and methods

Sampling and barcoding

Angiosperm plant materials used for DNA barcoding and included in the present study consisted of herbarium samples originating from the collections of the Botanical Museum (H), Finnish Museum of Natural History, University of Helsinki. Sampling was conducted during years 2012–2016, including 3176 specimens, 1068 species, 456 genera and 77 families. All included specimens had been collected in Finland between years 1867 and 2013, the mean age equaling 39 years at the time of barcoding (conducted in 2012–2016). The number of very old samples (collected between 1867 and 1910) was only 59 (1.8%). Taxonomic assignments and geographic information were obtained from herbarium voucher labels. All specimens were photographed. It was also recorded whether a species was native (including archaeophytes) or introduced (definitions following Hämet-Ahti et al. 1998). All plant samples were shipped to the Canadian Centre for DNA Barcoding for DNA extraction, PCR and sequencing of chloroplast matK and rbcL barcodes using standard plant barcoding protocols (Ivanova et al. 2008; Kuzmina and Ivanova 2011; Kuzmina et al. 2017). All barcodes, images and other sample informations are available in BOLD (http://www.boldsystems.org) under the FinBOL-Plants (FBPL) project (https://doi.org/10.5883/ds-fbpl1).

Data analysis

First, all sequences were subjected to genus-level similarity search against the National Center for Biotechnology Information (NCBI) database using BLASTn to confirm the assignment of taxonomic identities. Matches were found for all sequences. Then, the barcoding success of samples, i.e., barcode compliance (sequences > 500 bp long, matK or rbcL or both), was calculated for all specimens and species, and for each family and sample age group. Final age comparisons were conducted for the whole data set and for the families with greatest sample sizes, including Asteraceae, Caryophyllaceae, Cyperaceae, Lamiaceae, Plantaginaceae, Poaceae, Ranunculaceae and Rosaceae. Age-specific barcoding success rates among age groups and among the eight above-mentioned families were compared with Chi-square tests (Fisher 1922).

Based on barcode-compliant matK and rbcL sequences, intra- and interspecific genetic distances were calculated. The matrices of pairwise distances were generated using MEGA6 (Tamura et al. 2013). The MEGA output files were used as input files for the ExcaliBAR software to produce intra- and interspecific pairwise genetic distances (Aliabadian et al. 2014). Average intraspecific and interspecific genetic distances based on matK and rbcL, separately for each marker, were produced for all samples and separately for families with numbers of samples over 12. Intraspecific genetic distances were generated separately also for native (including archaeophytes) and introduced species, but only for those genera that contained both native and introduced species. Diversity values were compared using t tests (Student 1908).

Results and discussion

The barcoding success rates are shown in Fig. 1 for all data and in Online Resource 1 for those families that contained at least 12 samples (41 out of 77 families). The average success rates for any barcode (matK or rbcL), rbcL, matK and both barcodes among all specimens equaled 81, 79, 55 and 53%, respectively, and among species (at least one specimen per species barcoded successfully) 95, 95, 74 and 73%, respectively. Success rates for rbcL varied from 33% in Nymphaeaceae specimens to 100% in Alismataceae, Betulaceae, Papaveraceae and Rubiaceae, while for matK success rates ranged from 0% in Boraginaceae, Crassulaceae, Geraniaceae, Liliaceae, Onagraceae, Orobanchaceae, Pinaceae and Saxifragaceae specimens to 100% in Gentianaceae and Papaveraceae. Only in Papaveraceae, the success rate was 100% for both barcoding regions. Previously, Kuzmina et al. (2017) have observed great differences in barcoding success among plant families, especially concerning the matK barcode, where the used primers are not fully universal.

Fig. 1
figure 1

Barcoding success rates for any barcode (rbcL and/or matK), rbcL, matK and both of them among all specimens and species

Among very old samples, 17 were collected between years 1867 and 1899. Only six of them (35%) provided barcode-compliant results, the oldest one being from the year 1880 (Fumaria officinalis, Papaveraceae). Overall, 37 samples were collected between years 1900–1909 and 36 samples between years 1910 and 1919. Among them, 15 (41%) and 24 samples (67%), respectively, gave barcode-compliant results. Figure 2 shows age-specific success rates for rbcL, matK and both in the whole data set and separately for eight families. Since the oldest age groups had only small numbers of samples, old materials formed one group, i.e., collecting before the year 1931. The success rates were tested by a Chi-square test only for rbcL results, because the amplification of matK was poorer even in young material and strongly dependent on factors other than age, largely due to the low performance of standard primers in many plant groups. Among the nine age groups (collecting < 1931, 1931–1940, 1941–1950, 1951–1960, 1961–1970, 1971–1980, 1981–1990, 1991–2000, 2001), the results differed significantly (P < 0.001). The rbcL barcoding success increased steadily from 51.4 to 88.0%, the biggest change being between the age groups 1931–1940 and 1941–1950 (success rates 56.6% and 68.3%, respectively).

Fig. 2
figure 2figure 2

Barcoding success rates for rbcL, matK and both of them in each age group for the whole data set and separately for eight plant families. Barcoding was conducted in 2012–2016. Black columns, the proportion of successful barcodes; gray column, the proportion of unsuccessful barcodes

When comparing rbcL barcoding success statistically in eight individual families, samples were combined into three age groups (< 1950, 1951–1990 and 1991) due to a limited amount of data per family. The Chi-square testing was not possible for two families, Lamiaceae and Poaceae. Among other six tested families, five families showed significant age differences (i.e., a negative age effect), but Plantaginaceae was an exception without any significant age effect. When testing barcoding success across families in each of the three age groups, significant differences among families were found in the age group < 1950 (P > 0.001, Lamiaceae and Poaceae excluded) and 1951–90 (P < 0.001, all eight families), while in the youngest age group 1991- (Lamiaceae and Poaceae excluded), no significant differences among families were detected. Ranunculaceae and Caryophyllaceae were the families with the poorest barcoding performance among old samples. Previously, Kuzmina et al. (2017) have also observed difference in specimen age effects among plant families.

Herbarium samples are commonly used for studies on molecular systematics or for the generation of DNA barcoding libraries. For instance, Lehtonen and Christenhusz (2010) have used historical herbarium specimens to study the molecular systematics of the fern genus Lindsaea using two chloroplast loci (trnL-trnF and trnSGGA-rps4). The age of the samples ranged from 4 to 172 years, and the total success rate was 57%. For Lindsaea, the specimens age was found to be of little importance for sequencing success when less than 75 years, while among older samples the sequencing success reduced considerably. In our study, the greatest decline in the sequencing success occurred for over 100-year-old specimens. Although fresh specimens generally provide a uniformly good sequencing results (H. Korpelainen, pers. obs.) and herbarium samples quite good results (e.g., the present study; Kuzmina et al. 2017), some investigations have reported low success rates for herbarium samples. For instance, in a small-scale investigation, Enan et al. (2017) have reported that the success rates for matK and rbcL barcodes were 90% and 90% for fresh samples, but 40% and 50% for herbarium samples, respectively.

Aging or inappropriately preserved herbarium specimens lead to complications in DNA analyses, such as DNA fragmentation and other quality issues (see, Xu et al. 2015). In addition, the taxonomic origin affects barcoding success (e.g., the present study; Kuzmina et al. 2017). Kuzmina et al. (2017) have hypothesized that in certain plant families, DNA degradation occurs soon after sample collection owing to the presence of compounds that degrade DNA or irreversibly bind to it. One way to solve obstacles caused by poor DNA qualities would be to use improved techniques, for instance, DNA reconstruction (Xu et al. 2015) that enables the amplification of sufficiently long fragments for DNA barcoding and other purposes even for otherwise failing specimens.

Table 1 and Online Resource 2 give the values for intraspecific and interspecific genetic distances across all data and in different plant families. Intraspecific distances per family ranged from 0 to 0.0062 (average 0.0005) and from 0 to 0.0048 (average 0.0007) based on rbcL and matK barcodes, respectively, while interspecific distances ranged from 0 to 0.0216 (average 0.0064) and from 0.0019 to 0.0497 (average 0.0150) based on rbcL and matK, respectively. Mean intraspecific distances were significantly lower than interspecific distances for both rbcL and matK barcodes (P < 0.001). The presence of intraspecific variation means that DNA barcodes can potentially be used as a tool for inferring biogeographic patterns, as also suggested by Costion et al. (2016). However, sampling should be sufficiently comprehensive to reveal geographic patterns in molecular variation. Increasingly, common and cost-efficient whole-chloroplast genome sequencing for plant DNA barcoding purposes will not only provide more accurate identification (Li et al. 2015; Coissac et al. 2016; Zhang et al. 2017) but also improve insights into biogeographic variation patterns.

Table 1 Minimum, maximum and average pairwise intraspecific and interspecific (intra-generic) genetic distances based on DNA barcodes (rbcL and matK)

Both native and introduced species were present in 50 genera, including 276 native species and 118 introduced species (2–3 successfully barcoded specimens per species). Intraspecific genetic distances based on rbcL equaled 0.0021 ± 0.0018 and 0.0016 ± 0.0042 in native and non-native species, respectively. Comparable values based on matK equaled 0.0007 ± 0.0013 and 0.0018 ± 0.0038, respectively. The differences were nonsignificant. Previously, Dlugosch and Parker (2008) have provided evidence that introduced populations possess lower levels of neutral genetic diversity than their conspecific native-range populations. On the other hand, Oduor et al. (2016) have discovered that invasive and native plant species do not differ consistently in the extent and frequency of local adaptation, which supports the view that rapid post-introduction adaptive evolution may enable invasive plant species to persist and expand their ecological niche in introduced ranges. Then again, Bock et al. (2015) have emphasized that numerous unknowns remain in invasion genetics, such as the sources of genetic variation, the role of so-called expansion load, and the relative importance of propagule pressure versus genetic diversity for successful establishment. Our comparison of intraspecific variation on partial chloroplast genes (DNA barcodes rbcL and matK) conducted for native and alien plants (most not considered invasive) showed that there is no significant difference in the genetic variation between these two groups. It remains unresolved, whether this may be due to multiple introductions and recombination or rapid adaptive evolution among alien plants. However, our sampling is not sufficient to allow definite conclusions on the patterns of intraspecific variation in native and alien plants. Yet, DNA barcoding with sufficient sampling is a tool to investigate specific evolutionary questions, such as the adaptive capacity of invasive and other alien plants or biogeographic patterns.