Introduction

The Lauraceae are among the most frequent woody plant families in moist tropical areas and include about 55 genera with 2500–3500 species1,2,3. The genus Ocotea Aubl., in its current circumscription, is the largest genus among the Neotropical Lauraceae, consisting of about 400–450 recognized species2,3,4,5,6,7,8. The number of Paleotropical Ocotea species is far smaller. The majority (34 spp.) is endemic to Madagascar, four are found in Continental Africa, three on Mauritius, one on Réunion Island, and one on the Comoro islands. Ocotea foetens (Aiton) Baill. is endemic to Macaronesia3,7.

Most of the molecular phylogenetic studies in Lauraceae published so far focused on other genera or on the major evolutionary lineages in the Lauraceae and included only a relatively small number of Ocotea species9,10,11. Nevertheless, they suggested that Ocotea was paraphyletic with respect to most other New World genera, viz. Aniba Aubl., Damburneya Raf., Dicypellium Nees & Mart., Endlicheria Nees, Kubitzkia van der Werff, Licaria Aubl., Nectandra Rol. ex Rottb., Paraia Rohwer, H.G. Richt. & van der Werff, Pleurothyrium Nees, Rhodostemonodaphne Rohwer & Kubitzki, Umbellularia (Nees) Nuttall and Urbanodendron Mez. A recent study based on RAD-seq data12 added Phyllostemonodaphne Kosterm. to this list. These genera, plus presumably Gamanthera van der Werff and Povedadaphne W.C. Burger, which have not been studied yet, are collectively referred to as the Ocotea complex10 or Supraocotea12, a group of about 950 species. A higher number of Ocotea species than in previous studies, plus representative species of other genera of the Ocotea complex, were studied by Trofimov et al.2 and Trofimov and Rohwer3, with similar results. Using sequences of the nuclear internal transcribed spacer (ITS) and one of the most informative parts of the chloroplast genome, the psbA-trnH spacer, they separated two genera from Ocotea s. lat., namely Mespilodaphne Nees & Mart. and Kuloa Trofimov & Rohwer. In addition, several of the morphological groups described by Rohwer4 were confirmed as monophyletic in these studies. Resolution and/or support values at the lower nodes within the Ocotea complex, however, remained poor. Most of the other established chloroplast markers tested in the research group of the senior author [JGR] (atpB-rbcL, matK, ndhF-rpl32, psbK-psbI, rbcL, rpl16, rpb2, rpl3–trnL, rpl32-trnL, rpoB, rpoC1, trnG–trnS, trnL-trnF, and trnT-trnL) turned out to be less informative in molecular analyses of the Ocotea complex, or problematic because of too many single nucleotide repeats. Therefore, no significant improvement was to be expected from sequencing individual chloroplast markers any more. Sequencing of entire chloroplast (cp) genomes, on the other hand, is expected to yield a higher number of informative characters, which will probably lead to better support for the lower nodes within the Ocotea complex. The present study of selected Ocotea plastomes is intended as a first step towards this goal. The most recent phylogeny by Penagos Zuluaga et al.12 based on RAD-seq data is fully resolved at the lower nodes, with strong bootstrap support for all of the basal and most of the more distal nodes, so that it will provide an ideal basis for comparison with our and future cp genome data.

The chloroplast (cp) genome is a circular molecule ranging in size from 107 to 218 kb. It shows a characteristic quadripartite structure with a pair of inverted repeats (IRs) separating a large single copy (LSC) and a small single copy (SSC) region13,14. The typical angiosperm cp genome consists of 120–130 genes, coding mainly for RNAs and photosynthesis-related genes15.

Up to the present, the plastomes of Lauraceae were studied mainly in Asian species of Actinodaphne Nees, Alseodaphne Nees, Beilschmiedia Nees, Cryptocarya R. Br., Caryodaphnopsis Airy Shaw, Cassytha L., Cinnamomum Schaeff., Dehaasia Blume, Endiandra R. Br., Eusideroxylon Teijsm. & Binn., Iteadaphne Blume, Laurus L., Lindera Thunb., Litsea Lam., Machilus Nees, Neocinnamomum H. Liu, Neolitsea (Benth. & Hook. f.) Merr., Nothaphoebe Blume, Parasassafras D.G. Long, Phoebe Nees, Sassafras J. Presl and Syndiclis Hook. F.16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33. Neotropical species were poorly represented in previous studies of the cp genome. Only Nectandra angustifolia (Schrad.) Nees & Mart. (but see below) and Persea americana Mill. have been studied so far, plus the North American P. borbonia (L.) Spreng.24,25,28. These studies considerably improved support values among the major phylogenetic lineages in the Asian Lauraceae, especially among Cassytha, Caryodaphnopsis and Neocinnamomum. In other plant groups, such as the genus Quercus L., Poaceae-Arundinarieae and Rosaceae, they allowed resolving phylogenetic relationships on different levels34,35,36.

In this study, we sequenced and analyzed the complete chloroplast genomes of six Neotropical and the only Macaronesian Ocotea species using Illumina high-throughput sequencing technology. This is the first study of this kind in this species-rich and ecologically important group. We describe the structure of the plastomes examined, amino acid percentage of protein-coding genes, content of Simple Sequence Repeats (SSRs), relative synonymous codon usage for protein coding nucleotides and variability values in the Ocotea plastomes, and compare them to 85 plastomes of Lauraceae. In addition, we performed a preliminary phylogenetic analysis to show the positions of the seven Ocotea species among 62 plastomes of Core Lauraceae in the sense of Rohwer and Rudolph37 examined so far, i.e. Cinnamomeae, Laureae and Perseeae. The Ocotea complex forms the largest clade within the Cinnamomeae, with the Laureae and the Perseeae as consecutive sister groups. However, this paper does not have a phylogenetic focus but rather provides basic data on chloroplast genomes that may be used in future phylogenetic studies.

Results

Organization of the plastomes of Ocotea

The chloroplast genome sequences of the seven Ocotea species range from 152,630 bp in O. porosa (Nees & Mart.) Barroso to 152,685 bp in O. aciphylla (Nees & Mart.) Mez (Table 1). The plastomes show the typical quadripartite structure of chloroplast genomes. Two inverted repeat (IR) regions (20,009–20,015 bp) are separated by a large single copy (LSC) region (93,815–93,859 bp) and a small single copy (SSC) region (18,775–18,818 bp) (Fig. 1, Table 1). All seven Ocotea plastomes contain a total of 131 genes (114 unique), among which 87 (80 unique) encode proteins (Table 2). The order of genes (if present) is the same in all Lauraceae so far examined. Fourteen genes have one intron (atpF, ndhA, ndhB, rpl2, rpl16, rpoC1, rps12, rps16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC), and two (clpP and pafI) have two introns (Table 2, Supplementary Table S1). The total GC content in plastomes is same in all Ocotea species examined (39.2%; Table 1). Contents of nucleotides in the LSC, IR and SSC of the plastomes were similar in all species of Ocotea examined (Supplementary Table S2). About 30.3–30.4%, 27.3–28.3%, and 32.9% were detected for A; 19.3–19.4%, 21.1–23.4%, and 21.1% for C; 18.6%, 21.0–23.4%, and 18.1% for G; and 31.6–31.7%, 27.2–28.3%, and 33.1% for T, respectively. The GC content in the IR regions was higher than in the LSC and SSC regions (44.4%, vs. 37.9–38.0% and 33.9–34.0%, respectively).

Table 1 Summary of seven complete plastomes of Ocotea species.
Figure 1
figure 1

Gene map of Ocotea species (O. aciphylla, O. daphnifolia, O. foetens, O. guianensis, O. odorifera, O. porosa, and O. tabacifolia) chloroplast genomes. The genes shown on the inside and the outside of the outer circle are transcribed in clockwise and counterclockwise direction, respectively. The coloured bars denote gene functional groups. The dark gray and light gray shading within the inner circle correspond to percentage GC and AT content, respectively. IR inverted repeat, LSC large single copy, SSC small single copy.

Table 2 Genes encoded by seven Ocotea plastomes.

Determination of the most variable regions

The nucleotide diversity (Pi) values within 600 bp across the seven Ocotea plastomes vary from 0 to 0.015, with a mean value of 0.001 (Fig. 2a). Four variable loci with Pi ≥ 0.006 were found in the LSC region (psbA-trnH, Pi = 0.007; ycf2, Pi = 0.006) and in the SSC region (ycf1, Pi = 0.008; ndhH, Pi = 0.008; trnL(UAG)-ndhF, Pi = 0.015). At the family level, sequence divergence was calculated using published chloroplast genomes of Alseodaphne, Cinnamomum, Laurus, Lindera, Litsea, Machilus, Neolitsea, Parasassafras, Persea, Phoebe, and Sassafras (see “Materials and methods” section). Unfortunately, the sequence of Nectandra angustifolia (marked as “unverified” in GenBank) had to be excluded because it differs so strongly from those of all other Core Lauraceae that large parts of it could not be readily aligned. The Pi values among the 69 plastomes vary from 0 to 0.022, with a mean value of 0.0045 (Fig. 2b). Variable loci with Pi > 0.01 were identified in the LSC region (rps16-trnQ, Pi = 0.01; rpoB-psbD, Pi = 0.01; trnT-trnL, Pi = 0.01; rpl23-ycf2, Pi = 0.014) and in the SSC region (ycf1, Pi = 0.019; trnL(UAG)-ycf1, Pi = 0.022). The open reading frames ycf1 and ycf2 are located in one of the IR regions (IRb), at the border of the SSC and the LSC region, respectively.

Figure 2
figure 2

Comparison of the nucleotide variability (Pi) values, (a) among the seven Ocotea plastomes and (b) among 69 plastomes of Lauraceae. The linear gene map of Ocotea species is shown below.

Comparative analysis of plastomes

A comparison of the LSC, IR and SSC junction positions in the Ocotea plastomes is shown in Fig. 3. The ycf1 gene crosses the boundary between the IRb (1408 bp) and the SSC (4163 bp) regions. The ycf2 gene is found in the boundary between the LSC (3852 bp) and the IRb (3162 bp) regions. Fragments (pseudogenes) of ycf1 (1408 bp) and ycf2 (3162 bp) are located in the IRa region. The distances between the ndhF gene and the ycf1 fragment and between the ycf2-fragment and the trnH gene are 21 bp and 27 bp, respectively. The pairwise cp genomic alignment between six Ocotea species and O. aciphylla as reference showed very high similarity in all sequences (Fig. 4). The LSC and SSC regions were more variable in comparison with the IR regions. The noncoding regions showed a relatively higher mutation rate than protein-coding regions in the Ocotea plastomes examined.

Figure 3
figure 3

Comparison of LSC, IR, and SSC junction positions among seven Ocotea chloroplast genomes.

Figure 4
figure 4

Visualization of Ocotea chloroplast genomes using mVISTA program with O. aciphylla as reference. CNS conserved non-coding sequence; UTR untranslated region.

Codon usage analysis

The count of codons in the plastoms examined here were 25,503–25,520 with an average number of about 25,514 (Supplementary Tables S3, S4). The effective number of codons (ENC), Codon Bias Index (CBI) as well the Scaled Chi-square (SChi2) were very similar in all Ocotea plastomes (56.59–56.62; 0.15; 0.073–0.074, respectively) (Supplementary Table S4). The GC content at coding positions is about 39.1% in the examined Ocotea plastomes. The GC contents at second and at third codon positions were also very similar (35.5–35.6%; 39.2% respectively). All possible codon types are used for each amino acid. The most frequent amino acids encoded in the Ocotea plastomes are leucine (Leu; 11.76–11.83%), isoleucine (Ile; 8.05%–8.11%), and serine (Ser; 7.93–8.01%) (Fig. 5). The amino acids arginine (Arg), glycine (Gly), lysine (Lys), phenylalanine (Phe), and valine (Val) account for 5.02–5.95% each. Least represented in the chloroplast genomes examined were cysteine (Cys; 1.81–1.87%) and tryptophan (Trp; 1.94–1.95%). The relative synonymous codon usage (RSCU) was greater than 1.0 in 31 codons (Supplementary Table S3). The count of preferred codons ending with A/U or G/C were 25 and six, respectively. The frequency of different codons coding for the same amino acid was almost the same in all Ocotea species examined. The Macaronesian Ocotea foetens presented slightly higher frequencies for arginine, cysteine, serine, histidine (His), and tyrosine (Tyr) in comparison with the Neotropical Ocotea species, whereas the contents of alanine (Ala), isoleucine and leucine were slightly lower.

Figure 5
figure 5

Amino acid percentage of protein-coding genes in seven Ocotea plastomes.

Simple sequence repeats (SSRs) analysis

The seven chloroplast genomes examined showed a total 586 SSRs with a repeat length of one to six bp (Fig. 6a, Supplementary Table S5). These SSRs were mainly mononucleotide repeats (433 SSRs = 74%) of A or T (417), less frequently C or G (16). In addition, there were 65 dinucleotide repeats (11%), 21 tri- (4%), 55 tetra- (9%), five penta- (1%), and seven hexanucleotide repeats (1%). The numbers of SSRs observed in the different Ocotea species were relatively similar. In each plastome we identified 77–89 SSRs, incl. 57–67 mono-, nine or ten di-, three tri-, seven or eight tetra-, zero, one or two penta-, and zero, one or three hexanucleotide repeats (Fig. 6b). The SSRs were identified mainly in the LSC region (62–70 SSRs; Supplementary Table S5), compared to one or two and 11–17 SSRs in the IR and SSC regions, respectively.

Figure 6
figure 6

Simple sequence repeats (SSRs) in seven chloroplast genomes of Ocotea. (a) Counts of nucleotide repeats; (b) counts of mono-, di-, tri-, tetra-, penta- and hexanucleotides.

Phylogenetic analysis of Lauraceae plastomes

The data matrix consisted of 160,629 characters, among which 5266 were variable but parsimony-uninformative, and 4631 were parsimony-informative. However, only 168 characters were parsimony-informative among the seven Ocotea species in this analysis. Most clades in the Maximum Likelihood analysis received 100% bootstrap support (ML-BS, Fig. 7, Supplementary Fig. S1). With the Perseeae defined as the outgroup, Laureae and Cinnamomeae are shown as sister clades in the ingroup. Among the Cinnamomeae, species of Cinnamomum, with Sassafras nested among them, form the sister group to the seven Ocotea species examined here. The Macaronesian Ocotea foetens is shown as sister taxon to the six Neotropical species. Among these, the two dioecious species, Ocotea guianensis Aubl. and O. tabacifolia (Meisn.) Rohwer, form the sister group to the remaining species, which are bisexual or gynodioecious [O. daphnifolia (Meisn.) Mez]. The latter clade, however, is barely supported (57% ML-BS). Ocotea aciphylla (Nees & Mart.) Mez appears as sister taxon to the remaining species, and among these O. porosa (Nees & Mart.) Barroso is shown as sister taxon to O. daphnifolia and O. odorifera (Vell.) Rohwer.

Figure 7
figure 7

Maximum Likelihood phylogeny including 69 complete chloroplast genome sequences of Core Lauraceae. Color codes: blue—Cinnamomeae, incl. pale blue for Ocotea spp.; pink—Laureae; green—Perseeae. Maximum likelihood bootstrap support values (ML-BS) are shown next to the branches.

Discussion

The genome sizes of the Ocotea species examined in this study are similar to those of other Core Lauraceae16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33, as well as Caryodaphnopsis and Neocinnamomum species25 (Supplementary Table S6). The genomes of Cryptocaryeae (Beilschmiedia, Cryptocarya, Endiandra, Eusideroxylon and Syndiclis) are more than 5000 bp larger25,28. The cp genome of the hemiparasitic Cassytha, on the contrary, is ca. 40,000 bp smaller than those of the Core Lauraceae25. Cassytha has lost not only its functional ndh genes, like many hemiparasitic plants38, but also an entire inverted repeat region.

The seven Ocotea chloroplast genomes show some length variation in all of their parts (LSC, IRs, SSC). Consistently smaller length variation was found among the species of Alseodaphne (3 spp.), Endiandra (4 spp.), Neocinnamomum (2 spp.), Phoebe (3 spp.) and Syndiclis (2 spp.) so far examined25,26,27,28. Larger variation was found among the species of Beilschmiedia (6 spp.), Caryodaphnopsis (3 spp.), Cinnamomum (7 spp.), Litsea (14 spp.) and Persea (3 spp.)16,17,24,25,28,29,30. However, the intrageneric differences are expected to increase, also in Ocotea, with increased number of species examined. It is therefore too early to make statements about the relative length variability in different clades. Surprisingly large intrageneric differences of about 6000 bp in total chloroplast, LSC and IR regions, and about 400 bp in the SSC region were observed among Caryodaphnopsis species, C. henryi, C. malipoensis and C. tonkinensis25,28.

A total of 131 (114 unique) genes were identified in the Ocotea species examined. For most Lauraceae examined so far (species of Actinodaphne, Alseodaphne, Beilschmiedia, Caryodaphnopsis, Cinnamomum, Cryptocarya, Eusideroxylon, Lindera, Machilus, Nectandra, Neocinnamomum, Neolitsea, Persea, Phoebe and Sassafras), the number of genes was indicated as 128–130 (113 unique)23,24,25,26,27,33. Lower numbers (127 total/112 unique) were reported for some species of Actinodaphne, Cinnamomum, Lindera, Litsea, and Neolitsea17,29,30, but only 107 genes (total and unique) in two Cassytha species25.

A total of 87 protein coding genes were identified in the Ocotea species examined. The counts of total protein coding genes in Lauraceae in previous studies ranged from 73 genes in Cassytha species via 79 in Cinnamomum camphora to 86 genes in Caryodaphnopsis henryi Airy Shaw16,25,30. Consistently 85 protein coding genes have been reported for the genera of the early divergent Cryptocaryeae (Beilschmiedia, Cryptocarya and Eusideroxylon), as far as they have been examined. Among the remaining Lauraceae, the most frequent count is 8430. Lower numbers have been reported for Cinnamomum micranthum and C. kanehirae (83)29, Litsea glutinosa (83)17, Cinnamomum camphora (79)16 and two Cassytha species (73)25. The differences among the counts of genes in the Lauraceae species, except the hemiparasitic Cassytha, may be due to different annotation of genes. Particularly the rpl22 gene has not been annotated in most of the earlier studies17,19,23,24,26,27,29,30,33,39,40,41.

The psbA-trnH, ycf1, ycf2, ndhH and trnL(UAG)-ndhF regions were identified as hypervariable loci (Pi ≥ 0.006) at the species level among the Ocotea species examined here. Seven hypervariable regions (Pi > 0.014), ihbA-trnG, ndhA, ndhF-rpl32, psbK-psbI, rps16, trnS-trnG and ycf1 were identified in Lindera species33. The psbA-trnH, ycf2 and ndhH regions are not among the most variable regions in these species. Alseodaphne species show six hypervariable loci (Pi > 0.006), accD-psaI, ndhF-rpl32, rps19-rpl3, rpl32-trnL, trnG-UCC, and ycf126. Seven hypervariable loci (Pi > 0.008), clpP, ndhF-rpl32, rpl32-trpL, rps8-rpl14, trnQ-psbI, ycf1, and ycf2, were identified in Machilus species23. At the family level, we identified additional hypervariable regions (Pi ≥ 0.01) among 69 Core Lauraceae species, viz. rpoB-psbD and trnT-trnL. Zhao et al.33 detected only ndhF-rpl32 and ycf1 as hypervariable loci (Pi > 0.014) among the Core Lauraceae. By comparing the Ocotea plastomes using the mVISTA program42,43, we confirmed that the IR regions are more conservative than the LSC and SSC regions. The LSC and SSC regions comprise more noncoding regions with higher mutation rates. The protein-coding sequences, including 80 genes, were longer in the Ocotea plastomes than in the plastome of Cinnamomum camphora (76,509–76,560 bp = 25,503–25,520 codons vs. 63,654 bp = 21,218 codons)16.

In our study and in Chen et al.16 the codons coding for leucine and for cysteine were the most and the least frequent, respectively. 11.76–11.83% of the codons in Ocotea and 10.87% in Cinnamomum camphora are coding for leucine, whereas only 1.81–1.87% or 1.25%, respectively, are coding for cysteine. Like in C. camphora, preferred codons in Ocotea are more frequently ending in A/U than in G/C (27 vs. two in C. camphora, 25 vs. six in Ocotea).

Simple sequence repeats (SSRs) are widely distributed in chloroplast genomes of Lauraceae. Chen et al.16 detected 81, 82, 83–88, and 86 SSRs in Litsea, Machilus, Cinnamomum and Persea species, respectively. In this study, we found more SSRs in the two Neotropical dioecious Ocotea species examined, O. guianensis and O. tabacifolia, than in the other four Neotropical species (87 vs. 75–82 SSRs), which are bisexual or gynodioecious (O. daphnifolia). The plastome of the Macaronesian Ocotea foetens contains 84 SSRs. It remains to be checked if the number of SSRs is indeed correlated with larger clades within the Ocotea complex. Mononucleotide SSRs are very predominant in the chloroplast sequences of Lauraceae. The counts of mononucleotide SSRs varied from 54 to 65 in Litsea, Machilus, Cinnamomum and Persea species16. In Ocotea, we detected 57–67 mononucleotide SSRs. Among the Neotropical Ocotea species, we found the highest numbers in the two dioecious species (66–67, vs. 57–61 in the other four species). The Macaronesian Ocotea foetens showed 63 mononucleotide SSRs. Hexanucleotide repeats were rare in all Lauraceae species examined so far. No hexanucleotide SSRs were found in Ocotea daphnifolia and O. odorifera. However, we detected three hexanucleotide repeats in Ocotea porosa, instead of only one in most other Lauraceae. The numbers of SSRs in the LSC, SSC and IR regions were similar for all Lauraceae species studied. Chen et al.16 detected 63, 16 and four SSRs in the LSC, SSC and IR regions of Cinnamomum camphora vs. 62–70, 11–17 and one or two SSRs in the Ocotea species in our study.

As expected, addition of the seven Ocotea species does not change the result of the phylogenetic analysis significantly compared to previous cp genome studies28,33. The topology among the major clades, Cinnamomeae, Laureae and the Perseeae, is the same in all studies. As excpected, the seven Ocotea species form a monophyletic group that is sister to Cinnamomum s.lat., i.e., including Sassafras. It is unfortunate that the cp genome of the taxon recorded as ‘UNVERIFIED Nectandra angustifolia’ in GenBank (MF939340) is so divergent from all other Core Lauraceae that large parts of it could not even be aligned. Based on the results of earlier studies2,3,9,10,12, Nectandra was expected to be nested in Ocotea, as sister taxon to the dioecious clade. Not only because of its aberrant sequence it is questionable if the species listed as N. angustifolia in the study of Song et al.25 has been determined correctly. The real N. angustifolia is known from the type collection from Bahia only, so that it appears unlikely that it was cultivated in Sulawesi. Apart from Nectandra angustifolia, no complete plastomes have been sequenced so far in any of the genera that are usually found nested among the Ocotea species (Aniba, Damburneya, Dicypellium, Endlicheria, Kubitzkia, Licaria, Mespilodaphne, Nectandra, Paraia, Pleurothyrium, Rhodostemonodaphne, Umbellularia and Urbanodendron). The number of Ocotea species examined here is still too small to reach any definite conclusions about their phylogeny. There are, however, two differences compared to the recent study by Penagos et al.12. In their study, the Old World Ocotea species (the clade named Palaeocotea) form the sister group to a clade named Praelicaria, which is represented by Ocotea aciphylla, O. odorifera and O. porosa in our study. Ocotea daphnifolia, which is nested among the Praelicaria taxa in our result, is a member of the O. minarum group and as such a member of the Pluriocotea clade in the study by Penagos et al.12. In their result, the Pluriocotea clade is the sister group to a clade consisting of the dioecious taxa (Diocotea, represented by O. guianensis and O. tabacifolia in our study), the O. helicterifolia group and the genera Nectandra, Pleurothyrium and Damburneya, which are not represented in our study. It needs to be checked if these differences persist when further cp genomes become available. As expected, entire plastomes have the potential to increase resolution and support values among the clades of the Ocotea complex. Our phylogeny is fully resolved, and not only the Ocotea complex receives 100% bootstrap support, but also four of the five nodes within it. There is still one node that is scarcely supported, but that may change with denser taxon sampling.

Sequence divergence among the seven Ocotea species is rather low, compared to the most closely related, likewise species-rich genera Cinnamomum, Lindera and Litsea. Even though we selected Ocotea species from widely divergent clades, there were only 168 parsimony-informative characters among them in the entire chloroplast genomes. If we arbitrarily select the first seven species of Cinnamomum, Lindera or Litsea from our data matrix, these numbers are 414, 423 or 410, respectively. This confirms the results of the tests of individual established chloroplast markers mentioned in the introduction, and may point to a rather recent diversification of the Ocotea complex, as was first suggested by Chanderbali et al.10. However, a much larger number of sequences will be required for a molecular clock analysis of this group.

Materials and methods

Plant materials

Silica-gel dried leaf material of seven Ocotea species, O. aciphylla, O. daphnifolia, O. foetens, O. guianensis, O. odorifera, O. porosa, and O. tabacifolia, was used for the present analysis (Supplementary Table S7). According to previous analyses2,3,12, these species belong to different clades within the genus, except Ocotea odorifera and O. porosa from the O. indecora group. The plant material was collected in accordance with the relevant institutional, national, and international guidelines and legislation. PLRM obtained the collecting permits for the material collected in Brazil 2011. Ocotea foetens was collected in the Botanical Garden of Berlin, with permission of the curator G. Parolly, from a tree of unknown origin that had been growing in the garden for decades. Voucher specimens are deposited in the herbarium Rioclarense (HRCB) at the Universidade Estadual Paulista, Rio Claro (Brazil), the herbarium Hamburgense (HBG) at the University of Hamburg (Germany), and the garden herbarium of the Botanical Garden and Botanical Museum Berlin (Germany).

DNA preparation and chloroplast sequencing

DNA was isolated with the innuPREP Plant DNA Kit (Analytik Jena, Germany) according to the manufacturer’s protocol, with modifications9,37. DNA libraries were built using the QIAseq FX DNA Library Kit (Qiagen, Germany) and 120 ng of each DNA. Normalized samples were pooled and sequenced using the 300-cycles (2 × 150 bp paired-end) MiSeq reagent kit v3 (Illumina, San Diego, CA) on a MiSeq platform at the NGS Core Facility at the Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany. The generated raw reads were first checked qualitatively, with Phred quality score < 20 trimmed and filtered to remove polyclonal and low quality reads (< 55 bases long) using CLC workbench v. 20.0.1 (Qiagen).

Plastomes assembly and annotation

Analyses of genome sequence and genomic organization were performed using Geneious Prime 2021.0.344. The generated contigs of Ocotea foetens were assembled de novo and annotated using the plastomes of Cinnamomum camphora (GenBank accession number MH050970) and Persea americana (NC_031189) for comparison. The contigs of the remaining taxa were assembled and annotated using the chloroplast genome of O. foetens as a reference. The contigs were inspected visually for any signs of erroneous assembly. In a few cases, doubtful regions were verified by Sanger sequencing (methods described earlier2,3,9,11). The circular plastome maps of Ocotea were drawn using OGDRAW v1.229,33,34,35,36,37,38,44,45,46,47.

Determination of the most variable regions of plastomes

The chloroplast genomes of seven Ocotea species and 63 other Lauraceae were downloaded from the NCBI GenBank (Supplementary Table S8) All 70 sequences were aligned using MAFFT v748 with default parameters. Visual inspection of the alignment showed that large parts of the sequence of Nectandra angustifolia could not be aligned with confidence, so that this species had to be removed. Ten small inversions (5–39 base pairs), bordered by long palindromic sequences, were identified and reversed, because earlier analyses had shown that the orientation of such hairpin loops varies even within a single population. In the final alignment, these inversions correspond to positions 272–276, 480–487, 29,786–29,810, 66,770–66,809, 69,509–69,522, 70,298–70,314, 117,696–117,701 (in Parasassafras only), 126,376–126,385, 132,449–132,505 and 140,474–140,479 (in Parasassafras only). Also a few additional minor adjustments of the alignment were made manually during inspection of the sequences, mostly in regions of SSRs. DnaSP v649 was used for calculating the nucleotide variability values (Pi) within the plastomes. The sliding window length was set to 600 bp, and the step size was set to 200 bp. Microsoft Excel50 was used to plot the Pi values. These data were used to identify hypervariable regions among the seven Ocotea plastomes examined as well as among the sequences retrieved from the NCBI GenBank (Supplementary Tables S7, S8).

Comparative analysis of Ocotea plastomes

A comparison of the LSC, IR and SSC junction positions in the Ocotea plastomes was carried out in Geneious Prime 2021.0.344. The mVISTA program in Shuffle-LAGAN mode42,43 was used for the visualization of the differences in the seven Ocotea chloroplast genomes.

Codon usage and SSRs analyses

The protein-coding genes of Ocotea plastoms were extracted using the program Geneious Prime 2021.0.344. The sequences were aligned using MAFFT v748. Codon usage frequency, Codon Bias, and G + C content were calculated using the program DnaSP v6.

The SSR motifs were scanned using MISA v2.151. The minimum thresholds were set to 10 repetitions for mononucleotide SSRs, five repeat units for dinucleotide SSRs, four repetitions for trinucleotide SSRs and three repetitions for tetra-, penta- and hexanucleotide SSRs. The maximum length of interruption between two SSRs was chosen as 100 bp.

Phylogenetic analysis of Lauraceae plastomes

The data matrix that had been prepared for the determination of the most variable regions was analyzed using maximum likelihood analyses (ML) in MEGA 10.2.552, with the following parameters: nrep = 500, Tamura-Nei model, uniform rates among sites and Nearest-Neighbor-Interchange (NNI). The chloroplast genomes of the Perseeae (Alseodaphne spp., Machilus spp., Persea americana, and Phoebe spp.) were used as outgroup.