Connecting systematic and ecological studies using DNA barcoding in a population survey of Drosophilidae ( Diptera ) from Mt Oku ( Cameroon )

Abstract. The characters used in taxonomy to describe new species cannot always be used to identify species in population surveys involving large samples. We used DNA barcoding to validate the taxonomic status of the morphospecies used in an ecological study involving 11 000 individual African drosophilids which had been determined without dissection. Some taxonomic information had been lost by not discriminating between rare species or by mistakenly splitting a morphologically variable species into two groups. However, the original ecological dataset provided a reliable picture of species diversity and the conclusions based on the original dataset are still supported by the molecular data.


Introduction
Reliable procedures to evaluate species diversity are required given the global loss of biodiversity and its consequences for environmental policies.However, environmental studies generally need datasets involving samples containing large numbers of individuals in which species cannot always be determined using the wealth of information provided by taxonomists when describing new taxa, since their methods cannot routinely be applied to a large number of specimens.This occurs frequently in insects, for example in beetles (Oliver & Beattie 1996), mayflies (Bauernfeind & Moog 2000) and chironomids (Carew et al. 2013), in which species often need to be distinguished using cryptic characters like male genitalia and internal parts observed through dissection.This is especially challenging in tropical environments in which many species remain to be described.As a compromise, environmental studies often use operational taxonomic units (OTU) at a lower resolution than the species level, and assume they are sufficiently informative for analysis.These OTUs retain biological meaning as long as each consists of a monophyletic unit containing species sharing morphological, physiological and behavioral characters.The accepted risk is a loss of resolution of the study, since two related species can diverge in a number of characters.This approximation cannot, however, be assessed as long as the ability of the researcher to recognize species has not been evaluated using molecular analysis, inter-specific crosses or examination of museum types.
A useful tool to clarify this point is DNA barcoding, which was designed as a reliable technique for identifying species (Hebert et al. 2003;Monaghan et al. 2005;Janzen et al. 2009;Lukhtanov et al. 2009) and detect cryptic species (Hebert et al. 2004;Yassin et al. 2008).It may also be used for biodiversity assessment in groups where taxonomic information is incomplete or difficult to use (Blaxter et al. 2005;Passmore et al. 2006;Frezal & Leblois 2008) or for large inventories (Telfer et al. 2015).
Here we apply DNA barcoding to a sample derived from a study carried out on a drosophilid community from the Central African montane forest of Mt Oku in Cameroon (Prigent et al. 2013).Drosophilids, an important model in evolutionary biology (Powell 1997;Ashburner et al. 2004;Markow & O'Grady 2005), are also considered a useful marker in ecosystem dynamics studies (Parsons 1991;Davis & Jones 1994;Mata et al. 2008).They cover a large diversity of habitats, and species assemblages vary according to environmental changes in time and space (Krijger & Sevenster 2001;Avondet et al. 2003;Tidon 2006;Prigent et al. 2013).Standardized traps can be used to collect them efficiently in large numbers.
The study took place from October 2008 to October 2009, in the montane forest of Mt Oku, on the Cameroon volcanic line (Prigent et al. 2013).About 11 000 specimens were caught using banana traps over a year at different altitudes between 2200 m and 2800 m, and preserved in 70% ethanol until further determination.Drosophilid identification involves a large number of characters including relative size and shape of body parts, color pattern, wing venation, chaetotaxy and form of the male genitalia.The male genitalia are, in fact, the main structure used to discriminate between taxa.These minute organs usually need to be dissected and are often difficult to examine in ethanol-fixed individuals.Also, in a large survey it is not logistically feasible to dissect every specimen.Moreover, several hundred species are known in Africa (537 valid species according to the TaxoDros database; www.taxodros.uzh.ch) and species were described with varying precision.Old descriptions often lack detail (Coquillett 1902;Adams 1905;Séguy 1933).Some revisions of African taxa include new descriptions for formerly described species within a group (Tsacas 1972(Tsacas , 1980a)), but no comprehensive comparative survey is available for the complete African fauna.The most important works providing a key were carried out on the drosophilid faunas of Ivory Coast (Burla 1954b), South Africa (Tsacas 1990) and Malawi (Chassagnard et al. 1997).Identification remains difficult, since Drosophila includes several species complexes, each containing morphologically similar species (Tsacas 1980b;Tsacas & Lachaise 1981;Tsacas & Chassagnard 1994;Tsacas 2002).
The material was therefore sorted on the basis of visible characters.Morphospecies (i.e., OTUs) were then defined as groups that could be separated from other groups by their external morphology.Determining females is a real challenge given the prevalent use of male genitalia as taxonomic criteria.Females might be expected to share characters with conspecific males.However, coloration patterns are sexually dimorphic in many species and in several species from the very speciose montium subgroup (or group, see Da Lage et al. 2007), females are dimorphic for abdominal color patterns.Sorting the specimens resulted in 62 recognizable morphospecies that were assigned, where possible, to known species according to published descriptions of external morphology (Prigent et al. 2013).Sample size was highly variable across species.One of them, Zaprionus vittiger Coquillett, 1901, made up 81% of the catches, whereas many other species were rare.The spatial and temporal distribution of dominant species was a clear illustration of the high dependence of the drosophilid community on climatic factors.It was therefore essential to assess the taxonomic reliability of this study.
Using DNA barcoding confirmed the correct determination of dominant species.We detected some problems in the correct identification of cryptic species and in the allocation of females to the right species in groups of related species.The delimitation of species remained uncertain in some species due to low divergence between them.Mitochondrial introgression has been reported in some species and thus cannot be excluded in our sample.The conclusions of the environmental study are not modified, given the co-occurrence and low abundance of the species involved in these uncertainties.DNA barcoding actually helped in clarifying temporal or distributional patterns of cryptic species.It should allow comparison with further studies.

Material and methods
We used specimens collected in the framework of the ecological study carried out on Mount Oku (Prigent et al. 2013) along with an additional sample, involving the same morphospecies, collected at Koh Kesoten, a locality close to the sampling place.We extracted DNA from 125 individuals using the Qiagen ® DNA extraction kit.Whole photographs of the individuals were taken under a Leica M165C binocular stereo microscope with coupled numerical imaging, and the legs, wings and genitalia were kept apart as vouchers.The rest of the body of the flies was used for DNA extraction.Forty additional extractions were made from one leg, allowing us to preserve the rest of the specimen as a voucher.All of the material will be deposited at the Muséum national d'Histoire naturelle (MNHN), Paris, when fully identified.Amplification and sequencing were carried out using already published primer sequences for COI (Bouiges et al. 2013).Barcoding was applied to both males and females for each morphospecies, when they were available.Each sequence was blasted against the NCBI sequence database.All available drosophilid COI sequences were downloaded (more than 3700 sequences, including more than 600 species mostly from Drosophila and closely affiliated genera), aligned with our data and analyzed using MEGA5.
The molecular phylogeny of Drosophila has been thoroughly studied using a number of markers (e.g., Pelandakis & Solignac 1993;Remsen & O'Grady 2002;Da Lage et al. 2007;van der Linde et al. 2010), including whole genomes (Drosophila 12 genomes consortium 2007).These studies were searching deep nodes in order to identify the main groups in this very large genus.Their results were of relatively little help to us.There were two reasons for this.First, there was little overlap between the species used in these studies and those from our sample, and thus they could not be used as barcode standards.Second, the purpose of a taxonomic study like ours is not to identify deep nodes but shallow ones.This is a technical issue.Hajibabaei et al. (2007) addressed the question of the best method being used depending on the kind of study being carried out.These authors stated that "the analysis of DNA barcoding data is usually performed by a clustering method, such as distance-based neighbor-joining (NJ)", whereas the construction of phylogenetic trees is carried out "by using optimality criteria such as Maximum Likelihood, Maximum Parsimony, or Bayesian analysis".Austerlitz et al. (2009) compared several phylogenetic and classification methods in analyses using known experimental datasets or data obtained from simulations using known parameter values.They confirmed the good performance of methods aimed at identifying closely related groups, such as nearest neighbor and neighbor-joining.For this reason, we analyzed our data using neighbor-joining, but we also ran a maximum-likelihood analysis on the same data, using the same software (MEGA5) in order to assess congruency between two methods that are widely divergent in their principles of analysis.We used a Kimura-2-p substitution model for neighbor-joining, and a GTR + G + I model for maximum likelihood.

Sequencing success
We obtained 111 sequences out of 165 specimens (by body or leg DNA extraction).Our success in DNA amplification and sequencing was very heterogeneous across samples, but was not linked to time, seasonality or sample size (Fig. 1).We obtained a COI sequence for 52 of the 62 morphospecies (Table 1).Six of the ten unsuccessful assays were OTUs represented by only one individual.Both male and female specimens were successfully sequenced in 40 morphospecies.Only one sex was available in seven other morphospecies (Prigent et al. 2013).In the five remaining morphospecies only one sex was successfully sequenced, although both sexes were available.Prigent et al. (2013) and their taxonomic interpretation based on DNA barcode analysis.Classification: the position of the taxon in the classification of Drosophilidae; the loiciana and the nutrita species complexes were established by Tsacas (2002) and Tsacas & Chassagnard (1994), respectively.The adamsi, brachytarsa and dyaramankana species groups are not formally defined; they are simply putative groups according to Prigent et al. (2013).Morphospecies: the identification of the taxon given in table 1 of Prigent et al. (2013)

Species identification using DNA databases
All COI sequences were blasted against the NCBI sequence database (Table 1).Percent divergence from the closest sequence ranged from 0 to 12% (Fig. 2).The distribution of matches was bimodal, with a first peak around 0-1% divergence and a second one around 8-9%, meaning that some of our taxa were present in the database, whereas others belonged to groups which were virtually absent from it.
Our identification of species (Table 1) was correct with 100% sequence similarity for Drosophila busckii Coquillett, 1901, D. immigrans Sturtevant, 1921, D. yakuba Burla, 1954and D. nikananu Burla, 1954;and  similarity), two related species known to be hardly distinguishable from Z. tuberculatus (Tsacas et al. 1977) Misidentification was observed for females of the melanogaster subgroup.While males of this subgroup are easily determined using genitalia, females are morphologically very similar to each other and their identification requires dissection (Yassin & Orgogozo 2013).Introgression of mitochondrial DNA across species boundaries has been documented between several related species, including D. yakuba, D. teissieri Tsacas, 1971and D. santomea Lachaise & Harry, 2000(Llopart et al. 2005;Bachtrog et al. 2006).In this group, females thought to belong to D. erecta and D. teissieri matched the D. yakuba sequence (100% similarity) and the sequence of the D. teissieri male matched that of its sister species D. yakuba (99% similarity).The differences in abdominal color pattern used to distinguish morphospecies overlapped the polymorphism of D. yakuba and were thus useless for species identification.
In all remaining morphospecies, similarity with available data was so low that it was irrelevant for this study.For example, the best match of Drosophila dyaramankana Burla, 1954, a drosophiline, was with the steganine Leucophenga sp.(90% similarity), and Drosophila sp. 5 matched several Hawaiian Drosophila (89%).Overall, two thirds of our morphospecies corresponded to poorly represented taxa in the COI database, either in terms of species or in terms of higher taxonomic units.

Phylogenetic relatedness
A neighbor joining tree using the whole dataset (not shown) proved to be unreliable for deep nodes, since major groups from the known phylogeny of drosophilids were disrupted and showed incorrect connections.This was expected given the relatively poor information provided by the short barcode fragment, which was designed for identifying purposes by searching the closest neighbors of unidentified taxa.Moreover, the genus Drosophila is a very large taxonomic ensemble which is fragmented into smaller "groups" and "subgroups", involving some endemism.In addition, some distinct genera, including Zaprionus, are actually internal branches of the Drosophila tree.Most groups are not represented in Africa, and a reliable topology of drosophilids can hardly be inferred from a geographically limited sample.
A neighbor-joining tree was constructed for the species of Zaprionus and Microdrosophila aff.mamaru (Burla, 1954) (Fig. 3) and a second for the Sophophora lineage of Drosophila and Lissocephala aff.diola Tsacas & Lachaise, 1979 (Fig. 4).A maximum-likelihood tree was run on the same datasets.A remarkable result is that in each analysis, the two trees were fully congruent for all nodes supported by a 50% bootstrap value or more.The two series of bootstrap values are shown on Figs 3 and 4 (on the neighbor-joining tree), except for one case where the two values were very close, although above and below this threshold: 48% and 52%, respectively.
Below we present and discuss the results obtained for each group of taxa separately.
The most abundant genus in the Mt Oku forest is Zaprionus (Fig. 3), of which we found 16 morphospecies.COI sequences were obtained for 13 of them.Nine of them belonged to the vittiger species group.The Z. indianus morphospecies departed from Z. indianus and was related to Z. africanus, although with a node supported only by an 84-86% bootstrap value in phylogenetic analysis (Fig. 3).The Z. davidi morphospecies was split into two different clusters, as seen above, one of them being the true Z. davidi, the other being closer to Z. proximus.Zaprionus aff.ornatus clustered with Z. taronus, but was Fig. 3. Phylogenetic analysis of the genus Zaprionus and Microdrosophila aff.mamaru (Burla, 1954).This tree is the neighbor-joining tree.The maximum likelihood tree gives the same topology.Nodes with a bootstrap value lower than 50% were merged.Bootstrap values were calculated over 1000 repeats.Above nodes: bootstrap values for maximum likelihood using a GTR + G + I model.Below nodes: bootstrap values for neighbor-joining using the Kimura-2p distance.not closely related to the reference sequence for Z. ornatus.Zaprionus aff.lachaisei matched the true Z. lachaisei.Two specimens misidentified as Z. taronus and Z. vittiger clustered together ("Zaprionus sp.1") and matched the Z. aff.vittiger sequence of Bouiges et al. (2013).The Z. aff.vittiger morphospecies of Prigent et al. (2013) branched within sequences from Z. vittiger, suggesting they belong to the same species.Our two sequences for Z. koroleu Burla, 1954 made up a new lineage related to Z. camerounensis Chassagnard &Tsacas, 1993 andZ. nigranus Yassin et al., 2008.This taxon, however, is different from Z. beninensis Chassagnard & Tsacas, 1993, which is considered a synonym of Z. koroleu (Yassin & David 2010).
Z. tuberculatus belongs to a closely related group of species within the inermis group.Our sequences for Z. tuberculatus were split into three clusters corresponding to Z. tuberculatus, Z. sepsoides and Z. mascariensis (Fig. 3).This was expected, since our ecological study made no attempt to distinguish these three species, which can only be recognized by dissection.Our results indicate that the three of them were present in the study.The three remaining morphospecies, Z. spineus Tsacas & Chassagnard, 1990, Z. momorticus Graber, 1957and Z. badyi Burla, 1954, gave no consistent results.The sequences for male and female for each of them were separate in the tree, suggesting that each individual belonged to a different species.Only two sequences clustering together indicated that the dark male of the Z. momorticus morphospecies belonged to the same species as the yellowish female of the Z. badyi morphospecies.Moreover, our two sequences for the Z. badyi morphospecies lie outside the tuberculatus cluster, in which the database reference for Z. badyi is found.
For the Sophophora subgenus of Drosophila, morphospecies from Mt Oku clustered into four distinct lineages (Fig. 4).They correspond to the melanogaster subgroup, the montium subgroup (both in the melanogaster group), and to two lineages for which no COI sequence has hitherto been available: the dentissima group and the fima group.The two morphospecies recognized for the fima group probably belong to the same species, D. microralis Tsacas, 1981.In the dentissima group two morphospecies were recognized, D. lamottei Tsacas, 1981 andD. aff. matilei Tsacas, 1975.Two females that we thought belonged to the montium subgroup were actually members of the dentissima group.The three D. melanogaster subgroup species from the Mt Oku forest, D. erecta, D. teissieri and D. yakuba, were correctly identified.
We used the sequences of the montium subgroup found in databases and found that African species cluster in a sub-section of the tree (data not shown), also including the sequences of this group from Mt Oku.The D. burlai, D. aff.chauvacae sp. 1 and D. aff.chauvacae sp. 2 morphospecies clustered with the D. burlai database sequence (Fig. 4).Another cluster contained D. bocqueti Tsacas & Lachaise, 1974 and an undetermined morphospecies.The D. aff.megapyga sp. 1 and D. aff.megapyga sp. 2 morphospecies made up a third cluster.The D. seguyi Smart, 1945 and D. bakoue Tsacas & Lachaise, 1974 morphospecies made up a fourth cluster.Six morphospecies were represented by both males and females.In five morphospecies, all females formed clades apart from the males we identified as conspecific.Thus, their determination was erroneous.Of two females sequenced for the D. aff.megapyga sp. 1 morphospecies, one was wrong and the other correct.Thus, females of similar morphology belonged to different species, confirming that the females of this group are difficult to determine.
Among the remaining species, identification was correct in D. immigrans and D. busckii, but unsettled in other species, since sequences did not generally cluster with available sequences.We did find some unexpected associations where OTUs clustered with supposed unrelated taxa.These were those between Microdrosophila aff.mamaru and the genus Zaprionus (Fig. 3) and between Lissocephala aff.diola and the fima group of the subgenus Sophophora (Fig. 4).However, the bootstrap values were always very low (40-60%).Thus, these results may be artefacts resulting from the lack of appropriate barcode references and from the low resolving power of DNA barcodes when divergence is high.
The remaining morphospecies confirmed this conclusion.The closest relatives in the tree were connected with very low support.For example, D. acanthomera Tsacas, 2001 clustered with a group of Hawaiian species with a support of only 35%.None of the Scaptodrosophila morphospecies clustered with a species of this genus which had already been sequenced except for Scaptodrosophila sp. 3.This OTU grouped either with S. riverata Singh & Gupta, 1977 (45%) or with D. immigrans (76%).At such a low level of discrimination and as a result of homoplasy, branching patterns and bootstrap values above the specific level are highly dependent on the size and on the composition of the sample of species used to construct the tree.They are unreliable and do not deserve further consideration.

Consequences for the Mt Oku ecological survey
Our ecological survey included 62 morphospecies (Prigent et al. 2013).Barcodes were obtained for 52 of them.The clustering of the specimens by their barcode suggests that nine morphospecies were duplicates of another species in the sample, as a probable result of color polymorphism within biological species.This does not substantially affect the conclusions of the survey, since the two incorrectly split subsamples occurred together in all cases.Moreover, such cases did not generally involve numerically important species.The COI barcode also showed that some morphospecies actually included several species.This was expected in some well-documented cases, for which external morphology is known to be insufficient to distinguish closely related species.DNA barcoding confirms the presence of two species in each of the Z. davidi, Z. vittiger and Z. taronus morphospecies, and of three species in the Z. tuberculatus morphospecies.This was an accepted lack of precision of the ecological survey, since these species were too numerous to allow for the dissection of all specimens.
Barcoding a stratified sample of morphospecies allowed us to check hypotheses as to underlying species assemblages in species complexes.The ecological distribution of the Z. tuberculatus morphospecies (which is made up of three biological species) had shown two distinct peaks in altitudinal distribution with an absence of flies at intermediate elevations (Prigent et al. 2013).We wondered whether this was due to habitat heterogeneity or to heterogeneity in the specific composition of the samples, given our inability to discriminate the three biological species by external morphology.The sequenced specimens from an elevation of 2300 m belonged to the three species; those from 2700 m belonged to two of them.This suggests that the gap in their distribution probably results from habitat heterogeneity, rather than from taxonomic confusion.
The Z. indianus morphospecies can similarly include three species in this geographical area.It shows a bimodal distribution around the year.A specimen collected in October belongs to the same species as two specimens collected in April.Thus, barcoding does not suggest that distribution heterogeneity results from species heterogeneity, even though appropriate sampling would be necessary to reach a firm conclusion.
Finally, barcoding allowed us to validate the determination of 43 morphospecies.Nine cryptic species had to be added to this count (Table 1), making up a total number of 52 barcoded species.

Discussion
This study took place on the Cameroon volcanic line, which belongs to an endangered biodiversity hotspot (Myers et al. 2000).This concentration of species has been interpreted as resulting from successive cycles of extension and regression of the forest having taken place during the Quaternary (Maley 1996;Plana 2004).During dry periods montane forests were restricted to montane refuges and to gallery forests.During rainy periods they could extend more widely and connect with each other to make up a continuous forest.Drosophilids are an important component of this rich biodiversity.This study, together with an earlier one (Prigent et al. 2013), was designed to record the response of drosophilid species to physical factors contributing to climate variation, including changes in habitat according to season and altitude.In this study the validity of the recognized morphospecies was tested using barcoding.
Consistent with the principle of DNA barcoding, the study was very informative when closely related reference sequences were present in DNA banks and almost intractable when an entire natural group of species was not represented.This occurred on several occasions, since African Drosophila have not been as well studied as those from most other areas.Overall the classification of our morphospecies was correct in the Zaprionus genus and the Sophophora subgenus of Drosophila.It was also correct for the two cosmopolitan species, D. busckii and D. immigrans.Morphospecies classification was more challenging in the Drosophila subgenus and in other genera, probably due to the fact that none of these species were included in the barcode database, and due to a lack of information on related taxa.Four taxa from our study were not represented in the database.The fima group belongs to the subgenus Sophophora but its branching position was uncertain due to a low bootstrap value.Similarly, the dentissima group was associated with the melanogaster subgroup (although with a low bootstrap value of 40-41%, Fig. 4) even though the dentissima group is considered to be distinct from the melanogaster group (Tsacas 1980).This result is, however, in agreement with the proposition of raising the ananassae and montium species subgroups to species groups (Da Lage et al. 2007).A barcode in the genus Lissocephala was sequenced for the first time and branched within the subgenus Sophophora, an unexpected observation needing confirmation, since this genus is generally given a basal position in Drosophilinae based on morphology (Throckmorton 1975) or molecular data (Harry et al. 1996(Harry et al. , 1998;;Yassin 2013).A species which we identified as belonging to the genus Microdrosophila, based on Burla's (1954b) key to drosophilids from Ivory Coast, branches in the genus Zaprionus, an unexpected result since the morphology of this morphospecies differs from that of Zaprionus, which is very homogeneous across species.The lack of definition of barcoding in some drosophilid genera may result from the relatively poor record of African drosophilids in databases, a requirement for such studies (Davison et al. 2009).Thus, the lack of similar works constitutes an extrinsic limitation of our study.Since Drosophila taxonomy relies heavily upon male genitalia, and since most species are sexually dimorphic in color patterns, our results illustrate the poor correspondence between males and females across a number of species.Several members of the melanogaster group show dimorphic females with black and yellow forms (Burla 1954a), presumably evolved through sexual selection (Yassin et al. 2016).Furthermore, females from different species of the montium subgroup are practically indistinguishable.Hence, some females thought to belong to the montium subgroup eventually appeared to actually belong to the dentissima group.
The main conclusion of this study is that DNA barcoding is a great help in making a link between taxonomic and ecological studies.This provides a means to assess our ability to identify species despite the methodological constraints brought about by the quantitative requirements of ecological studies.The results of our former ecological study (Prigent et al. 2013) were practically unaffected by the fact that nine morphospecies represented excessive splitting, and that nine true species had not been recognized.The misidentified species happened to be rare ones.As a rule, abundant species were correctly determined.In three cases we had consciously pooled groups of two or three related species under a common heading ("Zaprionus tuberculatus", "Z.indianus" and "Z.davidi"), since these taxa could be distinguished only through dissection, and since these groups made up very important sample numbers (Prigent et al. 2013).We also knew from previous studies that the ecological similarity of the species within each of these groups would not substantially affect the data.Our barcoding results confirm these assumptions.This positive outcome must, however, be viewed as a warning for possibly less favorable cases.

Fig. 1 .
Fig. 1.Percent success and failure in obtaining a COI sequence from specimens.A. Total number of flies in the sample.B. Date of field collection (month indicated in lowercase Roman numerals).n = number of specimens used.

Fig. 2 .
Fig. 2. Percent divergence of the morphospecies DNA barcode from the closest neighbor found in the barcode database.
The fusion of D. aff.adamsi with D. adamsi Wheeler, 1959 and that of D. aff.microralis with D. microralis add up to a sample size of 20 individuals.Similarly, the Z. aff.vittiger sample size represents only 1.1% of the Z. vittiger catches.The most important changes occurred within the montium subgroup.Pooling the D. aff.chauvacae sp. 1, D. aff.chauvacae sp. 2 and D. burlai morphospecies made up a total of 101 individuals.In the same way, pooling D. aff.megapyga sp. 1 and D. aff.megapyga sp. 2 made up 77 individuals.In view of the total collection of 10 839 specimens, these changes had minor consequences for the statistics of abundance.