Mysterious chokeberries: new data on the diversity and phylogeny of Aronia Medik. (Rosaceae)

. Aronia Medik. (chokeberry, Rosaceae) is a genus of woody shrubs with two or three North American species. Species boundaries and relationships between species of Aronia are frequently under question. The only European species in the genus, A. mitschurinii A.K.Skvortsov & Maitul., is suggested to be an inter-generic hybrid. In order to clarify the relationships between species of Aronia , we performed several morphometric and molecular analyses and found that the molecular and morphological diversity within data on American Aronia is low, and species boundaries are mostly not clearly expressed. Whereas morphology is able to separate American species from A. mitschurinii , there is no support for such discrimination from the molecular data; our analyses did not reveal evidence of A. mitschurinii hybrid origin. We believe that higher-resolution markers are needed to resolve species boundaries and putative hybridization events.

Chokeberries are agriculturally promising for fruit and nutraceuticals, and also have applications as ornamental landscape plants (Brand 2010;Taheri et al. 2013;Connolly 2014).Aronia mitschurinii has bigger and tastier fruits then other aronias and was adopted as valuable agricultural fruit plant across North America (Kask 1987).
We believe that instability in Aronia taxonomy is largely the result of insufficient sampling; thus, our goal was to increase sampling, use expert knowledge to identify or re-identify our samples, and then to employ DNA sequence analyses and advanced morphometry methods in order to perform a comprehensive analysis of Aronia diversity which will clarify species relationships.

DNA sequencing and molecular analysis
We sequenced the nuclear rDNA internal transcribed spacer 2 (ITS2), the chloroplast encoded rbcL gene and trnL-F spacer (Kuzmina & Ivanova 2011) from several representatives of each of five Aronia species (34, 38 and 15, respectively).These markers are commonly used for studying the phylogeny and origins of angiosperms species including Rosaceae (Campbell et al. 2007;Li et al. 2012;Lo & Donoghue 2012;Dluzewska et al. 2013;Zarrei et al. 2014).In addition, we amplified and sequenced the second intron of the nuclear low-copy LEAFY gene which has been considered a useful marker in phylogenetic analysis of Rosaceae (Oh & Potter 2003;Lo et al. 2007;Li et al. 2017).
We obtained 42 tissue samples from herbarium collections in the USA (CAS, F, HUH, JEPS, MO, NY, UC, US) and Russia (MHA).We also used 31 freshly collected leaf tissue samples taken in the field and from the Rosaceae living collection of the University of Connecticut (USA).Field collected material (mostly of A. mitschurinii) was from the Central Russian Tverj and Moscow regions, where tissue samples were obtained from the living plants growing in the cultivation and the living plants that have escaped from the cultivation.The list of samples, all sequences, datasets and scripts are available on Zenodo (https://doi.org/10.5281/zenodo.3276157).All sequences were also submitted to GenBank.
In addition to Aronia, we used samples of Sorbus aucuparia L. (Linnaeus 1753) and of the intergeneric hybrid, × Sorbaronia fallax (C.K.Schneid.)C.K.Schneid.(Sorbus aucuparia × Aronia melanocarpa : Schneider 1906;Connolly 2009).GenBank sequences of S. aucuparia were also used.The first sample is phylogenetically distant from both Aronia and Sorbus s.str.(Sun et al. 2018) and therefore was used as an outgroup in most phylogenetic analyses.Considering problems with Aronia species identification (typical issue is that proper determination requires checking the autumn coloration of fruits : Hardin 1973), GenBank sequences of this genus were not thought to be reliably identified.
DNA was extracted using either a MO BIO PowerPlant DNA Isolation Kit (MO BIO Laboratories, Carlsbad, California, USA), or NUCLEOSPIN Plant II Kit (MACHEREY-NAGEL GmbH & Co. KG, Düren, Germany).Dry plant leaf material (typically, 0.05-0.09g) was powdered using a sterile mortar and pestle and then processed in accordance with the supplied protocol.We increased the lysis time to 30 minutes and used thermomixer on the slow rotation speed (350 rpm) instead of water bath.
Nanodrop 1000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA) was used to assess the concentration and purity (the 260/280 nm ratio of absorbance) of DNA samples.
We sequenced the markers mentioned above using primers and protocols in accordance with recommendations of the Barcoding of Life Consortium (Kuzmina & Ivanova 2011) and the LFY1/LFY2 primers (Oh & Potter 2003).PCR was carried out as follows: the reaction mixture in a total volume of 20 μL contained 5.2 μL of PCR Master Mix (components from QIAGEN Corporation, Germantown, Maryland), 1 μL of 10 μM forward and reverse primers, 1 μL of DNA solution from the extraction above and 11.8 μL purified water.Samples were incubated in a thermal cycler: 94° for 5 min, then 35 cycles of 94° for 1 min; 51° for 1 min, 72° for 2 min, and finally 72° for 10 min.Regardless to the marker and species, we always received single band PCR products.They were sent for purification and sequencing to Functional Biosciences, Inc. (Madison, Wyoming) and sequenced there in accordance with standard protocol.Sequences were obtained, assembled and edited using Sequencher  ) and ClustalX (Thompson et al. 1997) using gap opening cost 9, gap extension cost 0.05 and IUB weight matrix.
In total, we produced 309 DNA sequences, and after the preliminary analysis, selected the 113 longest and minimally noisy sequences for further research.Those selected sequences belonged to 42 individual plants from all species of Aronia (12-24 sequences per species).In terms of the markers, most successful were the amplification and subsequent sequencing of rbcL (35 sequences) and the least successful the amplification of the first variant of LEAFY second intron (6 sequences).Phylogenetic analyses were implemented with the help of the R ape and phangorn packages (Paradis et al. 2004;Schliep 2011).We used the Kimura distance, neighbor-joining and maximum likelihood (under a GTR model with gamma distribution selected using the Akaike information criterion implemented in R) trees for the most of our calculations.

Morphometric analyses
We gathered morphometric data from 436 herbarium samples and living plants of Aronia (including 187 samples of A. mitschurinii).We applied several methods of multivariate analyses using both 'classical' and geometric morphometrics approaches.As the dried herbarium samples might suffer from shrinking (Volkova et al. 2011), we compared the fresh and the dried herbarium samples of A. mitschurinii to calculate the shrinking coefficient.This coefficient was less than 4%, therefore shrinking was not taken into account in our geometric morphometrics analyzes.
Several leaf-related morphological characters were measured, including 'classic' petiole and leaf length, leaf maximal width and position of maximal width.In geometric morphometrics analyses, we used the standard thin plate spline approach (Zelditch et al. 2012) with 10 landmarks located on the tip and base of the leaf plus on the points of maximal curvature on the leaf contour (see Fig. 7 for the examples).Coordinates of the landmarks were written to the data file with tpsDig (Rohlf 2010).Consensus configuration, values of principal relative and partial warps (which characterize the degree of differences between the specimen and consensus configuration), were calculated with the R geomorph package (Adams & Otarola-Castillo 2013) which implements TPS analysis in a way similar to principal component analysis (PCA).Principal component analysis of the relative warps matrix was employed for the classification of leaf shapes, similarly to the way described elsewhere (Shipunov & Bateman 2005).
Many statistical methods have the ability to combine different types of data, and we used this to study the joint matrix that includes morphological characters and haplotype data of exactly the same samples (similar to Shipunov et al. 2004).We employed model-based clustering (Scrucca et al. 2016)

DNA sequencing and molecular analysis
Most of the analyses were based on ITS2 and rbcL alignments.Analysis of the trnL-F and LEAFY data was more problematic, since they did not amplify well from many of the studied herbarium samples; therefore, these markers were analyzed separately.The resulting trees always had extremely short branches (Fig. 1).Bootstrap was high enough (> 97%) to support only the whole Aronia group.Within Aronia, there were no clades receiving high statistical support.
Within each locus, most of our sequences differ only by several nucleotides.Across all sequences analyzed, we found ten haplotypes.
We counted only three haplotypes from A. mitschurinii samples whereas the more diverse sets were found in A. melanocarpa and A. arbutifolia (Fig. 2).As our phylogeny trees are not highly informative, we employed the cluster analysis of the haplotype occurrence (Fig. 3) which revealed the tendency of A. arbutifolia samples to group in one cluster (7 out of 10 are in one group).Aronia mitschurinii samples did not form a cluster, and grouped together with the bulk of other Aronia.Data from our trnL-F sequences were generally in agreement with rbcL data.Only two haplotypes were found, one the most frequent and the other specific to A. arbutifolia.
We were able to sequence both variants (potential paralogs) of LEAFY second intron (Burgess et al. 2015), and therefore constructed two separate datasets, one for each variant.From × Sorbaronia fallax, we amplified the second copy, which was identical to the same copy of Sorbus aucuparia (GenBank ID KT834297).Each LEAFY dataset contained at least one A. mitschurinii sample.In the analyses of both datasets, A. mitschurinii samples always robustly grouped with A. melanocarpa and did not group with Sorbus.This pattern is illustrated on Fig. 4 which reflects relationships within the second, most samplerich LEAFY dataset.

Morphometric and combined analyses
The principal component analysis (PCA) of morphometric data (Fig. 5) was able to clearly separate the A. mitschurinii samples.Samples of other species did not form clusters.
In the case of joint matrix (morphological characters + haplotype data), the scarcity of A. mitschurinii data did not allow for robust conclusions, but it is remarkable that these two A. mitschurinii samples occupy a separate branch on the minimum spanning tree and therefore distinctive (Fig. 6) from all other Aronia.
Geometric morphometrics was first used to produce the consensus shapes for each of our species.These shapes demonstrated the bending required to make imaginary thin plate splines to converge.and A. floribunda require more modifications of their average shapes.PCA ordination of relative warps provided another view on the diversity of our samples.It was similar to the morphology ordination, and separated A. mitschurinii whereas other species were intermixed (Fig. 8).
We also tried two combined approaches which included geometric morphometrics data together with (1) morphology and (2) morphology + haplotype data from the same samples.PCA was not really helpful for the analysis of geometric morphometrics + morphology data, but t-SNE (van der Maaten & Hinton 2008) allowed to arrange the samples in more understandable way (Fig. 9); here most of A. mitschurinii samples were clearly separated from the rest of Aronia.On the other hand, the model-based clustering (Scrucca et al. 2016) returned the clustering model with only one component (i.e., no internal clusters).

Discussion
Our phylogenetic analyses show that the overall genetic and morphological diversity within Aronia is low, and no clear species boundaries could be detected based on DNA data.In our morphological and combined analyses, only A. mitschurinii could be separated with confidence.These results are in contrast to Connolly (2014) where four Aronia species could be morphologically distinguished.
This discrepancy between morphological and molecular data could mean that Aronia still awaits more advanced methods of data collection and data analysis.However, considering the diversity of our approaches which include phylogeny trees, haplotype analysis, classic and geometric morphometry and combined integrative analysis, the possible conclusion is that at least some borders within Aronia are not expressed as it is typical in neighboring genera (Guo et al. 2012, Li et al. 2012) but may delimit only subspecific or lower level variation; it is even possible that most of the Aronia samples represent one polymorphic species.
No clear similarities, morphological or genetic, between any Aronia and Sorbus aucuparia were discovered.The × Sorbaronia fallax sample has the ITS2 and second copy of LEAFY second intron identical to Sorbus aucuparia, indicative of a hybridization event between Sorbus aucuparia and Aronia melanocarpa.However, this pattern was not present in our samples of A. mitschurinii.Therefore, we believe that the hybrid origin of A. mitschurinii is under question.This hypothesis has an independent support from observations on A. mitschurinii propagation which almost always results in the uniform progeny (Vinogradova & Kuklina 2014).In addition, whereas morphology alone allows for the separation of A. mitschurinii from other species, there is no support of such dissimilarity from molecular data.DNA-wise, A. mitschurinii does not differ from the other Aronia species.
The apparent conflict between AFLP (Leonard et al. 2013) and our sequencing analysis could be due to over-sensitivity of the first analysis (Pelser et al. 2003).It is important to note that Leonard et al. (2013) did not show the affinity of A. mitschurinii to the Sorbus, they have had only found similarities with × Sorbaronia hybrids.
It is also possible that A. mitschurinii is a backcross to Aronia, and the Sorbus genome is largely lost.Therefore, we cannot fully exclude the possibility that thorough cloning of multiple markers or highthroughput sequencing, might reveal the traces of others genomes in A. mitschurinii.Thus, the next step in Aronia phylogeny research should involve more high resolution molecular markers, especially multilocus nuclear data.
'Lake Moldino' biological station, and we thank the station staff for the assistance.We thank the College of Art and Sciences and the Department of Biology of Minot State University for the financial support.
From May 2014, this research is supported by North Dakota INBRE.

Fig. 1 .
Fig. 1.ML tree resulted from the analysis of concatenated ITS2 and rbcL sequences. 2014

Fig. 2 .
Fig. 2.Abundance of haplotypes per species.Each rectangle is a species.The first letter of haplotype name represent the marker: i for ITS2, r for rbcL and t for trnL-F.Dots correspond with haplotype counts.

Fig. 3 .
Fig. 3. Cluster analysis (binary distance, Ward clustering method) of the occurrence of eight basic rbcL and ITS2 haplotypes.

Fig. 5 .
Fig. 5. Ordination on the plane of first two principal components from the analysis of morphological data.

Fig. 6 .
Fig.6.Ordination from multidimensional scaling of the Gower distance matrix from the combined dataset of haplotypes occurrence and morphology.

Fig. 7 .
Fig. 7.Transformation grids of leaf shape from four Aronia species, with deformations required to reach the overall average shape.

Fig. 8 .
Fig. 8. Ordination from the principal component analysis of relative warps.