TreePics: visualizing trees with pictures

. While many programs are available to edit phylogenetic trees, associating pictures with branch tips in an efficient and automatic way is not an available option. Here, we present TreePics, a standalone software that uses a web browser to visualize phylogenetic trees in Newick format and that associates pictures (typically, pictures of the voucher specimens) to the tip of each branch. Pictures are visualized as thumbnails and can be enlarged by a mouse rollover. Further, several pictures can be selected and displayed in a separate window for visual comparison. TreePics works either online or in a full standalone version, where it can display trees with several thousands of pictures (depending on the memory available). We argue that TreePics can be particularly useful in a preliminary stage of research, such as to quickly detect conflicts between a DNA-based phylogenetic tree and morphological variation, that may be due to contamination that needs to be removed prior to final analyses, or the presence of species complexes.


Introduction
The DNA-barcoding initiative (Hebert et al. 2003) and Next-Generation Sequencing technologies have dramatically increased the amount of molecular data available for biodiversity studies. It is now common to manipulate datasets of 100s or even 1000s of haplotypes for a given genetic marker. Typically, DNA barcodes obtained from several specimens in several species are compared to test whether intra-and interspecies diversity matches (or not) the accepted species hypotheses, which are generally based on morphological characters. Such comparisons will eventually lead to detecting highly structured groups within a given species, then used as primary species hypotheses in an integrative taxonomy approach (Dayrat 2005).
There is a need for efficient bioinformatic tools to handle increasingly larger datasets, and many tools have been proposed to turn raw data in a wide range of input formats for analytic software. One critical step is to detect potential conflicts between DNA sequences and the phenotypes of the specimens from which the sequences were obtained (the vouchers). Such conflicts could actually correspond to "true" discrepancies, i.e., several morphs correspond to a single haplotype (or closely related and highly similar haplotypes) due to, e.g., rapid adaptive radiation, incomplete lineage sorting or phenotypic plasticity, or conversely, one morph actually encompasses several molecular units due to, e.g., convergence or retention of ancestral polymorphism (e.g., Sangster 2009). However, contamination by other living organisms can also happen, even if evidence for that in the literature is not common (see, e.g., Schmidt et al. 1995): negative results are rarely published and in most cases contamination is removed prior to final analyses. While some contaminations are easy to detect (e.g., contamination by very distant organisms), others are more tricky to identify (e.g., contamination by a closely related species). The BLAST tool in GenBank is commonly used to detect such contaminations, but many species are not represented in GenBank, and detecting contaminations often requires comparing sequences from closely related species. Thus, one way to efficiently detect such discrepancies is to visualize the data with a phylogenetic tree and compare the phenotypes of the sequenced specimens in the order they appear in the tree. While numerous standalone programs are available to visualize phylogenetic trees (e.g., FigTree (Rambaut & Drummond 2009); Treeview (Page 2002); MrEnt (Zuccon & Zuccon 2010);…) none of them offer, to our knowledge, the possibility to automatically display pictures associated to the sequenced specimens in a way that enables a direct comparison of a tree and hundreds of associated images. In our case, this gap in the analytical workflow appeared when we started to accumulate barcoding data from the specimens preserved in the collection of the MNHN (Muséum national d'Histoire naturelle) and stored in its databases (Puillandre et al. 2012). Sample contamination occurred regularly because we were working with universal primers on specimens that were potentially not perfectly preserved and collected over several decades. To allow a rapid identification of conflicts between DNA phylogenies and morphological characters, we present the program TreePics as a webtool and as a standalone software. TreePics can be used to visualize a tree in Newick format and, at the tip of each branch, a picture corresponding to the specimen (typically, a picture of the voucher, but potentially any picture related to it, a detail of the voucher, an icon related to a phenotypic trait,…). It is not designed to edit a tree, and can thus be used in combination with other programs that are dedicated to tree editing. The ETE toolkit (Huerta-Cepas et al. 2016) is also able to associate pictures to branch tips, but it needs to be used within a Python framework, where pictures and taxa names are associated manually. TreePics has been designed to automatically associate hundreds or thousands of pictures to a phylogenetic tree, using only a web browser.

General functioning
TreePics is accessible through a web browser. It is based on the web standards HTML5, CSS3, javascript and the library jQuery. The input tree in the Newick format is converted in the SVG (Scalable Vector Graphics) format using the library jsPhyloSVG (Smits & Ouverney 2010) to obtain a vector graphic representation of the tree which is understood natively by modern browsers.
All operations in the web browser are performed in javascript, and all the operations are done locally in the web browser. TreePics is thus independent from any application server, and the pictures do not need to be uploaded (either on the server or on cloud storage). Because the size of the pictures can easily reach several gigabytes, this step would have been highly time-and space-consuming.
If a web application is not constrained by the heterogeneity of the operating systems, it must nonetheless take into account the security rules imposed by the web browser. Indeed, a web browser cannot access local files without an explicit request from the user. This request is possible through different steps in the two versions of TreePics we propose: 1. The web version available at https://treepics.mnhn.fr builds a tree in two steps (Fig. 1): first, the pictures that will eventually be displayed are selected, and second, the Newick tree is uploaded. In the http protocol, the web browser does not allow a deferred use of local resources (the pictures) and the pictures should be embedded in the web browser when they are selected. To do that, they are converted in base64-encoded images by the web browser and directly associated to the SVG tree. This methodology reduces the process to only two steps, but the memory used (number of pictures × size of each picture) can limit the number of pictures selected. We translated TreePics into two languages (French and English), automatically detected from the language preference by the web browser; it is also possible to switch from one language to the other.

2.
A full standalone version (downloadable at https://treepics.mnhn.fr) can be installed locally. The process is here divided into three steps: in addition and before the two steps of the web version, the user needs to provide the path to the folder that contains the pictures. As in the web version, the operating system, and not the web browser, will manage the local resources. However, the web browser obtains the data stream (base64) or a list of file names, but not the local path, which needs to be user-specified. This adds a step to the process compared to the web version, but the standalone version does not necessitate an internet connection and reduces the memory needed. The standalone TreePics also exists in French and English and the language is requested when downloading.

Data input and visualization
Two types of files are provided: the pictures, in JPEG or PNG format, and the tree, in Newick format (https://en.wikipedia.org/wiki/Newick_format). Values for the tree nodes (typically, statistical support) provided in the Newick tree will be displayed by TreePics. An example (a tree with the associated pictures) is provided in the supplementary material 1.
To associate the pictures to the names of the taxa in the tree, TreePics will compare the names of the pictures and the names of the taxa in the tree. The rule is simple: the name of the picture should be included entirely (without the extension) in the name of the taxon in the tree. The name of the picture should contain only alphanumeric characters, dashes ("-") and underscores ("_"); in addition to the constraints linked to the OS (e.g., no ~@#$%^&*() for UNIX), it should not contain an exclamation point (!) or quotes ('), as these are used by TreePics as internal delimiters.
When a picture is associated to a name in the tree, the part of the name corresponding to the name of the picture appears in blue. Pictures are visualized as thumbnails at the tip of each branch, and when the mouse rolls over a thumbnail, a pop-up window opens with a zoom of the picture. A click on a thumbnail selects it and places it in a basket below the tree. The pictures placed in the basket can be visualized in a larger format in a separate window for easy comparison (Fig. 2). Pictures can be unselected either by clicking on the thumbnail or by clicking on the red cross in the basket. A click on the zoom icon on the picture will open it in a new tab on full screen. Finally, clicking on the "print" button will save the tree with the thumbnails in pdf format. The selected thumbnails (red squares) also appear in the list below the tree.

Restrictions
Because TreePics uses a web browser, the problems linked to the different operating systems are minimized, but not totally removed. In particular, the classic rules for naming the files should be followed (e.g., no space on Mac). We tested several web browsers: while Google Chrome makes better use of the memory than Firefox, it also limits the number of files that can be selected (according to their size) compared to Firefox. Using a PC with 16GB RAM and Firefox, we were able to visualize a tree with 3000 pictures. In any case, the full standalone version is more robust and should be preferred for large datasets.

Data accessibility
The web version can be accessed at https://treepics.mnhn.fr. The full standalone version can be downloaded on the same page.

Supplementary material
A test dataset, with a tree and the associated pictures (supplementary material).