B. cereus group Multi-Scheme MLST Supertree Reconstruction

Here is a brief description about the strategy used for reconstructing multi-gene supertrees for the B. cereus group. The same technique was used to build the global supertree based on all isolates and gene fragments from the 5 MLST schemes as well as scheme-specific supertrees. For further details, please see Tourasse and Kolstø 2008.

Supertrees were built according to the widely used Matrix Representation by Parsimony (MRP) method. Briefly, for each gene a phylogenetic tree is reconstructed by the Maximum Likelihood method with the PHYML 3.0 program (see below). Then, each gene tree is recoded into a binary matrix representing the branching order (i.e., the phylogenetic groupings). All gene tree matrices are concatenated into a supermatrix, in which isolates missing from a particular tree are coded using the "?" character representing unknown data. In this supermatrix, the sequence of 0's, 1's, and ?'s defines the branching profile of a strain. Closely related strains have similar branching profiles. Supertrees are then generated from the supermatrix by the Maximum Parsimony technique. For this we are now using the Trees with New Technology (TNT) software, instead of MIX from the PHYLIP package used previously. The Maximum Parsimony step infers the trees that would require the minimum number of changes between the branching profiles of all isolates, where the unknown characters can take any of the two possible states 0 or 1 (they are not treated as missing gaps). As several trees can be equally parsimonious, the final supertree is taken as the strict consensus of all parsimony trees. Note that TNT is specifically designed for analysis of large datasets and permits ultra-fast supertree building, allowing the global MLST supertree to be built in about four hours on this webserver, compared to 2-3 days with MIX of PHYLIP. In addition, TNT showed an improved accuracy over other parsimony programs (including PAUP and PHYLIP), since the speed and algorithms implemented in TNT enable a broader and more efficient exploration of the tree space, therefore allowing the program to find more parsimonious trees (see Goloboff 1999). To compute branch lengths and obtain statistical support values for all groupings (i.e., internal branches) in the supertree the Maximum Likelihood method and the PHYML 3.0 program were employed. Branch confidence was computed using approximate likelihood-ratio tests (aLRTs) for branches with Shimodaira-Hasegawa-like support values, which estimate for each branch the probability (or p-value) of being significant. The branch supports were computed in about two hours.
Trees for each MLST gene were reconstructed by the Maximum Likelihood method with the PHYML 3.0 program, using the Felsenstein 1984 (F84) nucleotide substitution model supplemented with a gamma distribution (F84+Γ) was used in Maximum Likelihood computations for individual gene trees and the supertree. This model allows for unequal base frequencies, transition/transversion rate bias, and gamma-distributed substitution rate variation among sites. It was empirically chosen as a consensus from exploratory model testing using ModelTest which indicated that models including these three factors were most appropriate for the MLST loci studied, although models for individual loci differed slightly.

A schematic overview of the multi-scheme MLST supertree reconstruction procedure is shown below.

Multi-Scheme MLST Supertree reconstruction summary picture

SuperCAT Home