Information on published B. cereus group MLST schemes

Five MLST schemes have been developed for the B. cereus group of bacteria:

 SchemeGenesTotal seq. length (bp)Total number of isolatesaAlso used in
Tourasse, Helgason et al. 2006 (TH) (this website)adk, ccpA, glpF, glpT, panC, pta, and pycA(2)2658497 (478)Fagerlund et al. 2007 and Guinebretiere et al. 2012
Helgason et al. 2004 (H)adk, ccpA, ftsA, glpT, pyrE, recF, and sucC2938441 (424)Ehling−Schulz et al. 2005, Klee et al. 2006, and Olsen et al. 2007.
Ko et al. 2004 (K)gyrB, mbl, mdh, mutS, pycA(1), and rpoB2002371 (352) 
Priest et al. 2004 (P)*glpF, gmk, ilvD, pta, purH, pycA(2), and tpi28292051 (2015)Kim et al. 2005a, Kim et al. 2005b, Barker et al. 2005, Marston et al. 2006, Vassileva et al. 2006, Klee et al. 2006, Vassileva et al. 2007, Didelot and Falush 2007, Bizzarri et al. 2008, Cardazzo et al. 2008, Hoffmaster et al. 2008, Didelot et al. 2009, and Raymond et al. 2010.
Candelon et al. 2004Sorokin et al. 2006 (CS)#clpC, dinB, gdpD, panC, purF, and yhfL2850457 (442) 

a Numbers in parentheses are the numbers of isolates used for analysis in SuperCAT after removal of strains exhibiting conflicting typing or phylogenetic data
 Visit the specific database for the TH scheme at http://mlstoslo.uio.no/
* A specific database system for the P scheme is available at http://pubmlst.org/bcereus/
# A specific BLAST database for the CS scheme is available at http://spock.jouy.inra.fr/cgi-bin/bacilliMLSopen.cgi

Note that, while the TH and P schemes use the same gene fragment for the pycA gene, the K scheme is based on a different and non−overlapping gene region. The B. cereus group−specific transcriptional regulator plcR was originally included in the CS and K schemes. However, plcR follows a phylogeny different from the other MLST loci (see Candelon et al. 2004 and Ko et al. 2004) and is no longer used for MLST; therefore it is not included here.

In the SuperCAT database, all isolates and genes (except plcR) used in all MLST publications cited above have been compiled, together with the completely sequenced strains available in Genbank, giving a total dataset of 2357 strains and 26 gene fragments from 25 genes, and a total sequence length of 10,619 bp. After removal of isolates that exhibit conflicting typing or phylogenetic data, the final dataset was reduced to 2342 isolates. However, since most of the strains have been typed using only 6 or 7 of the 26 loci, about 1/3 of the complete set of sequences are included. Visit the Gene Distribution and Strain Distribution pages to see which gene sequences are available for each strain. Information and sequences for isolates typed by the TH and P schemes were retrieved from the databases devoted to these schemes at http://mlstoslo.uio.no/ and http://pubmlst.org/bcereus/, respectively. MLST data for additional strains and for the other schemes were taken from the published literature and the Genbank nucleotide sequence database.

The TH scheme is a combined scheme based on 3 genes from the H scheme (adk, ccpA, and glpT), 3 genes from the P scheme (glpF, pta, and pycA(2)), and the panC gene from the CS scheme. Otherwise, all schemes are based on different gene sets. However, schemes share a small common subset of isolates. In particular, the strains with complete genome sequences (currently 330 in total), for which all MLST loci are thus available, can be used to join the schemes by supertree analysis (see the Supertree Info page). In addition, there is a large strain overlap between the H and TH schemes. Go to the Scheme Overlap page for detailed information about strain overlap between the schemes.

The positional distribution of the 25 genes on the 5,411,809 bp chromosome of the type strain of B. cereus, ATCC 14579, is shown below (oriC indicates the origin of replication and the genes are colored by MLST scheme; genes used for the combined TH scheme are indicated with an asterisk). The relative positions are similar in the chromosomes of other completely sequenced B. cereus group strains. Note that no gene is located in the ~1 Mbp area surrounding the replication termination region (dif). This region has been shown to be highly variable among B. cereus group strains (see Read et al. 2003)

Gene_locations_picture

Below you will find gene details for each of the 5 MLST schemes.

Tourasse, Helgason et al. 2006 (TH) scheme:

gene locus in B.cereus ATCC 14579 encoded protein genomic position in B.cereus ATCC 14579 total gene length fragment length used for MLST fragment position in the gene
adkBC0152adenylate kinase137017−137667651411163−573
glpTBC0656glycerol−3−phosphate permease / transporter653160−6518111,350330958−1287
glpFBC1034glycerol uptake facilitator protein1014999−1015835822372151−522
panCBC1541pantoate−β−alanine ligase1489715−1490563849350307−656
pycABC3947pyruvate carboxylase3930297−39268513,4473632443−2805
ccpABC4672catabolite control protein A4612709−4611711999418567−150
ptaBC5387phosphate acetyltransferase5304503−5303532972414181−594

The adk gene fragment has been shortened from 450 bp (as originally published in Helgason et al. 2004 ) to 411 bp, by trimming the first 16 bp and the last 23 bp. The glpF fragment has also been shortened from 381 (as originally published in Priest et al. 2004) to 372 bp by trimming the first 9 bp. These changes provide better sequencing reads, as the trimmed regions are close to the primers used for PCR amplification and sequencing.

Helgason et al. 2004 (H) scheme:

gene locus in B.cereus ATCC 14579 encoded protein genomic position in B.cereus ATCC 14579 total gene length fragment length used for MLST fragment position in the gene
recFBC0004DNA replication and repair protein3298−44251,128470194−663
adkBC0152adenylate kinase137017−137667651411163−573
glpTBC0656glycerol−3−phosphate permease / transporter653160−6518111,350330958−1287
sucCBC3834succinyl−CoA synthase, β subunit3814395−38132351,161504431−934
pyrEBC3882orotate phosphoribosyltransferase3862257−386162563340436−439
ftsABC3907cell division protein3888107−38868001,30840147−447
ccpABC4672catabolite control protein A4612709−4611711999418567−150

The adk gene fragment has been shortened from 450 bp (as originally published in Helgason et al. 2004 ) to 411 bp, by trimming the first 16 bp and the last 23 bp. This provides better sequencing reads, as the trimmed regions are close to the primers used for PCR amplification and sequencing.

Ko et al. 2004 (K) scheme:

gene locus in B.cereus ATCC 14579 encoded protein genomic position in B.cereus ATCC 14579 total gene length fragment length used for MLST fragment position in the gene
gyrBBC0005DNA gyrase, B subunit4458−63861,398300735−1034
rpoBBC0122DNA−directed RNA polymerase, β subunit113908−1174413,5343181219−1536
mutSBC3769DNA mismatch repair protein3741019−37383652,6793671837−2203
pycABC3947pyruvate carboxylase3930297−39268513,447379658−1036
mdhBC4592malate dehydrogenase4538999−4538061939278533−256
mblBC5281rod shape−determining protein5191988−51909601,002360463−822

Priest et al. 2004 (P) scheme:

gene locus in B.cereus ATCC 14579 encoded protein genomic position in B.cereus ATCC 14579 total gene length fragment length used for MLST fragment position in the gene
purHBC0333phosphoribosylaminoimidazole carboxamide formyltransferase; inosine−monophosphate cyclohydrolase309016−3105511,536348502−849
glpFBC1034glycerol uptake facilitator protein1014999−1015835822372151−522
ilvDBC1780dihydroxyacid dehydratase1732373−17340341,6743931111−1503
gmkBC3869guanylate kinase (GMP kinase)3849416−384879961850428−531
pycABC3947pyruvate carboxylase3930297−39268513,4473632443−2805
tpiBC5137triosephosphate isomerase5041317−5040562756435319−753
ptaBC5387phosphate acetyltransferase5304503−5303532972414181−594

The glpF fragment has been shortened from 381 (as originally published in Priest et al. 2004) to 372 bp by trimming the first 9 bp. This provides better sequencing reads, as the trimmed region is close to the primer used for PCR amplification and sequencing.

Candelon et al. 2004 − Sorokin et al. 2006 (CS) scheme:

gene locus in B.cereus ATCC 14579 encoded protein genomic position in B.cereus ATCC 14579 total gene length fragment length used for MLST fragment position in the gene
clpCBC0102ATP−dependent protease; negative regulator of genetic competence ClpC/MecB95187−976222,436500264−763
purFBC0330amidophosphoribosyltransferase305846−3072611,416600126−725
gdpDBC0591glycerophosphoryl diester phosphodiesterase579430−5813011,872500268−767
yhfLBC1088Long−chain−fatty−acid−−CoA ligase1073787−10753191,533500482−981
panCBC1541pantoate−β−alanine ligase1489715−1490563849350307−656
dinBBC4142DNA polymerase IV4108856−41076091,239400645−246

SuperCAT Home