PROCOV
PROtein COVarion analysis
Maximum likelihood estimation of
phylogeny under protein covarion models
The
covarion hypothesis
of protein
evolution proposes
that
selective pressures on an amino acid or nucleotide site change
throughout time, rusulting in changes of evolutionary rates of sites
along the branches of a phylogenetic
tree (W. M. Fitch
& E. Markowitz, Biochem. Genet. 4: 479-593, 1970 ). Covarion-like
evolution is now recognized as an important
mode of molecular
evolution in proteins,
structural RNA genes and protein-coding genes. Empirical
studies have shown that phylogenetic estimation under a covarion
model may recover different optimal topologies than when estimation is
performed ignoring covarion effects. Simulation studies have
demonstrated that under
some edge-length conditions, use of rates-across-sites models that
ignore
covarion effects may cause long branch repulsion biases in the
resulting
phylogenetic estimates (Wang, Susko, Spencer & Roger,
2008).
PROCOV
implements a
number of covarion models of protein evolution (Tuffley and
Steel, 1998; Galtier, 2001; Huelsenbeck,
2002; Wang et al.,
2007). It
evaluates the maximum likelihood of a given tree under these covarion
models
and optimize the tree topology using the subtree pruning and regrafting
tree-searching algorithm. Covarion models may be especially useful for
phylogenetic estimation when ancient divergences between sequences have
occurred and rates of evolution at sites are likely to have changed
over the
tree. It can also be used to study functional shifts in protein
families that
result in changes in site-rates in subtrees.
- New version of PROCOV available: Procov_2.0
- Features of Procov_2.0 compared with Procov_1.0:
- Tree searching available
- New user interface - command-line arguments are used
- Three amino acid substitution models available: JTT, WAG and LG
- Implemented numerical libraries (BLAS) for matrix
manupulations, increasing procov running speed by about 3-fold.
- Can print log likelihoods at sites
- Wang H-C, Spencer M., Susko E. & A. J.
Roger, Topological
estimation biases with covarion evolution. J. Mol. Evol. 66:
50-60, 2008. (Simulation studies that show the
impact of covarions on phylogenetic inference. Ignoring covarion
effects may cause long-branch repulsion bias in phylogeny)
- PROCOV history:
- January 2009: Procov_2.0 released: Add tree search function to version1.0.
- January 2007: Procov_1.0 released. This version
evaluates maximum likelihood for a fixed topology and protein alignment.
covTests
implements
three statistical tests
for detecting whether a protein sequence alignment has heterotachy
property. The test statistics are
- The w-statistic:
compares amino acid
substitution patterns between two monophyletic groups of protein
sequences. It
is defined as the difference between the fraction of varied sites in
both
groups and the fraction of varied sites in each group (Lockhart
et al. Mol. Biol. Evol.
15:1183–1188,
1998).
- The w'-statistic: using
site entropy as a measure
of variability of a sequence site, the
w statistic is generalized
to be a w’
statistic by assigning those sites that are varied in both groups but
have a
large entropy difference to those sites that are variable only in one
group,
thus modifying the fractions of varied sites in each group and in both
groups.
- Pearson
correlation
coefficient (r): Under
an rates-across-sites (RAS)
model, the r of site
entropies between two monophyletic
groups is positive. Under
the covarion models, the r is
also positive but smaller than that under RAS, because sites switching
from ON to OFF and from OFF to ON diminishes the
correlation. Under an equal rate (ER) model, the r
is statistically 0.
- Source
code and EF-related data
- Reference: Wang H-C, Susko E. & A. J. Roger, Fast
statistical tests for detecting heterotachy in protein evolution.
Submitted.
Simulating
sequence evolution under covarion models and other heterotachy models:
Seq-gen-aminocov and various versions of Seq-gen
In developing covarion models,
utilizing these models for phylogenetic analyses and developing methods
for detecting covarion / heterotachy signals in protein alignments we
freuently need to simulate sequences under covarion (or other
heterotachy) evolution and compare with sequences simulated under the
RAS or ER models. The sequence simulation programs we often use are
listed below.
- Seq-gen:
a general program developed by Andrew Rambaut and Nicholas Grass for
simulating sequence alignments based on a given phylogenetic tree and
common models of the substitution process, described in Rambaut A &
N C Grassly, Computer Applications in the Biosciences 13:235-238,
1997.
- Seq-gen-cov:
Cécile Ané modified Seq-gen to include two covarion
models (Tuffley and Steel model and Huelsenbeck model for nucleotide
sequences), described in Ané C, Burleigh J, McMahon M & M
Sanderson, Mol. Biol. Evol.
22:914–924, 2005.
- Seq-gen-aminocov:
my former collegue, Matthew Spencer, further modified Seq-gen and
Seq-gen-cov so that the simulator can simulate nucleotide and protein
evolution under various covarion models, including Tuffley and Steel
model, Huelsenbeck model, Galtier model and the general covarion model.
The program is described in Wang H-C, Spencer M., Susko E. & A J
Roger, J. Mol. Evol. 66:
50-60, 2008.
- indel-seq-gen:
developed by Strope C L, Abel K, Scott, S D & E N Moriyama (Mol. Biol. Evol. 26: 2581-2593,
2009) to allow simulating insertion and deletion events and the use of
multiple related root sequences in simulation.
- LineageSpecificSeqgen:
generating sequence data with lineage-specific variation in the
proportion of variable sites, described in Shavit Grievink L, Penny D,
Hendy MD & B R Holland, BMC Evol. Biol. 8:317, 2008.
Last updated:
01/21/2011