The makefile should be able to deal with the compilation, and installation. [See the installation notes for any issues on your system.] Simply typing make should compile the program with the default options. After this, typing make install should install the program. [You may need root access for this, depending where you want to install the program.] To install with modified default options, simply type make optionname=value, with as many such assignments as desired. The following options can be modified in this way listed with default values:
NUMTHREADS | 4 |
BUFFERSIZE | 50 |
DEFAULTPARAMS | parametermatrices |
DEFAULTVARIABLES | .variables |
DEFAULTMATRIX | ECMq.txt |
DEFAULTNUMTIMES | 100 |
DEFAULTTREE | testtrees |
DEFAULTDATA | testdata |
DEFAULTMASKFILE | masks |
DEFAULTMODELFILE | models |
The only options that are likely to need to be changed are
NUMTHREADS
, which should probably be set to the
number of available processors, but might need to be smaller
if memory is a problem [see Choosing the
number of threads for more
detail]; DEFAULTDATA
, which could be changed to a
database stored on your system; and DEFAULTTREE
which could be set to a file containing the trees that you are
particularly interested in. Other default files are specific
to cold, and the user could just edit the default files
instead.
The runtime configuration file can also be modified using the makefile. This time type make config optionname=value. The following options can be modified:
showeverysite
path
Alternatively the variables file
(called .variables
by default) can be directly
modified.
Variable | Meaning | Default value |
SHELL | shell used for executing system commands. | /bin/sh |
CXX | c++ compiler. | set by OS. |
CFLAGS | compiler flags. | -Wall -ggdb -O2 -pthread |
PACKAGENAME | the name of the executable file. | cold. |
VERSION | the version of the program. | 1.0.0 |
forceremakeconfig | indicates whether configuration file needs to be remade. | |
forceremakevar | indicates whether .variable file needs to be remade. | |
| ||
mainsearchdir | directory in which application files are based. | $(HOME)/$(PACKAGENAME) |
prefix | directory in which installed files should be stored. | /usr/local/ |
exec_prefix | directory in which executable files should be stored. | $(prefix) |
bindir | directory in which the actual binary executable files should be stored. | $(exec_prefix)/bin |
DESTDIR | user input prefix to installation directory. | |
installbin | executable files to be installed. | $(PACKAGENAME) DNAml Utilities/setup
Utilities/reordermatrix Utilities/summats Utilities/stripambiguous |
installdat | data files to be installed. | .variables .flags
Models/parametermatrices Models/ECMq.txt Data/testtrees
Data/testdata Models/mixture Models/models Models/masks
Models/standardmodelmatrices Utilities/codonlistexample.txt
Utilities/coldcodonorder.txt Utilities/codonlistalph.txt Models/varnames |
installdoc | documentation files to be installed. | Documentation/cold.info Documentation/files.info Documentation/installation.info Documentation/utilities.info |
INSTALL | command for installing programs. | set by OS. |
make all | default action. Makes all executable files. |
make utilities | just makes the utility
programs reordermatrix , readmatricesfromcols , summats and stripambiguous |
make cold | |
make DNAml | |
make Utilities/reordermatrix | |
make Utilities/readmatricesfromcols | |
make Utilities/summats | |
make Utilities/stripambiguous | |
make clean | deletes all files made by previous makes, leaving just the original distribution files. |
make config | updates the .variables |
make dist | collects all the files in the distribution into a zipped file, ready for distribution. |
make backup | like make dist, but copies the file to a backup directory. |
make uninstall | deletes all installed files. |
make remove | collects all the distribution files into a zipped file, then deletes everything except the zipped file. |
make install | copies the executable files into a directory in the search path, where they can be automatically run. |
The following programs are required for correct execution of the makefile:
awk
tee
mv
cp
rm
echo
tar
gzip
rmdir
c++
compiler
If any of these programs is missing, the program may not
compile. The compiler options are based on the Gnu c++
compiler
(g++ version 4.4.3
). To compile with other compilers, set the
CFLAGS
variable with the appropriate flags for your
compiler, i.e.
make CFLAGS=...
The use ofawk
and tee
is for editing
the configvars.h
header file. If these programs are
not available on your system, you can manually edit the
configvars.template
file to get
your configvars.h
, then run make with the setting
NOAWK=1
. (The value 1
is arbitrary, any
value will do.) None of the other programs should present a
problem. If there is a problem, you can manually compile the main
executable cold
by just compiling the
following c++
files:
Main.cpp CommandLine.cpp Likelihood.cpp Matrix.cpp
Miscellaneous.cpp Optimisation.cpp Parameters.cpp
Sequence.cpp SignalHandler.cpp Tree.cpp TreeLikelihood.cpp
Input.cpp Debug.cpp ErrorHandler.cpp
MixTreeLikelihood.cpp
You will need to compile with support for pthreads. [I might try to create a version which can be compiled without this support if there are many problems with this.] Other compilation options are not essential, but using optimisation is recommended, and adding debugging information improves your chance of getting useful replies to any problems.
As mentioned earlier, the computation of likelihoods for different sites is close to independant. Therefore, the computation lends itself well to parallel computation. If your system has multiple processors, each processor can be working on a different site. Therefore, choosing the number of threads to be the same as the number of processors can minimise computation time.
Unfortunately, reality is more complicated than this. The
computation is very memory intensive. Computing a sitewise hessian
for a tree with n branches, for a model with p
parameters, requires total memory usage of
about (3×61/2)n(p²+n) variables of
type long double
. Typically, somewhere around 8-16
bytes. That is, each thread could need about
about n(p²+n) kilobytes of memory. For large trees
with lots of parameters, this can quickly use up most of your
available memory, actually causing the program to run more
slowly. Therefore, if your memory is limited, or if other programs
are also using it up, it may be more efficient to use fewer
threads. [On Unix-based systems, you can test memory usage using
the top command.] [There are probably other ways too.]
As an example of typical memory usage, on my computer, for a tree with 6 species, using a model with 32 parameters, and 4 threads, the memory usage was about 750 megabytes. For a tree with 12 species, it was using 875 megabytes. For a tree with 25 species, it was using about 800 megabytes. For a tree with 349 species it used about 5 gigabytes.
For the tree with 349 species (using sequences of 987 nucleotides) the times for a single hessian calculation with various numbers of threads are as follows:
Number of Threads | Time for a single hessian calculation |
1 | 24m22.164s |
2 | 18m40.653s |
3 | 24m11.497 |
4 | 22m35.210s |
5 | 14m?? |
6 | 20m?? |
These tests were done on a computer with 2 2.66GHz intel i5 processors (which support hyperthreading), with 4MB shared L3 cache, and 6GB total RAM.
[Note that the threaded version is not yet optimally implemented. Many values are needlessly calculated once for each thread. Improvements to later versions of the program should increase the benefits from using additional threads.]