COLD

The makefile should be able to deal with the compilation, and installation. [See the installation notes for any issues on your system.] Simply typing make should compile the program with the default options. After this, typing make install should install the program. [You may need root access for this, depending where you want to install the program.] To install with modified default options, simply type make optionname=value, with as many such assignments as desired. The following options can be modified in this way listed with default values:

`NUMTHREADS`	`4`
`BUFFERSIZE`	`50`
`DEFAULTPARAMS`	`parametermatrices`
`DEFAULTVARIABLES`	`.variables`
`DEFAULTMATRIX`	`ECMq.txt`
`DEFAULTNUMTIMES`	`100`
`DEFAULTTREE`	`testtrees`
`DEFAULTDATA`	`testdata`
`DEFAULTMASKFILE`	`masks`
`DEFAULTMODELFILE`	`models`

The only options that are likely to need to be changed are NUMTHREADS, which should probably be set to the number of available processors, but might need to be smaller if memory is a problem [see Choosing the number of threads for more detail]; DEFAULTDATA, which could be changed to a database stored on your system; and DEFAULTTREE which could be set to a file containing the trees that you are particularly interested in. Other default files are specific to cold, and the user could just edit the default files instead.

The runtime configuration file can also be modified using the makefile. This time type make config optionname=value. The following options can be modified:

showeverysite

path

Alternatively the variables file (called .variables by default) can be directly modified.

Other make variables that could be set by the user:

Other make options available:

Possible make problems:

The following programs are required for correct execution of the makefile:

awk

tee

mv

cp

rm

echo

tar

gzip

rmdir

any c++ compiler

If any of these programs is missing, the program may not compile. The compiler options are based on the Gnu c++ compiler (g++ version 4.4.3). To compile with other compilers, set the CFLAGS variable with the appropriate flags for your compiler, i.e.

make CFLAGS=...

The use of awk and tee is for editing the configvars.h header file. If these programs are not available on your system, you can manually edit the configvars.template file to get your configvars.h, then run make with the setting NOAWK=1. (The value 1 is arbitrary, any value will do.) None of the other programs should present a problem. If there is a problem, you can manually compile the main executable cold by just compiling the following c++ files:

Main.cpp CommandLine.cpp Likelihood.cpp Matrix.cpp Miscellaneous.cpp Optimisation.cpp Parameters.cpp Sequence.cpp SignalHandler.cpp Tree.cpp TreeLikelihood.cpp Input.cpp Debug.cpp ErrorHandler.cpp MixTreeLikelihood.cpp

You will need to compile with support for pthreads. [I might try to create a version which can be compiled without this support if there are many problems with this.] Other compilation options are not essential, but using optimisation is recommended, and adding debugging information improves your chance of getting useful replies to any problems.

Choosing the number of threads

As mentioned earlier, the computation of likelihoods for different sites is close to independant. Therefore, the computation lends itself well to parallel computation. If your system has multiple processors, each processor can be working on a different site. Therefore, choosing the number of threads to be the same as the number of processors can minimise computation time.

Unfortunately, reality is more complicated than this. The computation is very memory intensive. Computing a sitewise hessian for a tree with n branches, for a model with p parameters, requires total memory usage of about (3×61/2)n(p²+n) variables of type long double. Typically, somewhere around 8-16 bytes. That is, each thread could need about about n(p²+n) kilobytes of memory. For large trees with lots of parameters, this can quickly use up most of your available memory, actually causing the program to run more slowly. Therefore, if your memory is limited, or if other programs are also using it up, it may be more efficient to use fewer threads. [On Unix-based systems, you can test memory usage using the top command.] [There are probably other ways too.]

As an example of typical memory usage, on my computer, for a tree with 6 species, using a model with 32 parameters, and 4 threads, the memory usage was about 750 megabytes. For a tree with 12 species, it was using 875 megabytes. For a tree with 25 species, it was using about 800 megabytes. For a tree with 349 species it used about 5 gigabytes.

For the tree with 349 species (using sequences of 987 nucleotides) the times for a single hessian calculation with various numbers of threads are as follows:

Number of Threads	Time for a single hessian calculation
1	24m22.164s
2	18m40.653s
3	24m11.497
4	22m35.210s
5	14m??
6	20m??

These tests were done on a computer with 2 2.66GHz intel i5 processors (which support hyperthreading), with 4MB shared L3 cache, and 6GB total RAM.

[Note that the threaded version is not yet optimally implemented. Many values are needlessly calculated once for each thread. Improvements to later versions of the program should increase the benefits from using additional threads.]

Variable	Meaning	Default value
`SHELL`	shell used for executing system commands.	`/bin/sh`
`CXX`	`c++` compiler.	set by OS.
`CFLAGS`	compiler flags.	`-Wall -ggdb -O2 -pthread`
`PACKAGENAME`	the name of the executable file.	`cold.`
`VERSION`	the version of the program.	`1.0.0`
`forceremakeconfig`	indicates whether configuration file needs to be remade.
`forceremakevar`	indicates whether .variable file needs to be remade.

`mainsearchdir`	directory in which application files are based.	`$(HOME)/$(PACKAGENAME)`
`prefix`	directory in which installed files should be stored.	`/usr/local/`
`exec_prefix`	directory in which executable files should be stored.	`$(prefix)`
`bindir`	directory in which the actual binary executable files should be stored.	`$(exec_prefix)/bin`
`DESTDIR`	user input prefix to installation directory.
`installbin`	executable files to be installed.	`$(PACKAGENAME) DNAml Utilities/setup Utilities/reordermatrix Utilities/summats Utilities/stripambiguous`
`installdat`	data files to be installed.	`.variables .flags Models/parametermatrices Models/ECMq.txt Data/testtrees Data/testdata Models/mixture Models/models Models/masks Models/standardmodelmatrices Utilities/codonlistexample.txt Utilities/coldcodonorder.txt Utilities/codonlistalph.txt Models/varnames`
`installdoc`	documentation files to be installed.	`Documentation/cold.info Documentation/files.info Documentation/installation.info Documentation/utilities.info`
`INSTALL`	command for installing programs.	set by OS.

`make all`	default action. Makes all executable files.
`make utilities`	just makes the utility programs `reordermatrix`, `readmatricesfromcols`, `summats` and `stripambiguous`
`make cold`
`make DNAml`
`make Utilities/reordermatrix`
`make Utilities/readmatricesfromcols`
`make Utilities/summats`
`make Utilities/stripambiguous`
`make clean`	deletes all files made by previous makes, leaving just the original distribution files.
`make config`	updates the `.variables file.`
`make dist`	collects all the files in the distribution into a zipped file, ready for distribution.
`make backup`	like `make dist`, but copies the file to a backup directory.
`make uninstall`	deletes all installed files.
`make remove`	collects all the distribution files into a zipped file, then deletes everything except the zipped file.
`make install`	copies the executable files into a directory in the search path, where they can be automatically run.

Codon Optimal Likelihood Discoverer

Installation:

Other make variables that could be set by the user:

Other make options available:

Possible make problems:

Choosing the number of threads