Codon Optimal Likelihood Discoverer


Installation:

The makefile should be able to deal with the compilation, and installation. [See the installation notes for any issues on your system.] Simply typing make should compile the program with the default options. After this, typing make install should install the program. [You may need root access for this, depending where you want to install the program.] To install with modified default options, simply type make optionname=value, with as many such assignments as desired. The following options can be modified in this way listed with default values:

NUMTHREADS4
BUFFERSIZE50
DEFAULTPARAMSparametermatrices
DEFAULTVARIABLES.variables
DEFAULTMATRIXECMq.txt
DEFAULTNUMTIMES100
DEFAULTTREEtesttrees
DEFAULTDATAtestdata
DEFAULTMASKFILEmasks
DEFAULTMODELFILEmodels

The only options that are likely to need to be changed are NUMTHREADS, which should probably be set to the number of available processors, but might need to be smaller if memory is a problem [see Choosing the number of threads for more detail]; DEFAULTDATA, which could be changed to a database stored on your system; and DEFAULTTREE which could be set to a file containing the trees that you are particularly interested in. Other default files are specific to cold, and the user could just edit the default files instead.

The runtime configuration file can also be modified using the makefile. This time type make config optionname=value. The following options can be modified:

  • showeverysite
  • path
  • Alternatively the variables file (called .variables by default) can be directly modified.

    Other make variables that could be set by the user:

    VariableMeaningDefault value
    SHELLshell used for executing system commands. /bin/sh
    CXXc++ compiler. set by OS.
    CFLAGScompiler flags. -Wall -ggdb -O2 -pthread
    PACKAGENAME the name of the executable file. cold.
    VERSIONthe version of the program. 1.0.0
    forceremakeconfigindicates whether configuration file needs to be remade.
    forceremakevarindicates whether .variable file needs to be remade.
    mainsearchdir directory in which application files are based. $(HOME)/$(PACKAGENAME)
    prefixdirectory in which installed files should be stored. /usr/local/
    exec_prefixdirectory in which executable files should be stored. $(prefix)
    bindirdirectory in which the actual binary executable files should be stored.$(exec_prefix)/bin
    DESTDIRuser input prefix to installation directory.
    installbinexecutable files to be installed. $(PACKAGENAME) DNAml Utilities/setup Utilities/reordermatrix Utilities/summats Utilities/stripambiguous
    installdatdata files to be installed. .variables .flags Models/parametermatrices Models/ECMq.txt Data/testtrees Data/testdata Models/mixture Models/models Models/masks Models/standardmodelmatrices Utilities/codonlistexample.txt Utilities/coldcodonorder.txt Utilities/codonlistalph.txt Models/varnames
    installdocdocumentation files to be installed. Documentation/cold.info Documentation/files.info Documentation/installation.info Documentation/utilities.info
    INSTALLcommand for installing programs. set by OS.

    Other make options available:

    make all default action. Makes all executable files.
    make utilitiesjust makes the utility programs reordermatrix, readmatricesfromcols, summats and stripambiguous
    make cold
    make DNAml
    make Utilities/reordermatrix
    make Utilities/readmatricesfromcols
    make Utilities/summats
    make Utilities/stripambiguous
    make cleandeletes all files made by previous makes, leaving just the original distribution files.
    make config updates the .variables file.
    make distcollects all the files in the distribution into a zipped file, ready for distribution.
    make backup like make dist, but copies the file to a backup directory.
    make uninstalldeletes all installed files.
    make removecollects all the distribution files into a zipped file, then deletes everything except the zipped file.
    make installcopies the executable files into a directory in the search path, where they can be automatically run.

    Possible make problems:

    The following programs are required for correct execution of the makefile:

  • awk
  • tee
  • mv
  • cp
  • rm
  • echo
  • tar
  • gzip
  • rmdir
  • any c++ compiler
  • If any of these programs is missing, the program may not compile. The compiler options are based on the Gnu c++ compiler (g++ version 4.4.3). To compile with other compilers, set the CFLAGS variable with the appropriate flags for your compiler, i.e.

    make CFLAGS=...

    The use of awk and tee is for editing the configvars.h header file. If these programs are not available on your system, you can manually edit the configvars.template file to get your configvars.h, then run make with the setting NOAWK=1. (The value 1 is arbitrary, any value will do.) None of the other programs should present a problem. If there is a problem, you can manually compile the main executable cold by just compiling the following c++ files:

    Main.cpp CommandLine.cpp Likelihood.cpp Matrix.cpp Miscellaneous.cpp Optimisation.cpp Parameters.cpp Sequence.cpp SignalHandler.cpp Tree.cpp TreeLikelihood.cpp Input.cpp Debug.cpp ErrorHandler.cpp MixTreeLikelihood.cpp

    You will need to compile with support for pthreads. [I might try to create a version which can be compiled without this support if there are many problems with this.] Other compilation options are not essential, but using optimisation is recommended, and adding debugging information improves your chance of getting useful replies to any problems.

    Choosing the number of threads

    As mentioned earlier, the computation of likelihoods for different sites is close to independant. Therefore, the computation lends itself well to parallel computation. If your system has multiple processors, each processor can be working on a different site. Therefore, choosing the number of threads to be the same as the number of processors can minimise computation time.

    Unfortunately, reality is more complicated than this. The computation is very memory intensive. Computing a sitewise hessian for a tree with n branches, for a model with p parameters, requires total memory usage of about (3×61/2)n(p²+n) variables of type long double. Typically, somewhere around 8-16 bytes. That is, each thread could need about about n(p²+n) kilobytes of memory. For large trees with lots of parameters, this can quickly use up most of your available memory, actually causing the program to run more slowly. Therefore, if your memory is limited, or if other programs are also using it up, it may be more efficient to use fewer threads. [On Unix-based systems, you can test memory usage using the top command.] [There are probably other ways too.]

    As an example of typical memory usage, on my computer, for a tree with 6 species, using a model with 32 parameters, and 4 threads, the memory usage was about 750 megabytes. For a tree with 12 species, it was using 875 megabytes. For a tree with 25 species, it was using about 800 megabytes. For a tree with 349 species it used about 5 gigabytes.

    For the tree with 349 species (using sequences of 987 nucleotides) the times for a single hessian calculation with various numbers of threads are as follows:

    Number of ThreadsTime for a single hessian calculation
    124m22.164s
    218m40.653s
    324m11.497
    422m35.210s
    514m??
    620m??

    These tests were done on a computer with 2 2.66GHz intel i5 processors (which support hyperthreading), with 4MB shared L3 cache, and 6GB total RAM.

    [Note that the threaded version is not yet optimally implemented. Many values are needlessly calculated once for each thread. Improvements to later versions of the program should increase the benefits from using additional threads.]


    Back to home page