This page contains resources relating to the IMR/DENOM package for assembling genomes from Illumina short read sequqnce data. This package was used to assemble 18 accessions of A thaliana, as described in our paper Multiple reference genomes and transcriptomes for Arabidopsis thaliana Nature 2011.. IMR v0.1.0 has also been used in Mouse genomic variation and its effect on phenotypes and gene regulation Nature 2011
IMR/DENOM comprises three independent programs, devised and written by Xiangchao Gan. The programmes can be run spearately or as a pipeline.
Descriptions of the algorithms are available.
Free for non-profit research purposes. Please contact authors otherwise. The program itself may not be modified in any way and no redistribution is allowed.
No condition is made or to be implied, nor is any warranty given or to be implied, as to the accuracy of IMR/DENOM, or that it will be suitable for any particular purpose or for use under any specific conditions, or that the content or use of IMR/DENOM will not constitute or result in infringement of third-party rights.
Authors
Currently the software is downloadable as precompiled Linux binaries. This version (0.5.0) is now available. Please send bug reports and comments to the authors. We also provide an example on how we assembled Bur-0 accession step by step( download , Be careful that the data is about 3.3G and probably takes long time to download.).
Make sure all the downloaded binaries are in a directory in your execeutable path.
IMR/DENOM need samtools (v1.3.1+) , picard, BWA, SOAPdenovo . In default, IMR/DENOM use BWA to align reads and then postprocess divergent reads using Stampy . Users need to install and test all above softwares on their own. Note stampy requires python and picard require a right version of java. To allow IMR/DENOM find picard, you can copy them to default external/ subfolder, or define PICARD_PATH to the directoy containing the binary file in your project description file.
You can install all above softwares by simply running: ./bootstrap
in your IMR/DENOM folder. In this way, all above software will be installed into external/ subfolder inside IMR/DENOM folder. You still need test picard and stampy but you do not need to define system variable PICARD_PATH.
simbam
be directly downloaded from simulation bamfile
. At the moment, simbam
files for Arabidopsis is available.
You can also generate your own simbam file by running the script sim_imrdenom.pl
with the reference genome as parameter. It will generate two simulated read files and a description file sim.t
. Run
imr easyrun --imrnocall [-m bwa] sim.t
You will get a bam file. That is the simbam you need.
A typical project might involve the assembly of more than one library with different insert sizes. The description file tells how to interpret the input files and group them together. Please read the details carefully. A simple example example1 is here and a more complicated example with multiple libraries example2 is also provided. Users can revise them accordingly.
imrdenom <proj_desctipiton_file>will finish all steps of assembly for you. However, we strongly encourage people to run IMR,DENOM and MCMERGE separately for better parameter tuning and lower computational burden.
To run each component indepently, please find the guidance at here .
Users can generate the fasta file easily using imr getgenome
with .sdi file if needed.
Each line of an sdi file consists of the columns chromosome, position, length, reference base, consensus base, quality value [\*0-9] (* or numeric value from 0-255).
Chr1 723 0 C T 2 Chr1 2719 -4 TGCA - 1 Chr1 6786 1 - T 1 Chr1 16786 -4 AGGCA T 1The columns after the 6th are optional. In the IMR/DENOM output, 7-9th column means: HMQ coverage (the coverage from reads with mapq >=30), SNP Phred score, HMQ consensus base(the consensus base when considering reads with mapq>=30),