Related Resources

IMR/DENOM short-read analysis software

This page contains resources relating to the IMR/DENOM package for assembling genomes from Illumina short read sequqnce data. This package was used to assemble 18 accessions of A thaliana, as described in our paper Multiple reference genomes and transcriptomes for Arabidopsis thaliana Nature 2011.. IMR v0.1.0 has also been used in Mouse genomic variation and its effect on phenotypes and gene regulation Nature 2011

IMR/DENOM comprises three independent programs, devised and written by Xiangchao Gan. The programmes can be run spearately or as a pipeline.

IMR iterative realignment to the reference genome
DENOM integration of denovo-assembled contigs with a reference genome for variant calling
MCMERGE which integrates variant calls from the mapping-based (IMR) and Denovo-assembly-based (DENOM) methods.

Descriptions of the algorithms are available.

LICENCE

Free for non-profit research purposes. Please contact authors otherwise. The program itself may not be modified in any way and no redistribution is allowed.

No condition is made or to be implied, nor is any warranty given or to be implied, as to the accuracy of IMR/DENOM, or that it will be suitable for any particular purpose or for use under any specific conditions, or that the content or use of IMR/DENOM will not constitute or result in infringement of third-party rights.

Authors

Xiangchao Gan gan@mpipz.mpg.de
Richard Mott r.mott@ucl.ac.uk

Download

Currently the software is downloadable as precompiled Linux binaries. This version (0.5.0) is now available. Please send bug reports and comments to the authors. We also provide an example on how we assembled Bur-0 accession step by step( download , Be careful that the data is about 3.3G and probably takes long time to download.).

Installation

Make sure all the downloaded binaries are in a directory in your execeutable path.

IMR/DENOM need samtools (v1.3.1+) , picard, BWA, SOAPdenovo . In default, IMR/DENOM use BWA to align reads and then postprocess divergent reads using Stampy . Users need to install and test all above softwares on their own. Note stampy requires python and picard require a right version of java. To allow IMR/DENOM find picard, you can copy them to default external/ subfolder, or define PICARD_PATH to the directoy containing the binary file in your project description file.

You can install all above softwares by simply running: ./bootstrap in your IMR/DENOM folder. In this way, all above software will be installed into external/ subfolder inside IMR/DENOM folder. You still need test picard and stampy but you do not need to define system variable PICARD_PATH.

Prepare genome profile file in bam format (needed by MCMERGE)

We reccommend that simbam be directly downloaded from simulation bamfile . At the moment, simbam files for Arabidopsis is available. You can also generate your own simbam file by running the script sim_imrdenom.pl with the reference genome as parameter. It will generate two simulated read files and a description file sim.t . Run imr easyrun --imrnocall [-m bwa] sim.t You will get a bam file. That is the simbam you need.

Running IMR/DENOM

Prepare Project Description File

A typical project might involve the assembly of more than one library with different insert sizes. The description file tells how to interpret the input files and group them together. Please read the details carefully. A simple example example1 is here and a more complicated example with multiple libraries example2 is also provided. Users can revise them accordingly.

Single command mode

 imrdenom <proj_desctipiton_file>

To run each component indepently, please find the guidance at here .

Output

sdi format

Users can generate the fasta file easily using imr getgenome with .sdi file if needed.

The sdi (Snps, Deletions and Insertions) file format

Each line of an sdi file consists of the columns chromosome, position, length, reference base, consensus base, quality value [\*0-9] (* or numeric value from 0-255).

chromosome The same name as the chromosome id in reference fasta file
position 1-based leftmost position
length the length difference of the changed sequence against reference (0 for SNPs, negative for deletions, positive for insertions)
reference base [-A-Z]+ (regular expression range)
consensus base [-A-Z]+ ((regular expression range), IUPAC code is used for heterozygous sites
quality value * means no value available; [0-9]+ shows the quality of this variant. It is not necessary Phred quality.

Chr1    723      0      C           T       2
Chr1    2719    -4      TGCA        -       1
Chr1    6786     1      -           T       1
Chr1    16786   -4      AGGCA       T       1

Cardamine hirsuta Genetic and genomic