Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads
- Rei Kajitani1,
- Kouta Toshimoto1,
- Hideki Noguchi2,
- Atsushi Toyoda2,
- Yoshitoshi Ogura3,
- Miki Okuno1,
- Mitsuru Yabana1,
- Masahira Harada1,
- Eiji Nagayasu3,
- Haruhiko Maruyama3,
- Yuji Kohara2,
- Asao Fujiyama2,
- Tetsuya Hayashi3 and
- Takehiko Itoh1,4
- ↵* Corresponding author; email: takehiko{at}bio.titech.ac.jp
Abstract
Although many de novo genome assembly projects have recently been performed using high-throughput sequencers, assembling the highly heterozygous diploid genomes is a big scientific challenge due to the increased complexity of the de Bruijn graph structure predominantly employed. To deal with an increasing demand for sequencing of non-model and/or wild-type sample, in most cases, inbred lines or fosmid-based hierarchical sequencing methods are employed with overcoming such problems. However, these methods are costly and time consuming, forfeiting the advantage of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, which can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes, followed by scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, the Platanus assembly results have a larger NG50 length without any accompanying loss of accuracy in both simulated data and real data. In addition, Platanus recorded the largest NG50 values for two of the three low heterozygous species used in the de novo assembly contest, Assemblathon2. Platanus provides, therefore, a novel and efficient approach for the assembly of Giga base-sized highly heterozygous genomes and is also an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.
- Received December 6, 2013.
- Accepted April 21, 2014.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.