![]() These and other applications motivate the need for fast and reliable sequence alignment techniques that are capable of handling large genomes and large volumes of sequence data. Alignment algorithms are also used to create and validate genome assemblies and to compare them from one version of a genome to the next. ![]() Applications of alignment include resequencing humans to discover single nucleotide polymorphisms (SNPs), sequencing and comparison of different species to detect evolutionarily conserved elements, alignment to detect large-scale chromosomal rearrangements, and more. Along with these increases came a corresponding increase in the demand for efficient sequence alignment algorithms. The cost of generating sequence data has decreased rapidly, leading to an exponential increase in the number of assembled genomes and a proliferation of sequencing-based assays. Since the 2004 publication of the MUMmer3 sequence alignment package, the bioinformatics landscape has changed dramatically. This is a PLOS Computational Biology Software paper. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Ĭompeting interests: The authors have declared that no competing interests exist. National Institutes of Health under grant R01 GM083873 to Steven Salzberg, in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4554 to Carl Kingsford, and in part by National Science Foundation Grants IOS-1238231 to Jan Dvorak, IOS-144893 to Herbert Aldwinckle, Keithanne Mockaitis, Aleksey Zimin, James Yorke and Marcela Yepes. įunding: This research was supported in part by the U.S. The work is made available under the Creative Commons CC0 public domain dedication.ĭata Availability: The data used for this paper is available from the NCBI SRA, and from the Cold Spring Harbor Laboratory web site. This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Received: AugAccepted: JanuPublished: January 26, 2018 University of Technology Sydney, AUSTRALIA These improvements make MUMer4 one the most versatile genome alignment packages available.Ĭitation: Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (2018) MUMmer4: A fast and versatile genome alignment system. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics.
0 Comments
Leave a Reply. |