Treffer: The fragment assembly string graph.

Title:
The fragment assembly string graph.
Authors:
Myers EW; Department of Computer Science, University of California Berkeley, CA, USA. gene@eecs.berkeley.edu
Source:
Bioinformatics (Oxford, England) [Bioinformatics] 2005 Sep 01; Vol. 21 Suppl 2, pp. ii79-85.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: Oxford : Oxford University Press, c1998-
Entry Date(s):
Date Created: 20051006 Date Completed: 20070827 Latest Revision: 20240109
Update Code:
20250114
DOI:
10.1093/bioinformatics/bti1114
PMID:
16204131
Database:
MEDLINE

Weitere Informationen

We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.