Result: Designing Efficient Spaced Seeds for SOLiD Read Mapping.
Title:
Designing Efficient Spaced Seeds for SOLiD Read Mapping.
Authors:
Contributors:
Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Sequential Learning (SEQUOIA), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de l'Université de Lille, Institut National de Recherche en Informatique et en Automatique (Inria), ANR-07-BLAN-0367,FLASH,Comparison of Complete Genomes: an algorithmic and statistical approach to investigate the mechanisms of bacterial genome evolution(2007)
Publisher Information:
CCSD; Hindawi Publishing Corporation, 2010.
Publication Year:
2010
Collection:
collection:UNIV-LILLE3
collection:CNRS
collection:INRIA
collection:INRIA-LILLE
collection:LIFL
collection:INRIA_TEST
collection:CV_LIGM
collection:TESTALAIN1
collection:CRISTAL
collection:INRIA2
collection:CRISTAL-BONSAI
collection:ANR
collection:CNRS
collection:INRIA
collection:INRIA-LILLE
collection:LIFL
collection:INRIA_TEST
collection:CV_LIGM
collection:TESTALAIN1
collection:CRISTAL
collection:INRIA2
collection:CRISTAL-BONSAI
collection:ANR
Subject Terms:
Spaced seeds, seed design, read mapping, Applied Biosystems SOLiD, color space alignment, ACM: J.: Computer Applications, J.3: LIFE AND MEDICAL SCIENCES, J.3.0: Biology and genetics, ACM: E.: Data, E.2: DATA STORAGE REPRESENTATIONS, E.2.2: Hash-table representations, ACM: G.: Mathematics of Computing, G.2: DISCRETE MATHEMATICS, G.2.3: Applications, G.4: MATHEMATICAL SOFTWARE, G.4.4: Parallel and vector implementations, [INFO.INFO-BI]Computer Science [cs], Bioinformatics [q-bio.QM], [SDV.BIBS]Life Sciences [q-bio], Quantitative Methods [q-bio.QM]
Original Identifier:
PUBMED: 20936175
PUBMEDCENTRAL: PMC2945724
HAL:
PUBMEDCENTRAL: PMC2945724
HAL:
Document Type:
Journal
article<br />Journal articles
Language:
English
ISSN:
1687-8027
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1155/2010/708501; info:eu-repo/semantics/altIdentifier/pmid/20936175
DOI:
10.1155/2010/708501
Availability:
Accession Number:
edshal.inria.00527029v1
Database:
HAL
Further Information
The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the other hand, a novel seeding principle especially adapted to read mapping. Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. We illustrate our approach by several seed designs and demonstrate their efficiency.