Treffer: MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects.

Title:
MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects.
Authors:
Vancaester E; Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK., Blaxter ML; Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK.
Source:
Wellcome open research [Wellcome Open Res] 2024 Feb 13; Vol. 9, pp. 33. Date of Electronic Publication: 2024 Feb 13 (Print Publication: 2024).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Wellcome Trust Country of Publication: England NLM ID: 101696457 Publication Model: eCollection Cited Medium: Print ISSN: 2398-502X (Print) Linking ISSN: 2398502X NLM ISO Abbreviation: Wellcome Open Res Subsets: PubMed not MEDLINE
Imprint Name(s):
Original Publication: [London] : Wellcome Trust, [2016]-
References:
Bioinformatics. 2006 Jul 1;22(13):1658-9. (PMID: 16731699)
ISME J. 2020 May;14(5):1100-1110. (PMID: 31992859)
mBio. 2016 Apr 21;7(2):e00135-16. (PMID: 27103626)
Nat Commun. 2024 May 24;15(1):4452. (PMID: 38789482)
Nat Biotechnol. 2023 Nov;41(11):1633-1644. (PMID: 36823356)
PLoS Comput Biol. 2011 Oct;7(10):e1002195. (PMID: 22039361)
Appl Environ Microbiol. 2023 Oct 31;89(10):e0060523. (PMID: 37800969)
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. (PMID: 29373581)
Nat Biotechnol. 2021 May;39(5):555-560. (PMID: 33398153)
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):. (PMID: 35042800)
Bioinformatics. 2019 Nov 15;:. (PMID: 31730192)
Database (Oxford). 2020 Nov 20;2020:. (PMID: 33216898)
Annu Rev Entomol. 1998;43:17-37. (PMID: 15012383)
Trends Parasitol. 2020 Oct;36(10):816-825. (PMID: 32811753)
PeerJ. 2019 Jul 26;7:e7359. (PMID: 31388474)
Microbiome. 2017 Oct 17;5(1):140. (PMID: 29041958)
Microbiome. 2023 May 13;11(1):105. (PMID: 37179340)
Genome Biol. 2019 Feb 26;20(1):46. (PMID: 30808380)
Bioinformatics. 2018 Oct 15;34(20):3600. (PMID: 29788404)
Biol Direct. 2018 Apr 20;13(1):6. (PMID: 29678199)
Genome Biol. 2022 Feb 28;23(1):63. (PMID: 35227283)
Database (Oxford). 2020 Jan 1;2020:. (PMID: 32761142)
Nat Methods. 2022 Jun;19(6):671-674. (PMID: 35534630)
Nucleic Acids Res. 2014 Jan;42(Database issue):D643-8. (PMID: 24293649)
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):. (PMID: 35042801)
J Comput Biol. 2006 Jun;13(5):1028-40. (PMID: 16796549)
Nat Biotechnol. 2018 Nov;36(10):996-1004. (PMID: 30148503)
Bioinformatics. 2016 Feb 15;32(4):605-7. (PMID: 26515820)
Nucleic Acids Res. 2023 Jan 6;51(D1):D29-D38. (PMID: 36370100)
Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200. (PMID: 33211869)
Bioinformatics. 2021 Dec 7;37(23):4572-4574. (PMID: 34623391)
Nat Microbiol. 2019 Jul;4(7):1088-1095. (PMID: 31036911)
Nat Methods. 2014 Nov;11(11):1144-6. (PMID: 25218180)
Science. 2007 Sep 21;317(5845):1753-6. (PMID: 17761848)
Bioinformatics. 2012 Jul 15;28(14):1823-9. (PMID: 22556368)
Genome Res. 2016 Dec;26(12):1721-1729. (PMID: 27852649)
Sci Data. 2020 Nov 17;7(1):399. (PMID: 33203859)
ISME J. 2012 Mar;6(3):610-8. (PMID: 22134646)
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. (PMID: 34320186)
Cell. 2013 Jun 20;153(7):1567-78. (PMID: 23791183)
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):. (PMID: 35042805)
Nature. 2021 Apr;592(7856):737-746. (PMID: 33911273)
Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. (PMID: 23193283)
Gigascience. 2022 Nov 18;11:. (PMID: 36399059)
Bioinformatics. 2022 Sep 30;38(19):4481-4487. (PMID: 35972375)
Science. 2008 Oct 31;322(5902):702. (PMID: 18974344)
Genome Biol. 2024 Feb 26;25(1):60. (PMID: 38409096)
Nucleic Acids Res. 2022 Jan 7;50(D1):D161-D164. (PMID: 34850943)
G3 (Bethesda). 2020 Apr 9;10(4):1361-1374. (PMID: 32071071)
Nat Methods. 2021 Feb;18(2):170-175. (PMID: 33526886)
Nucleic Acids Res. 2014 Jan;42(Database issue):D633-42. (PMID: 24288368)
Bioinformatics. 2020 Jul 1;36(Suppl_1):i3-i11. (PMID: 32657364)
Nat Biotechnol. 2022 May;40(5):711-719. (PMID: 34980911)
Genome Biol. 2019 Nov 28;20(1):257. (PMID: 31779668)
Nat Methods. 2020 Nov;17(11):1103-1110. (PMID: 33020656)
Genome Biol. 2020 May 12;21(1):115. (PMID: 32398145)
PLoS Biol. 2023 Jan 23;21(1):e3001972. (PMID: 36689552)
Microbiol Spectr. 2024 Feb 6;12(2):e0366923. (PMID: 38214524)
J Mol Biol. 1990 Oct 5;215(3):403-10. (PMID: 2231712)
Front Microbiol. 2020 Feb 25;11:268. (PMID: 32161575)
Environ Microbiol. 2015 Nov;17(11):4443-58. (PMID: 25914091)
Grant Information:
United Kingdom WT_ Wellcome Trust
Contributed Indexing:
Keywords: bioinformatics tools; cobionts; database contamination; eukaryotic genomics; genome sequencing
Local Abstract: [plain-language-summary] This article addresses a common issue in genetic research: the accidental mixing of genetic information from different species in public databases, often due to mislabelling or contamination. Interestingly, this ‘contamination’ can sometimes lead to exciting discoveries, like identifying DNA from unexpected species in a sample, revealing insights about organisms that live in the environment of the target organism. In our study, we developed a tool called MarkerScan for identifying these additional species found alongside the target species in eukaryotic genome sequencing projects. The method includes a way to sequence the whole genomes of the additional species. Our method involves sorting through the genetic data to identify certain small RNA sequences, which we then use as markers. These markers help to classify and assemble high-quality genomes from these additional species. This not only cleans up the main target species’ genome data but also provides new, valuable genomes for further exploration.
Entry Date(s):
Date Created: 20240415 Latest Revision: 20241208
Update Code:
20250114
PubMed Central ID:
PMC11016177
DOI:
10.12688/wellcomeopenres.20730.1
PMID:
38617467
Database:
MEDLINE

Weitere Informationen

Contamination of public databases by mislabelled sequences has been highlighted for many years and the avalanche of novel sequencing data now being deposited has the potential to make databases difficult to use effectively. It is therefore crucial that sequencing projects and database curators perform pre-submission checks to remove obvious contamination and avoid propagating erroneous taxonomic relationships. However, it is important also to recognise that biological contamination of a target sample with unexpected species' DNA can also lead to the discovery of fascinating biological phenomena through the identification of environmental organisms or endosymbionts. Here, we present a novel, integrated method for detection and generation of high-quality genomes of all non-target genomes co-sequenced in eukaryotic genome sequencing projects. After performing taxonomic profiling of an assembly from the raw data, and leveraging the identity of small rRNA sequences discovered therein as markers, a targeted classification approach retrieves and assembles high-quality genomes. The genomes of these cobionts are then not only removed from the target species' genome but also available for further interrogation. Source code is available from https://github.com/CobiontID/MarkerScan. MarkerScan is written in Python and is deployed as a Docker container.
(Copyright: © 2024 Vancaester E and Blaxter ML.)

No competing interests were disclosed.