Treffer: Storing Mass-Spectrometry Data in Simple Databases Enables Flexible and Intuitive Exploration without Time or Space Penalties.
Nat Methods. 2025 Jun;22(6):1247-1254. (PMID: 40355727)
Methods Mol Biol. 2011;696:205-24. (PMID: 21063949)
Mol Cell Proteomics. 2012 Jan;11(1):O111.011379. (PMID: 21960719)
PLoS One. 2017 Nov 15;12(11):e0188059. (PMID: 29141005)
Mol Cell Proteomics. 2011 Jan;10(1):R110.000133. (PMID: 20716697)
J Proteome Res. 2025 Nov 7;24(11):5329-5335. (PMID: 41037468)
Nat Biotechnol. 2012 Oct;30(10):918-20. (PMID: 23051804)
Sci Rep. 2020 Jun 2;10(1):8939. (PMID: 32488104)
Anal Chem. 2023 Jun 27;95(25):9428-9431. (PMID: 37307589)
Mass Spectrom Rev. 2017 Sep;36(5):668-673. (PMID: 27741559)
J Proteome Res. 2023 Feb 3;22(2):508-513. (PMID: 36414245)
BMC Bioinformatics. 2022 Jan 12;23(1):35. (PMID: 35021987)
J Am Soc Mass Spectrom. 2010 Oct;21(10):1784-8. (PMID: 20674389)
Bioinform Adv. 2024 Oct 26;4(1):vbae160. (PMID: 40034104)
Nat Commun. 2025 Jan 8;16(1):473. (PMID: 39773949)
Mol Cell Proteomics. 2015 Mar;14(3):771-81. (PMID: 25505153)
Bioinformatics. 2022 Apr 12;38(8):2333-2340. (PMID: 35171986)
Nucleic Acids Res. 2016 Jan 4;44(D1):D463-70. (PMID: 26467476)
PLoS One. 2015 Apr 30;10(4):e0125108. (PMID: 25927999)
Sci Rep. 2022 Mar 30;12(1):5384. (PMID: 35354909)
Mol Cell Proteomics. 2015 Sep;14(9):2301-7. (PMID: 26217018)
Metabolites. 2022 Feb 11;12(2):. (PMID: 35208247)
J Proteome Res. 2021 Jan 1;20(1):172-183. (PMID: 32864978)
Nat Methods. 2021 Jul;18(7):768-770. (PMID: 34183830)
Nucleic Acids Res. 2020 Jan 8;48(D1):D440-D444. (PMID: 31691833)
Weitere Informationen
Mass spectrometry (MS) generates large data sets that are stored in increasingly optimized and complex file types, demanding technical expertise to extract information rapidly and easily. We wondered whether a simple structured query language (SQL) database could hold raw MS data and allow for easily readable queries without incurring major penalties in the read time or disk space relative to other popular MS formats. Here, we describe a basic MS schema with intuitive database tables and fields that can outperform other formats for exploratory and interactive analysis according to six data subsets commonly extracted: single scans (both MS <sup>1</sup> and MS <sup>2</sup> ), ion chromatograms, retention time ranges, and fragmentation searches (both precursor and fragment search). Additionally, we compare SQLite, DuckDB, and Parquet implementations and find that they can perform these tasks in under a second, even when the files occupy over a gigabyte of data on the disk. We believe that this tidy data schema expands nicely to most forms of MS data and offers a way to transparently query data sets while preserving computational performance.