Treffer: Prepared binned DNA data storage datasets for reconstruction benchmarking.

Title:

Prepared binned DNA data storage datasets for reconstruction benchmarking.

Authors:

Daniella Bar Lev, Itai Orr, Omer Sabary, Tuvi Etzion, Eitan Yaakobi

Publisher Information:

Zenodo

Publication Year:

2024

Collection:

Zenodo

Document Type:

dataset

Language:

unknown

Relation:

https://zenodo.org/records/14296588; oai:zenodo.org:14296588; https://doi.org/10.5281/zenodo.14296588

DOI:

10.5281/zenodo.14296588

Availability:

https://doi.org/10.5281/zenodo.14296588
https://zenodo.org/records/14296588

Rights:

Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode

Accession Number:

edsbas.4E9F730

Database:

BASE

Weitere Informationen

This repository includes datasets from the following publications. 1 Grass, R. N., Heckel, R., Puddu, M., Paunescu, D., & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie International Edition, 54, 8, 2552–2555 (2015) 2 Erlich, Y. & Zielinski, D. DNA fountain enables a robust and efficient storage architecture. Science, 355, 6328, 950–954 (2017). 3 Srinivasavaradhan, S. R., Gopi, S., Pfister, H. D. & Yekhanin S. Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage. in 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 2453–2458 (2021). The datasets are given in a binned format to enhance the reproducibility of the results presented in the paper. Bar-Lev, D., Orr, I., Sabary, O., Etzion T., & Yakkobi, E. Scalable and robust DNA-based storage via coding theory and deep learning. 2024. Detailed description of the format The binned format was created using the binning step described in the paper ("Scalable and robust DNA-based storage via coding theory and deep learning"). Each cluster of reads appears in the file with a header followed by the reads. More specifically: The header consists of 2 lines, the first corresponds to the encoded sequence of the clusters, and the second is a line of 18x“*” that should be ignored The reads in the clusters are provided after the header, where each read is given in a separate line Each cluster ends with two empty lines Data processing To ease the processing of our datasets, we also provide the following Python scripts (see https://github.com/itaiorr/Deep-DNA-based-storage) reads_preprocessor.py includes our preprocessing procedure for the raw reads. The procedure detects and truncates the primers binning.py - parses the file of the binned reads and creates two Python dictionaries. In the first dictionary, each key is an encoded sequence, and the value is a list of the reads in the cluster. In the second dictionary the keys are the ...

Treffer: Prepared binned DNA data storage datasets for reconstruction benchmarking.

Weitere Informationen

Links

Zusatz-Funktionen