Treffer: Representing DNA for machine learning algorithms: A primer on one-hot, binary, and integer encodings.
Original Publication: Oxford, UK : Elsevier, c2000-
Dong Y, Sun F, Ping Z, Ouyang Q, Qian L. DNA storage: research landscape and future prospects. Natl Sci Rev. 2020;7(6):1092–1107.
ElAbd H, Bromberg Y, Hoarfrost A, Lenz T, Franke A, Wendorff M. Amino acid encoding for deep learning applications. BMC Bioinfor. 2020;21:1–14.
Choong ACH, Lee NK. Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one‐hot encoding method. 2017 International Conference on Computer and Drone Applications (IConDA). 2017.
Wang C, Ma G, Wei D, Zhang X, Wang P, Li C, et al. Mainstream encoding–decoding methods of DNA data storage. CCF Trans High Perform Comput. 2022;4(1):23–33.
Chia SE, Lee NK. Comparisons of DNA Sequence Representation Methods for Deep Learning Modelling. 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET). 2022.
Yin C. Encoding DNA sequences by integer chaos game representation. arXiv Preprint arXiv. 2017;171204546.
Yin C. Encoding and decoding DNA sequences by integer chaos game representation. J Comput Biol. 2019;26(2):143–151.
Zakeri B, Carr PA, Lu TK. Multiplexed sequence encoding: a framework for DNA communication. PLoS One. 2016;11(4):e0152774.
Bhadola P, Gupta YM. Classifying DNA barcode sequences of four insects belonging to Orthoptera order using tensor network. Agric Nat Res. 2022;56(4):705–712.
Bada SO, Olusegun S. Constructivism learning theory: a paradigm for teaching and learning. J Res Method Educ. 2015;5(6):66–70.
Hmelo‐Silver CE. Problem‐based learning: what and how do students learn? Educ Psychol Rev. 2004;16:235–266.
Magana AJ, Taleyarkhan M, Alvarado DR, Kane M, Springer J, Clase K. A survey of scholarly literature describing the field of bioinformatics education and bioinformatics educational research. CBE—Life Sci Educ. 2014;13(4):607–623.
Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform. 2019;20(2):398–404.
Gupta YM, Kirana SN, Homchan S, Tanasarnpaiboon S. Teaching python programming for bioinformatics with Jupyter notebook in the post‐COVID‐19 era. Biochem Mol Biol Educ. 2023;51(5):537–539.
Dow EG, Wood‐Charlson EM, Biller SJ, Paustian T, Schirmer A, Sheik CS, et al. Bioinformatic teaching resources–for educators, by educators–using KBase, a free, user‐friendly, open source platform. Original Strategies Train Educ Initiatives Bioinf. 2022;6:711535.
Goodman AL, Dekhtyar A. Teaching bioinformatics in concert. PLoS Comput Biol. 2014;10(11):e1003896.
Weitere Informationen
This short paper presents an educational approach to teaching three popular methods for encoding DNA sequences: one-hot encoding, binary encoding, and integer encoding. Aimed at bioinformatics and computational biology students, our learning intervention focuses on developing practical skills in implementing these essential techniques for efficient representation and analysis of genetic data. The primary goal of this study is to enhance students' understanding and practical application of DNA encoding methods, which are crucial for various computational analyses in bioinformatics. Our intervention consists of three key components: (1) a conceptual framework that contextualizes these encoding methods within broader bioinformatics applications, (2) an interactive Jupyter Notebook with Python code examples (https://github.com/yashmgupta/Representing-DNA/tree/main), and (3) a user-friendly Streamlit application for visualizing encoded sequences (https://dnaencoding.streamlit.app/) that also enables students to input their own DNA sequences and visualize the different encoding methods, further enhancing their understanding and practical experience. By combining conceptual overview with practical coding and visualization tools, our approach provides a comprehensive foundation for students to leverage these key DNA sequence encoding methods in their future work. This study contributes to bioinformatics education by offering effective, hands-on learning resources that bridge the gap between theoretical knowledge and practical application in DNA sequence analysis, preparing students for advanced research and data analysis projects in the field.
(© 2024 International Union of Biochemistry and Molecular Biology.)
The authors declare no conflicts of interest.