University of Pisa [Italy] = Università di Pisa [Italia] = Université de Pise [Italie] (UniPi), Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de Lyon, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), This work was supported in part by European Union’s Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie Grant through the Algorithms for Pangenome Computational Analysis (ALPACA) Project under Grant 956229 and through the Pangenome Graph Algorithms and data Integration (PANGAIA) Project under Grant 872539, in part by Miistero Universita’ e Ricerca (MUR) Progetto di Ricerca di Interesse Nazionale (PRIN) through the Project Pangenome Informatics: from Theory to Applications (PINC) under Grant 2022YRB97K, and in part by the NextGeneration European Union (EU) Program Tuscany Health Ecosystem under Grant PNRR ECS00000017., European Project: 956229,H2020-MSCA-ITN-2020,H2020-MSCA-ITN-2020,ALPACA(2021), European Project: 872539,H2020-MSCA-RISE-2019,H2020-MSCA-RISE-2019,PANGAIA(2020)
info:eu-repo/semantics/altIdentifier/doi/10.1109/ACCESS.2025.3597547; info:eu-repo/grantAgreement//956229/EU/ALgorithms for PAngenome Computational Analysis/ALPACA; info:eu-repo/grantAgreement//872539/EU/Pan-genome Graph Algorithms and Data Integration/PANGAIA
The outcome of a Multiple Sequence Alignment (MSA) can be compactly represented by means of Elastic-Degenerate Strings (ED-strings) by collapsing conserved fragments into standard linear strings, while representing gaps and variants as sets of alternative strings. These alternative variants can differ in size and can possibly include the empty string. In 2022, Lee et al. introduced Partial Order Alignment (POA) to enable the alignment of a string against a graph-like structure derived from an MSA. However, the POA edit transcript (the sequence of edit operations that describe the alignment) does not reflect the possible elasticity of the MSA (such as different gaps sizes in the aligned string), leaving room for possible misalignment and its propagation in progressive MSA strategies. In this paper, we propose a dynamic programming based method that optimally aligns a string to an ED-string, the latter compactly representing an MSA, overcoming the ambiguity in the POA edit transcript while maintaining its time and space complexity. Moreover, since pangenomes ca also be represented using ED-strings, our algorithm paves the way to a new class of sequence to graph alignment methods capable of taking into account possible gaps in the pangenome representation, thus offering a richer and more flexible model for pangenomic analysis.