Result: Exact computation of protein structure similarity

Title:
Exact computation of protein structure similarity
Authors:
Source:
Computational geometry (SCG'06) (June 5-7, 2006, Sedona AZ). :468-474
Publisher Information:
New York NY: ACM Press, 2006.
Publication Year:
2006
Physical Description:
print, 27 ref 1
Original Material:
INIST-CNRS
Document Type:
Conference Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Computer Science Dept Cornell University, Ithaca, NY 14853, United States
Rights:
Copyright 2006 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.18297856
Database:
PASCAL Archive

Further Information

A protein can be considered as a string (on the alphabet of 20 amino acids) or as a structure (each protein folds into a particular 3D configuration). Consider the following string-based problem: Given two protein strings that are not necessarily similar in their entirety, determine the most similar contiguous substrings, one from each protein. The exact meaning of most similar here is determined by the user; it is based on user-specified scores for character vs. character similarity and for character vs. space similarity. It is important to allow for spaces or gaps because evolutionary changes to proteins often involve insertion or deletion of one or more individual amino acids. For this kind of string-based similarity, the most-similar substrings can be determined in time O(mn) using Dynamic Programming (DP). The goal here is to design an algorithm for similarity of protein structures as opposed to protein strings. The inspiration for our algorithm is drawn from the DP-based similarity algorithm for strings. Instead of comparing sequences of characters, we compare sequences of vectors. One complication for working with structures instead of strings is the problem of orientation: basically, two structures that have similar shape can look different if they are at different orientations. Algorithmically, this means that we must establish the optimal orientations for our two proteins as well as finding the similar subsequences. In other words, an algorithm for similarity of structures involves both discrete optimization (to find the corresponding subsequences) and continuous optimization (to find the optimal orientation). Interestingly, if the correspondence is given then the optimal orientation (for that correspondence) is easy to find, and if the the orientation is given then the optimal correspondence (for that orientation) is easy to find. The challenge is to accomplish both optimizations at once. Note that the technique presented here produces a globally optimal solution; there are no approximations or assumptions of randomness.