A Genetic Algorithm for Ranking Peptide Sequences For Duchenne Muscular Dystrophy

Author nameVasiliki Konstantopoulou
TitleA Genetic Algorithm for Ranking Peptide Sequences For Duchenne Muscular Dystrophy
Year2017-2018
Supervisor

George Paliouras

GeorgePaliouras

Summary

Duchenne Muscular Dystrophy. This dystrophy is caused by a destroyed reading frame of the protein dystrophin. The goal is to achieve a transformation of Duchenne to Becker’s dystrophy, a type that has the same causes but includes milder symptoms and better quality of life. The way to achieve that is by using a technique called exon skipping. Exon skipping is a therapeutic approach based on RNA splicing which uses peptides in order to skip faulty sections and correct the reading frame in order to be readable again and a smaller but functional type of dystrophin could be produced. Exon skipping uses antisense oligonucleotides which will enter the cells and try to be connected to the target exon. The percentage of the success of the connection to the target exon is increased when connected to a Cell Penetrating Peptide (CPP). CPPs can penetrate into biological membrane and deliver a wide variety of cargos into cells. Additionally, they have gathered much attention due to their high transduction efficiency and low cytotoxicity. The peptides are formed by amino acids. The combinations of amino acids form billion of peptides which could be studied for their efficacy on exon skipping. Obviously it is impossible for a Biomedical researcher to try all these combinations in the laboratory. The algorithm presented in this study aims to help Biomedical researchers reduce the search space. The search space is defined as the space of all feasible solutions, the set of solutions among which the desired solution resides. The aforementioned procedure is implemented by selecting and ranking the best Cell Penetrating Peptides in order to provide a list which is possible for the biologists to use and evaluate in the laboratory. This thesis studies the development of a novel genetic algorithm which shows promising results on selecting and ranking peptides which would be efficient for exon skipping on Duchenne Muscular Dystrophy. Because of the enormous search space, the most suitable approach is by introducing a genetic algorithm. Genetic algorithms are heuristic search procedures inspired by natural evolution. The process starts with the initialization of a population which, in this project, consists of randomly created peptides (individuals) and consequently, the selection of the fittest individuals from the population. After the selection of the fittest, individuals, created with genetic operators, which inherit the parents’ characteristics are produced. The newly created individuals are called offsprings and are added to the next generation. The selection of the fittest amongst individuals is performed by a fitness function. This means that only the highest-ranked individuals of each generation are selected, whose offsprings form the next generation along. The fitness function in this work consists of a set of two functions. On the one side, there is a machine learning model, SVM, which evaluates the individuals by their characteristics and on the other side the Hamilton distance function which evaluates them by their position on the search space. This set of functions decides if an individual is useful enough to continue to the next generation. The algorithm outputs a ranked list of peptides that are  capable of  successfully being inserted in the cell and improve the efficacy of the corresponding Antisense Oligonucleotides (AONs) for exon 23 skipping. The results could be improved by both training the algorithm with a larger dataset and experiment with more machine learning algorithms that complete the learning procedure using less training instances. Although the complete list of the results could not be tested, the evaluation shows that the algorithm successfully recognizes the useful peptides.