Skip to main content

How genetic algorithm works in Bioinformatics?



A.    Initialization

Originally various individual solutions are generated arbitrarily to build initial population. Size of population depends on problem nature, but typically it carries several hundred to several thousand possible solutions. Usually, the population is created arbitrarily, covering the complete range of probable solutions. Sometimes solutions may be “seeded” where there is a chance of optimal solutions.

B.    Selection

During every consecutive generation, a fraction of the present population is chosen for breeding a new generation. Fitness-based process chooses individual solutions, where solutions measured through functions of fitness are usually likely to be chosen. Many selection procedures rate the fitness for every solution and specially select the one best solution among all. Some other procedures rate just a random population sample, because this procedure may be inefficient in terms of time. Most functions are designed such that a little quantity of solutions is selected which are less fit. These benefits keep the variety of the large population, avoiding early convergence on poor solutions. Widespread and well-considered selection procedures comprise tournament selection and roulette wheel selection.

C.    Reproduction

In next step second population of solutions is generated from selected genetic operators: recombination, or/and mutation. For producing every recent solution, “parent” solution pair is chosen for breeding process through the previously selected pool. By generating a “child” solution from the above described procedures of mutation and crossover, a new solution is produced that usually shares various properties of its “parents”. These new parents are chosen for every new child, and this procedure lasts until a different population of solutions is generated. This population is of suitable size.  Though reproduction approaches are based on how we use two parents are inspired by biology various research recommended that it is better to use more than two “parents” to reproduce a chromosome quality. These procedures finally result in the subsequent population generation of chromosomes, different from the original generation. Usually, average fitness increases through this method for the population, as only the best generation or first generation from GA is chosen for breeding process, along with a little  amount of solutions  that stood less fit, for causes that have been already cited above.

D.    Termination

This generational step is repetitive until a condition has been reached that terminates this process. Some common conditions for termination are:
• A solution has been found that fulfills least standards;
• Fixed or decided number of generations has been reached;
• Allocated computation time and/or money) have been reached;
• Successive iterations are no longer generating good results or highest fittest solution has been reached;
• Manual inspection;
• Combinations of the one or more reasons described above.

E.    Simple Genetic algorithm pseudocodes:

Step no.1: Select the initial individual population.
Step no.2: Estimate the fitness of every individual of that population.
Step no.3: Repeat on this generation till end: (adequate fitness attain, time limit, etc.)
·       For reproduction, choose the best-fit individuals.
·       By mutation and crossover operations, breed new individuals to give birth.
·       Estimate the individual fitness for recent individuals.
·       Replace all least-fit population by new best fit individuals.

Comments

Popular posts from this blog

Information Retreival Systems in Bioinformatics: Entrez

Currently many biological databases have been developed and became an important toolbox for every scientist in research and academic purpose. Searching a sequence homologue of either Protein, DNA or to know the novelty of a sequence, one needs to do a sequence search against available databases. Similarly, searching for Open Reading Frame, structure, functional, regulatory sequences and repeated elements, we also need to search our query against different available databases. As biological data is increasing with the passage of time, its tremendous growth requires a searching and access system to retrieve useful information. In biological data, three retrieval systems are widely used relevant to a scientific need, it includes: Entrez, Sequence Retrieval System also known as SRS and DBGET. These retrieval systems let its user a text search against multiple molecular databases and also provides useful relevant information in the forms of links either internal or external to our qu...

Genetic algorithm and its applications in medicine

With the increase in biological and medical data it has become necessary for medical and bioinformaticians to have some automated approaches to identify different patterns it their data, so as to predict or have some useful information. Many applications have been described above for genetic algorithm, along with these applications GA has been applied in protein structure prediction, RNA structure prediction and Motif finding. Basic steps of GA are almost same in many applications but it requires expertise, parameters and involves a huge number of randomness and can provide different results in outcomes.      Applications of Genetic Algorithm in medicine Oncology Screening tests suggests a valuable chance cancer detection at early stages, which when keep an eye on by proper handling could recover the patient’s survival rate. Developing a non-invasive procedure for the detection of cervical cancer, Duraipandian et al, using colposcopy developed Raman spectra ...

Information Retreival System: Implementation

NCBI provides an information retrieval system, Entrez, designed to provide user friendly access to biomedical data including structural, molecular, sequences and literature.   Entrez provides access and searching facilities to more than 30 databases of genome, health, structural, literature, sequence and chemical. It provides faecet, limited and advance searching option with Boolean operators to customize user’s query. It also facilitates querying with wild card characters, mapping and controlled vocabulary. Web implementation of Entrez has more valuable applications and benefits over Network Entrez as it facilitates searching with a tremendous amount of data in different databases. Entrez provides navigational links between different databases either provided by NCBI or external (journal/databases) for each record by using two types of relationships: neighbors and hard links. Both of these types of relationships have been found on the basis of controlled vocabulary and algor...