Skip to main content

How genetic algorithm works in Bioinformatics?



A.    Initialization

Originally various individual solutions are generated arbitrarily to build initial population. Size of population depends on problem nature, but typically it carries several hundred to several thousand possible solutions. Usually, the population is created arbitrarily, covering the complete range of probable solutions. Sometimes solutions may be “seeded” where there is a chance of optimal solutions.

B.    Selection

During every consecutive generation, a fraction of the present population is chosen for breeding a new generation. Fitness-based process chooses individual solutions, where solutions measured through functions of fitness are usually likely to be chosen. Many selection procedures rate the fitness for every solution and specially select the one best solution among all. Some other procedures rate just a random population sample, because this procedure may be inefficient in terms of time. Most functions are designed such that a little quantity of solutions is selected which are less fit. These benefits keep the variety of the large population, avoiding early convergence on poor solutions. Widespread and well-considered selection procedures comprise tournament selection and roulette wheel selection.

C.    Reproduction

In next step second population of solutions is generated from selected genetic operators: recombination, or/and mutation. For producing every recent solution, “parent” solution pair is chosen for breeding process through the previously selected pool. By generating a “child” solution from the above described procedures of mutation and crossover, a new solution is produced that usually shares various properties of its “parents”. These new parents are chosen for every new child, and this procedure lasts until a different population of solutions is generated. This population is of suitable size.  Though reproduction approaches are based on how we use two parents are inspired by biology various research recommended that it is better to use more than two “parents” to reproduce a chromosome quality. These procedures finally result in the subsequent population generation of chromosomes, different from the original generation. Usually, average fitness increases through this method for the population, as only the best generation or first generation from GA is chosen for breeding process, along with a little  amount of solutions  that stood less fit, for causes that have been already cited above.

D.    Termination

This generational step is repetitive until a condition has been reached that terminates this process. Some common conditions for termination are:
• A solution has been found that fulfills least standards;
• Fixed or decided number of generations has been reached;
• Allocated computation time and/or money) have been reached;
• Successive iterations are no longer generating good results or highest fittest solution has been reached;
• Manual inspection;
• Combinations of the one or more reasons described above.

E.    Simple Genetic algorithm pseudocodes:

Step no.1: Select the initial individual population.
Step no.2: Estimate the fitness of every individual of that population.
Step no.3: Repeat on this generation till end: (adequate fitness attain, time limit, etc.)
·       For reproduction, choose the best-fit individuals.
·       By mutation and crossover operations, breed new individuals to give birth.
·       Estimate the individual fitness for recent individuals.
·       Replace all least-fit population by new best fit individuals.

Comments

Popular posts from this blog

Information Retreival Systems in Bioinformatics: Entrez

Currently many biological databases have been developed and became an important toolbox for every scientist in research and academic purpose. Searching a sequence homologue of either Protein, DNA or to know the novelty of a sequence, one needs to do a sequence search against available databases. Similarly, searching for Open Reading Frame, structure, functional, regulatory sequences and repeated elements, we also need to search our query against different available databases. As biological data is increasing with the passage of time, its tremendous growth requires a searching and access system to retrieve useful information. In biological data, three retrieval systems are widely used relevant to a scientific need, it includes: Entrez, Sequence Retrieval System also known as SRS and DBGET. These retrieval systems let its user a text search against multiple molecular databases and also provides useful relevant information in the forms of links either internal or external to our qu...

Comparison between Shared memory architecture and Shared nothing architecture

Shared-Memory Architecture: This architecture connects different processor under one operating system through high speed interconnections (cross-bar switch or high speed bus etc). Query response time is reduced by dividing workload to any connected processor with least or no workload. This architecture provides two main advantages over other architectures as it manages load in a perfect manner and easy to manage. It uses least busy processors and allocates new tasks to it so that query processing is done at a fast speed. But along with two major advantages it has three basic disadvantages too. These are low availability, any fault or problem may affect most of the processors making less availability, high cost to link processors and third, limited extensible. Performance   Shared memory architecture provides a good performance as compared to shared nothing architecture by balancing query load on a processor with less or no work load. So...

BioMart: An Innovative and Unified Access To Biological Databases

BioMart: Recently, new high-throughput techniques have developed and increased biomedical data both in terms of complexity and quantity. However, many bioinformatics resources have been created to link significant newly generated information with previous one. Each of these resources have their own method for querying and processing information, causing problems for a scientist to use these resources in their research work. Another challenge faced by scientist is to compile results from the available resources even from few available resources due to lack of data catalogue and navigation between the existing resources by using different query interfaces. Another problem is to maintain or generate their own independent data sets. All of these problems need to be address by some common interface to facilitate research work by generating, managing data and distributing them among different scientists in some easy and simple way. All of these challenges are addressed by BioMart pr...