Skip to main content

Genetic Algorithm and Bioinformatics



The Genetic algorithm is heuristic searching method, lies on population genetics. In 1970, John Holland introduced Genetic Algorithm (GA).  GA is a mechanics based algorithm of natural genetics and natural selection and started with population (a set of solution). A solution is characterized by a chromosome and its size is conserved during each generation. Fitness of every chromosome is assessed at each generation, and after that for the subsequent generation, chromosomes are selected probabilistically according to values based on their fitness. Some carefully chosen chromosomes are allowed randomly to mate and yield offspring. Only the chromosomes of high fitness values have high probability values for selection and new subsequent generation chromosomes have a high average fitness value as compared with the older one. This process of evolution repeated until a condition is satisfied at the end of a process. Strings or chromosomes are the solutions of Genetic Algorithms. In many cases, chromosomes are shown by strings or lists and for this reason many operations in genetic algorithms have been designed as for strings and lists. For implementing genetic algorithm, high level languages are used i.e Perl, Phython, C/Java/ C++. These programming languages are highly productive and widely used in bioinformatics.

GA is a searching method used to discover approximate or exact searching problems and optimization solutions. Genetic algorithms (GA) have been characterized as search heuristics. GA are a special group of evolutionary algorithms, using methods encouraged by evolutionary biology. These algorithms include mutation, inheritance, crossover and selection. Genetic algorithms are also used to discover optimal solutions for simple to multifaceted problems of different domain areas i.e engineering, biology, social science and computer science. These domains are using GA as an alternative to hill climbing, simulated annealing (SA), or for tattoo searching. As oppose to local searching procedures and methods, genetic algorithms lies on a set of liberate calculations well-ordered by a probabilistic approach. It is a natural selection model of fittest entities inside a sequential generation. According to classical definition, an individual is a solution for a problematic question under consideration and Population is a set of individuals under consideration. Every individual has only one chromosomal string which encodes its data properties. After that, one quantum of information is represented by a sequence of chromosomal alleles, i.e bits, digits, and letters. An alternative representation of data needs decoding and coding for exchanging solutions with nominal object space. GA is an evolutionary algorithm which solves problems without having efficient solution and optimization problems such as modeling systems, scheduling problems.

Genetic algorithm programming is a method of evolutionary algorithms which helps mapping data to a given output especially when set formulation is unknown. Programmers/mathematicians can discover procedures to resolve problems which treat with a limited number of variables, as number of variables increases from 10 or more (i.e above 50) variables, the problem under consideration becomes almost difficult to solve. If mathematical data is accessible and outputs are available then expression which combines data with answers is absent, a GA can ‘evolve’ expression tree and built close fit data. Crossing over, mutation and other components of genetic algorithms are used for for a given problem, breeding the ‘highest-fitness’ tree. It will absolutely perfect match variables with answers and will produce an output almost close to the required output or answer.

Comments

Popular posts from this blog

Information Retreival Systems in Bioinformatics: Entrez

Currently many biological databases have been developed and became an important toolbox for every scientist in research and academic purpose. Searching a sequence homologue of either Protein, DNA or to know the novelty of a sequence, one needs to do a sequence search against available databases. Similarly, searching for Open Reading Frame, structure, functional, regulatory sequences and repeated elements, we also need to search our query against different available databases. As biological data is increasing with the passage of time, its tremendous growth requires a searching and access system to retrieve useful information. In biological data, three retrieval systems are widely used relevant to a scientific need, it includes: Entrez, Sequence Retrieval System also known as SRS and DBGET. These retrieval systems let its user a text search against multiple molecular databases and also provides useful relevant information in the forms of links either internal or external to our qu...

Information Retreival System: Implementation

NCBI provides an information retrieval system, Entrez, designed to provide user friendly access to biomedical data including structural, molecular, sequences and literature.   Entrez provides access and searching facilities to more than 30 databases of genome, health, structural, literature, sequence and chemical. It provides faecet, limited and advance searching option with Boolean operators to customize user’s query. It also facilitates querying with wild card characters, mapping and controlled vocabulary. Web implementation of Entrez has more valuable applications and benefits over Network Entrez as it facilitates searching with a tremendous amount of data in different databases. Entrez provides navigational links between different databases either provided by NCBI or external (journal/databases) for each record by using two types of relationships: neighbors and hard links. Both of these types of relationships have been found on the basis of controlled vocabulary and algor...

How genetic algorithm works in Bioinformatics?

A.     Initialization Originally various individual solutions are generated arbitrarily to build initial population. Size of population depends on problem nature, but typically it carries several hundred to several thousand possible solutions. Usually, the population is created arbitrarily, covering the complete range of probable solutions. Sometimes solutions may be “seeded” where there is a chance of optimal solutions. B.     Selection During every consecutive generation, a fraction of the present population is chosen for breeding a new generation. Fitness-based process chooses individual solutions, where solutions measured through functions of fitness are usually likely to be chosen. Many selection procedures rate the fitness for every solution and specially select the one best solution among all. Some other procedures rate just a random population sample, because this procedure may be inefficient in terms of time. Most functions are designed ...