Skip to main content

Genetic Algorithm and Bioinformatics



The Genetic algorithm is heuristic searching method, lies on population genetics. In 1970, John Holland introduced Genetic Algorithm (GA).  GA is a mechanics based algorithm of natural genetics and natural selection and started with population (a set of solution). A solution is characterized by a chromosome and its size is conserved during each generation. Fitness of every chromosome is assessed at each generation, and after that for the subsequent generation, chromosomes are selected probabilistically according to values based on their fitness. Some carefully chosen chromosomes are allowed randomly to mate and yield offspring. Only the chromosomes of high fitness values have high probability values for selection and new subsequent generation chromosomes have a high average fitness value as compared with the older one. This process of evolution repeated until a condition is satisfied at the end of a process. Strings or chromosomes are the solutions of Genetic Algorithms. In many cases, chromosomes are shown by strings or lists and for this reason many operations in genetic algorithms have been designed as for strings and lists. For implementing genetic algorithm, high level languages are used i.e Perl, Phython, C/Java/ C++. These programming languages are highly productive and widely used in bioinformatics.

GA is a searching method used to discover approximate or exact searching problems and optimization solutions. Genetic algorithms (GA) have been characterized as search heuristics. GA are a special group of evolutionary algorithms, using methods encouraged by evolutionary biology. These algorithms include mutation, inheritance, crossover and selection. Genetic algorithms are also used to discover optimal solutions for simple to multifaceted problems of different domain areas i.e engineering, biology, social science and computer science. These domains are using GA as an alternative to hill climbing, simulated annealing (SA), or for tattoo searching. As oppose to local searching procedures and methods, genetic algorithms lies on a set of liberate calculations well-ordered by a probabilistic approach. It is a natural selection model of fittest entities inside a sequential generation. According to classical definition, an individual is a solution for a problematic question under consideration and Population is a set of individuals under consideration. Every individual has only one chromosomal string which encodes its data properties. After that, one quantum of information is represented by a sequence of chromosomal alleles, i.e bits, digits, and letters. An alternative representation of data needs decoding and coding for exchanging solutions with nominal object space. GA is an evolutionary algorithm which solves problems without having efficient solution and optimization problems such as modeling systems, scheduling problems.

Genetic algorithm programming is a method of evolutionary algorithms which helps mapping data to a given output especially when set formulation is unknown. Programmers/mathematicians can discover procedures to resolve problems which treat with a limited number of variables, as number of variables increases from 10 or more (i.e above 50) variables, the problem under consideration becomes almost difficult to solve. If mathematical data is accessible and outputs are available then expression which combines data with answers is absent, a GA can ‘evolve’ expression tree and built close fit data. Crossing over, mutation and other components of genetic algorithms are used for for a given problem, breeding the ‘highest-fitness’ tree. It will absolutely perfect match variables with answers and will produce an output almost close to the required output or answer.

Comments

Popular posts from this blog

Information Retreival Systems in Bioinformatics: Entrez

Currently many biological databases have been developed and became an important toolbox for every scientist in research and academic purpose. Searching a sequence homologue of either Protein, DNA or to know the novelty of a sequence, one needs to do a sequence search against available databases. Similarly, searching for Open Reading Frame, structure, functional, regulatory sequences and repeated elements, we also need to search our query against different available databases. As biological data is increasing with the passage of time, its tremendous growth requires a searching and access system to retrieve useful information. In biological data, three retrieval systems are widely used relevant to a scientific need, it includes: Entrez, Sequence Retrieval System also known as SRS and DBGET. These retrieval systems let its user a text search against multiple molecular databases and also provides useful relevant information in the forms of links either internal or external to our qu...

BioMart: An Innovative and Unified Access To Biological Databases

BioMart: Recently, new high-throughput techniques have developed and increased biomedical data both in terms of complexity and quantity. However, many bioinformatics resources have been created to link significant newly generated information with previous one. Each of these resources have their own method for querying and processing information, causing problems for a scientist to use these resources in their research work. Another challenge faced by scientist is to compile results from the available resources even from few available resources due to lack of data catalogue and navigation between the existing resources by using different query interfaces. Another problem is to maintain or generate their own independent data sets. All of these problems need to be address by some common interface to facilitate research work by generating, managing data and distributing them among different scientists in some easy and simple way. All of these challenges are addressed by BioMart pr...

Information Retreival System: Implementation

NCBI provides an information retrieval system, Entrez, designed to provide user friendly access to biomedical data including structural, molecular, sequences and literature.   Entrez provides access and searching facilities to more than 30 databases of genome, health, structural, literature, sequence and chemical. It provides faecet, limited and advance searching option with Boolean operators to customize user’s query. It also facilitates querying with wild card characters, mapping and controlled vocabulary. Web implementation of Entrez has more valuable applications and benefits over Network Entrez as it facilitates searching with a tremendous amount of data in different databases. Entrez provides navigational links between different databases either provided by NCBI or external (journal/databases) for each record by using two types of relationships: neighbors and hard links. Both of these types of relationships have been found on the basis of controlled vocabulary and algor...