Log on:
Powered by Elgg

Camilo Silva :: Blog :: BIOinformatics: Improving the Efficiency in Genomic Analysis

April 22, 2008

The main characteristic of genomic data is its large size. For
example, GDB, which is a public repository of information on the human
genome is 8GB in size; Genbank, which is the NIH sequence database,
has over 65 billion bases. The focus of this research will be in
implementing solutions for problems related to the storage, search and
retrieval of genomic sequence data using high performance computer
systems.

One of the most striking features of DNA is the extent to which
repeated substrings occur in the genome. In C. elegans (3.6 million
bases) over 7,000 families of repetitive sequences have been
identified. Families of reiterated sequences account for about one
third of the human genome.

Repeat sequences come in many different flavors and are responsible
for different functions and diseases. Finding repeats has applications
in finding defective genes, and in forensic DNA fingerprinting.

The goal of this research is to contribute in the creation of
applications that will help improve efficiency of the application of
such programs that help find and store genomic sequences. Since
computer RAM memory is an issue due to the large amounts of data being
analyzed, distributed computing techniques will be used and
implemented in order to break up the problem into smaller sub problems
to be then solved in a computer GRID environment.

References
http://www.cs.fiu.edu/~sadjadi/Teaching/Autonomic%20Grid%20Co

Keywords: Bioinformatics Genome Efficiency Analysis Biology genomic data

Posted by Camilo Silva

You must be logged in to post a comment.