Student Name: Justin Beyer
Supervisor’s Name and Title at FIU/FAU: Xingquan Zhu
Name of the PIRE International Partner’s Institution: BSC
Supervisor’s Name and Title at the PIRE International Partner’s Institution:
Dr. Josep Lluis Gelpi @ Barcelona Supercomputing Center
Project Title: Gene Selection for Cancer Classification
Problem Statement: The purpose of this project is to use data mining tools to select a set of important genes for cancer classification.
Motivation and Impact: Gene expression data provides important information for scientists to study cancer tissues and classify important genes associated to different types of cancers. Because micro-array experiments usually produce a large number of genes (e.g., more than 10,000 genes) based on a very limited number of tissue samples (e.g., less than 100 samples), screening genes and finding important ones associated to the diseases becomes a huge burden for the scientists. This research intends to use a set of data mining tools to discover a small subset of genes related to the disease tissues, and further uses of the selected genes to build machine learning classifiers to automatically predict whether an input tissue is diseased or not, as well as the likelihood of the tissue belonging to a certain type of cancer.
Current Status: We are currently using existing data mining tools to investigate a set of collected gene expression data, which include samples from breast cancer, colon cancer, and prostate cancer. We will identify a set of genes associated to each particular cancer and further combine domain knowledge to refine the gene selection process.
Research Roadmap:
Week 1-2: Using existing data mining tools and feature selection methods to select a number of genes for each particular caner.
Week 3-4: Determining the most effective tools for gene selection, and further validating the effectiveness of the selected genes in supporting cancer classification.
Week 5-6: Combining domain knowledge to refine the gene selection results
Week 7-8: Comparatively study the improvement of the gene selection for cancer classification, and investigate the gene interactions related to different types of cancers.
Week 9-12: Technical report.
Relation to PIRE Core Research Projects: One important goal of the PIRE project is to leverage the computing resources (e.g. LA grid) and boost international cooperation for Bioinformatics related research activities. This research will bring experts from FAU and Barcelona Supercomputing Center to carry out research collaboration on molecular biology and bioinformatics related research topics.