Project Proposal - Filebox

wickedshortpumpBiotechnology

Oct 1, 2013 (3 years and 11 months ago)

80 views



1




Abstract


DNA microarray data is available from numerous
different sources in a wide array of formats. In order to use this
data efficiently and effectively different tools must be made
available to the end user. Our approach is to provide a
visualiz
ation tool that offers a new or enhanced approach to what
is currently provided by existing systems to date. We propose
bringing a better capability to the end user with a better
visualization and analysis tool so a researcher can arrive at a
proper domain
-
relevant determination. The system will enhance
the users' data perception and offer the ability to visualize gene
expression microarray data in a three dimensional context.


Index Terms

visualization, data mining,
bioi
nformatics,
genes, microarray.

I.

I
NT
RODUCTION


Most gene expression microarray data sets have some form
of visualization that can be used to analyze the data.
Unfortunately, visualization still has a long way to go to
provide the researcher enough tools to make an intelligent
decision based
on his/her observations. In order to facilitate
this, we propose to develop a 3D visualization system which
has a better capability than those existing today. The amount
of bioinformatics data is massive, and a problem with any
massive amount of data is m
aking sense of it and transforming
it into knowledge that people can use. There is much research
being done to try to solve the problems inherent in this task,
and in this paper we explore some possible solutions in two
areas; Visualization with 3D mapping

and application of
dimensional reduction algorithms. By providing the researcher
with the best domain
-
specific views of the complex nature of
genes and their interactions, we believe a more comprehensive
solution can be obtained than is available today.



Visualization tools are very useful when trying to examine
gene data
-

a picture is, after all, worth a thousand words. A
static or dynamic image can convey large amounts of
knowledge to the human brain in seconds, whereas reading a
table of data and

int
erpreting it might take hours, if even
possible at all. In this regard our approach for DNA
microarray data visualization and analysis will enhance the
known tool set.


The main purpose of this project is to apply data mining
algorithms and create ne
w or improved visualization
techniques to enhance the capabilities of bioinformatics.










Bioinformatics is an exploding discipline in which over 300
visualization tools are currently in use or development. Each
day brings more and more massive amounts of

data which
need to be analyzed by the researcher. Depending on the
problem at hand the researcher needs a visualization tool that
will provide the right perception of the data. We believe our
approach will provide the proper domain
-
specific
interpretation

of the data.


The rest of this paper is laid out as follows: Section II
examines some of the related works, Section III discusses the
proposed system architecture and technologies, and Section IV
provides

a division of responsibilities
for the team me
mbers
and establishes a draft schedule.

II.

R
ELATED WORK


There are a number of related systems that perform Gene
Expression DNA Microarray data visualization. Some exiting
visualization tools for gene expression layouts are related to
our approach include b
ut are not limited to:

---
BioLayout Express 3D (2006
-
2010)

---
Cytoscape (Shannon P 2003)

---
GeneVAnD (Matthew A. Hibbs 2005)

---
Gene DIVER(Gunjan Gupta 2010)


Additional related resources we will leverage for both
domain and general knowledge include:

-
--
The Gene Ontology (2000)


though the type of data in this
project is different from what we will use, the website contains
useful data on existing tools and methods

---
Visualization of microarray gene expression data
(Prasad
and Ahson 2006)


study of v
arious visualization methods,
weighing the strengths and weakness of each

III.

P
ROPOSED
V
ISUALIZATION APPROAC
H


Our approach for the visualization tool enhancement of
DNA microarray gene expression data is to provide a 3D
capability to enhance t
he
researcher’s

perception.


We will implement our

system using
Weka
which
is a free
open source data mining tool written
in
Java.


It contains a
number

of machine learning algorithms for classification,
regression, clustering, and

attributes

selection.


We plan to
ap
ply

Weka to perform PCA and ISOMAP
or some other non
-
linear reduction algorithm
to the ColonCA dataset.


The
ColonCA dataset contains 62 tissue samples of

which 22 are
normal and 40 are from tumors.


Each sample was analyzed
using

microarray technology and

2000 genes were selected.




Design and Implementation of a Visualization Tool
for Colon Cancer Genes


Mike Fritz, Eric Frohnhoefer, Daniel Pechulis

Computer Science Department

Virginia Polytechnic Institute and State Univer
sity



2

In regards to the visualization parts we will c
reate a 3D scatter
plot plug
-
in for Weka.


Scatter plots can be used to plot

genes
after Principal Component Analysis (PCA).


In addition to
PCA the scatter

plot can also be used t
o show the results of
classification.


Using a 3D plot

three attributes can be used to
graph all the points.


Various colors can be

used to
differentiate samples from different classes.


We propose to
use JOGL
,
a Java
OpenGL binding, to create the 3D

visua
lization.


OpenGL is the industry

standard for high
performance 3D graphics.


Using OpenGL will allow
rendering

of very large datasets and quick manipulation of the
graph.

IV.

R
ESPONSIBILITIES AND
S
CHEDULE


While each member of our group will assist wherever
n
ecessary, we intend to split up the division of labor. Please
note some tasks can be done in parallel as reflected in Table 1.




Mike Fritz

o

Data format and parsing

o

Dataset

development

o

Algorithm development



Eric Frohnhoefer

o

Visualization tool

o

Java programmi
ng

o

Algorithm development



Daniel Pechulis

o

Java Development

o

Visualization tool












































Table
1
: Proposed
Draft
Schedule of
Deliverables


S
EPT
12
-
16

P
ROJECT
P
ROPOSAL

SEPT

16
-
30

R
ESEAR
CH
V
ISUALIZATION TOOLS
,

DATASET
,

AND ALGORITHMS

SEPT.

30
-
O
CT
21

D
OWNLOAD
,

FORMAT
,

PARS
E
,

AND IMPORT
DATA INTO WEKA

O
CT
.

14
-
28


CODE LINEAR AND NON
-
LINEAR DIMENSION
REDUCTION ALGORITHMS

AL
ONG WI
TH
WEKA

OCT.

14
-

NOV.

11

J
AVA DEVELOPMENT WITH

VISUALIZATION
TOOLSET

NOV.11
-
NOV18

INTEGRATE WEKA
,

ALGORITHMS
,

AND
VISUALIZATION TOOL

NOV.

18
-
NOV.

25

TESTING OF SYSTEM AN
D FINAL REPORT
ANALYSIS

NOV.

25
-
DEC

1

Y
OU
T
UBE VIDEO TESTING AN
D FINAL
PRESENTATION

DEC.

2

P
RESENTATION
,

FINAL REPORT
,

AND
Y
OU
T
UBE VIDEO



V.


R
EFERENCES

(2000). "Gene ontology: tool for the unification of biology."
Nature Genetics

25
(1): 9.



(2006
-
2010). "BioLayout Express 3D." Retrieved 09/15/2010,
2010, from
http://www.biolayout.org/
.



Gunjan Gupta, A. L., Joydeep Ghosh (2010). "Automated
Hierarchical Density Shaving: A Robust Automated Clustering
and Visualization Framework for Large Biological Data Sets."
IEEE/ACM Transactions on Computatio
nal Biology and
Bioinformatics

7
(2): 15.



Matthew A. Hibbs, N. C. D., Kai Li, Olga G. Troyanskaya
(2005). "Visualization Methods for Statistical Analysis of
MicroArray Clusters."
BMC BioInformatics

6
.



Prasad, T. V. and S. I. Ahson (2006) Visualization o
f
microarray gene expression data.
Bioinformation

1
, 141
-
145





3

Shannon P, M. A., Ozier O, Baliga NS, Wang JT, Ramage D,
Amin N, Schwikowski B, Ideker T. (2003). "Cytoscape: a
software environment for integrated models of biomolecular
interaction networks.
"
Genome Research

13
(11): 7.