Bioinformatics for Microarray Studies - TIGP--Bioinformatics Program

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 6 months ago)

67 views

3/24/2005

TIGP

1

Bioinformatics for
Microarray Studies at IBS

Pei
-
Ing Hwang, Ph.D.

Mar. 24, 2005

TIGP

2

3/24/2005

Different aspects

for life science research

genomics

transcriptomics

proteomics

TIGP

3

3/24/2005

Building blocks for DNA or RNA


DNA: A, T, G, C


RNA: A, U, G, C


TIGP

4

3/24/2005

DNA: deoxyribonucleic acid

Double stranded


Antiparallel


TIGP

5

3/24/2005

Why microarray?


Gene Expression


To simultaneously study multiple genes


To obtain an overview of gene expression at
transcriptional level under specific experimental
conditions


To study gene interaction network from the
transcriptional aspect


Genome


SNP detection


To find out recombination site in the
chromosome/genome


Hopefully

to discover the gene responsible for a
genetic disease


TIGP

6

3/24/2005

Outline


Introduction to Microarray experiments


Experiences at IBS for the cDNA arrays


Data generated with microarray


DNA annotation


Data Analysis


Data Management



TIGP

7

3/24/2005

About Microarray Technology
-
1


Up to hundreds of thousands of spots in a
fixed area on a glass slide or a membrane


One species of DNA molecules per one
spot


Spot is also named as

feature



DNA fixed on the chip or membrane is also called

probe


The sequence or/and function of each
DNA species on the spot is known .

TIGP

8

3/24/2005

About Microarray Technology
-
2


Making use of

hybridization method





A : T, U



G : C


Image processing


Data analysis


Result interpretation from biology aspect

TIGP

9

3/24/2005

Types of Microarray


Types of DNA immobilized on the solid
support


cDNA vs. oligonucleotides


Manufacturing methods


Printing vs. photolithography


Solid support


Glass slides


Membrane


Nucleotide labeling (slide scanning condition)


One color vs. two colors



TIGP

10

3/24/2005

GeneChip
®

Array Manufacuturing

Figure 1. Affymetrix uses a unique combination of photolithography and
combinatorial chemistry to manufacture GeneChip® Arrays.

TIGP

11

3/24/2005

Microarray printing machine

http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano
-
Print
-
new
-
600.jpg

TIGP

12

3/24/2005

Procedure for
one
-
channel
array

TIGP

13

3/24/2005

Experimental
Procedure for
2
-
channel
Microarray

TIGP

14

3/24/2005

Data Analyses


Feature intensity acquisition


Image analyses:

To identify differentially expressed genes


Normalization
(global, local, print
-
tip, btwn array etc.)


Clustering or Classification


Analyses from biology aspect


Significant genes


Transcriptional regulation study


Cellular pathway or network finding


3/24/2005

TIGP

15

Experiences at IBS for the
cDNA arrays

TIGP

16

3/24/2005

About

IBS tomato arrays


~13000 spots/features per chip


1 clone per spot


cDNA clones from ~a dozen of various
cDNA libraries


At least two different protocols were
followed and six different vectors were
used


More than ten technicians involved


TIGP

17

3/24/2005

Bioinformatics for Microarray at IBS
(cont

d)


IBS tomato EST database construction


Installation, management and
maintenance of data analyses software


Reference information searching


Batch Submission of EST sequences

TIGP

18

3/24/2005

Bioinformatics Needs for Microarray
Studies at IBS


Pre
-
arraying data management


cDNA info collection, vector trimming, sequence annotation, EST
submission
……
..etc.



Array information management


Gene set characterization, data storage, data retrieval



Post
-
hybridization data analysis and
management


array data analyses, storage of the scanning result, biology
-
oriented bioinformatics analyses

TIGP

19

3/24/2005

Bioinformatics Service Work for
Microarray studies at IBS


Data pre
-
processing for the cDNAs


Clone id assignment


Sequence trimming


gene annotation


Function classification


Data sheet preparation for commercial
software to analyze microarray data


Gal file preparation for GenePixPro


Master Gene List preparation for GeneSpring


TIGP

20

3/24/2005

cDNA clones

GenePix

Spotfire,

GeneSpring

Biological meaning

:


Pathway analysis


Transcription network


Gene
-
gene interaction

Feature intensities

normalization

sequencing

PCR

Vector trimming

Assembly

Function annotation

Database

Data analysis:

Normalization,

Variance

Clustering

TIGP

21

3/24/2005

Pre
-
array Bioinformatics

clones from labs

sequencing


Raw EST seq

1.
Clone id generation

2.
Vector Trimming

3.
Sequence assembly

4.
Seq annotation (BLAST)

5.
EST submission to NCBI

6.
Database construction

Data Processing and Management

TIGP

22

3/24/2005

Clone id generation


Data centralization following sequencing


Rules for re
-
arraying


96 well plate to/from 384 well


PCR from 96 well and spotting from 384 well


Order of A1, A2, B1, B2

TIGP

23

3/24/2005

cDNA clones

sequencing

PCR

96 or 384 well

96 well

96 well

384 well

TIGP

24

3/24/2005

96
-
well to 384 well plates

A1

B2

A2

B1

TIGP

25

3/24/2005

Data

collection


Raw sequencing data obtained from the
sequencing company


Organized and stored both ABI and text files by
labs and by date


Confirmed with each sequence contributor for
clone info


Clone id matched with raw sequences



TIGP

26

3/24/2005

Processing the sequencing data


cDNA libraries procedures confirmed with
each single lab


Vector/linker/primer trimming (Seqclean)


Function annotation


Blast against different database


Gene Ontology annotation


Sequence Assembly (Phrap)

TIGP

27

3/24/2005

Procedure to generate cDNA clones

TIGP

28

3/24/2005

IBS tomato EST Database


Cloning

information


Sequencing data


Vector/adaptor Trimming
information


EST assembly


Function annotation


Cross Reference



3/24/2005

TIGP

29


ID MAP


1.
Seq id

2. Clone _ id

3.
Contig id

4. Lab_id#1

5. Lab_id#2

6. NCBI_sbmt_id93

7. NCBI_sbmt_id94

8. dbEST _ accn _no

9. note

Trimmed Sequence


1.
Seq id

2. Trimmed Sequence

3. Method

4. Trim set

Assembly Information


1.
Contig _ id

2. Contig Sequence

3. BLAST Result

4. Position

5. Component seq id


TAIR Result


1.
Seq id

2.
At number

3. E
-
Value

4. Description

5. Identity

6. Other result

NCBI BLAST Result


1.
Seq id

2.
NCBI _id

3. E
-
Value

4. Description

5. Identity

6. Other result


TIGR Result


1.
Seq id

2.
TC number

3. E
-
Value

4. Description

5. Identity

6. Other result


Lab info


1.
Seq id

2. Comment

3. Primer

4. Biotech

5. Sender

6. Collect From


cDNA Library Information


1. Clone _ id(3)(4) 8. Host.

2. Name 9. Species

3. Date made 10. Vector

4. Developmental stage 11. Antibiotic.

5. Cloning sites 12. Authors

6. Description 13. Tissue

7. Library 14. Primer


Gene Ontology


1.
TC number

2. EC number

3. Process


-
GO_id


-
Description

4. Function


-
GO_id


-
Description

5. Component


-
GO_id


-
Description

TC number

Untrimmed Sequence


1.
Seq id

2. Trimmed Sequence

Clone _ id

n

1

1

n

The Tomato Database

Entity
-
Relationship model

TOM 3

TOM 4

Clone _ id

Clone _ id

Seq _ id

TIGP

30

3/24/2005

Information to be further analyzed


Gene set characterization


Number of unique genes on the array


Number of known/ unkown genes


Coordination of each spotted sequence


Statistics about spotted cDNA


grouped by function/pathway


grouped by sequence similarity


3/24/2005

TIGP

31

Post
-
hybridization data
analysis and management

TIGP

32

3/24/2005

Post
-
hybridization data analysis


Software for Microarray Analysis At IBS


GenePix Pro5.0


image processing


GeneSpring


microarray data analysis


Spotfire


microarray data analysis and data
storage


TransPath


pathway searching


TIGP

33

3/24/2005

Image Processing


GenePix Pro5.0


GAL (GenePix
Array List) file

TIGP

34

3/24/2005

From multi
-
well plate to microarray

TIGP

35

3/24/2005

GAL online

TIGP

36

3/24/2005

GeneSpring at IBS


for microarray data analyses


standalone software


providing statistical methods for data analysis


Some bioinformatics


providing visaulization


licensed annually


rigid format requirement for input data


requiring installation of a master gene list
(master table) prior to data analysis

TIGP

37

3/24/2005

Master table for GeneSpring


Master table contains information of


Id


Source of DNA


Gene name


Gene function annotation (from Blast results)


GO annotation


Each array needs its own master table


Format of master table may vary with
different version of the software.



TIGP

38

3/24/2005

To generate master table for
GeneSpring


Batch blast against three sequence
database


Parsing Blast results


Incorporating EC number, GO number and
other related data from the best BLAST
matched results


Integrate all required data from various
files and generate the master table


checking

TIGP

39

3/24/2005

Spotfire


for microarray data analyses


server
-
client software


linked to Oracle database for data storage


providing various statistical methods for data
analysis


capability in establishing links to more
bioinformatics tools


can record analysis procedure


more flexible format requirement for input data


TIGP

40

3/24/2005

One color array for Arabidopsis


Affymetrix ATH1 chip


Annotation information provided by
company and available on internet


TIGP

41

3/24/2005

Bioinformatics support at
Affymetrix

TIGP

42

3/24/2005

Projects for now and the near future


Infrastructure build
-
up


Microarray data management system


Platform for Bioinformatics analyses


Plant Signaling Pathway Database

TIGP

43

3/24/2005

Team

3/24/2005

TIGP

44

Thank you!