FW551A (Special Topics) Molecular Genetics of Trees

fabulousgalaxyΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

80 εμφανίσεις


1

FW4089: Bioinformatics (3 credit
s
)

FW5089: Tools of Bioinformatics (4 credit
s
)





Time:

Every Tuesday and Thursday, 9.35 am to 10.55 am (3 hours)



Place:
Forestry, Room No. 139


Note: Presentation of class paper w
ill be arranged sometime in early April 2006.
Final exam will be held sometime in the week of April 24 to 28, 2006.


Instructor:


Shekhar Joshi (C. P. Joshi),

Associate Professor of Plant Molecular Genetics, SFRES

Room 168, Forestry, Phone: 487
-
3480 (cpjo
shi@mtu.edu)

Office hours: 9 am to 6 pm except when I teach this class!


Teaching assistants:

Shiv T. and Frank Xu

(FMGB Graduate students)


Course Description


The main purpose of this course is to provide
extensive
hands
-
on
-
experience in
using a variety
of
Bioinformatics tools

and

in future you could extrapolate

that
knowledge to other fields of biology such as genomics, molecular phylogenetics, and
biotechnology. You will not write

Bioinformatics

programs but

will
use the available
ones for extensive seq
uence analysis.


Why was this course proposed?


A number of sequence analysis packages and databases are currently available from
the
commercial
sources
as well as public web sites. In our day
-
to
-
day molecular
biology
research, we use some of these program
s and databases to analyze the
significance of the new
genetic
information that we obtain. But it is not always easy
to choose the correct approach

or appropriate tool
. Databases are growing at a
very
fast pace and new questions are constantly popping up.
Moreover, genomics is a
new and exciting field of biotechnology that has recently witnessed many conceptual
and technical advances. Ability to make sense of this information explosion will
make our students more competitive in the current job markets in th
e fields of
academics and industries. There is no doubt that this knowledge will be extremely
valuable for living in this century.


2



FW4089/5089


Tools of Bioinformatics




GENERAL TEXTBOOKS (Optional Reading material)








1)

Genes VII


Benjamin Lewin, 2000, Oxford University Press





2)


Molecular Biology


Robert F. Weaver, 1999, McGraw
-
Hill Press


3)

Bioinformatics

David W. Mount, 2001, CSH Press


All these books will provide
only supplemental material for the
course and may be available at the MTU Book Store or in the
library.



Reading materials for the topics being covered in the class will
be provided.


Although there is no specific prerequisite for this class, it is
advis
able to have taken at least one of the following and have
some background in genomics and bioinformatics:


BL4030: Molecular Biology

FW4087/5087: Plant Molecular Genetics




3

FW4089/5089

Tools of Bioinformatics



THIS COURSE WILL NOT TEACH YOU HOW TO WRI
TE PROGRAMS.


Bioinformatics Reference Books available in the MTU Library




Guide to Human Genome Computing (Second Edition)
by Bishop MJ



Call No
. QH445.3 .G85 1998



Bioinformatics: The machine learning approach by
P. Baldi and S. Brunak



Call No.
QH506 .
B35 1998



Sequence Analysis in Molecular Biology
by G Von Heijne


Call No.
QP551 .H43 1987



Biological Sequence Analysis: Probabilistic Models Of Proteins And Nucleic Acids


by R Durbin, S. Eddy, A. Krogh, G. Mitchison

Call No.
QP620 .B576 1998



Algorithms O
n Strings, Trees And Sequences: Computer Science And
Computational Biology
by Dan Gusfield

Call No.
QA76.9 .A43 G87 1997




Introduction To Computational Biology
by Michael S. Waterman


Call No.
QH438.4 .M33W38 199
5



Calculating The Secrets Of Life
by Er
ic Lander And Michael Waterman


Call No.
QH438.4 .M3 C35 1995
.


Some internet addresses where Bioinformatics information is available:


National Center of Biotechnology Information (GenBank)
http://www.ncbi.nlm.ni
h.gov/

Genetics Computer Group: http://
www.GCG.com

Protein analysis:
http://www.expasy.ch

Celera Genomics:
http://www.celera.com


4





FW
4089/5089

Bi
oinformatics



GRADING SYSTEM



Grade Scale




100
-

95 = A

Excellent




94
-

90 = AB

Very Good






89
-

85 = B

Good




84
-

80 = BC

Above Average




79
-

75 = C

Average




74
-

70 = CD Below Av
erage




69
-

60 = D

Inferior




60
-

= F

Failure




Course Points





Home work
, quiz etc=

30%



Mid
-
term

Exam 1 = 30%



Final Exam = 30%



Class
Participation= 10%


Exams: The
midterm and
cumulative final
s

will be worth 100 points.




Class Paper = One Credit

for FW5089





5

Jobs! Jobs! Jobs!


Current Job trends:
http://www.sloan.org/programs/scitech_page1.html


Jobs in Genomics:
http://www.genomejobs.com


See also Science and Natu
re for Job ads.



Bioinformatics is a young science but the information explosion has
demanded more people in academics and industries. It is easy to get either a
molecular
biologist or a computer scientist but th
e

job
of bioinform
a
tician
needs both. Biolo
gist who can compute and a computer scientist who can
make sense out of biological data are hot commodities.


Supply and demand!



This is what I heard but do not quote me anywhere!


MS in Bioinformatics: 60
-
100 K

Ph.D. in Bioinformatics: 80
-
100K or highe
r


All CS people do not find money that attractive! But those who are
interested in the topic do very well

in this field
. New challenges and
questions biologists are facing every day and CS is providing the answer.
True collaboration!


Having this course l
isted in your CVs will help in your job prospects.



6

http://www.bio.mtu.edu/campbell/bl4820/intro/plagiarism.htm

Plagiarism
-

What It Is and How to Avoid It!

Adapted from Notes pr
epared by Ron Gratz

Scientists do not work in isolation from each other. Attendance at scientific meetings
exposes us to the work of our colleagues and allows for the free exchange of ideas.
Reading the published literature in our fields is vital for all s
cientists, who must keep
themselves current with what is being done in other laboratories. Scientists continually
refer to the work of their colleagues and most scientific research is based at least in part
on ideas derived from others. Review articles and

textbooks are often wholly based on
already published work. It is thus necessary for you as developing scientists to learn how
to properly use previously reported knowledge.

While a free flow of ideas and information is vital to scientific progress, it al
so presents
avenues for fraud, particularly plagiarism. Plagiarism can be defined as "Taking the ideas
from another and passing them off as one's own" (Webster's New World Dictionary) and
is unacceptable under any circumstances. Despite this universal disa
pproval, it is one of
the more common faults with student papers. In some cases, it is a case of downright
dishonesty brought upon by laziness but more often it Is lack of experience as how to
properly use material taken from another source.

To avoid plagi
arism you must not only properly attribute the ideas of another but must
also either paraphrase what the original author said or wrote or you must enclose that
person's exact words in quotation marks. To use another's exact words with attribution
but witho
ut quotation marks implies that the ideas belong to the original source but that
the words are your own. Besides being dishonest, copying another’s work defeats the
purpose of your education. Writing about the subject you are studying is a great way to
lea
rn. Ideas become more firmly implanted in your memory if you have to think about
them and then write a coherent statement using them. Copying another’s work prevents
you from learning, which is the whole purpose of your education.

Whenever the words or ide
as of another individual are used, proper attribution must be
given. In other words, you must give credit for those ideas and words to their originator.
Not to do so is a clear case of plagiarism. Plagiarism in classwork may result in a failing
grade or ev
en expulsion from the university. Plagiarism in professional work may result
in dismissal from an academic position, being barred from publishing in a particular
journal or from receiving funds from a particular granting agency, or even a lawsuit and
crimi
nal prosecution.

In a review article, the author attempts to summarize all of the pertinent work done in a
particular field of study. The goal is generally twofold: (1) to report what has been done
and what has been learned; and (2) to use this knowledge
to generate general conclusions
based on these previous works. The author of a review article must be able to present the
cited work accurately and be able to synthesize new ideas from this work. In order to

7

accurately represent the work of others and at t
he same time avoid plagiarism, the author
of a review will often paraphrase the statements made in the cited work.

The problem for many students, and some professional scientists, is that they do not know
how to properly paraphrase another's words. Several

general rules for paraphrasing that
are relevant for students learning to master this skill are:

1. You should change both the sentence structure and the non
-
technical terms in order to
avoid plagiarism.

2. You can also avoid plagiarism by altering the se
quence of subject matter within and
between sentences.

3. Don't paraphrase technical terms unless you are certain of their exact meaning and can
provide an exact equivalent.

4. Accredit the original author within the group of sentences using his/her work.


8

FW4089
and

FW5089: Bioinformatics questionnaire



Your name:


ID number:


Department:


Graduate student/Undergraduate:



Name of Advisor if Graduate student:



Motivation for taking this course:




Previous experience with Unix, GCG or other sequence anal
ysis packages












What do you expect to get out of this course?





Have you understood the problems of plagiarism? Yes No


Do you know what my office hours are? Yes No


Are you clear about grading policy? Yes No

9


First QUIZ
of Plant Bio
informatics


Date: January 10, 2006


Write one line answers to as many questions as possible in next 45 minutes. Feel free to
refer to books/web etc. This will not be counted towards your grade. I just want to know
where you stand with molecular biology ba
ckground:


1.

DNA stands for


2.

RNA stands for


3.

DNA is made up of


4.

RNA is made up of


5.

What is the difference between Deoxyribose sugar and ribose sugar?


6.

What are the different types of nitrogen bases in DNA?


7.

What are the different types of nitrogen bases in
RNA?


8.

What is the difference between purines and pyrimidines?


9.

Name 2 purines and three pyrimidines


10.

Which purine pairs with which pyrimidines? State the number of H bonds
between each pair.


11.

What are the differences between DNA and RNA?


12.

What is transcrip
tion and translation?


13.

What is central dogma in molecular biology?


14.

What is reverse transcription?


15.

What is a prokaryote?


16.

What is a Eukaryote?


17.

What are the differences between prokaryote and Eukaryotes


18.

What is a genome?



10

19.

What is genomics?


20.

How many geno
mes are present in viruses, prokaryotes, plants and animals?
Where?


21.

What is bioinformatics?


22.

What is the biological name for humans (binomial)


23.

How big is the human genome?


24.

How many chromosomes are there in a human diploid and haploid cell?


25.

How are hum
an genes arranged in the genome?


26.

How many human genes are there?


27.

What proportion of human genome is made up of genes?


28.

What is a gene?


29.

Why eukaryotic genes are said to be split?


30.

How does DNA replicate? Conservatively or semi
-
conservatively? What is th
e
difference?


31.

How does DNA make RNA?


32.

How many types of RNA are produced in a cell?


33.

How many of these RNAs are said to be protein coding?


34.

What is pre
-
mRNA? Is it present in bacteria?


35.

What are the main three steps in pre
-
mRNA processing?


36.

What is the 5’
leader and 3’trailor sequence in pre
-
mRNA?


37.

What is the difference between exons and introns?


38.

How are introns spliced off?


39.

Why are introns there?


40.

How transcription process in regulated in prokaryotes?


11


41.

How transcription process is regulated in eukaryot
es?


42.

What is a TATA box and AATAAA box?


43.

What is a transcription factor?


44.

Why TFIID is said to a commitment factor?


45.

What is a transcription start site?


46.

What is polyadenylation? Why is it an important biological process? Is it present
in bacteria?


47.

Descr
ibe the process of polyadenylation.


48.

Define “protein”. What alternative forms are proteins present in a cell?


49.

How many types of amino acids are typically present? Name five amino acids?
What are their 3 letter and 1 letter codes?


50.

How does a code present
in DNA is used to make proteins?


51.

Do you believe that genome is life’s instruction book? Why?


52.

If you have a disease gene (what does that mean), do you always get the disease?


53.

What is a mutation? Name a few types of mutations.


54.

What are the translation st
art and stop sites?


55.

What is tRNA?


56.

What is rRNA?


57.

What is ribosome?


58.

What is the genetic code? Who discovered it?(Bonus)


59.

Is genetic code Universal? What does it tell about our evolution?


60.

Why a code is said to be made up of triplet?


61.

What is codon bias?



12

62.

What is wobbling hypothesis?


63.

Who discovered the structure of DNA?


64.

What is reverse transcription? Who discovered it?


65.

Do you believe that viruses are most evolved organisms? If yes, Why? If not why
not?


66.

What is mitosis and meiosis?


67.

What are the main s
teps in mitosis? How many cells are produced at the end of
one cycle of mitosis?


68.

What are the main steps in meiosis?

How many cells are produced at the end of
one cycle of meiosis?


69.

What is the recombination?


70.

Do bacteria recombine?


71.

What is DNA sequencin
g? Who discovered it?


72.

What is dideoxynucleotides? Why they are important in sequencing?


73.

How can you sequence a gene?


74.

Why DNA sequence is written in only one line when it is double stranded?


75.

Which DNA strand is always denoted when writing a gene sequen
ce?


76.

How can you derive which protein a gene encodes by just looking at a gene
sequence? (BONUS).














13

Bioinformatics and
The
Human
G
enome


Human genome is the biggest gift

of science
to humanity.


We have achieved something new in 2001 that we

ha
ve only dreamed of for many
years.
Human genome is just the beginning of our exciting and sometimes fearful journey. Fear
of unknown lurks around there but the promise of tomorrow is also bright and vivid.


Sequenced organisms (From Science 291, Feb 2001
pp 1178)


Organism



genome size

year completed

No. of genes


H. influenzae



1.8 MB

1995



1740

S. cerevisiae (yeast)


12.1 MB

1996



6034

C. elegans (worm)


97 MB


1998



19099

A. thaliana (water cress)

100MB

2000



25,000

D. melanogaster (fruit fly)

180

MB

2000



13,061

H. sapiens(human)


3000 MB

2001



35
-
45,000

Rice…Poplar…mouse…

more than 200 genomes sequenced and list is ever
-
increasing.


Human genome was a dream for which thousands of scientists worked for over 15 years.


Celera and HGP provided tw
o books for price of one. Celera achieved it in 3 years but
heavily depended on public data. How did we do what we set out for? That is what is now
written in Science and Nature articles.


What it means is still unknown.


They say that 200 telephone books

of New York equivalent pages will be needed to print
3 billion bp of genome per cell. But Internet would allow this easily.


Humans were supposed to have 100,000 genes but seems like only 32,000 are possible.


Does that make
humans

less powerful or inad
equate in any way?


No, “The purpose of science is to find meaningful simplicity in the midst of complexity”
Herbert Simon (Nature 409, 771, 2001). DNA structure and PCR are best examples.




One gene work
s

harder
at many places and
many
times
. So less is
better in that
crammed nuclear space.

Alternative splicing.



Human proteins have
the
same domains as worms but the way these domains
come together is unique.



We will know one day what makes up a human.



We
all a
re unique
!

A
ll sexually reproduced organism
s have the entire ensemble of
the
gen
es

in one organism only once. One genotype occurs only once
.


14


There are
also
some surprises in human genome!




SNPs accumulate with a specific pattern



Regulatory CpG islands occur more in gene rich regions than gene less



TEs in gene poor regions



Only 1.1
-
1.5 % of the genome is coding not even 3% as widely estimated earlier



Parts of chromosome 12 in men and chromosome 16 in women are recombination
prone.



Repetitive DNA is only 40
-
45%



Humans share 223 genes from bacteria th
at are absent in worm, fly and yeast
genome.



Did genome duplicate early on similar to plants?



We will know how humans develop from zygote: ontogeny



We will know our phylogeny looking at ontogeny: molecular archeology



One day we will be trace our evolution
using the genome information.



Geneology of human race!


CLASS PAPER (1 credit worth of extra work)


Each of you will select a different gene family from human genome to write an essay on


How to build a better human?



You will also present your research
finding to class. You may select either a human
disease or a trait that you are interested in studying further. Collect all necessary
background information and collect genes associated with your topic. Find the
counterparts of your gene of interest in oth
er organisms and develop a phylogenetic tree.


You are expected to use as many bioinformatics programs as possible that you learnt in
this class to create a comprehensive database of genes that you have selected.


Important: Provide me with a list of all

reference work (printed materials and web site
addresses) that you used. Write in your own words. I plan to put your essays and
databases on web so watch out that you are not accused of plagiarism. See the handout for
more information on plagiarism.


For
FW5089: You have to do one more extra project to earn the fourth credit. I will
discuss this separately with you all.







15

FW4089: How to use
GCG

in the GIS lab?


Sit on any computer and shake the mouse to activate or wake the computer up.
Press
control a
lt delete and then


Enter your username and password (first initial of your first name and first 7 numbers of
your id)


Your userids may be the MTU ones.





The following procedure you will do every time you come for the class

(unless things
change in nex
t few days due to new arrival of GCG at Mango server)
:


Go to telnet and connect with oak by typing



telnet oak.ffr.mtu.edu


You will get window
for login: type your login name and

enter password; see
oak
%


Type
source

/gcg/gcgstartup

then hit retur
n

Then type
gcg


You should see
GCG logo
!


Start using GCG programs!


For GCG manuals go to:


http://forestry.mtu.edu/manuals/gcg/index.htm


16

Tutorial on using Unix:


Useful Unix Commands: GCG is unfriendly!! It is not Mac or PC based.


Not for distributi
on. For personal use only.


Login: connect or telnet with
oak

the server where GCG is loaded!

Type the password correctly and enter

You should see
oak
%

Logout: Do not forget to logout at the end of the session. Nothing saved will be lost.


Important note:
Do not give your username or password to anyone. If someone wants to
use it for GCG, ask him or her to contact his or her supervisor and then me. Any
unauthorized use will cost you the loss of GCG privileges.


UNIX Commands


UNIX commands are entered at th
e prompt> and delivered to the system with the
<RETURN> key.


UNIX commands have a syntax, just like any language; there is a correct order for the
words in a command, and MANY incorrect orders. Mix up the order, and UNIX is
unlikely to be clever enough t
o understand what you want it to do!
It is a d
umb
Computer!


The most general form of UNIX command syntax is


Prompt> command
-
flag(s) argument(s)


Prompt. =
oak
%


The command is WHAT you want to do, the
-
flags help refine the command, saying
HOW you wa
nt it done, and the arguments tell the OBJECT of the command
-

the things
to be acted upon.


UNIX expects all of its commands to be lower
-
case, though flags and arguments may be a
mixture of cases. Remember,
UNIX is case
-
sensitive
!


As a trivial example,

suppose you wanted to translate the following English request



"Would you please quickly shovel the snow in the driveway today?"


into UNIX. The translation might look something like



17


prompt> shovel
-
quickly
-
today snow


In fact, given the a
bsence of vowels and longer words from most UNIX commands and
flags, the actual command is more likely to be



prompt> sw
-
f
-
n snow


where sw is short for shovel,
-
f is short for fast (=quickly), and
-
n is short for now
(=today).


For a genuine exa
mple of a UNIX command, consider



mango% ls
-
la Dirname


Here, ls is short for list,
-
l is short for long (=all details), and
-
a is short for all (=all files,
even the hidden ones). Dirname is the name of the directory of files for which you want
th
e listing.


Finally, when using GCG commands in UNIX, there is one important "feature" for
the arguments; the case you use for the names of database entries is unimportant,
but all filenames must be in lower case and typed or copied and pasted correctly.


Text files


Data on computers (text, programmes, sequences etc.) is held in blocks of information
called 'files'.


Different files have different names and/or different locations
-

and there is a convention
that filenames end with a three
-
letter extensi
on that indicates the type of data

held in the file, e.g., .txt for text, .seq for sequences, .pep for peptides, .dat for generic
data, etc.


Files can be created, deleted, altered, overwritten, moved around, copied, renamed,
printed out to a screen or a
printer, searched, compared, sorted, counted and transferred
over the network to computers on other sites.


Some UNIX commands for file management:



touch filename
-

create a file [ holding no information! ]



pico filename
-

edit the file
using the pico editor [ use <CTRL> X to exit ]



cp filename newfilename
-

copy a file to a new file [ retains the old file ]



mv filename newfilename
-

move (rename) a file to a new file [ deletes the old file ]


18



cat filename
-

conc
atenate (print) a files contents to the screen



more filename
-

print a files contents to the screen, one page at a time [ use
<SPACE> to see the next page ]



cat filename1 filename2 > filename3
-

concatenate (print) the contents of the fir
st two
files into the third



rm filename
-

remove (delete) the file
dangerous to use with wildcard *



Exercise DNA Analysis
-

UNIX 1: create and manage files



Create a file named easyunix.txt



prompt> touch easyunix.txt



(NB: you may

use any UNIX text editor you like
-

pico is


probably the simplest

but we will use vi today
)



prompt>
vi

easyunix.txt



Edit the file and enter "UNIX is EASY!". Exit by typing
:X

and save the changes.



To p
rint easyunix.txt to the scre
en.



prompt> more easyunix.txt



Copy easyunix.txt to the file opinion.txt (How would you do this with cat? Hint!)



prompt> cp easyunix.txt opinion.txt



Rename easyunix.txt to unixcmds.txt



prompt> mv easyunix.txt unixcmds.txt



Edit the file unixcmds.txt

with vi editor
. Move down the screen with the arrow cursor
keys and type what you now know about UNIX. Exit and save the new changes.



prompt>
vi

unixcmds.txt



Print unixcmds.txt to the screen to see how clever

you have become.



prompt> more unixcmds.txt



19


Delete opinion.txt.



prompt> rm opinion.txt


Directories


A directory is a group of files or other directories. A directory within another is often
called a sub
-
directory, to reflect this hie
rarchical organization.


Directories can be created, copied, deleted, renamed, searched and transferred over the
network to computers on other sites. Files can be moved between or copied among
specified directories.


You work in one directory at a time.
This is known as the present working directory. The
directory you begin with when you login is your home directory.


PWD: print working directory


You can easily return to your home directory from any other directory by giving the
UNIX command "cd" with n
o argument.


Some UNIX commands for directory management:



cd dirname
-

change to the directory named dirname



cd ..
-

change to the directory above the present one [ ".." = up ]



cd
-

change to your home directory [ the default ar
gument for cd is your home
directory ]



ls
-

list the files in the present working directory



ls
-
l
-

a file list that is longer, more detailed



mkdir subdirname
-

make (create) a new sub
-
directory in the present directory



rm
dir subdirname
-

remove (delete) a sub
-
directory in the present directory



mv filename dirname
-

move a file into a sub
-
directory






Exercise: create and manage directories




20


Create a sub
-
directory named Unixinfo



prompt> mkdir Unix
info



Switch your present working directory to the new sub
-
directory



prompt> cd Unixinfo



Check to see you are there



prompt> pwd



Move a file from the directory above into your new present working directory (".." is a
short
form for the directory above, and "." is a short form


for the present directory)



prompt> cp ../unixcmds.txt .



Has the file moved? It should occur in the second list (";" separates the two list
commands)



prompt> ls
-
l .. ; ls
-
l



Get back to your home directory



prompt> cd






21