Introduction to Computational Biosciences - Pittsburgh ...

weinerthreeforksΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

63 εμφανίσεις

Introduction to Computational Biosciences

and Bioinformatics

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh
Sup
ercomputing Center

1


Alex Ropelewski

ropelews@psc.edu

Pittsburgh Supercomputing Center

National Resource for Biomedical Supercomputing


http://staff.psc.edu/ropelews/jsu/Begin_CS_Jackson_State_Intro_Computational_BioScience.ppt


http://compbio.jsums.edu/awareness/week1.html

Computational Biosciences

The application of

computer science, engineering,

physical science and mathematics

to the way in which plants, animals and
humans function

Computational Bioscience Fields


Bioinformatics


Structural biology



Genetic databases


Quantitative ecology


Physiological modeling


Medical informatics


Image processing and
visualization



Medical imaging


Biomedical
instrumentation


Biomathematics


Neuroscience


Telemedicine


Biomedical engineering


Other related areas

Bioinformatics

The interdisciplinary science of using computational approaches
to analyze, classify, collect, represent and store biological data
with the goal of accelerating and enhancing the understanding
of DNA, RNA and Protein sequences.



Structural Biology

The branch of the sciences concerned with the molecular
structure of biological macromolecules such as proteins and
nucleic acids, how they acquire the structures they have, and
how alterations in their structures affect their function.

Physiological Modeling

The study of the mechanical, physical, and biochemical
functions of living organisms through the use and creation of
mathematical models of physiological systems. Examples
include models of components of organisms, such as
particular organs or cell systems
.


Image Processing and Visualization

The science of organizing, displaying, and analyzing image data

taken from any living organism in a realistic life
-
like manner.


Computational Neuroscience

and Signal Processing

Applying mathematical and computational methods to understand
the signaling, control and other networks in living organisms

Who Employs Computational Bioscientists?


Pharmaceuticals & Biotechnology (Bayer, Schering
-
Plough,
Amgen, Merck, Eli Lilly, etc,)


Hospitals (particularly research hospitals)


Agriculture (Monsanto, Pioneer, etc.)


Academia (particularly research universities/institutes)


Government


NIH (many institutes including NLM, NCBI, NCI, CDC)


DOE (National labs)


Department of Defense (including Army Corps of Engineers)


Agriculture, Veterans Affairs, NSF


Government Contractors (such as Computercraft, SRA)

Computational Biosciences Job Growth

Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008
-
09 Edition.

Department of Labor, Bureau of Labor Statistics

Computational Biosciences Salaries

National Occupational Employment and Wage Estimates

Department of Labor, Bureau of Labor Statistics, May 2007

Computational Biosciences


Interdisciplinary skills are required


Require knowledge in the following areas:


Biology


Chemistry


Computer Science


Mathematics


Statistics


Physics


Engineering

Computational Biosciences Required Skill Sets


Agricultural and food scientists need “…the ability to apply
statistical techniques, and

the ability to use computers

to analyze
data and to control biological and chemical processing.”


Biological scientists “…usually study allied disciplines such as
mathematics, physics, engineering and computer science.

Computer courses are beneficial

for modeling and simulating
biological processes, operating some laboratory equipment and
performing research in the emerging field of bioinformatics”



Computer skills are essential

for prospective environmental
scientists and hydrologists. Students who have some experience
with computer modeling, data analysis and integration, digital
mapping, remote sensing and Geographic Information Systems
will be the most prepared to enter the job market”


Medical scientists “in addition to required courses in chemistry
and biology undergraduates should study allied disciplines such
as mathematics, engineering, physics, and
computer science
…”

Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008
-
09 Edition.

Department of Labor, Bureau of Labor Statistics

Computational Biosciences Required Skill Sets


“Developments in the field of Chemistry that involve life sciences
will expand, resulting in more interaction among biologists,
engineers, computer specialists and chemist.” Chemistry majors
“usually study biological sciences; mathematics; physics; and
increasingly computer science.
Computer courses are essential
because employers prefer job applicants who are able to apply
computer skills to modeling and simulation tasks and operate
computerized laboratory equipment.

This is increasingly important
as combinatorial chemistry and advanced screening techniques are
more widely applied. Courses in statistics are useful because
chemists… need the ability to apply basic statistical techniques.”
“Chemists should experience employment growth in pharmaceutical
and biotechnology research as recent advances in genetics open
new avenues of treatment for diseases…. Job growth for chemists is
expected to be strongest in pharmaceutical and biotechnology
firms.”

Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008
-
09 Edition.

Department of Labor, Bureau of Labor Statistics

Bioinformatics

The interdisciplinary science of using computational approaches
to analyze, classify, collect, represent and store biological data
with the goal of accelerating and enhancing the understanding
of DNA, RNA and Protein sequences.



What is a Sequence?


A sequence is a way to represent a protein, DNA, or
RNA molecule as a character string.

MRLLVLAALLTVGAGQAGLNSRALWQFNGM
IKCKIPSSEPLLDFNNYGCYCGLGGSGTPV
DDLDRCCQTHDNCYKQAKKLDSCKVLVDNP
YTNNYSYSCSNNEITCSSENNACEAFICNC
DRNAAICFSKVPYNKEHKNLDKKNC


Phospholipase A2
-

Bos taurus (Bovine).

Molecular Alphabet


DNA/RNA Sequences: Letters represent side chains or
bases:


A
-

Adenine



C
-

Cytosine



G
-

Guanine



T
-

Thymine (DNA)



U
-

Uracil (RNA)


X or N (Unknown)

Image from Wikipedia Commons: http://en.wikipedia.org/wiki/File:DNA_chemical_structure.svg

Molecular Alphabet


A
-

Alanine


R
-

Arginine


N
-

Asparagine


D
-

Aspartic acid


C
-

Cysteine


E
-

Glutamic acid


Q
-

Glutamine


G
-

Glycine


H
-

Histidine


I
-

Isoleucine


L
-

Leucine


K


Lysine


M


Methionine


F
-

Phenylalanine


P
-

Proline


S
-

Serine


T
-

Threonine


W
-

Tryptophan


Y
-

Tyrosine


V
-

Valine


B
-

Asparagine or aspartic acid



Z
-

Glutamine or glutamic acid



J
-

Leucine or Isoleucine



X
-

Any Amino Acid


U
-

Selenocysteine



O
-

Pyrrolysine



Protein Sequences: Letters represent amino acids:

Image from Wikipedia Commons: http://en.wikipedia.org/wiki/File:Oxytocin.jpg

G

L

P

C

N

Q

I

Y

C

What is an Information Library?


A compilation of
prior experimental knowledge

about
biologically relevant molecules into a computer system.


Bioinformatics power is in the ability to leverage and
apply this
prior experimental knowledge

to additional
biological problems.


In order to effectively search prior experimental
knowledge, the prior experimental knowledge must be
organized in a way that makes sense from both a
computer science prospective and a biological point of
view.


How is Information Organized?


From a computer
-
science perspective, there are several
ways that data can be organized and stored:


In a relational database


In a flat file



In a networked (hyperlinked) model


From a biologists perspective, there are also several
different ways that data can be organized:



Sequence



Structure



Family/Domain



Species



Taxonomy



Function/Pathway



Disease/Variation



Publication Journal



And many other ways


Representing Biological Data


Sequence Libraries:


Character based


Classification Libraries (Aligned sets of sequences):


Ambiguous consensus patterns


Weight Matrix


Position Specific Scoring Matrix (Profile)


Hidden Markov Models


Structural Libraries


X,Y,Z coordinates for each alpha carbon atom


Taxonomy


Tree structure represents the taxonomic lineage

What does a biologist do with this data?


Search for similar sequences (sequences that share a
biological relationship)

What does a biologist do with this data?


Search for similar sequences (sequences that share a
biological relationship)

What does a biologist do with this data?


Align groups of sequences that share a biological
relationship (family)

What does a biologist do with this data?


Understand phylogenetic relationships of the family.

What does a biologist do with this data?


Understand key positions (residues) of the family.

What does a biologist do with this data?


Understand how key positions affect the structure and function of
the molecule being studied

What does a biologist do with this data?


Use structural data for a molecule from one species to model a
related molecule from another species.

Job Opportunities in Bioinformatics


This course will teach you many essential skills that are
asked for in these job postings.


Let’s look at actual job postings asking for bioinformatics
expertise:


Not all jobs will be labeled “bioinformatics” or “sequence
analysis”; many are in a related computational bioscience field.


Specific skills required

Summer Internship
-
Computational Biology



Qualifications:

To be eligible for a Computational Biology Summer
Scientific Internship students will have completed their
undergraduate Sophomore year (by June 2009)


Be majoring in a biological, chemistry or computer science
program.


Candidates would have completed
at least one programming
course before the start of the internship
.


All interns must have current authorization to work for any
employer within the United States.


Experience with MatLab, SQL, C++ and/or PERL experience is
desired
.


http://jobview.monster.com/getjob.aspx?JobID=78206043&JobTitle=Summer+Internship
-
Computational+Biology&q=computational+biology&cy=us&lid=316&re=0&pg=1&dv=1&AVSDM=2008
-
12
-
18+14%3a20%3a00&seq=2&fseo=1&isjs=1&re=
1000

Bioinformatics Assembly Analyst

Responsibilities:



assembling genome sequence data using a variety of tools and parameters and
performing the experiments needed to evaluate sequencing strategies


using existing software and databases to analyze genomic data and correlating assemblies
and sequences with a variety of genetic and physical maps and other biological
information


identifying problems and serving as point of contact for various groups to propose and
implement solutions


proposing and implementing upgrades to existing tools and processes to enhance
analysis techniques and quality of results


developing and implementing scripts to manipulate, format, parse, analyze, and display
genome sequence data; and developing new strategies for analysis and presentation of
results.

Requirements:



a bachelor's degree in biology or related field


at least three years of experience in DNA sequencing
and sequence analysis.



Must possess solid knowledge of sequencing software and public sequencing databases.



Knowledge of bioinformatics tools helpful.


http://sh.webhire.com/servlet/av/jd?ai=631&ji=2285147&sn=I


Bioinformatics Analyst:

Responsibilities:


The Bioinformatics Analyst will process sequence data and apply quality control measures for generating high quality
raw sequence and assembled data from next generation sequencing technologies.




Will perform whole genome alignments using existing alignment tools, including BLAST, mummer and patternhunter
Perform mapping and post
-
mapping analysis with short reads using third
-
party and internally developed tools.




Responsible for receiving, processing and managing sequence data.




Evaluate new methodologies and tools and improve data processing and quality control protocols.




Develop suitable metrics for reporting the completeness and quality of the sequence delivered to the customers.




Requirements:


B.

S.


in biology, computer science, bioinformatics or related field, or equivalent combination of education and
experience


A minimum of 2 years experience in genomics and bioinformatics
-
related work.




Proficiency in Unix and experience in one or more of these programming languages
-
perl, SQL, jython and java is
required.



Familiar with the use of commonly
-
used

sequence analysis tools and genomic databases


Willing to multi
-
task and respond to new challenges as required.



Excellent communication skills.



Hands
-
on experience in a research or production environment

http://jobview.monster.com/getjob.aspx?JobID=78527133&JobTitle=Bioinformatics+Analyst&brd=1&q=bioinformatics&cy=us&lid=316&re
=13
0&AVSDM=2009
-
01
-
09+12%3a56%3a00&pg=1&seq=11&fseo=1&isjs=1&re=1000

Business Systems Analyst:

Responsibilities



The ideal candidate should be a highly motivated team player
with a strong understanding of informatics solutions
to biology and chemistry, especially in the area of data visualization/

statistical analysis and with proven record of
building/

integrating effective tools for scientists to help them in their daily work.




Actively work with scientists/

computational biologists in a disease area to understand their needs


Define proper data analysis solution(s) to meet their scientific needs


Perform rapid prototyping to refine the requirements with proper documentation


Work with internal and external software teams, where appropriate to design/

implement proper solutions to meet
scientists' needs


Work either as a team member or lead a team to deliver data analysis platforms to scientists/

computational
biologists


Work effectively with different NITAS groups to ensure a globally consistent implementation scheme.




Requirements:




Bachelor's degree in computer science, Biology, Bioinformatics or comparable qualification


At least 3
-
5 years hands
-
on experience on data analysis in a drug discovery, scientific or biotech environment


Strong communications and interpersonal skills


Proven capabilities interacting with scientists and being customer service oriented


Ability to work independently and/

or as part of a team


Familiarity with scientific LIMS such as ActivityBase, and data visualization/

analysis tools such as Spotfire


Solid understanding of relational databases and familiarity with Oracle and/

or SQL server


Good understanding in fundamentals of software engineering.


Summary


Wide variety of jobs


Biology, especially molecular biology and genetics


Some statistics


Computer skills:


UNIX


Bioinformatics Tools


Database (SQL)


Some Programming


Web


Bioinformatics can be a rewarding career path


National Resource for Biomedical Supercomputing