R. P. Deolankar

frequentverseUrban and Civil

Nov 16, 2013 (4 years and 7 months ago)


R. P. Deolankar

Half knowledge is always dangerous

Wet lab

A laboratory allowing for hands
on scientific research
and equipped with

Appropriate plumbing



throughput technology

The technology handling high volume of data or

scale methods to purify, identify, and
characterize DNA, RNA, proteins and other molecules.
These methods are usually automated, allowing rapid
analysis of very large numbers of samples.


A tool used to sift through and analyze the
information contained within a genome. A microarray
consists of different nucleic acid probes that are
chemically attached to a substrate, which can be a
microchip, a glass slide or a microsphere
sized bead.

DNA microarray

A microarray of immobilized single
stranded DNA
fragments of known nucleotide sequence that is used
especially in the identification and sequencing of DNA
samples and in the analysis of gene expression (as in a
cell or tissue)

Protein microarray

Protein microarray is a piece of glass on which
different molecules of protein have been affixed at
separate locations in an ordered manner thus forming
a microscopic array.

Mass spectrometry

An instrumental method for identifying the chemical
constitution of a substance by means of the separation
of gaseous ions according to their differing mass and

called also mass spectroscopy

Mass spectrometry: A method used to determine the
masses of atoms or molecules in which an electrical
charge is placed on the molecule and the resulting ions
are separated by their mass to charge

Tandem mass spectrometry

Multiple steps of mass spectrometry selection, with
some form of fragmentation occurring in between the

Immunofluorescence and immunocytochemistry,
ELISA, immunoblotting

Dry lab

A laboratory for making computer simulations or for
data analysis especially by computers (as in

called also dry laboratory

Gene prioritization

The results of experimental or computational analyses
in the post
genomic era (e.g., those from microarrays,
proteomics, ChIP
chip, genome
wide in silico
searches, genetic linkages, etc.) often consist of long
lists of candidate genes. There are methods that
provide score to the gene and rank them. This process
is known as gene prioritization.


PhenoGO is a multiorganism database that provides
phenotypic context, such as the cell type, disease, and
tissue and organ to existing associations between gene
products and Gene Ontology (GO) terms as specified
in the Gene Ontology Annotations (GOA).


One existing Natural Language Processing (NLP)
system, known as BioMedLEE, automatically extracts
biological information consisting of bio
substances and phenotypic data.


Medical Subject Heading

MeSH is the National Library of Medicine's controlled
vocabulary thesaurus. It consists of sets of terms
naming descriptors in a hierarchical structure that
permits searching at various levels of specificity.


Phenotype Organizer System, PhenOS is a system
under development by the Lussier research group with
purpose of bridging the gap between heterogeneous
biomedical terminologies.

Inparanoid algorithm

The protein interaction networks of two species are
aligned by assigning proteins to sequence homology
clusters using the Inparanoid algorithm


Prioritization of candidate genes using statistics

Reference: Turner FS, Clutterbuck DR, Semple CA.
POCUS: mining genomic sequence annotation to
predict disease genes. Genome Biol. 2003;4(11):R75.


Mendelian Inheritance in Man

The Online Mendelian Inheritance in Man. A catalog
of human genes and genetic disorders authored and
edited by Dr. Victor A. McKusick and his colleagues at
Johns Hopkins and elsewhere, and provided through
NCBI. The database contains information on disease
phenotypes and genes, including extensive
descriptions, gene names, inheritance patterns, map
locations and gene polymorphisms.


A web
based integrated approach for identification of
candidate disease genes, Transcriptomics of OMIM

Reference: Rossi S, Masotti D, Nardini C, Bonora E,
Romeo G, Macii E, Benini L, Volinia S. TOM: a web
based integrated approach for identification of
candidate disease genes. Nucleic Acids Res. 2006 Jul

Data mining

Data mining (sometimes called data or knowledge
discovery) is the process of analyzing data from
different perspectives and summarizing it into useful

Online Predicted Human
Interactions Database or OPHID

Designed to be both a resource for the laboratory
scientist to explore known and predicted protein
protein interactions, and to facilitate bioinformatics
initiatives exploring protein interaction networks.

Single nucleotide polymorphisms

A single nucleotide polymorphism (SNP, pronounced
snip), is a DNA sequence variation occurring when a
single nucleotide

A, T, C, or G

in the genome (or
other shared sequence) differs between members of a
species (or between paired chromosomes in an




Substitutions that result in amino acid replacements
are said to be

while substitutions that
do not cause an amino acid replacement (such as a
GGG to GGC change


still encode
) are said to be synonymous substitutions.
Because of the difference in their effects on the
physiology of the organism, synonymous and

substitutions can have quite different
dynamics. For example, synonymous substitutions
usually occur at a much faster rate than do

substitutions. Hence, for coding
sequence it is often desirable to separate these two.

Ka/Ks values

In genetics, the Ka/Ks ratio or dN/dS ratio is the ratio
of the rate of non
synonymous substitutions (Ka) to
the rate of synonymous substitutions (Ks), which can
be used as an indication of selection on a protein
coding gene.


db (Database) of Single nucleotide polymorphism

A public
domain archive for a broad collection of
Single Nucleotide Polymorphisms (SNPs) and is
hosted at the National Center for Biotechnology


OrthoDisease, a comprehensive database of model
organism genes that are orthologous to human disease

Orthodisease is constructed primarily using
Inparanoid analysis. Inparanoid is a program that
automatically detects orthologs (or groups of
orthologs) from 2 species


Biology of organisms living in their natural

Applications in Ecology and Evolutionary Biology


Epidemiology is the study of how often disease occur
in different groups of people and why

Planning and evaluating strategies to prevent illness

Guide to the management of patients in whom disease
is already developed

Reference: Epidemiology for the uninitiated by
Coggon, Rose and Barker

Population at risk

The population at risk is the group of people, healthy
or sick, who would be counted as cases if they had the
disease being studied

It defines the denominator for the calculation of rates
of incidences and prevalence

It is the number of persons potentially capable of
experiencing the event or outcome of interest

Floating numerator

Numerator floating without its denominator

Common error occurring in field investigations

The error occurs due to the number of cases not
relating to the “at risk” population

Epidemiological conclusions (on risk) cannot be
drawn from purely clinical data (on the number of sick
people seen)

Target population

It is the population about which the conclusions are to
be drawn

Sometimes measurement can be made on the full
target population else study samples are used

population and study sample

The group of individuals in a study

In a clinical trial, the participants make up the study

Study sample is chosen from study population


The study of the factors that predispose to or
precipitate the disease

External agent, a susceptible host, and an environment
that brings the host and agent together is a disease
etiology triad


Watching over a population and recording data likely
to have epidemiological significance, usually with the
aim of early detection of disease. Essentially an
interventionist exercise compared with monitoring,
which is passive.


Disease in populations exists as a continuum of
severity rather than as an all or none phenomenon

The real question in population studies is not “has the
person got the disease?” but “How much of the disease
has he or she got?”

Diagnostic continuum is dichotomized into “cases”
and “non
cases” on the basis of statistical, clinical,
prognostic or operational options

Hence case definition should be precise and

Epidemiological case definitions are narrower and
more rigid than clinical ones


It is the rate at which new cases occur in a population
during a specified period

(number of new cases) / (Population at risk) * (Time
during which cases were ascertained)


Point prevalence

The proportion of a population that are cases at a point
in time

Period prevalence

The proportion of a population that are cases at any
time within a stated period

risk and relative risk

Attributable risk is the disease rate in exposed persons
to that in people who are unexposed

Relative risk is the ratio of the disease rate in exposed
persons to that in people who are unexposed

Attributable risk = rate of disease in unexposed
persons * (relative risk



Causing confusion about causation due to 2 or more
variables associated with the disease

Confounding may give rise to spurious associations
when in fact there is no causal relation, or at other
extreme, it may obscure the effects of a true cause



is the deviation of inferences from the truth

Selection bias

is the biased selection of individuals
into the study

Information bias

is the biased collection or biased
analysis of the data

Motto of the epidemiologist could well be “dirty hands
but a clean mind” (manus sordidae, mens pura)


A measure of how likely it is that some event will occur

Random, unpredictable influences on events

The association between the exposure and disease is
considered to be “statistically significant” if the
probability that the test statistic < 0.05


The proportion of persons with the disease who are
correctly identified by defined criteria

The proportion of persons with the disease who are
correctly identified by a screening test

The ability of a system to detect epidemics and other
changes in disease occurrence

A sensitive test detects high proportion of the true


The proportion of persons without a disease who are
correctly identified by a test

The number of true negative results divided by the
total number of all those without the disease


Randomization is used to obtain a similar allocation of
individuals to each group, the groups are followed at
the same time

Purpose of randomization: To obtain unbiased
estimates of differences among treatment responses
(means or effects) and to obtain an unbiased estimate
of the random error variation in the experiment

Replication and Local

Replication is the repetition of an experiment in order
to test the validity of its conclusion

Local control is blocking or grouping to eliminate or to
control the various sources of variation (error)

Replication and local control are necessary to achieve a
reduction in the random variation among treatment
effects in the experiment

Observational (non

level unit of observation


Longitudinal measurements


Cohort samples


Case control samples


sectional measurements

Aggregate level units of observation (ecological

Reference: Epidemiology Kept Simple: An
Introduction to Traditional and Modern
Epidemiology; by B. Burt

level vs. Aggregate

Personal level study on smoking might collect
information on each person’s smoking habits, age and
disease status

Aggregate level of study on smoking might collect
information on each region’s per capita cigarette
consumption, age distribution and disease rate

Longitudinal studies

Longitudinal studies are studies in which the sequence
of events in individuals can be delineated over time

In cohort studies the incidence of disease in exposed
and non
exposed groups are compared

In case
control studies people with disease (cases) and
people without disease (controls) are sampled from
the source population and exposure histories of cases
and controls are compared

Longitudinal vs. Cross

Longitudinal measurements relates exposures and
diseases in individuals at various time references

sectional measurements are not definitively
time sequenced in individuals

In cross
sectional studies the analysis of data is
gathered from samples at one point in time. Since both
the outcome and the variables are measured at the one
time these studies are not strong at showing cause
effect relationships.

Experimental studies

In experimental studies, the investigator introduces or
removes an exposure in order to observe its influence
on a health outcome. Such allocations may be based
on chance mechanism (randomized trials) or on other
deliberate mechanisms built into the study’s protocol
randomized trials)

Other disease informatics lectures:

Supercourse: Epidemiology, the Internet and Global Health

Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011