PhD Brain Project
Vertical Data Integration
The document has the intent to describe the project purpose and design, defining
activities, phases and
presenting the state of art on the field of the Vertical Integration of biological data.
Topics: domain problem, d
escription of data
what is missing,
needs (vertical integration)
The project is included in the gene
tic and biological field related to neuroscience
. In this research area
there have been done many efforts on discovering models and brain features and actually we have a basic
knowledge on the whole nervous system, especially of the brain.
The high complex
ity of the nervous
system is the biggest mountain to climb with traditional
as statistics and mathematics, but
it is also very hard for computer science methods such as learning theory.
In latest years there have been produced a huge amount
of biological data in different bioinformatics areas,
the omics fields. Big project and efforts have been promoted on the integration and the fusion of these
data, and recent
went to biobanks related projects whose objective is the integration of
of different heterogeneous/homogeneous sources widely distributed to improve statistical analysis
. What is actually
challenge is the integration of the biological
based on this integratio
we could use the biological data
, for example genotypes,
or discover other knowledge (disease genes for example).
This new kind of integration is exactly
perpendicular to the first one and for this reason is sometimes called “vertical integrati
on”. In other words
what is missing in the bioinformatics scenario is a comprehensive system for ontological integration,
starting from genes (or other measures) and combining gene’s networks, pathways, systems biology
models, and so forth.
Once this syste
m has been realized, researchers may insert biological data
system and analyze it
in a semantic perspective and thus infer new knowledge. This biological data can be
acquired and integrated also from different sources, such biobanks will provide.
DUBBIO SULLA PARTE SUPERIORE: citare le biobanche
non va bene
perché non è integrazione
semantica ma di dati, qui ci riferiamo a verticale/orizzontale riferita alla semantica (VARIFICA QUESTA
The available data are related to brain disea
ses; there are different kind of data: genotype information,
phenotype data and clinical records.
DESCRIVI QUI I DATI VELOCEMENTE
The objective of the project is to infer the phenotype of a person or a group of person from his genotype
xploiting the actual knowledge on the nervous system domain in all the bioinformatics areas, such as
genomics, proteomics, systems biology, pharmacology, etc. This purpose has to be based on ontological
perspective and allow dynamical incremental knowledge
based on statistical measures (alpha
error, confidence intervals and empirical risk).
The computer science domains related to the objective are
data mining, machine learning, data integration and fusion, data quality. Recent literature named o
integration and vertical integration such process.
The proposed solutions
Descrizione dei due approcci e delle diverse prospettive convergenti
Descrizione del punto di arrivo finale
Progettazione di massima con taglio
all’obiettivo di una call internazionale
Stesura delle fasi del progetto
Inquadramento delle fasi con la letteratura (tabella articoli