WEBHOUSEx

hipshorseheadsΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

91 εμφανίσεις


MEMBERS:


Venigandla, Prajeeth

Gupta, Rajan

Meka, Manohar

Obillaneni, Sita Rama Swamy

Makol, Saurabh


PROJECT OBJECTIVE


In this project, data sources are Web tables. Here Web
tables are defined to be ones that are enclosed in the
table tag of HTML pages. Each table might be
regarded as a data source. We will need to warehouse
the data from a set of such data sources.


INTRODUCTION


What are Web
-
Tables?




Web tables are database tables that store structured
html data that is enclosed in the <Table> tag that is
already surfaced and crawlable.

What are Data Sources



Data sources is a data structure that contain
information about specific data base.

What is Data Warehouse



Data warehouse is a collection of data from variety of
sources, that is organized to provide useful guidance to
the organization’s decision makers.

Introduction to the Project




In our project we are considering web tables as data
sources and these tables are enclosed in <table> tag of
html pages . We will apply some algorithm to extract
the desired data from these sources and display it on
the web page.


CORE TECHNOLOGIES


Eclipse as an IDE with Java 1.4 compatibility



Oracle as Database server.



Apache Tomcat server 6.0.



DIFFERENT PHASES OF


THE PROJECT(1)


Extract Transform Load (ETL):


Major Operations include


1)Extraction.


2)Transformation.


3)Loading.



How is it used in our Project.



DIFFERENT PHASES OF


THE PROJECT(2)


OLAP:


Full Form: Online Analytical Processing.


Operation: An approach to quickly answer multi
-
dimensional analytical queries.


OLAP Cube and Schema’s.




How is it used in our Project ?


DIFFERENT PHASES OF


THE PROJECT(3)


Data Mining:


Operation: The process of extracting hidden patterns
from data.


Different Algorithms in Data Mining:


1)Classification.


2)Clustering.


3)Regression.


4)Association Rule Learning.


DIFFERENT PHASES OF


THE PROJECT(4)


Cont…



Algorithm Chosen and its Advantages and Dis
-
Advantages.



How is it used in our Project.



PROJECT ARCHITECTURE



External Sources

Data Sources

Extraction of Raw
Data

Transforming of
Raw Data

Loading Data in
Warehouse

Storing Data in Data
Management System

Implementation of
Data Mining
Algorithm

Building the result


ROLES & RESPONSIBILITES


Each Member is responsible to work on one Data
Source.



Each member is responsible to code his own JSP page
in order to retrieve the information from his data
source.



Testing and Documentation
-
TBD.


REFERENCES



Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang,
Eugene Wu, Yang Zhang:
Web Tables: exploring the
power of tables on the web.
PVLDB 1(1): 538
-
549 (2008)


Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy
Zhe Wang, Eugene Wu:
Uncovering the Relational Web
.
WebDB 2008


Yalin Wang and Jianying Hu.
A machine learning based
approach for table detection on the web.
In WWW
-
2002.



THANK YOU.