notes - USERLab

cowphysicistInternet and Web Development

Dec 4, 2013 (3 years and 11 months ago)

94 views

Variations in

Searching for Information


CMPT 455/826
-

Week 11, Day 2


1

Approximate Query Processing


Abstract
1


This article describes query processing in the DBO database system.



Like other database systems designed for ad hoc analytic processing, DBO is
able to compute the exact answers to queries over a large relational database in
a scalable fashion.



Unlike any other system designed for analytic processing, DBO can constantly
maintain a guess as to the final answer to an aggregate query throughout
execution, along with statistically meaningful bounds for the guess’s accuracy.



As DBO gathers more and more information, the guess gets more and more
accurate, until it is 100% accurate as the query is completed.



This allows users to stop the execution as soon as they are happy with the query
accuracy, and thus encourages exploratory data analysis.



1.

Scalable Approximate Query Processing with the DBO Engine by Chris Jermaine, Subramanian Arumugan, Abhijit
Pol, and Alin Dobra

Approximate Query Processing


Purpose:


To get fast intermediate results on queries that could take longer
than the extra precision is worth



Technique:


Uses random sampling rather than sequential processing to
keep accumulating more and more exact information



Comments:


The paper is very technical, but the concept is what is important
to consider

Inconsistent Databases


Abstract
2


Query answering from inconsistent databases


amounts to finding “meaningful” answers to queries posed over database
instances


that do not satisfy integrity constraints specified over their schema.



A declarative approach to this problem relies on


the notion of repair,


that is, a database that satisfies integrity constraints


and is obtained from the original inconsistent database


by “minimally” adding and/or deleting tuples.



2.

Repair Localization for Query Answering from Inconsistent Databases by Thomas Eiter, Michael
Fink, Gianluigi Greco, and Domenico Lembo Sapienza

Inconsistent Databases


Purpose:


A database may become inconsistent in many ways


This is particularly challenging in the context of data integration,


where a number of data sources, heterogeneous and widely
distributed, must be presented to the user as if they were a single
(virtual) centralized database, which is often equipped with a rich set of
constraints expressing important semantic properties of the application
at hand.


Since, in general, the integrated sources are autonomous, the data
resulting from the integration are likely to violate these constraints.



The standard approach through data cleaning


may be insufficient


even if only few inconsistencies are present in the data


Inconsistent Databases


Technique:


The notion of a repair for an inconsistent database


a repair is a new database which satisfies the constraints in the
schema and minimally differs from the original one.


The suitability of a possible repair depends on

»
the underlying semantics adopted for the inconsistent database,

»
and on the kinds of integrity constraints allowed on the schema.


multiple repairs might be possible


the standard way of answering a user query is


to compute the answers that are true in every possible repair


Comments:


Inconsistent Databases


Comments:


The major problem here is having inconsistent information in a
database.


A more important problem is the reason behind the inconsistency in
information throughout the database.


It is difficult to decide what form information should be represented in
when combining differing database schemes.


If this is not done carefully it is likely that the database will end up with
misleading or inconsistent data.


The query is checked against all the possible repairs to the database.


The answer is based on some evaluation between the repairs that are
available, but how likely is it that the query was answered in the desired
way?


Instead of doing extra work with rewriting queries as they are asked


why not use the information found out by these techniques to determine a
more permanent fix for the inconsistency of the data


If a consistent answer can be determined from an inconsistent database, then it
seems likely that the information could be made consistent in the database for
future queries.

Dynamic Spatial Queries


Abstract
3


Conventional spatial queries are usually meaningless in dynamic
environments


since their results may be invalidated


as soon as the query or data objects move.


In this paper we formulate two novel query types,


A time
-
parameterized query


A continuous query




3.

Spatial Queries in Dynamic Environments by Yufei Tao and Dimitris Papadias

Dynamic Spatial Queries


Purpose:


As opposed to traditional, “instantaneous”, queries


that are evaluated only once to return a single result,


continuous queries


may require constant evaluation and updates of the results


as the query conditions or database contents change


Dynamic Spatial Queries


Technique:


A time
-
parameterized query returns:


the objects that satisfy the corresponding spatial query at the time when the
query is issued


the expiry time of the result given the current motion of the query and
database objects


the change that causes the expiration of the result



A continuous query retrieves


tuples of the form <result, interval>,


where each result is accompanied by a future interval, during which it is
valid.



NOTE: A continuous query can be answered by repetitive execution of TP
queries until some termination clause is satisfied.

Dynamic Spatial Queries


Comments:


In addition to getting the correct result from the spatial queries,
should have addressed how a dynamic database could be
updated.


E.g. Dynamic environment such as automated car park involves
both vehicles moving in and out of the parking lot and the database
being updated on the number of available lots at a given time.



There are issues how expiry time is dealt with,


what happens when the entity changes direction or velocity, does
the expiry time remain valid?



Querying the Semantic Web


Abstract
4


The Resource Description Framework (RDF)


enables the creation and exchange of metadata as any other Web data.


There is a need for sufficiently expressive declarative query languages


for querying Web pages that make use of RDF


We propose RQL, a new query language


adapting the functionality of semistructured or XML query languages


to the peculiarities of RDF


but also extending this functionality


in order to uniformly query both RDF descriptions and schemas.




4.

Querying the Semantic Web with RQL by G. Karvounarakis, A. Magganaraki, S. Alexaki, V.
Christophides, D. Plexousakis, M. Scholl, and K. Tolle

Querying the Semantic Web


Purpose:


RQL adapts the functionality


of semistructured or XML query languages


to the peculiarities of RDF


but also extends this functionality in order


to uniformly query both RDF descriptions and schemas.



With RQL users are able to query resources


described according to their preferred schema,


while discovering how the same resources


are also described using another classification schema.

Querying the Semantic Web


Technique:


We introduce a formal data model and type system


for description bases created according to the RDF Model & Syntax
and Schema specifications



In order to support superimposed RDF descriptions,


the main modeling challenge is


to represent properties as self
-
existent individuals,


as well as to introduce a graph instantiation mechanism permitting
multiple classification of resources.

Querying the Semantic Web


Comments:


The typed system used for RQL is extremely useful


in that it is actually read from the RDF schema
-

the type system is
specific to the schema being used.


However all types fit into a finite list of types,


which contains literal types, resource types, class types, property
types and others.


The discussion on typing as it relates to RDF


would be useful in considering various other approaches to typing
for other means of modeling (ER or class diagrams).


In ER modeling this could be achieved


through choosing property names/attributes for a relationship and
including them in the diagram (and not just “is
-
a”).

Entity Search Engine


Abstract
5


The Web has become a rich collection of data
-
rich pages,


on the “surface Web” of static URLs


as well as the “deep Web” of database
-
backed contents



The richness of data,


while a promising opportunity,


has challenged us to effectively find data we need,


from one or multiple sources.



We are motivated by the need of


large scale on
-
the
-
fly integration for online structured data.


5.

Entity Search Engine: Towards Agile Best Effort Information Integration over the Web by Tao
Cheng and Kevin Chen
-
Chuan Chang

Entity Search Engine


Purpose:


How do we identify and integrate the structured data


embedded in unstructured result pages?

Entity Search Engine


Technique:


search engines search for pages by keywords.


such as Google, Yahoo, or MSN,


while being ”IR
-
style” with a scalable text processing framework,


they are not data aware.



Integration services exist online for specific domains.


such as Expedia.com or PriceGrabber.com


They provide “DB
-
style” precise querying,


but they can hardly scale the amount of data and the number of
sources on the Web.



We propose a solution


where the two extremes meet,


with a synergistic “marriage” in the middle.

Entity Search Engine


Comments:


There are still problems with sites that embed their data in
inaccessible formats that cannot be queried