notes - USERLab

cowphysicistInternet and Web Development

Dec 4, 2013 (3 years and 6 months ago)


Variations in

Searching for Information

CMPT 455/826

Week 11, Day 2


Approximate Query Processing


This article describes query processing in the DBO database system.

Like other database systems designed for ad hoc analytic processing, DBO is
able to compute the exact answers to queries over a large relational database in
a scalable fashion.

Unlike any other system designed for analytic processing, DBO can constantly
maintain a guess as to the final answer to an aggregate query throughout
execution, along with statistically meaningful bounds for the guess’s accuracy.

As DBO gathers more and more information, the guess gets more and more
accurate, until it is 100% accurate as the query is completed.

This allows users to stop the execution as soon as they are happy with the query
accuracy, and thus encourages exploratory data analysis.


Scalable Approximate Query Processing with the DBO Engine by Chris Jermaine, Subramanian Arumugan, Abhijit
Pol, and Alin Dobra

Approximate Query Processing


To get fast intermediate results on queries that could take longer
than the extra precision is worth


Uses random sampling rather than sequential processing to
keep accumulating more and more exact information


The paper is very technical, but the concept is what is important
to consider

Inconsistent Databases


Query answering from inconsistent databases

amounts to finding “meaningful” answers to queries posed over database

that do not satisfy integrity constraints specified over their schema.

A declarative approach to this problem relies on

the notion of repair,

that is, a database that satisfies integrity constraints

and is obtained from the original inconsistent database

by “minimally” adding and/or deleting tuples.


Repair Localization for Query Answering from Inconsistent Databases by Thomas Eiter, Michael
Fink, Gianluigi Greco, and Domenico Lembo Sapienza

Inconsistent Databases


A database may become inconsistent in many ways

This is particularly challenging in the context of data integration,

where a number of data sources, heterogeneous and widely
distributed, must be presented to the user as if they were a single
(virtual) centralized database, which is often equipped with a rich set of
constraints expressing important semantic properties of the application
at hand.

Since, in general, the integrated sources are autonomous, the data
resulting from the integration are likely to violate these constraints.

The standard approach through data cleaning

may be insufficient

even if only few inconsistencies are present in the data

Inconsistent Databases


The notion of a repair for an inconsistent database

a repair is a new database which satisfies the constraints in the
schema and minimally differs from the original one.

The suitability of a possible repair depends on

the underlying semantics adopted for the inconsistent database,

and on the kinds of integrity constraints allowed on the schema.

multiple repairs might be possible

the standard way of answering a user query is

to compute the answers that are true in every possible repair


Inconsistent Databases


The major problem here is having inconsistent information in a

A more important problem is the reason behind the inconsistency in
information throughout the database.

It is difficult to decide what form information should be represented in
when combining differing database schemes.

If this is not done carefully it is likely that the database will end up with
misleading or inconsistent data.

The query is checked against all the possible repairs to the database.

The answer is based on some evaluation between the repairs that are
available, but how likely is it that the query was answered in the desired

Instead of doing extra work with rewriting queries as they are asked

why not use the information found out by these techniques to determine a
more permanent fix for the inconsistency of the data

If a consistent answer can be determined from an inconsistent database, then it
seems likely that the information could be made consistent in the database for
future queries.

Dynamic Spatial Queries


Conventional spatial queries are usually meaningless in dynamic

since their results may be invalidated

as soon as the query or data objects move.

In this paper we formulate two novel query types,

A time
parameterized query

A continuous query


Spatial Queries in Dynamic Environments by Yufei Tao and Dimitris Papadias

Dynamic Spatial Queries


As opposed to traditional, “instantaneous”, queries

that are evaluated only once to return a single result,

continuous queries

may require constant evaluation and updates of the results

as the query conditions or database contents change

Dynamic Spatial Queries


A time
parameterized query returns:

the objects that satisfy the corresponding spatial query at the time when the
query is issued

the expiry time of the result given the current motion of the query and
database objects

the change that causes the expiration of the result

A continuous query retrieves

tuples of the form <result, interval>,

where each result is accompanied by a future interval, during which it is

NOTE: A continuous query can be answered by repetitive execution of TP
queries until some termination clause is satisfied.

Dynamic Spatial Queries


In addition to getting the correct result from the spatial queries,
should have addressed how a dynamic database could be

E.g. Dynamic environment such as automated car park involves
both vehicles moving in and out of the parking lot and the database
being updated on the number of available lots at a given time.

There are issues how expiry time is dealt with,

what happens when the entity changes direction or velocity, does
the expiry time remain valid?

Querying the Semantic Web


The Resource Description Framework (RDF)

enables the creation and exchange of metadata as any other Web data.

There is a need for sufficiently expressive declarative query languages

for querying Web pages that make use of RDF

We propose RQL, a new query language

adapting the functionality of semistructured or XML query languages

to the peculiarities of RDF

but also extending this functionality

in order to uniformly query both RDF descriptions and schemas.


Querying the Semantic Web with RQL by G. Karvounarakis, A. Magganaraki, S. Alexaki, V.
Christophides, D. Plexousakis, M. Scholl, and K. Tolle

Querying the Semantic Web


RQL adapts the functionality

of semistructured or XML query languages

to the peculiarities of RDF

but also extends this functionality in order

to uniformly query both RDF descriptions and schemas.

With RQL users are able to query resources

described according to their preferred schema,

while discovering how the same resources

are also described using another classification schema.

Querying the Semantic Web


We introduce a formal data model and type system

for description bases created according to the RDF Model & Syntax
and Schema specifications

In order to support superimposed RDF descriptions,

the main modeling challenge is

to represent properties as self
existent individuals,

as well as to introduce a graph instantiation mechanism permitting
multiple classification of resources.

Querying the Semantic Web


The typed system used for RQL is extremely useful

in that it is actually read from the RDF schema

the type system is
specific to the schema being used.

However all types fit into a finite list of types,

which contains literal types, resource types, class types, property
types and others.

The discussion on typing as it relates to RDF

would be useful in considering various other approaches to typing
for other means of modeling (ER or class diagrams).

In ER modeling this could be achieved

through choosing property names/attributes for a relationship and
including them in the diagram (and not just “is

Entity Search Engine


The Web has become a rich collection of data
rich pages,

on the “surface Web” of static URLs

as well as the “deep Web” of database
backed contents

The richness of data,

while a promising opportunity,

has challenged us to effectively find data we need,

from one or multiple sources.

We are motivated by the need of

large scale on
fly integration for online structured data.


Entity Search Engine: Towards Agile Best Effort Information Integration over the Web by Tao
Cheng and Kevin Chen
Chuan Chang

Entity Search Engine


How do we identify and integrate the structured data

embedded in unstructured result pages?

Entity Search Engine


search engines search for pages by keywords.

such as Google, Yahoo, or MSN,

while being ”IR
style” with a scalable text processing framework,

they are not data aware.

Integration services exist online for specific domains.

such as or

They provide “DB
style” precise querying,

but they can hardly scale the amount of data and the number of
sources on the Web.

We propose a solution

where the two extremes meet,

with a synergistic “marriage” in the middle.

Entity Search Engine


There are still problems with sites that embed their data in
inaccessible formats that cannot be queried