Knowledge Discovery in Databases and Information Retrieval

fantasicgilamonsterΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

133 εμφανίσεις

Knowledge Discovery in Databases and Information Retrieval


Knowledge Management Systems

Anne Marie Donovan

April 22, 2003

Knowledge Management Systems, LIS 385T

The University of Texas at Austin

School of Information

KDD and IR


The pro
cesses of Knowledge Discovery in Databases (KDD) and Information
Retrieval (IR) appear deceptively simple when viewed from the perspective of
terminological definition. Fayyad, Piatetsky
Shapiro, and Smith (1996) define KDD as
"the nontrivial process of i
dentifying valid, novel, potentially useful, and ultimately
understandable patterns in data" (p. 30). The closely related process of IR is defined by
Rocha (2001) as "the methods and processes for searching relevant information out of
information systems t
hat contain extremely large numbers of documents" (1.1). In
execution, however, these processes are not simple at all, especially when executed to
satisfy specific personal or organizational Knowledge Management (KM) requirements
or as the core functional
ity of Knowledge Management Systems (KMS).

The potential validity or usefulness of an individual data element or pattern of
data elements may change dramatically from individual to individual, organization to
organization, or task to task. Relevance is a h
ighly contextual and personal data
characteristic, changing even as the IR process is underway and information requirements
are incrementally met. Making retrieved data or a description of data patterns generally
understandable is also highly problematic.

Data that may appear relevant and easily
understandable in one retrieval context may be completely unintelligible in another, even
to the same audience. KDD and IR are, in fact, highly complex processes that are
strongly affected by a wide range of facto
rs. These factors include the needs and
information seeking characteristics of system users as well as the tools and methods used
to search and retrieve, the structure and size of the data set or database, and the nature of
the data itself.

KDD and IR

KDD and IR
: An Historical Perspective


Information professionals often describe the KDD and IR processes in the context
of specific types of Database Management Systems (DBMS). Devarakonda (2001)
divides DBMS into four types: simple data without query, simpl
e data with query,
complex data without query, and complex data with query. An example of the first type,
simple data without query, is a filing system, including files that may exist only in paper
form. The second, third and fourth types are exemplified
by Relational DBMS
(RDBMS), Object
Oriented DBMS (OODBMS), and Object
Relational DBMS
(ORDBMS), respectively (Devarakonda, 2001, ORDBMS). The type of database that is
queried significantly affects the processes of knowledge discovery (KD) and IR.


an RDBMS of some type forms the core of almost all KMS,
improvement of RDBMS functionality for KD and IR has been a crucial part of KMS
refinement for the past three decades. The relatively recent introduction of OODBMS to
KMS has created many new KD and

IR problem sets for researchers. These challenges
have been met, thus far, primarily through the introduction of certain features of RDBMS
to OODBMS. The result has been the development of a small group of ORDBMS that
combine the best KD and IR features


Information professionals familiar with traditional filing systems are acutely
aware of the limitations imposed on KD and IR by their pre
set filing structure. Although
technically a database, this type of DBMS does not len
d itself to automated searching, but
KDD and IR

only to browsing or search by pre
designated subject categories and file descriptions
(e.g., library card catalogs). The difficulties presented for KD and IR by simple filing
structures were initially replicated in com
supported file structures and were only
alleviated with the introduction of t
he Relational Database Model (RDM), by E. F. Codd
in 1970 (Devarakonda, 2002, RDBMS).

Introduction of the RDM resulted in rapid adoption of RDMS for information
on and control across a broad range of commercial and social organizations as
well as the development of increasingly effective data collection and storage
technologies. RDBMS permitted much more flexibility in data organization and retrieval
than traditi
onal data filing systems, but traditional

IR methods did not permit flexibility in
the characterization of user needs or the delineation of search parameters (Rocha, 2001,
1.2). The result,
of course, was increasing numbers of organizations that possesse
d very
large and continually growing databases but only rudimentary tools for KD and IR. Two
areas of research focus in information management developed in response to this
problem: data warehousing and data mining.

Data warehousing, defined by Fayyad et

al. as "collecting and 'cleaning'
transactional data to make it available for online analysis and decision support" (2001, p.
30), focuses on the methodical collection and pre
processing of data for specific
analytical uses. The data is subject

stamped, and integrated to permit
interactive analysis in support of decision
making processes. A data warehouse normally
integrates data from a variety of sources, "thus enriching the data and broadening the
context and value of the information" (R
auber et al., 2002, Data Warehousing and

KDD and IR

Data mining, defined as "the application of specific algorithms to a data set for the
purpose of extracting data patterns" (p. 28), focuses on improving the utility of large data
sets as well as IR response.

Data mining, in particular the algorithms used in data
mining, has received a lion's share of attention in the development of Decision Support
Systems (DSS) and RDMS research because results are often immediately applicable in
payoff decision

industries such as insurance, sales, and financial and
medical services.

Inspirations and Intentions for the Technology

Rocha describes the ultimate goal of IR as the production or recommendation of
relevant information to users (2001, 1.2). We can ascr
ibe the same motivation to the
development of KDD systems and methods in general, particularly in regards to the
refinement of DBMS. Research in data collection, storage, and retrieval has focused on
issues specifically related to the improvement of KD an
d IR functionality. Among the
topics given special attention have been data translation, change detection, integration,
duplication, summarization, aggregation, and timeliness (Widom, 1995).

Research has also focused on the need to improve automation in K
D and IR,
especially in the areas of data selection and pre
processing, data transformation, and data
interpretation and evaluation (Fayyad et al., 1996, p. 28). However, increased automation
in KD and IR requires increased attention to the methods used f
or data collection and
storage as well as the statistical foundations of the search and retrieval processes (p. 29).
Despite this complication, however, it is clear that manual analysis of billions of records
and hundreds of fields is impractical and that

automated data handling will be even more
KDD and IR

in demand as requirements for on
fly analysis and more flexible presentation of
search results increase (p. 28).

KDD and IR: Application to KMS

Technological Systems and Processes

Interface, interaction, an
d ubiquity.
The relationship of KDD and IR to KMS is
intimate: all KMS rely in some form on the aggregation of data for search and retrieval.
Historically, improvements in the utility of KMS have depended in large part on
improvements in KDD and IR functi
onality. Fayyad et al. describe KDD as "the overall
process of knowledge discovery from data, including how the data is stored and accessed,
how algorithms can be scaled to massive data sets and still run efficiently, how results
can be interpreted and vi
sualized, and how the overall human
machine interaction can be
modeled and supported” (Fayyad et al., 1996, p. 29). This comprehensive list of KDD
processes, which encompasses IR, also serves to describe the core functionality of most
KMS (pp. 30
31). Res
earch issues that have arisen in the development of DBMS and the
study of KDD are also closely related to the development and deployment of KMS.
Among these are: data collection and pre
processing; continually increasing volumes of
data; increasingly comp
lex forms of data; identifying and extracting useful knowledge
from extremely large repositories; means for identifying knowledge of value about as
well as in the data set; extracting knowledge from data and presenting that knowledge in
usable forms (pp. 3

The development of highly specialized DBMS for data warehousing and the
continual refinement of data mining methods and technologies have been motivated in
KDD and IR

large part by the deployment of KMS throughout industry. Many KMS are simply
elaborated RDBM
S integrated with IR and communication systems. More sophisticated
KMS may also add collaborative work tools. Decisions related to data mining, including
model functions, model representation, and preference criterion are an elemental part of
KMS develop
ment and deployment (pp. 31
32). Data mining tasks (c
forecasting, clustering, description, deviation detection, link analysis, and visualization
Shapiro, 1998, Slide 17) and search algorithms are fundamentally affected by
the foc
us and purpose of an organization's KMS.

System architecture.
The characteristics of the underlying DBMS
determine the architecture of KD and IR systems.
RDBMS are composed of many
relations in the form of two
dimensional tables of rows and columns con
taining related
. The rows (tuples) are called records, and the columns (fields in the record) are
called attributes. Each column is accorded a specific data type. The type of data stored in
an RDBMS has traditionally been constrained to ensure tha
t there are no ambiguous
tuples in the database (Devarakonda , 2002, RDBMS) although in
the case of very
complex data types, for example scientific data, programmers have overcome the
constraints of the DBM by employing Binary Large Objects (BLOBs) to stor
e data in a
database. This "solution" creates its own set of problems, however. BLOBs are usually
much larger than a single block of storage in a database, a characteristic that undermines
the efficiency of the database. As well, because of their size,
and because BLOBs in a
single database may contain a variety of data types and compound data, the data content
of the BLOB is not visible to the database. The opacity of data content means that a user
KDD and IR

cannot perform a high
level search across the BLOBs in

a database (Wallace, Benschop,
and Köhntopp, 1999).

RDBMS use Structured Query Language (SQL) for data definition, modification,
querying and constraint specification. Queries can range from simple single
table queries
to complicated multi
table queries
. A commonly used RDBMS is Microsoft Access, but
the existence of a standard query language allows data to be migrated easily from one
RDBMS to another (
Devarakonda , 2002,
RDBMS). Although the structure of RDBMS
renders them incapable of handling complex

data types such as spatial data, images, or
number arrays without the use of BLOBs, it does permit rapid data access and large
storage capacities.

The data management limitations of RDBMS led to the development of
OODBMS. In OODBMS, internal data str
ucture is hidden so that external operations can
be performed on the data as an Abstract Data Type (ADT). RDBMS and OODBMS are
fundamentally different in the way they handle data relationships; OODBMS represent
relationships explicitly, which improves dat
a access performance. Nonetheless,
OODBMS are plagued by poor query performance and problems of database scalability
(Devarakonda, 2002, OODBMS).

ORDBMS, a relatively recent innovation, are designed to incorporate the best
features of both RDBMS and OODB
MS. Data is stored in tables, but some entries may
have richer data structure; as in OODBMS, these entries are called ADTs. Because the
data is stored in rows and columns, the ORDBMS maintains a relational data model,
although it must be heavily modified

to support object
oriented programming. In
essence, the object
relational model adds a new object
oriented layer to support rich data
KDD and IR

types on top of the relational database model. ORDBMS support query and handle data
objects; the can also be built on
a massive scale. These features make ORDBMS
particularly useful for the development of KMS for handling complex data types.

System configuration and deployment
. A primary concern for many organizations
during the configuration and deployment of KD and IR

systems has been the creation of
data and query context. Some efforts to create context have been retrospective. Lee and
Hwang (2002) describe the process of extracting and visualizing semantic metadata from
databases. This process, called relational da
tabase reverse engineering (RDRE), “extracts
a conceptual model from an existing relational database by analyzing data instances as
well as metadata” (Lee and Hwang, 2002, Conclusion). RDRE has been especially useful
in creating shared "conceptual schema
" for multiple databases (Introduction). A
conceptual schema describes the database in terms of data items and relationships
between data items in a form "suitable for human presentation" (Introduction) thereby
enhancing KD and IR. The ability to discove
r and describe data relationships within and
between databases allows organizations to profile and map information in their data
warehouses in ways that were previously unimaginable. Mapping and profiling of data
not only creates discovery and retrieval c
ontext to enhance data reuse, it can also reveal
entirely new uses possibilities. A well
defined database reengineering project enables an
organization to integrate the masses of transactional data that lies in its data warehouse
with information collecte
d from other enterprise systems or from outside the company.

Another common method for creating data and query context for enterprise data
warehouses is the establishment of mechanisms for creating context during data creation
and collection or during
query construction. Many personal KMS provide robust
KDD and IR

mechanisms for data contextualization through the addition of metadata or by data
structuring. KMS such as

(Xiong and Donath, 1999) extract social
context for data during the processes of
data collection and data exchange. Extending IR
throughout the social network of an organization, as is done by Answer Garden
(Ackerman, 1994,
Ackerman, and Malone, 1990,

& Ackerman and MacDonald, 1996) is
another method for providing query context for KD

and IR.

Technology transition in organizations.
Institutions that have pioneered the use of
KDD and IR, especially in the form of data mining, have traditionally been those that rely
heavily on knowledge
based decisions for their success. Because their
operations have
historically relied heavily on data collection, these organizations normally have a large
quantity of accessible, relevant, historical and current data. They also anticipate a high
payoff for making rapid, correct decisions based on their
collected data and they actively
seek a technological advantage in knowledge management. Financial institutions such as
banking and investment firms, healthcare and insurance organizations, and businesses
that rely heavily on marketing and customer relatio
ns are emblematic of sectors that have
aggressively pursued technological innovations in KD and IR (Piatetsky
Shapiro, 1998,
Slides 28

The development of Decision Support Systems (DSS) based on electronic data
processing (EDP) was an early application

of database technology to KM in large
enterprises. In many cases, however, technological strides in data collection (hardware
and software) rapidly outpaced the enterprises' ability to understand and manage the data
that was being collected and stored.
Information was often plentiful without being
KDD and IR

relevant and extensive data warehouses often proved inadequate for applied decision
making (Bass, 1983, p. 189).

Another difficulty faced by organizations that relied on large data bases for
decision support wa
s the danger that decisions would be made based on data that was
poorly contextualized or poorly understood. Managers faced with a complex decision
process might misinterpret the applicability of a data set to the problem or fail to
investigate the existe
nce of contradictory data (Calvert, 1993, p. 91). The less contextual
the data, the more easily it may be misinterpreted or misapplied.

Organizational Systems and Processes

The introduction of automated KDD and IR changed the fundamental nature of
edge work, organizational architectures, management practices, and
communication flows in organizations. The introduction of Web
served data collection,
query and delivery has also significantly affected these systems. In particular, the
expansive applic
ation of KDD and IR technologies and techniques to information
management for distributed or "flattened" organizations has resulted in KM becoming a
ubiquitous "industrial" product in many business sectors.

Two aspects of knowledge work profoundly affected

by the pervasive use of
KDD and IR technologies have been knowledge creation and communication in the
context of collaboration. The enhancement of collaborative possibilities in knowledge
work created by distributed KDD and IR has had significant social
affects in
organizations and among individuals. The problem of creating shared context for data
collection, retrieval, and delivery in distributed DBMS has already been mentioned.
KDD and IR

Equally difficult are the incitement of collaboration and the creation of n
etworks of trust
among the dispersed users of distributed DBMS.

The creation of massive, increasingly powerful DBMS and more effective KDD
and IR technologies and techniques has also raised many complex social issues outside
business processes. One signi
ficant social concern is the increasingly pervasive
collection of detailed individual data that enabled by sophisticated DBMS. Many
individuals enjoy the convenience offered by the maintenance of personal information in
commercial databases, but are unawa
re of the privacy implications inherent in the
services these databases enable. Many individuals are faced with a daily choice:
convenience and service or security and privacy?

KDD and IR: Looking to the Future

KDD and IR research problems

The demands of c
ommercial KM markets drive the lifecycle of KD and IR
systems. The creation of highly dimensional, massive data sets and the increasing
sophistication of users and complexity of database uses have directed KDD research in
specific directions. High priorit
y research topics include: problems of statistical
significance and missing data; the understandability of data patterns; the management of
changing data and data integration; and the manipulation of non
standard, multi
and object oriented data (Fay
yad, Piatetsky
Shapiro, & Smyth, 1996, pp. 33

Research and development in IR is equally market driven. In 1995, Croft
published a "top ten" list of IR research issues based on his experiences in the area of
industrial and government research priori
ties as a member of the National Science
Foundation (NSF) Center for Intelligent Information Retrieval (CIIR) (¶ 3). These
KDD and IR

research priorities, derived from surveys of companies that use and sell IR systems, still
resonate today:

1. Integrated solutions
(standardized architectures and common platforms; the
integration of database management and IR systems with multimedia capabilities)

2. Distributed IR (retrieval systems that can work in distributed, wide
network environments)

3. Efficient, flexible

indexing and retrieval (including ability to handle a wide
variety of data formats)

4. Automatic query expansion (To overcome vocabulary mismatch between users
and databases

5. Interfaces and browsing (Interfaces that support a range of functions includi
query formulation, presentation of retrieved information, feedback, and browsing
in a conceptually simple way)

6. Routing and filtering (many companies considered data routing to be the main
function required for a text
based DBMS, with IR being a secon
dary function)

7. Effective retrieval (companies are particularly interested in techniques that
produce significant improvements in precision but still avoid occasional major
retrieval mistakes)

8. Multimedia retrieval (techniques for accessing image, vide
o and sound
databases without text descriptions)

9. Information extraction (techniques to identify database entities, attributes and
relationships in full text)

KDD and IR

10. Relevance feedback (improved algorithms and models for automatic relevance
feedback) (Crof
t, 1995)

New developments

KD and IR problems for Web resources.

The rapid growth of the Web and
increasing reliance on the Web for the collection and delivery of data for KM has created
new problems in KD and IR as well as bringing some older problems to
the fore. Among
the problems are: standardization of data collection and pre
processing; huge volumes
of continually changing data; complex, streaming, and multi
media data; identifying and
extracting useful knowledge from Web resources; a lack of consi
stent data models and
context; a lack of available descriptive information; the problem of presenting knowledge
in usable forms; and the rapid development of more time
sensitive, multi
applications for Web resources.

Many of these problems reflect
the inadequacy of current methods for Web
resource KD and IR. Data collection is presently performed primarily by automated Web
crawlers. Pre
processing consists of link
based ranking or human indexing and
categorization. The identification and extractio
n of useful knowledge from Web
resources is dependent on highly inefficient keyword searches on natural language text or
on imprecise topical directories or topical Web sites. Retrieved knowledge can be
viewed only in its native format (with a plugin) or
sometimes only as derived HTML.

A variety of research and development projects are underway to enable more
efficient, automated KD and IR for Web resources. Among the best known efforts are
those that seek to apply semantic markup to Web resources to enab
le machine
understanding and processing and inference analysis. Related projects seek to develop
KDD and IR

intelligent search engines and agents to exploit the semantic statements created by this
markup, while still others are creating ontologies to provide context

for these search
engines and agents (Shah et. al., 2002)

Other researchers are examining improved methods for automated data and
context collection (data pre
processing), the provision of value
added services such as
query routing, the development of in
tegrated query and knowledge delivery systems, and
the establishment of social accounting metrics to provide context for humans (Smith,
2002, p. 52). Another major area of research focuses on leveraging historical information
about individual and group We
b browsing experience and patterns to enable more
efficient KD and IR (Chakrabarti et al., 1998, Abstract). Rauber et al. (2002) provide an
evocative description of the potential for enhanced KD and IR that is as yet untapped,
"With [such] a repository
of Web data, as well as the metadata associated with the
documents and domains, we have a powerful source of information that goes beyond the
content of Web pages …. in order for the most useful analyses to yield answers to project
questions and issues, a
different perspective of the Web and Web archives is needed, a
perspective focusing not solely on content, but on the wealth of information
automatically associated with each object on the Web" (Introduction). Capturing an
understanding of how other indiv
iduals have discovered, retrieved, and used Web content
provides invaluable context for users who are accessing that content for the first time.

Integration with Other Technologies

Enhanced presentation for the Web.

The need for better integration of K
DD and
IR systems with delivery and presentation technologies has already been mentioned and it
KDD and IR

is a need that cannot be overstated. This is particularly true in the case of information
presentation on the Web. Considerable research is underway in the ar
ea of reformatting
data for discovery and presentation through Web
enabled devices. Another area of
research focus is differentiated service for different devices that would enable variable
visualization of retrieved information depending on a user's need
s and device
characteristics. Researchers in the field of adaptive graphics, "a unifying framework that
allows visual representations of information to be customized and mixed together into
new ones” have proposed content pre
viewing, interactive content,

selective presentation,
and customized views of Web
served content (Boier
Martin, 2003, pp. 6
9) as areas ripe
for progressive research. Many of these researchers refer to the work of Turner Whitted
who in 1998 suggested the use of computer displays as "
wallpaper" for interactive
information exchange to enable pervasive collaboration and information retrieval (1999,
p. 6).

KDD and IR for pervasive computing.

Achieving what Cherniack, Franklin, and
Zdonik term “ubiquitous data access” (2001, slide 7) pres
ents several unique challenges
in system integration. Many of these challenges reflect data management problems.
Among these are: the resolution of context
dependent data (e.g., push/data pull delivery
issues); synchronization of data from multiple, distr
ibuted sensors and collectors; the
efficient renewal of data streams; effecting profile
driven data management; dealing with
location aware, mobile devices; and the enabling of service mobility and service
discovery (slides 8

KDD and IR

The next generation

search trends and priorities suggest a number of substantial advances in next
generation KDD and IR systems. We can expect them to enable the solving of business
problems, not data analysis problems. They will embed knowledge discovery engines
and integr
ate access to enterprise and external data on the back
end. Moreover, most
importantly, they will integrate the knowledge discovery process with knowledge
delivery tools (Piatetsky
Shapiro, 1998, Slide 7). We can also expect next generation
KDD and IR sys
tems to manage information retrieval contextually, allow contextual
query/continuous query, enable KD in virtual networks of peer
peer databases, and
interpolate or extrapolate for missing data (Cherniack et. al., 2001, slides 115

To enable mobil
e and pervasive computing applications, future KDD and IR
systems will also have to be able to characterize information resources, recognize
individual users, provide variable means to exchange knowledge between users and
information sources (push and pull

of information), adapt to the user community, and
enable the reuse and recombination of information as well as its exchange (Rocha, 2001,
1.2). The most fundamental and difficult of these challenges will be information

Conclusion: On th
e Bleeding Edge

One might reasonably ask if the KDD and IR systems described above fall in the
realm of science or science fiction. The answer is, assuredly, in the realm of science,
although science fiction has often been influential in application devel
opment. This
answer is supported by a brief examination of the KDD and IR research being funded by
the Defense Advanced Research Projects Agency (DARPA) (the folks who brought us
KDD and IR

the Internet) under the auspices of the federal Total Information Awareness
Program. This research covers substantially new database technologies, architectures,
population techniques, search algorithms, and data models.

One funded project, Genisys, has the goal of producing technology to enable
large, all
source in
formation repositories (DARPA, 2003b, Program Strategy).
Unlike RDBMS in use today, Genisys
developed DBMS will require no prior data
modeling; support automated restructuring and projection of data; store data in context of
time and space; and develop a
large, distributed system architecture for managing a huge
volume of raw data input, analysis results, and feedback (DARPA, 2003a, TIA System:
Program Strategy). Programs such as Genisys are building aggressively on a foundation
of 30 years of research in
KDD and IR technology and techniques. Although these
initiatives raise new social as well as technical problems, they also suggest the possibility
of substantially new applications for these technologies.

The difficulties of contextualizing and interpreti
ng data for KM have increased
fold in the past decade. New technologies for data collection and storage have led
to ever
larger data warehouses containing hugely complex data types

a development
that has greatly complicated data discovery, retrieva
l, visualization, and sharing within
organizations. A growing need to incorporate increasingly disparate data sources from
outside the organization has transformed enterprise KM from a cluster of internal
management problems into a problem set that also e
ncompasses an organization's
relationships with clients and competitors, as well as its ability to participate in lucrative
cooperative ventures. Enterprises now seek to use information technology to support not
just individual problem solving, but entire

decision making processes.

KDD and IR

KD and IR have become tools that not only enhance human decision
making but
that also compensate for inherent weaknesses in human decision making processes. The
result has been the development of powerful new EDP application
s in knowledge
discovery, KM, and enterprise decision making, especially in the areas of collaborative
ventures, market forecasting, the management of customer relations, and fraud or crime
detection. If these technologies are to progress even further, ho
wever, researchers must
deal with the essential task of describing (characterizing) our growing wealth of
information resources (online and offline). Only when we are able to visualize
meaningfully the vast extent of our available information resources wi
ll we be able to
develop new approaches to KD and IR. The fundamental problems in KM today relate to
our inability to find and understand the information we already possess, not to an
inability to collect and manipulate new data. It is in the development

of better KD and IR
tools that the future of KM and KMS lie.

KDD and IR


Ackerman, M. S. (1998, July). Augmenting the organizational memory: A field study of
Answer Garden.
ACM Transactions on Information Systems
(3), 203
Retrieved March 28, 20
03 from

Ackerman, M. S., & Malone, T. W. (1990, April). Answer Garden: A tool for growing
organizational memory.
ACM SIGOIS Bulletin, 11
3), 31
39. Retri
eved March
28, 2003 from

Ackerman, M. S., & McDonald, D. W. (1996). Answer Garden 2: Merging organizational
memory with collaborative help.
Proceedings of the ACM Conference on
Supported Cooper
ative Work 1996

(CSCW96 Boston, MA).

March 28, 2003 from

Bass, B. M. (1983). Organizational decision making. In L. L. Cummins, E. Kirby
Warren, & J. F. Mee (Eds.),
The Irwin serie
s in management and the behavioral
Homewood, IL: Richard D. Irwin.

Martin, I. M.. (2003, January/February). Adaptive graphics. In T. Rhyne (Ed.)
Visualization Viewpoints,

IEEE Computer Graphics and Application, 23
Retrieved April

5, 2003 from

Calvert, G. (1993).
Highwire management: Risk
taking tactics for leaders, innovators,

and trailblazers
. San Francisco, CA: Jossey
Bass Publishers.

Chakrabarti, S., Srivastava, S., Subramanyam, M., & Tiware, M. (1998). Using Memex
to archive and mine community Web browsing experience. A paper presented at
the 9th International World Wide W
eb Conference, Amsterdam, May 15
2000. Retrieved April 12, 2003 from

Croft, W. B. (1995, November). What do people want from information retrieval?: The
top 10 resea
rch issues for companies that use and sell IR systems.

. Retrieved April 5, 2003 from

A. (2003a).
. Retrieved from the DARPA Information Awareness Office
Web site at:

DARPA. (2003b).
Total Information Awareness System
. Retrieved from the DARPA
nformation Awareness Office Web site at:

KDD and IR

Devarakonda, R. (2001, March). Object
relational database systems

The road ahead.
ACM Crossroads Student Magazine
. Re
trieved April 12, 2003 from

Fayyad, U., Piatetsky
Shapiro, G., & Smyth, P. (1996, November). The KDD process for
extracting useful knowledge from
volumes of data.
Communications of the ACM,
(11), 27
34. Retrieved March 03, 2003 from

Lee, D., & Hwang, Y. (2002, March 1). Extracting semantic metadata and its
ACM Crossroads Student Magazine
. Retrieved March 27, 2003

Shapiro, G. (1998, December 4).

Data mining and knowledge discovery tools:
The next generation
. Retrieved February 27, 2003 from at

Rauber, A., Aschenbrenner, A., Witvoet, O., Bruckner, R. M., & Kaiser, M. (2002,
December). Uncovering information hidden in Web archives: A glimpse at Web
analysis building on data warehouses.
Lib Magazine, 8
. Retrieved March
28, 2003 from

Rocha, L. M. (2001). TalkMine: A soft computing approach to adaptive knowledge
[Electronic version]. In V. Loia & S. Sessa (Eds
.), Studies in
fuzziness and soft computing: Vol. 75. Soft computing agents: New trends for
designing autonomous systems
. (pp. 89
116). New York: Springer. Retrieved
March 28, 2003 from

Shah, U., Finin, T., Joshi, A., Cost, R. S., & Mayfield, J. (2002, November). Information
retrieval on the Semantic Web. Paper presented at The ACM Conference on
ion and Knowledge Management , November 2002. Retrieved March
28, 2003 from

Smith, M. (2002). Tools for navigating large soci
al cyberspaces.
Communications of the
ACM, 45
(4), 51
55. Retrieved March 28, 2003 from

Wallace, N., Benschop, O., & Köhntopp, K. (1999). What is a BLOB?
Retrieved May 1, 2003 from

Whitted, T. (1999, July/August). Draw on the Wall.
IEEE Computer Graphics and
Applications, 19
(4), 6
9. Retrieved April 8, 2003 from ieeeexp at:
KDD and IR

Widom, J. (1995, November). Research problems in data warehousing.
Proceedings of
the 4th International Conference on Information and Knowledge Management

Retrieved March 28, 2003 from

Xiong, R., & Donath, J. (1999). PeopleGarden: Creating data portraits for users
Letters, 1
, 37
44. Retrieved April 8, 2003 from