How to Retrieve the Web Intelligence

cageyinfamousSecurity

Nov 5, 2013 (3 years and 9 months ago)

79 views

How to Retrieve the Web Intelligence

Juan Chamero
, CEO
Intag
, July 3
rd

2003





Darwin
-
FIRST starts with the red dot, retrieving a primal Knowledge Map from
the Web. On
ce retrieved a Thesaurus from
primal Knowledge
Map, begins an
exponential expansion process retrieving as much “Authorities” as possible via
similarities, as explained below
. See
A
dvanced Search

for colors meaning
.





The Web is a huge reservoir of “documents”. The Websites are like units of communication
between the “owners” of the site and their “users”, being users those Web navigators enabled to
“see” all or part of the host
ed content. The Web technology makes “Full Duplex” communication
magic possible. However in the big majority of cases this FD communication has not been
enabled yet.


Darwin
-
FIRST enables and enhances FD communication
, broadcasting messages to their
users
, informing and even “teaching” them, and

“learning” from users and even “nurturing” from
the whole Web as well. We are going to talk a little about this potential nurturing process.


To accomplish
Information Retrieval task
we assign
“procurement
agents

. Being the Web
space so huge and apparently so chaotic

it’s not an easy matter to define this complex task.
Our primal human intelligence of this apparent chaos tell us that documents could be
could be
reasonably
classified by
“Major Subjects” (discipline
s) of the Human Internet resources’ realms:
information, knowledge, and entertainment. But even split in Major Subjects the Web reservoirs
are extremely huge, noisy and fuzzy. So the problem of retrieving information efficiently still
remains.


Darwin
-
FIR
ST methodology uses consensual Major Subjects’ Logical Trees to identify reliable
content within the chaos. Why?. Because
consensual Logical Trees form a coupled feedback
unit with the discipline they intend to represent. That continuous coupling assures a
t large Web
popularity among many others influences. But Web popularity means “authorities”
,

presence at
Top of main Search Engines.


Then the strategy to guide procurebots (procurement agents) to the right target through chaos
simplifies significantly, i
n a first step oriented to map any collection of cognitive objects hosted in
the Web. They (procurebots) will perform their tasks gathering authorities candidates to be
mapped guided by Logical Trees adjusted by a set of parameters in order to emulate the
human
search skills. In this way through Darwin
-
FIRST we could
map any major Subject at a given
level of “redundancy”.
Redundancy stands for amount of average cognitive images retrieved
from the Web per branch/leaf of the Logical Tree
.


Once done this fir
st step we may capture the big part of the intelligence retrieved. How?.
Building a Thesaurus from this “reasonable good” sample of knowledge. Over a minimum
redundancy and for a given level of retrieving quality the Thesaurus worth will remain almost
cons
tant. Why?. Because similar documents share almost the same keywords set. Keywords
are like the intellectual bricks of documents. Knowing the keywords of a document
allow us to
infer pretty much about its content, being like the “spectral marks” of it. Onc
e known the
Thesaurus permit us to continue extending our intellectual reach of the Web!.


Now we could train our agents to perform substantially smarter tasks than before. We may
easily get similar documents to the ones we

have mapped in our primal map b
ecause they
share their intimate structural intelligence.
It is like to implement a virtual Expert System within
the Web and using the Search Engines as huge reservoirs to retrieve as much intelligence as
possible!. Actual Search Engines couldn’t do that.
Google for instance
deliver similar but limited
to the keyword queried. Even in the hypothetic case that they would decide to use its huge
Thesaurus to extract keywords from target document the set would be so large and detailed that
similar set would rend
er null. Remember that in Google and in many other Search Engines
almost every word is
considered a

potential keyword.