The Semantic Web
How NLP can resolve
The Chicken & Egg Issue
By Alessandro Marcias
Student No. 2523889
Natural Language Processing
This research will explore the meanin
g, main component and vision behind the Semantic Web.
The report will analyze the structure underpinning the Semantic Web and we will look to the
objects and technologies that make it possible. We will look at what has been done, what
challenges we are fac
ing to achieve the full development of such a “dream”, and what tools and
techniques can help to facilitate and accelerate its fulfilment.
We can only express our intents properly with our natural language, keyword search engine
cannot catch exactly what w
e really want them to fetch.
NLP techniques are thought to be a field of AI that can help the development of the Semantic
Web. We will also look at some applications that implement NLP techniques and try to solve the
problem facing the fulfilment of the Se
mantic Web’s vision.
Question for Discussion
The Semantic Web: an Overview
(Definition and general views of the Semantic Web)
Why is the Semantic Web so difficult to implement)
The Chinese Room Argument
The Chinese Room Experiment
Natural Language Processing
The Chatbot Approach
The Child Approach
The Adult Approach
The Chicken &
(Overview of some applications that tackle some of the Semantic Web issues)
(A natural language research company)
(An intelligent discovery engine)
(Links to more interesting website)
In 2001 Tim Berners
Lee, James Hendler and Ora Lassila
published an article about the Semantic
In this article were described how two people, brother and sister named Pete and Lucy, were able
to let their semantic web agents deal with the bookings of their mother physical thera
how Pete is not happy with the first solution and how Pete’s web agent finally find a solution
suitable for Pete, but in doing so it reschedules some less important appointments.
This visionary scenario for many thinkers and internet speculato
r is not that far, even though in
order to be feasible lot of work has still to be done.
The first vision of the Semantic Web came precisely from Tim Berners
Lee, but after him many
have embraced the idea to use technological innovations not only to contro
l information but to
convert it into intelligence. 
By using the word intelligence I mean that the data present within the web could be analysed and
processed by the machine itself in a way that it will look to our eyes as if they were actually
g is stead of us and, helping us in finding intelligent solutions to our problems.
Question for Discussion
But how is this possible?
What is the Semantic Web?
What are the basic bricks that make Semantic Web possible and alive?
Can computer understand
How can NLP help the development of the Semantic Web?
What is already out there to help us contribute to this fascinating and exciting perspectives?
The Semantic web is seen as an evolution of the World Wide Web where the informa
tion held in it
can be used both from humans and machines and can be transformed into intelligence.
To do so not only is necessary a communication to a human/machine and machine/machine level
but it is also needed a system of semantic knowledge on the ba
ckground that will be the Semantic
“The World Wide Web Consortium (W3C) creates Web standards and its mission is to
the Web to its full potential
, which it does by developing technologies (specifications,
guidelines, software, and tools
) that will create a forum for information, commerce,
inspiration, independent thought, and collective understanding. “
One of its seven goals and operating principles is the Semantic Web. On the semantic Web people
will be able to “communicate” with c
omputer, to express them self in a way that a machine would
be able to compute, elaborate and exchange information and knowledge, saving us from the
tedious and boring jobs, such as fetching a receipt, check time travel, medical information, fixing
ointment et cetera.
These operations called Web Services will be carried out by Web Agents.
What are Web Agents and Web Services?
is defined by the
A software system designed
In the vision of the Semantic Web these services will be carried out by the Web Agents.
Web agents are very complex software systems operating on the web.
They shall be able to f
etch, compute, elaborate and exchange data both with humans and
In other words they are pieces of software working on behalf of people.
They will have access to all the information on the web and they will be able, thanks to the
Semantic Web, to
carry out intelligent tasks.
Unfortunately the Semantic Web is far from being fully operative.
To achieve its goal we need tools and technologies that are not yet optimal.
Too many different companies are offering services that claim to be the final solut
ion to the
Semantic Web problems.
In order to understand better what this services are and what they do I will now give a brief
introduction on the vision and technologies behind the concept of the Semantic Web.
The Semantic Web:
is thought to be an evolution of the World Wide Web.
The Web when developed was thought to be used not only for human to human communications
but machines also were meant to be part of this “information space” ,as defined by Tim
Lee ;but the information on the web is designed to be meaningful for humans rather than
machines. In other words Internet was not designed to teach the machines what the information
they held really means.
term aim of this vision is to
imbuing the Web itself with
That is, providing
meaningful ways to describe the resources available on the Web and, perhaps more importantly,
why there are links connecting them together .
A Web of resources with a semantic behind it, sort of
a global database of knowledge inter
correlate, in which every item or entity has its own description, specification and annotations.
Computers will be able to understand a particular question asked by the user and find related
documents linked together b
y a semantic net and not by simply keyword comparison or ranking.
The aim of this vision consist in create a metadata
rich Web of resources
that can describe
themselves not only by how they should be
displayed (HTML) or syntactically (XML), but
also by th
e meaning of the
What is metadata?
“Metadata is data associated with objects which relieves their potential users of having full
advance knowledge of their existence or characteristics.”
Metadata can be thought as data about dat
During the annotation process we create metadata on a particular document. On the context of
the Semantic Web, annotation is
a set of instantiations attached to an HTML document. 
In other words we add extra comments to our documents or web pages a
nd this will enable the
computer or other people to better understand the nature of the resource and to link them
Annotations will be sort of summaries describing the content of a resource, being that a
document, an image, a web page and/or basi
cally anything that can be stored online.
Annotations will also be the ground where intelligent systems will carry out their reasoning and
inducing processes in order to answer a query, e.g. fetch a particular document, or simply solve
our particular needs
and problems. Annotations can also take the form of human readable
describing data, helping us understand better the online content, e.g. tags on a website.
Adding annotations to web resources is the essential step to produce semantic and knowledge
hin the web.
Semantic annotations are to tag ontology class instance data and map it into
Ontology classes. 
Figure 1 represent five web resources: three WebPages, a document and a library.
Each resource has some arrow linked to the symbol
I used to describe annotations.
Thanks to the annotations it is possible to understand to which sub
class the object belongs.
class are instances of more general classes,which are described in ontologies.
Therefore it can be possible to understand t
he nature of a resource, assign it a class and link it
together to other objects related to it ,as described in the ontology.
Fig.1, Ontology, Class and Annotations
The main aim of the semantic web is in fact to empower people. To give them a better t
connect and sift knowledge within the web.
"The central principle behind the success of the giants born in the Web 1.0 era who have
survived to lead the Web 2.0 era appears to be this, that they have embraced the power of
the web to
There is far too much information in the web, and we have neither the capabilities nor the time to
go through all of it in an effective way. With the semantic web and its technologies working
properly we will be able to find what w
e need effectively and efficiently.
In 1998 the basis for the technologies and architectures necessary to the fulfilment of this vision
were outlined. 
In the next section we will look at the technologies that support and will make the Semantic Web
Figure 2 gives a conceptualized vision of the various component of the Semantic Web.
Fig2. The Semantic Web Structure
I will give now a brief description of the conceptual layers that constitutes the Semantic Web in
top approach. 
Unicode and URIs
URIs (Uniform Resource Identifiers) are unique identifiers for resources of any type, being for
instance a document, people, a page on the web, et cetera. The Unicode is the standard for
computer character rep
resentation. Together they are the foundations of the Semantic Web’s
The XML with its related standards (e.g. XML schemas) is
a language that lets one write structured
” it is particularly suita
ble for sending documents across the Web. XML is a
just like HTML and allow one to write some content and provide information
about what role that content plays. Like HTML, XML is based on tags.”
The creation of XML is considered
by many as what has made or will make the Semantic Web
“XML is a structured set of rules for how one might define any kind of data to be shared on
the web. It is called ‘extensible’ because it can be modified to suit any purpose......
... The pri
mary goal of XML is to describe information on the web.”
Thanks to DTD (Document Type Definition) we can define the tags within our XML documents.
Resource Description Framework
The Resource Description Framework or RDF is a framework that enables
the interchange and
description of metadata. RDF is a data modelling language
It allows modelling information through a variety of syntax formats and enables us to classify data
on the web.
RDF is graph
based, but usually serialised as XML. Essentia
lly, it consists of triples:
subject, predicate, object. 
It also represents the relationships between entity and resources via graphs models.
is a basic data model, like the entity
relationship model, for writing simple
statements about Web obj
ects (resources) “.
The RDF schema is a knowledge representation language and it is used to describe ontologies.
It uses a class
subclasses fashion and gives a reasoning framework for inferring types of resources
RDF Schema defines the
used in RDF data models. In RDFS we can define the
vocabulary, specify which properties apply to which kinds of objects and what values they can
take, and describe the relationships between objects. 
Ontology or RDF Vocabularies
cally speaking ontology is “the study of the nature of being”. It is about grouping entities
and structure them in a hierarchy, subdividing them by their similarities and differences.
They are also referred to as metadata vocabularies and they are systems
that provide extra
constraints on things such as entities types and their attributes.
“An Ontology is a shared conceptualization of the world. Ontologies consist of definitional
aspects such as high
level schemas and assertional aspects such as entities,
interrelationships between entities, domain vocabulary and factual knowledge
connected in a semantic manner. Ontologies provide a common understanding of a
particular domain. They allow the domain to be communicated between people,
tions, and application systems. Ontologies provide the specific tools to organize
and provide a useful description of heterogeneous content.”
Logic and Proof
At this layer is where the AI reasoning kicks in. The inference process takes place at this s
Is in fact in this system that the web agents take their decisions and make their deductions on the
particular problems being asked by the users.
The logical foundations of the Semantic Web allow us to construct proofs that can be used
to improve tr
ansparency, understanding, and trust.” 
In order to gain the user’s trust on the truthfulness of the data, resources and conclusion
elaborated from the web agents we need to address issues such a validation of evidence and facts
used in the logi
c process of finding the solutions to the tasks.
Therefore this layer is extremely dependant on the accuracy of the metadata.
Explanations facilities can help users to gain confidence in the system. I discuss explanation
facilities, especially relating to
recommender systems, in my final year project.
Further discussions and recommendations for interested, curious or just masochist readers can be
found at the following links (Recommendations References page 30 of this paper):
Best Practice Recipes for Pub
lishing RDF Vocabularies [D]
up Language (XML) 1.0 (Fifth Edition) [E]
OWL Web Ontology Language Guide [Q]
OWL Web Ontology Language Overview [O]
OWL Web Ontology Language Reference [I]
OWL Web Ontology Language Semantics and Abstract Synta
OWL Web Ontology Language Test Cases [G]
RDF Primer [N]
RDF Semantics [K]
RDF Test Cases [J]
RDF Vocabulary Description Language 1.0: RDF Schema [F]
RDF/XML Syntax Specification (Revised) [C]
RDFa Primer [A]
RDFa in XHTML [B]
Resource Description Fra
mework (RDF) [M]
Web Ontology Language (OWL) [L]
XML Schema Datatypes in RDF and OWL [H]
This was the basic structure of the semantic web and its vision.
Of course we could go in much deeper details but the aim of this paper for now is to give a
view of it and to better focus on the problems of its making and some solutions that have been
If everything is already been outlined and so well structured and thought why the development of
the Semantic Web is taking so long?
I still writing this paper instead of talking to my laptop?
Why do I spend hours searching for some relevant document, and why after a keyword search on
any search engine at page four there are documents that have nothing to do with my intended
seems that for the Semantic Web to work, with that I mean to actually be fully part of our lives,
we require computers to think?
Or maybe to just process data in a more seemingly intelligent way?
Since its first days the study of AI was intended to be t
he way to create, one day, a machine with a
mind. Humans would be able to assemble together unanimated parts and give them life, reason,
thinking, in other words they could be God for one day.
Ambition is one of the strongest stimulating feeling that human
s have and I am not going to talk
about what is right to think or what is not but I will just say that sometimes ambition can be
productive while sometimes it may just be counter
productive, but without ambition the human
kind would never got anywhere far
from a tree and a bunch of bananas.
The history of AI studies is plenty of successes and failures.
Most of the time failures can be due to the scope of the particular project, often too big.
Because of that, AI researcher tend to carry out a bottom
proach to solve problems
Which means subdivide the main problem in smaller chunk and, once every little problem has
been solved the main goal will be achieved.
One of these sub
problems is to get the machines to understand our language (Natural Language
ocessing studies or NLP studies).
Language is in fact is the first barrier between human beings and computers.
Can computer understand language?
Can computer fool us to understand language?
The Chinese room argument
There are two main schools of though
t in the AI world; one is called the strong AI and the other
Thinkers who reside in the strong AI side believe that computers one day will acquire the
capability to think just like us, they will be able to carry out any task that we associat
e to our
human brain in other words computer one day will be able to do things such as dream, imagine
and create on their own just like we do.
All these attributes and processes are in fact usually refer to as defining characteristics of the
he branch of research or school of thought called the weak AI on the other hand believes that
computers will never be able to gain our brain peculiars attributes.
What they believe is that computers will be only able to compute a finite amount of tasks,
llowing some given rules or logic that will never comprise creativity or creation.
On other words they can only mimic the process of human thinking and they argue that computers
are not conscious of what they are doing.
Therefore the study of AI can be onl
y thought as a tool to better understand how the human brain
actually works. AI models in fact can be used to mimic the mind and better understand what goes
on inside our brain.
the Chinese room experiment
John Searle is one of the promoters of
the weak AI and in 1980 he coined the term “strong AI” and
devised a thought experiment against the beliefs of this thinking. 
The experiment is called the Chinese Room experiment.
Searle asks the reader to imagine a room with two holes, one marked wit
h “INPUT” and the other
one marked with “OUTPUT”.
Inside the room there is a person that can only speak English and there are a number of books
with Chinese language rules.
Outside the room there is a Chinese person that put a paper with a question written
trough the “INPUT” hole.
The English person inside the room reads the first symbol on the “Chinese query”, get the book
with that symbol on the cover and then check which book to take if the first symbol is followed by
the second symbol and so
on. Reputedly in this way, thanks to the books and their links to them
self, he manages to answer perfectly the question that has been asked. He then put the answer
through the “OUTPUT” hole.
The Chinese person outside the room will believe that inside th
e room there is someone who does
speak and understand perfectly Chinese, but obviously, as we know, that’s not the case.
In conclusion for Searle a symbol
processing machine, despite the fact that can make the user
believe that is actually understanding in
deep the meaning of our queries, it will never have a
complete understanding of what has been asked. For Searle they will be just manipulator of
symbols and they will never have conscious mental states of what they are saying.
The issue for many is not t
o get machines to actually have cognition of what we are saying but to
just understand us, or in other words to give us a pertinent answer when we ask them something.
If computer would be able to better understand our language there will be a number advant
Let’s think for example how easier, in terms of scalability and usability would be to communicate
with a database if we could query the DB in spoken English instead of using some specific query
language. In this scenario anyone who could write a que
ry in the English language, if the database
was written in English, s/he could get information out of it without studying any particular
Let’s think, for another example, of a medical expert system. If the system could actually proce
natural language there won’t be the need the add new rules every time something new has been
discover. We will just give it journals and reports to process, and it will get the new information,
store them, and use them for future cases.
benefits from successful natural language processing are amazing, but seems that
nothing has properly been achieved. (See the
chapter of this paper)
Natural Language Processing is difficult, it has been studied for many years and not many solutio
have proved to be fully successful.
Natural Language Processing
Natural Language Processing or NLP is a subfield of studies of Artificial Intelligence.
Its main concerns and efforts are towards the “translation” of computer readable data and
ormation in a more human friendly format.
A state of the art NLP system will translate normal human spoken sentences in a format
understandable by machine therefore its goal is to allow users to “talk” to computers like if they
were talking to a human and
Of course this process arises many issues and problems.
A computer just doesn’t have the necessary amount of experience or sometimes common
that allows us human
beings to understand sentences and the meaning behind them.
Examples of su
ch problems can be disarming. Instead of listing all the different types of problems,
which can be found on this Wikipedia web page  page, I will outline the three main approaches
to solve them and I will then focus on how one of these approaches can
help the Semantic Web.
There are many ways in which scientists and researchers have tried to overcome the NLP
problems, and in the following sections I will outline the three that seems more promising to me:
chatterbot (chatbot) is a computer program designed to mimic a natural conversation with
It consists of an input textbox where the user will input some answer and then an output region
where the software will display the answer.
Chat box don’t know
anything about the structure of the language.
What they do is a simple find and match. They are actually very good at that but the problem is
that often they can make stupid mistakes.
This approach can be compared to the situation where two people are dia
loguing in English and
one of them doesn’t speak English fluently. So he may fool the English native for a couple of break
ice questions, but then inevitably if he tries to answer or better to guess every question he will
give a silly answer.
on keywords and pattern
matching and they haven’t give encouraging results so far
to the mission of NLP.
The Child Approach
A different approach is to allow computers to learn a language like a child will do.
Some religion and philosophies believe that
we already know everything about everything and we
just need to dig it out from our brain. To unveil the knowledge that resides inside us.
Others think of our brain as a
; a blank slate where day by day we will write down
knowledge and notions.
Thinking in this way a child, when is born, doesn’t know anything about the meaning or the
structure of the words and therefore of the language; but s/he manage to learn it.
Thanks to examples and the context the child lives in, s/he
will be able to pick up a language and
to express him or herself with it.
Computers can be compared to a child. In fact they know nothing unless we put data in them.
We can give them to process and to store lots of example and they have the ability to pr
and compare it very fast.
based approach turned out to be a good approach but not the more satisfying.
The Adult approach
The third and personally the more promising approach is the so called adult approach.
The way it tries to proce
ss a language is just like an adult person would proceed in learning a new
Syntax + Vocabulary, Semantic, Pragmatics.
When we study the rules and principles for construct a sentence in any spoken language we are
studying the language’s s
yntax or grammar.
Every language has a structure. For example most languages in an affirmative positive sentence
will use the following structure: subject first, followed by a verb and then one or many objects.
Different ways have been used in NLP studies
to represent syntaxes; from parse trees to lists and
techniques such as re
write rules and transition network have been used to represent the possible
structure allowed in a grammar. Bottom
up and top
down approach, or algorithms such as depth
first and br
eadth first are example of algorithms used to find out the structure of grammars.
The study of syntax and the codification of its rules in a machine understandable way are useless if
we don’t “teach” to a computer how to understand the semantic behind the
If I say “ I’d could eat an horse!!!!“ for any listener ,a part for 2
5 years old babies, it will be
obvious that I am just saying that I am hungry in a “colourful“ way.
Unfortunately computers lack of the most obvious notions and
common sense that we acquired in
Semantic is that branch of studies that tackle the meaning behind the words and sentences while
pragmatics is more concerned with the figurative, allusive way we can express ourselves in so
many different ways,
even if we are still saying the same thing.
He seems to me that to overcome the NLP’s problems and difficulties is needed an hybrid
approach ,where first we teach to the machine the syntax, then we give it a huge amount of
data/examples and then it co
uld elaborate them in a pattern/example
Of course it’s easy to think that if for instance we could feed the machine with every possible
combination of sentences it will be able to understand everything we say; but that is, as a matter
ct, impossible. First because it will take an infinite amount of time, and secondly because
languages evolve every day, with new words, composed words, terms and situations just as the
human history is a continuous, yet cyclic, flow. [9, 23]
& Egg: Problems! ….Solutions?!?!
What has to be done to get the semantic web going?
The process of the development of the Semantic Web is cost and time consuming and that’s why
often people prefer an egg (a quick solution that meets some requirements b
ut is available now)
to a chicken ( a solution that meet all the requirements but he will take years to fully develop and
The chicken and egg problem also means that vendors and developers are waiting to put real
effort and money until the mark
et is created, but the market won’t create itself if there are not
Basically those applications require data that is not out in the web, and this data to be created
needs those applications and so on.
That is known as a chicken and egg proble
m: who came first? the egg or the chicken?
Despite these problems recently there has been an acceleration in the creation of such
In a very simple way, what the IT world, researchers of any type and/or anyone interested in the
f these promising issues are trying to do, is to give every entity or item that can be
stored or linked to in the web, a meaning and/or a way to be referred to in a meaningful way.
(See Fig.1) To do so people have to start writing data/documents in a RDF f
Many are claiming to have built applications that use NLP to better search the web or to have
created applications able to automatically create annotations and metadata.
In the next section we will look at some of such applications.
In order to actually get the Semantic Web to work properly data has to be published to the web in
A way of doing so can be “screen scraping”. In the screen scraping process a computer basically
extract data from the output displayed by another
The purpose of this process is to format the data into a more manageable way.
Another way is to get the data directly from a user’s inputs or forms.
Many companies are claiming to have created applications that can help the Semantic Web’s
During the research for this coursework I came across a number of this “statement” and I will now
show some of my results.
Powerset a Natural Language Search Company
The creators of Powerset: a Natural
Language search engine states that their tool will enable us to
carry out more meaningful researches within Wikipedia documents. This is clearly a limited data
set of what is on the web.
Fig.3 screenshot of the Powerset Homepage
Powerset Search Engin
e is meant to allow us a keyword
based search or a phrase
In a video lecture Barney Pell, founder and CEO of Powerset (
, Powerset, Inc.) states
the Powerset’s techniques and pri
nciples which are summarized in the following bullets point:
Goal : enable people to interact with information and services as naturally and
effectively as possible
Combine deep NL and scalable search technology
How ?: natural language search
t the web
Interpret the query
The system creates and uses Semantic Web information in multiple ways “
He states that the main innovation that Powerset is bringing to the Web Searching and the
Semantic Web world is to understand how t
he document’s intent is encoded.
Goal: Matching query intents with document intents
Changes to document model drive largest innovations:
Proximity: shift from “doc as bag
keywords” to “doc as vector
Anchor Text: Adding off
page text t
o doc “
How is it able to understand and better answer a query?
Parses each sentence on the page
Extracts entities & semantic relationships
Identifies and expands to similar entities, relationships & abstractions
Indexes multiple facts for each senten
Let’s see it in action.
If a type in the query “What is the capital of France?” I get the following results:
If we look at the highlighted words in the results we can noticed that Powerset took the initial
phrase “What is the capit
al of France?” it understood that was a question, in fact the words are
now in a different way: an affirmative way.
While Powerset obviously analyze the query in a semantic way Google will give us just a series of
documents where are present the words con
tained in our query.
But, as we can see, it also gives us results such as the capital punishment in France, which is clearly
unrelated to my first query.
If a do the same search on Google the output will be still acceptable:
Fig.5 Google “What is t
he capital of France?”
A definition that states that Paris is
the capital of France.
I will now try an example to discredit Google, because after all I am trying to show is if those new
technologies that claim to use and empower the Semantic Web are really helpful and have
something new to offer.
If a type “rare
wildlife of the Amazon” Powerset gives me a list of papers, works and TV series that
treats the argument of rare wildlife of the Amazon”.
While Google straight from the first page gives me a link to an Amazon.com item, it w
Powerset 41 links to stick one of those in the list.
Page 5 link 41 : Powerset’s first
“wrong” translation of the term
Amazon. Figure 7
link Google is already
showing some Amazon.com links.
A final example will be given by the typing of the following query:
“Why is natural language processing difficult?”
It is obvi
ous from this results that Google doesn’t have a clue of what we are talking about; in fact
it just gives us a series of link with the NLP words in it; but it also asks us if we meant a different
“why is NLP
By selecting the suggest
ed query, Google still doesn’t give us more than just a series of documents
with the words Natural Language Processing and difficulty in it.
On the other hand using Powerset we get a little bit closer to our goal (which is find documents
that explain us why NLP is so difficult).
Fig.10 Powerset results on the query “why is natural language processing difficult?”
With Powerset the results are a bit more accurate than the Google’s one but still there is a lot of
unrelated links to our
In both searches from the two search engines most results relate to pages where the words NLP +
the word difficult appear.
It is clear from the above examples how more accurate the understanding of a query is in a search
engine that implements NLP
techniques compare to a traditional keyword indexing and ranking
technique; but it is also obvious how ineffective and inefficient those applications still are.
But something more accurate is already out there....
Another interesting Search Engine
Application which I came across during my researches is Juice.
Juice is an Add
on for Firefox. An add
on is an application that enhances Firefox adding to it
Juice is meant to be an intelligent discovery engine.
What it does, thanks
to a NLP system and a dictionary management system, is “help the semantic
web by keyword connecting keywords with the most relevant, rich content from third
How juice works?
As we can see from figure 11 Juice reside on the righ
t hand side of the screen but can easily
removed by clicking on a button on the toolbar.
Fig. 11 Screenshot of how the Juice Add
One of the main characteristics of Juice is the ability to highlight a text and drag & drop it on the
panel. This action will activate a search and automatically generate a list of links.
Juice allows us also to search within its panel for topics while keeping the page from with started
from on the main page.
Let’s try with the following: Why is Natur
al Language Processing difficult?
As the screenshot shows, Juice gives much more accurate results than our previous experiment
with both Powerset and Google.
In fact it actually finds a document with the title:”The Difficulty of Natural Lan
guage” and another
one stating:” Natural Language Processing: Difficulties”.
In summary Juice have found more pertinent documents on a query where our previous tested
search engine failed miserably.
In addition to the better “understanding “of the query,
Juice gives us the chance to navigate
through its finding without lose the original webpage from which we started.
Juice allows us also to search for images, news, videos and blogs.
The results from the Juice search on
why Natural Language Processing
Another interesting features of
Juice is that when we have a video in the screen a small button ,
“ drag me “will appear next to the video giving us the possibility to paste it into the Juice search
panel and watch it in the panel leaving the main page free and “usable”.
Juice will also store video or images for future viewing.
In summary seems that Juice is much more “clever “than our previous analyzed search engine;
plus it offers a series of entertainment features.
I have to admit that many of the papers I found
during this research, I found them thanks to Juice.
Plus the ability to navigate without overwriting the initial page is really useful; a user will never be
carry away too far in surfing new pages and s/he can always come back to the initial point.
eve that is a quite an intelligent way to search the web.
Until now I have shown applications that help us find information or resources on the World Wide
Web thanks to the Semantic Web.
The following section analyzes a way of help the Semantic
Web by adding metadata to document.
Calais or Open
The following sentences are cited from the Calais website:
“Calais is a rapidly growing toolkit of capabilities that allow you to readily incorporate
art semantic functionality
within your blog, content management system,
website or application. “
“Calais is a web service that uses natural language processing (NLP) technology to
semantically tag text that is input to the service. The tags are delivered to the user who can
incorporate them into other applications
for search, news aggregation, blogs,
catalogs et cetera.”
Calais in other words enables users to simply create tags by coping and paste a document into it
and it will output tags making the documents linkab
Figure 16 shows a screenshot of the input form where we can paste our document in Calais.
As input for testing purposes I used the definition I gave earlier on this paper about XML, RDF and
Calais will use different colours f
or different types of tags, such as green for cities, red for
companies and so on. We can see which colour has been used for which type of tag in the left
panel of the screen.
The output will be what is shown in figure 17:
Calais has pr
oduced many tags for the given document and if I go over a tag with the mouse a
window will appear giving out some information and description.
The interesting thing is that if the user presses the button “show RDF” the system will show the
document in RDF
format as we can see in figure 18.
Once again the results are not optimal. The tags that Calais will annotate don’t cover all the
important words there are held in the text, but it is still impressive what they have managed to do
and I believe
there is the need to see those results in an optimistic way.
If you think about it, the majority of the people that uses the internet don’t know anything about
tags, RDF and XML and so on .Therefore the real need of sophisticated tools that do the
for the users, or in other words that take care of the dirty jobs without the
unaware user having a clue of what is going on
the background. But those users are the one
that will be surprised the most when they see “magic” connections comin
g out apparently from
Our community, by that I mean us human beings, cannot cope with everything we have at our
hands or virtual hands. So for those who hate computers because are changing our world in to a
more cold and electro
enario I will suggest that they just use them and abuse
them and think that they actually are making our lives easier .After all we created them
make them do something useful and especially something we prefer not to do and concentrate in
e creative aspects of our lives. We can create,
they cannot and they never will.
Unfortunately and inevitably when discussing topics so actual it is impossible to cover everything;
and even during the writing of this paper new technologies and
new tools just kept popping out.
To follow is a list of some other very interesting applications and research groups
that due to time
constrains I wasn’t
able to discuss but are of great interest:
"an open, shared database of the world's
A smarter way to track interest ;
a UK com
pany creating tools to control personal
information on the Web;
online TV provider;
a vendor of software that makes data
"available to share,
remix and reuse"
a European enterprise information integration
a German ve
ndor of ontology
related tools; “
(Descriptions have been taken from the website’s homepages)
I believe those type of tools are really helpful and they are already making a big difference in the
way we use the internet and its resources.
nd what they l
together are changing the way we surf, research and discover the web and
Further discussion on how tags are changing our web browsing experience and the evolution of
visualization of information are discussed in more detail
s on my Information Retrieval coursework.
It is clear though, that such tools are not yet optimal and fully efficient but we have seen an
enormous effort and development in the last decade.
I strongly believe that what is driving the world today, a part th
e obvious making
processes, is a community
driven fashion of applications, where people are able to share their
resources being that for knowledge
research purposes or simply like
People have to feel connected and able to communicate w
ith everything and anything in the
The Semantic Web is an amazing and exciting prospective that is trying to make this possible and I
do believe that even if what was described in the article by T.B
ee will never be possible, at least
in my life t
ime unfortunately, it will still make our lives a lot more interesting.
We are already able to discover a lot of new and interesting things thanks to those sorts of
applications and I am confident there will be definitely more amazing things still to come.
G.Antoniou, F.van Harmelen (2008) .A Semantic Web Primer, The MIT Press Cambridge,
Massachusetts, London, England
Information Management: A Proposal
. CERN. Available at:
Semantic Web Road Map
. W3C. Available at:
Lee,J. Hendler, and O. Lassila (2001).
The Semantic Web
J.Cardoso (2006); Semantic Web Services : theory ,tools and applications ,
K.Darlington (2005); Effective Website Development;Tools and Techniques
Semantic Web Technologies ;Trends and Research in
based Systems, Wiley.
L. Dempsey, R. Heery, (1997).Metadata [DESIRE] Specification for resource description
methods Part 1: A review of metadata: a survey of current
resource description formats.
making a new Science
Juice. available at http://www.juiceapp.com
B. Matthews (2005) .Semantic Web Technologies,
D. McGuiness, M. Dean (2005); Substance of the
Semantic Web ; SWANS
Natural Language Processing, Wikipedia. available at
Opencalais. Available at
(2005). What is web 2.0 .available at
Powerset available at http://www.powerset.com/
(2006).Visualizing the Semantic Web ;XML
Screen scraping, Wikipedia. Available at
Minds, Brains and Programs
Behavioral and Brain Sciences
R.Studer, S.Grimm, A. Abecker (2007).
Semantic Web S
ervices Concepts, Technologies, and
Tabula rasa ,Aristotele (unknown)
Wide Web Consortium homepage: http://www.w3.org/
G.P.Williams .(1997) Chaos Theory Tamed .
B. Adida, M. Birbeck,
RDFa Primer, Bridging the Human and Data Webs, W3C Working
Group Note 14 October 2008 .
B. Adida, M. Birbeck, S. McCarr
on, S. Pemberton,I (2008) ; RDFa in XHTML: Syntax and
Processing ,A collection of attributes and processing rules for extending XHTML to support
RDF ;W3C Recommendation .
Dave Beckett (2004).
RDF/XML Syntax Specification (Revised).
[last accessed 01/12/08]
D. Berrueta, J. Phipps,
Best Practice Recipes for Publi
shing RDF Vocabularies(2008), W3C
Working Group Note .
T. Bray, J. Paoli, C. M. Sperberg
McQueen, E. Maler, F. Yergeau (2008);
Language (XML) 1.0 (Fifth Edition),
W3C Recommendation .
[last accessed 01/12/08]
D. Brickley and R.V. Guha (2004).
Vocabulary Description Language 1.0: RDF Schema
W3C Recommendation . Available at:
J. Carroll, J. De Roo (2004).
OWL Web Ontology Language Test Cases
Recommendation . Available at:
J. J. Carroll, J. Z. Pan ; XML Schema Datatypes in RDF and OWL(2006), W3C Working Group
[last accessed 01/12/08]
M. Dean, G. Schreiber, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F.
Schneider, L. A. Stein (2004).
OWL Web Ontology Language Reference
Recommendation . Available at:
[last accessed 01/12/08]
J. Grant, D. Beckett (2004).
RDF Test Cases
. W3C Recommendation. Available at:
[last accessed 01/12/08]
P. Hayes (2004).
. W3C Recommendation . Available at:
[last accessed 01/12/08]
J. Heflin (2004).
Web Ontology Language (OWL) use cases and requirements
Recommendation. Available at:
G. Klyne, J. Carroll (2004).
Resource Description Fr
amework (RDF): Concepts and Abstract
. W3C Recommendation. Available at:
F. Manola, E. Miller (2004).
. W3C Recommendation. Available at:
D. L. McGuinness, F. van Harmelen (2004).
OWL Web Ontology Language Overview
Recommendation. Available at:
P. F. Patel
Schneider, P. Hayes, I. Horrocks (2004).
OWL Web Ontolo
gy Language Semantics
and Abstract Syntax.
W3C Recommendation. Available at:
[last accessed 01/12/08]
M. K. Smith, D. McGuinness, R. Volz, C. Welty (2004).
OWL Web Ontology Language Guide
W3C Recommendation. Available at: