Harvesting Semantic Content from the Web
4:30, Tuesday, October 14, 2008, Lecture Hall 1
Fall 2008 PLATO Royalty Lecture Series
Research in natural language processing (NLP) over the past fifteen years has
produced impressive p
ractical results using statistical methods. But increasingly there are
signs that continued quality improvement in language processing applications (including QA,
summarization, information extraction, opinion mining, and machine translation) requires
per and richer representations, possibly even (shallow) semantics of text meaning.
Although theories of semantics (
formal and informal) abound, no
one has yet built a resource
of semantic symbols that effectively supports NLP, that is empirically based, an
d that has
been validated through human
agreement scores. Can this be done?
This talk describes the harvesting of semantic knowledge from the web, and
reformulation of that knowledge into the Omega ontology, to support various NLP
applications. We wil
l explore a series of increasingly detailed experiments in knowledge
harvesting and organization: from fully automated, through partly automated, ending with
work requiring manual annotation. The first two make extensive use of the web; the third is
project, a large collaborative effort to build
a manually annotated
corpus of one
million words of English, Chinese, and Arabic text, with accompanying
ontology for the senses of nouns and verbs.
Throughout the lecture, we will touch on
some problematic aspects of
and semantic representations that must support robust large
scale reasoning and other
applications. We will see examples of cases where traditional, formal, semantics simply does
not work, and where what does wor
k instead looks woefully simplistic.
Eduard Hovy leads the Natural Language Research Group at the Information
Sciences Institute of the University of Southern California
is Deputy Director of the
Intelligent Systems Division, as well
as a research associate professor of the Computer
Science Department of USC and Advisory Professor of the Beijing University of Posts and
Telecommunications. He completed a Ph.D. in Computer Science (Artificial Intelligence) at
in 1987, a
is research focuses on information extraction, automated text
summarization, the semi
automated construction of large lexicons and ontologies, machine
translation, question answering, and digital government. He is the author or co
editor of five
and over 180 technical articles. Dr. Hovy regularly serves in an advisory capacity to
funders of NLP research in the US and EU. In 2001 Dr. Hovy served as President of the
Association for Computational Linguistics (ACL) and in 2001
03 as President of the
International Association of Machine Translation (IAMT); he currently serves as President
of the Digital Government Society of North
America (DGSNA). He
This Lecture Series is sponsored by Evergreen’s PLATO Royalty Fund, a fund established with royalties
from computer assisted instruction (CAI) software written by Evergreen faculty John Aikin Cushing and
students in the early 1980’s for the C
ontrol Data PLATO system.
course in the Master’s Degree Program in Computer Science at the University of S
California, as well as occasional short courses on
and other topics at
universities and conferences. He has served on the Ph.D. and M.S. committees for students
from USC, Carnegie Mellon University, Taiwan National U, the Unive
rsities of Toronto,
Karlsruhe, Pennsylvania, Stockholm, Waterloo, Nijmegen, Pretoria, and Ho Chi Minh City.
for the Lecture, and Reading
Deepak Ravichandran and Eduard Hovy,
Learning Surface Text Patterns for a
Question Answering System
The 90% Solution.
Min Kim and Eduard Hovy,
Identifying and Analyzing Judgment Opinions.
Especially the Abstract and Sections 1
3, 7: Chin
Yew Lin and Eduard Hovy,
mated Acquisition of Topic Signatures for Text Summarization.
Dongui Feng, Eduard Hovy,
iographical Questions with Implicature.
Ken Barker, Bhalchandra
Agashe, et al.
Learning by Reading
A Prototype System,
Performance Baseline and Lessons Learned.