Knowledge_Management - AU-KBC Research Centre

collardsdebonairManagement

Nov 6, 2013 (3 years and 8 months ago)

65 views

Knowledge Management

Indo
-
German Workshop on Language technologies

AU
-
KBC Research Centre, Chennai

Speaker




Prof. Sudeshna Sarkar



Computer Science & Engineering Department,

Indian Institute of Technology Kharagpur,Kharagpur

sudeshna@cse.iitkgp.ernet.in

Research Activities

Department of Computer Science &
Engineering

College of Engineering, Guindy

Chennai


600025


Participant :
Dr.T.V.Geetha

Other members:
Dr. Ranjani Parthasarathi


Ms.D. Manjula


Mr. S. Swamynathan

Knowledge Management, Semantic Web
Retrieval
-

Possible Areas of cooperation



Semantic Based Approaches to Information Retrieval Extraction


Cognitive Approaches

to

Semantic Search Engines with user
profiles and user perspective



Multilingual Semantic Search Engines



use of an intermediate
representation like UNL


Goal based Information Extraction

from semi
-
structured
Documents


use of ontology


Information Extraction and its Visualization



development of
time line visualization of documents

Contacted: Dr. Steffen Staab

University of Karlsruhe

Institute of Applied Informatics and Formal Description Methods

Core Competencies:

Knowledge Management


Knowledge Management, Web Services
-

Work done in the area


Design and implementation of Reactive Web Services using Active
Databases


Design and Implementation of
Rule Engine



Design and implementation of complex rules to tackle client and
server side semantics of the rule engine.


Development of
intelligent web services

for E
-
commerce.




Extension to tackle multiple and cooperative web service
environments.



Knowledge Management, Web Services
-

Possible Areas of cooperation



Formalization and Description of Web Service Semantics using
Semantic Web


Introspection between Web Service


Personalization of Web Services


Rating of Web Services



Contacted: Dr. Steffen Staab

University of Karlsruhe

Institute of Applied Informatics and Formal Description Methods

Core Competencies:

Knowledge Management


Natural Language Processing

Knowledge Representation


Possible
Areas of cooperation


Knowledge Representation Architecture based on Indian Logic


Argumentative Reasoning Models based on Indian Logic


Knowledge representation and interpretation strategies based on Indian
sastras like Mimamsa



Building Domain Ontologies based on above architecture


Knowledge Management based on above approaches


Contacted: Prof. Dr. Gerd Unruh

University of Applied Sciences Furtwangen

Department of Informatics


Core Competencies: WordNet, Data bases


Utkal University

We Work On

Image Processing

Speech Processing

Knowledge Management

Utkal University

We Work On

Image Processing

Speech Processing

Knowledge Management

Knowledge Management


Machine Translation


Normal sentences with WSD



Lexical

Resources


(A)

e
-
Dictionary (Oriya

䕮杬E獨

H楮摩)


Got IPR. and Tested
by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000
Hindi words.


(B) Oriya WordNet
with

Morphological Analyzer.



Got IPR. , Tested by SQTC, ETDC, Banglore
-
1,000 Lexicon.


(C) Ori
-
Spell (Oriya Spell Checker)



Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root
and derived).


(D) Trilingual Word Processor (Hindi
-

English
-
Oriya)



Integrated with Spell Checker and Grammar Checker.



Utkal University

We Work On

Image Processing

Speech Processing

Knowledge Management

Knowledge Management


Machine Translation


Normal sentences with WSD



Lexical

Resources


(A)

e
-
Dictionary (Oriya

䕮杬E獨

H楮摩)


Got IPR. and Tested
by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000
Hindi words.


(B) Oriya WordNet
with

Morphological Analyzer.



Got IPR. , Tested by SQTC, ETDC, Banglore
-
1,000 Lexicon.


(C) Ori
-
Spell (Oriya Spell Checker)



Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root
and derived).


(D) Trilingual Word Processor (Hindi
-

English
-
Oriya)



Integrated with Spell Checker and Grammar Checker.




San
-
Net(Sanskrit Word
-
Net)


Developed using
Navya
-
NyAya
()Philosophy and
Paninian Grammar


Beside Synonym, Antonym, Hypernym,
Hyponym, Holonym and Meronyms etc., some
more relation such as: Analogy, Etymology,
Definition, Nominal Verb, Nominal Qualifier,
Verbal Qualifier and Verbal Noun have been
introduced in San
-
Net.


San
-
Net can be used for Indian language
understanding, translating, summarizing and
generating.


A standard Knowledge Base (KB) has been
developed for analyzing syntactic, semantic and
pragmatic aspects of any lexicon.

KM(Sanskrit)

Utkal University

We Work On

Image Processing

Speech Processing

Knowledge Management

Knowledge Management


Machine Translation


Normal sentences with WSD



Lexical

Resources


(A)

e
-
Dictionary (Oriya

䕮杬E獨

H楮摩)


Got IPR. and Tested
by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000
Hindi words.


(B) Oriya WordNet
with

Morphological Analyzer.



Got IPR. , Tested by SQTC, ETDC, Banglore
-
1,000 Lexicon.


(C) Ori
-
Spell (Oriya Spell Checker)



Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root
and derived).


(D) Trilingual Word Processor (Hindi
-

English
-
Oriya)



Integrated with Spell Checker and Grammar Checker.




San
-
Net(Sanskrit Word
-
Net)


Developed using
Navya
-
NyAya
()Philosophy and Paninian
Grammar


Beside Synonym, Antonym, Hypernym, Hyponym,
Holonym and Meronyms etc., some more relation such as:
Analogy, Etymology, Definition, Nominal Verb, Nominal
Qualifier, Verbal Qualifier and Verbal Noun have been
introduced in San
-
Net.


San
-
Net can be used for Indian language understanding,
translating, summarizing and generating.


A standard Knowledge Base (KB) has been developed for
analyzing syntactic, semantic and pragmatic aspects of any
lexicon.

KM(Sanskrit)

Present Interest


Sanskrit WordNet based Machine Translation System


Morphological Analyser for Sanskrit


Navya Nyaya Philosophy to be extensively used for it.


Help to have better WSD as NNP provides a effective Conceptual
analysisng capability.

Natural Language Processing
Group

Computer Sc. & Engg. Department

JADAVPUR UNIVERSITY

KOLKATA


700 032, INDIA.

Professor Sivaji Bandyopadhyay

sivaji_ju@vsnl.com

Cross
-
lingual Information
Management


Multilingual and Cross
-
lingual IR


A Cross Language Database (CLDB) System in
Bengali and Hindi developed


Natural language query analyzed using a Template
Grammar and Knowledge Bases to produce the
corresponding SQL statement


Cooperative response in the query language


Anaphora / Coreference in CLDB studied


Database updates and elliptical queries also supported

Cross
-
lingual Information
Management


Open Domain Question Answering


Work being done for English


Currently building a set of question templates
(Qtargets) and the corresponding Answer patterns with
relative weights


Input question analyzed to produce the corresponding
question template


Appropriate answer pattern retrieved


Answer generated using the input document and the synthesis
rules of the language

Search and Information Extraction Lab

IIIT Hyderabad

Search and Information extraction lab focuses

building technologies for Personalized, customizable
and highly relevant information retrieval and
extraction systems The vertical search or the domain
specific search, when combined with the
personalization aspects, will drastically improve the
quality of search results.



Current work includes on building search engines that are vertical
portals in nature. It means that they are specific to a chosen domain
aiming at producing highly quality results (with high recall and
precision). It has been realized in the recent past that it is highly
difficult to build a generic search engine that can be used for all
kinds of documents and domains yet produce high quality results.
Some of the tasks that are involved in building domain specific
search engines include to have representation of the domain in the
form of ontology or taxonomy, ability to “deeply understand” the
documents belonging to that domain using techniques like natural
language processing, semantic representation and context
modeling. Another area of immediate interest for English pertains to
summarization of documents. Work is also going
-
on on text
categorization and clustering.




The development makes use of the basic technology already
developed for English, as well as for Indian languages pertaining to
word analyzers, sentential parsers, dictionaries, statistical techniques,
keyword extraction, etc. These have been woven in a novel
architecture for information extraction.


Knowledge based approaches are being experimented with. The
emphasis is on using a combination of approaches involving automatic
processing together with handcrafting of knowledge. Applications to
match extracted information from documents with given specifications
are being looked at. For example, a given job requirement could be
matched with resumes (say, after information is extracted from them).

A number of sponsored projects from industry and government are
running at the Center in this area. A major knowledge management
initiative in the areas of eGovernance is also being planned.










We are building search engines and named entity extraction tools
specifically for Indian context. As a test bed, we are building an
experimental system codenamed as PSearch
(
http://nlp.iiit.net/~psearch
).


SIEL is also actively developing proper name gazetteers to cover the
commonly used names of people, places, organizations etc in the
Indian news media for various languages. These resources will help
in the information extraction, categorization, and machine
translation activities.


For further information and details please email to
vv@iiit.net


Efforts in Language & Speech Technology









Natural Language Processing Lab

Centre for Development of Advanced Computing

(Ministry of Communications & Information Technology)

‘Anusandhan Bhawan’,


C 56/1 Sector 62, Noida


201 307, India

karunesharora@cdacnoida.com

Gyan Nidhi : Parallel Corpus



GyanNidhi’ which stands for ‘Knowledge Resource’ is parallel in 12 Indian
languages , a project sponsored by TDIL, DIT, MC &IT, Govt of India

Gyan Nidhi: Multi
-
Lingual Aligned Parallel Corpus



What it is?

The multilingual parallel text corpus contains the same text translated in more than
one language.


What Gyan Nidhi contains?

GyanNidhi corpus consists of text in English and 11 Indian languages (Hindi, Punjabi,
Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese). It
aims to digitize 1 million pages altogether containing at least 50,000 pages in each
Indian language and English.



National Book Trust India



Sahitya Akademi


Navjivan Publishing House


Publications Division


SABDA, Pondicherry

Source for Parallel Corpus

GyanNidhi Block
Diagram

Platform

: Windows

Data Encoding

: XML, UNICODE

Portability of Data

: Data in XML format supports various
platforms

Applications of GyanNidhi


Automatic Dictionary extraction

Creation of Translation memory

Example Based Machine Translation (EBMT)

Language research study and analysis

Language Modeling


Gyan Nidhi: Multi
-
Lingual Aligned Parallel Corpus


Tools:

Prabandhika: Corpus Manager


Categorisation of corpus data in various user
-
defined domains


Addition/Deletion/Modification of any Indian Language data files
in HTML / RTF / TXT / XML format.


Selection of languages for viewing parallel corpus with data aligned
up to paragraph level


Automatic selection and viewing of parallel paragraphs in multiple
languages


Abstract and Metadata


Printing and saving parallel data in Unicode format

Sample Screen Shot : Prabandhika

Tools:

Vishleshika : Statistical Text Analyzer



Vishleshika is a tool for Statistical Text Analysis for Hindi extendible to other
Indian Languages text



It examines input text and generates various statistics, e.g.:


Sentence statistics


Word statistics


Character statistics


Text Analyzer presents analysis in Textual as well as Graphical form.

Sample output: Character statistics


0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0


































Consonants
Percentage of occurance
Hindi
Nepali
Above Graph shows that the distribution is almost equal
in Hindi and Nepali in the sample text.


Most frequent consonants in
the Hindi

Most frequent consonants in
the Nepali

Results also show that these six consonants constitute
more than 50% of the consonants usage.

Vishleshika: Word and sentence Statistics

Knowledge Management

Information Retrieval / Information Extraction

AU
-
KBC Research Centre

IE in Partially structured data

Information extraction on partially structured
domain dependent data is done for IB.

The sample data was in criminal domain.

This is a rule based system and the rules are hand
crafted.

There are various dictionaries for places, events
and the basic verbs which are used by the rules.

The dictionary can be dynamically updated.

The template is pre
-
defined.

Example:

Event :

An exchange of fire took place between the police and
CPML
-
PW extremists ( 2 ) at Basheera ( Kamarpally
mandal/district Nizamabad/January 9 ) resulting in the death
of a DCM of the outfit . The police also recovered wireless sets
( 2 ) , hand
-
grenade ( 1 ) and revolver ( 1 ) from the site .





Participant 1 = police




Participant 2 = CPML
-
PW_extremists




No of Participant 2 = ( 2 )




Material = revolver




Date = January 9 2002




Police Station = Nizamabad




Mandal = Kamarpally




District = Nizamabad




Event = exchange of fire

IE in Unstructured data

Information extraction on Unstructured, domain
dependent data is done in online matrimonial.

The sample data was take from The Hindu online
matrimonial.

This is a rule based system and the rules are hand
crafted. Linguistic rules as well heuristic rules play
a major role in this.

There are various dictionaries for cast, religion,
language etc. Which are used by the system.

The template to be filled up is static and pre
-
defined.