Assignment 1: Database Search Results (10%)

pantgrievousΤεχνίτη Νοημοσύνη και Ρομποτική

30 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

56 εμφανίσεις

Assignment 1: Database Search Results (10%)

Submit a report that includes:

1) a list of 10
-
20 articles or books which seem potentially relevant to your research topic

1.

Manning, C. and H. Schutze,
Foundation of Statistical NLP
. MIT Press, 1999

2.

Sebasti
ani, F.
Machine Learning in Automated Text Categorisation.

http://faure.iei.pi.cnr.it/~fabrizio/

3.

Robert Dale, herman Moisl, Harold Somers.
Handbook of Natural Language Processing.

Marcel Dekker, 2000.

4
.

Aas, K. and L. Eikvil,
Text Categorisation: A Sur
vey
. 1999.

5
.

Pal, M.,
Multiclass Approaches for Support Vector Machine Based Land Cover
Classification.

6
.

Baud, R., A. Rassinoux, and J. Scherrer,
Natural language processing and semantical
representation of medical texts.

Methods Inf Med, 1992.
31
(2): p
. 117
-
125.

7
.

Bodenreider, O., et al.,
Investigating subsumption in SNOMED CT: An exploration into
large Description Logic
-
based biomedical terminologies.

Artificial Intelligence in
Medicine (Special issue on Formal Biomedical Knowledge Representation), 20
05.

8
.

Hu, J. and H. Huang.
An algorithm for text categorization with SVM
. in
TENCON '02.
Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control
and Power Engineering
. 2002.

9
.

Chua, S. and N. Kulathuramaiyer.
Semantic Feature Se
lection Using WordNet
. in
Web
Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
.
2004.

10
.

Silva, C. and B. Ribeiro.
Labeled and unlabeled data in text categorization
. in
Neural
Networks, 2004. Proceedings. 2004 IEEE Intern
ational Joint Conference on
. 2004.

11
.

Xiao
-
Yun, C., et al.
Text categorization based on frequent patterns with term frequency
. in
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference
on
. 2004.

12
.

Doan, S.
A Fuzzy
-
Based App
roach for Text Representation in Text Categorization
. in
Fuzzy Systems, 2005. FUZZ '05. The 14th IEEE International Conference on
. 2005.

13
.

Qi
-
Rui, Z., et al.
Document indexing in text categorization
. in
Machine Learning and
Cybernetics, 2005. Proceedings

of 2005 International Conference on
. 2005.

14
.

Spyropoulos, C.D. and V. Karkaletsis,
Information extraction and summarization from
medical documents.

Artificial Intelligence in Medicine, 2005.
33
: p. 107
-
110.

15
.

Xiujuan, W., G. Jun, and Z. Kangfeng.
Norm
alized and classified feature selection in text
categorization
. in
Communications and Information Technology, 2005. ISCIT 2005. IEEE
International Symposium on
. 2005.

16
.

Moises, G., H. Hugo, and C. Edgar.
Contextual Entropy and Text Categorization
. in
Web

Congress, 2006. LA
-
Web '06. Fourth Latin American
. 2006.

17
.

Amine, B.M. and M. Mimoun.
WordNet based Cross
-
Language Text Categorization
. in
Computer Systems and Applications, 2007. AICCSA '07. IEEE/ACS International
Conference on
. 2007.

18
.

Davy, M. and
S. Luz.
Dimensionality Reduction for Active Learning with Nearest
Neighbour Classifier in Text Categorisation Problems
. in
Machine Learning and
Applications, 2007. ICMLA 2007. Sixth International Conference on
. 2007.

19
.

Sarinnapakorn, K. and M. Kubat,
Com
bining Subclassifiers in Text Categorization: A
DST
-
Based Solution and a Case Study.

Knowledge and Data Engineering, IEEE
Transactions on, 2007.
19
(12): p. 1638
-
1651.


2) A review of 1 paper from this list that you and your supervisor consider particularly

important for your research topic. This should be no more than 1 page long and include:

-

D
etails
,

summary
, main strengths and weaknesses of the paper

It is worthy of mentioning a
report

named “Text categorisation: A survey” which was
contributed
in 199
9
by K. Aas and L. Eikvil, from Norwegian Computing Center
.

Succeeding
the beginning introduction
,
Chapter
-
2 lists the steps of pre
-
process which
transform
free

text into a proper representation for following categorisation task. There are
6 classifying a
lgorithms described in
Chapter
-
3, and all of them have been successfully
implemented in previous text classification
work
.
Chapter
-
4

sets up

performance measures
for evaluations of category ran
king and binary categorisation. Based on the
two

preceding
chap
ters,
Chapter
-
5 leads readers to the description of the evaluated previous work using
Reuters
-
21578 collection.

Then authors’ own work using the same Reuters collection was
presented in
Chapter
-
6, followed by a summary,
Chapter
-
7.

To conclude, this report

described different approaches for pre
-
processing, indexing,
dimensionality reduction, and classification, which constitute a typical
progress in

text
categorisation.

Moreover, based on description of the results from previous text
categorisation work usin
g Reuters collection
,

plus
authors’ own experiment
s
, the following
classification methods were evaluated: Rocchio’s,
NB
, K
NN
, DT
,

SVM, and Vo
t
ed
Classification.

It clearly
showed

that

all of the methods
generated reasonable

classifiers
.

Based on my own poi
nt of view, I listed the pros and cons of the paper:

Pros 1.

It is persuasive that t
he surveyed previous work had been applied with a
number of statistical classification and machine learning techniques

Pros 2.

It is competent to measure progress in field that the standar
dised
R
euters
-
21578 collection was chosen

to

all the implementations

Cons 1.

Multi
-
class and multi
-
label problems have

not

been analysed
deep

enough


-

How

does the paper relate to your research
topic
,

and questions that you want to ask

The objective of my resear
ch topic “
Converting Natural Language to Medical Codes


is to
convert the natural language in clinical notes into me
dical codes pre
-
defined.

Thus

text
categorisation methods and technologies are highly
needed

to classify
these
clinical notes

and

reports
.

I
n pre
-
processing stage, in order to reduce dimensionalities, those unique words usually
are filtered out

since they have few statistical meanings
, then the task cost millions of
times calculation to assign one or more class
-
values to the input.
However, so
me medical
terminologies
such as “endarterectomy”, “hepatorrhexis” can directly give out

the report
type

according to their literal meanings, by even merely one
-
time hashing calculation.
I
would like to know whether it is

worthy of adding such a literally
-
guessing step into
pre
-
processing

stage
, with assistances from a Terminology
Database?

-

Directions

for future research that you think are worthwhile to consider

Considering
Mahesh Pal
’s

article

“Multi
-
class Approaches for Support Vector Machine Based
Lan
d Cover Classification”, it will never be a bad idea to generate

a more powerful

multi
-
class SVMs
.

A

number of methods to

establish

multi
-
class SVMs have been proposed
by researchers and
are still under improvement
, to extend
SVMs’ satisfying
binary
-
classi
fying performance

for
the
more

practical multi
-
class classification.

3) A list of the top conference and journals in your research area (up to 6 each)

top conference:

1

Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference
on. 2007.

2

Advan
ced Learning Technologies, 2007. ICALT 2007. Seventh IEEE
International Conference on
. 2007.

3

Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth
International Conference on. 2007.

4

Pattern Recognition, 2006. ICPR 2006. 18th International Co
nference on.
2006

5

Communications and Information Technology, 2005. ISCIT 2005. IEEE
International Symposium on
. 2005.

6

Computer and Information Technology, 2004. CIT '04. The Fourth
International Conference on
. 2004.


journals:

1

Artificial intellig
ence in medicine

2

Bioinformatics

3

Computers and Biomedical Research

4

Computers and Biomedical Research

5

Concepts, knowledge, and language in health
-
care information systems
(IMIA)

6

Data and Knowledge Engineering


4) A list of the main research g
roups working in your area (up to 6).

main research groups

1


2


3


4


5


6