Improving quality of graduate

sharpfartsAI and Robotics

Nov 8, 2013 (4 years and 1 month ago)

92 views

1

Improving quality of graduate
students by data mining

Asst. Prof. Kitsana Waiyamai, Ph.D.

Dept. of Computer Engineering

Faculty of Engineering, Kasetsart University

Bangkok, Thailand


2

Content

PART I


Introduction to data mining


Data mining technique:
association rule
discovery


Data mining technique:
data classification

PART II


Improving quality of graduate students

by
data mining

Conclusion

3

What Is Data Mining ?

Knowledge Discovery from Data: KDD (Data Mining):


The process of nontrivial extraction of patterns
from data. Patterns that are:


implicit,


previously unknown, and


potentially useful


Patterns must be comprehensible for human users.

4

Knowledge Discovery Process:

Iterative & Interactive Process


Data sources

Databases, flat files,

Complex data

Data

Warehouses

Preprocessing data

Gathering, cleaning

and selecting data

Search for patterns: Data Mining

Neural nets, machine learning,

statistics and others

Analyst reviews output

Report findings

Take actions

based on findings

Interpret results

Mining

Objective

5

What kind of data can be mined?

Relational databases

Data warehouses

Transactional databases and Flat files

Advanced DB systems and information repositories


Object
-
oriented and object
-
relational databases


Spatial databases


Time
-
series data and temporal data


Text databases, multimedia databases


Heterogeneous and legacy databases


World Wide Web


Bioinformatic data


Databases

Data

Warehouse

6

Two modes of data mining

Predictive data mining


Predict behavior based on historic data


Use data with known results to build a model that
can be later used to explicitly predict values for
different data


Methods: classification, prediction,


etc.

Descriptive data mining


Describe patterns in existing data that may be
used to guide decisions


Methods: Associations rule discovery, Sequence
pattern discovery, Clustering,


etc.


7

Data Mining Techniques

Data Clustering

Association rule discovery

Data Classification

Outlier detection

Data regression

Etc.


8

9

Classification

is the process of assigning new objects to
predefined categories or classes


Given a set of labeled records


Build a model


Predict labels for future unlabeled records

Example
:


Age, Educational background, Annual income, Current
debts, Housing location => Making Decision


Degree=

Master


and Income=7500 =>
Credit=

Excellent



Data Classification

10

Three
-
Step Process of
Classification

Model construction

Model Evaluation

Classification

Classifier Model

Training

Data

Testing

Data

Classifier Model

Unseen

Data

11

Data Mining Tools

ANGOSS KnowledgeStudio

IBM Intelligent Miner

Metaputer PolyAnalyst

SAS Enterprise Miner

SGI Mineset

SPSS Clementine

Many others


More at
http://www.kdnuggets.com/software


12

Data Mining Projects

Checklist:


Start with well
-
defined questions


Define measures of success and failure

Main difficulty: No automation


Understanding the problem


Data preparation


Selection of the right mining methods


Interpretation


13

Using Data Mining for Improving Quality

of Engineering Graduates


Objective:


Discover knowledge from large databases of
engineering student records.

Discovered knowledge are useful in:


-

Assisting in development of new curricula,


-

Improvement of existing curricula,


-

Helping students to select the
appropriate major

14

Using a data mining technique to help
students in selecting their majors

Motivation:



-

Student major selection is very important factor
for his/her success.


-

Lack of experience and information on each major.

Solution:



-

Find out the profiles of good students for each
major using student profile database and course
enrollment student databases (10 years)


-

Determine the most appropriate major for each
student



15

A Data Mining based Approach for Improving
Quality of Engineering Graduates

DB2

SQL Server

course
enrollment
student
databases

student
profile
database

Data Mining Tool

Java
Servlet

User

16

Data for Data Mining

Stu_code


Sex


Address


Sch_GPA


.....

GPA

37058063


male


Bangkok


2.5


.....

2.3


37058167


male


Songkla


3.4


.....

3.2


...........

....

.......

......

....

....

Stu_code


Sub_code


Term


Year


Grade


37058063


204111

1


2537



C+


37058063


403111


1


2537


D


37058063


208111


1


2537



B+


Student profile

database

course enrollment

student databases

17

Data preparation a classification model

Stu_code


Sex


Address


Sch_GPA


....
.

GPA

37058063


male


Bangkok


2.5


....
.

2.3


37058167


male


Songkla


3.4


....
.

3.2


..........
.

....

.......

......

....

....

Stu_code


Sub_code


Term


Year


Grade


37058063


204111

1


2537



C+


37058063


403111


1


2537


D


37058063


208111


1


2537



B+


Stu_code


Sex

204111


403111





GPA


37058063


male


Medium


Low


....

2.3


37058167


male


High


High


.....

3.2


.......

....
.

......

.......

.....

......

+

18

Global Classification Model






Global

Decision Tree

which determines which
majors should be appropriate
to which students.


Each internal node represents
a test on student’s profile.


Each leaf node represents an
appropriate major to be
selected


19

Drawbacks of Global Classification
Model


-

Low Precision

~ 50
%

due to the large
number of majors


-

Number of students is different in each
department =>
the model cannot predict
correctly the best major to be selected.


-

The model proposes a unique major to be
selected, a set of possible majors ordered by
appropriateness score would be preferred.


20

Classification Model for Each Major

-

Decision tree
predicts whether a
student is likely to
be a good student
in a given major.

-
Good students are
those that
graduate within 4
years and are at
the first 40%
ranking in a given
major.

-

Leaf nodes
represent two
class: Good and
Bad

21

Advantage of

Major

s Classification Model


Good precision

80%


The model predicts the best major to be
selected even if number of students in each
major is different


Its proposes a set of possible majors to be
selected ordered by appropriateness score.

Encountered problems



Database size



Other factors that could affect student’s decision:



Teacher Preference, etc.

22

Presentation of Discovered
Knowledge

23

Applying Association rule discovery

for Grade prediction













Basket Analysis

204111


Medium

403111


High

417167


Medium

417168


Medium

Education

24

Grade Prediction for the Coming Term

25

Presentation of Discovered Knowledge

26

Conclusion & Future works

Application of data mining in Education

Use data mining techniques for improving
quality of engineering students

Apply data mining techniques to several other
educational domains.