Predicting Water Quality in

ocelotgiantAI and Robotics

Nov 7, 2013 (3 years and 8 months ago)

78 views

Predicting Water Quality in
Northwest Indiana

Team members:


Carl Summers, Zhe Wei Wang,


Brian Hunter, Joseph Robertson


Project Mentor:


Dr. Ruijian Zhang


Purdue University Calumet

Purdue University Calumet
Undergraduate Research

Achievements


Research extended to the IEEE CHC
61

Web
Programming Competition


Received funding through Purdue University
Research Department to pursue See5.0 Web
implementation


Collaborating with Indiana’s Department of
Environmental Management

Outline of Presentation:


Water Quality Prediction


Motivation


Preparing Data


Output of See5 decision tree



Website


Data Graphical Representation


Web Technologies


Flash Professional 8


Cascading Style Sheets


ASP.NET Framework 2.0

Purdue University Calumet
Undergraduate Research

I. Water Quality Prediction


Current mechanistic models require significant
expert input to provide accurate forecasts.



These systems are typically used to predict trends
in water quality over a vast region and long
timelines.



Improving the detail of a mechanistic model may
be too difficult, costly, or time consuming.

Traditional Mechanistic Models

Modeling Methods

Artificial

Intelligence

Data Mining

Bayesian

Statistics

Decision Tree

See5

Traditional

Mechanistic

Models

Implement and compare
Decision Trees, Bayesian
Networks, and the traditional
Mechanistic modeling
techniques.

See5


A Decision Tree Tool


See5 generates a text file containing a rule
-
set,
used for classifying (predicting) each record in a
data
-
set, into a discrete set of pre
-
determined
classifications ({Good, Bad}, {Above, Normal,
Below}, etc.).



Utilizes information gain, from information
theory, to determine which attributes to “split” the
data on.

Data Set


Raw data was sparse



Many attributes were useless



Required extensive work to glean useful
information.



Not classified

Clustering

Unclassified

data from USGS


Clustering

Process


Classified

Data

See5 requires classified input data.

Clustering is composed of two parts:


1)
A function to group together similar points, and ultimately similar
clusters. We refer to these functions as a whole as Joining Methods.

2)
A function to quantify the similarity between points or clusters. These
are referred to as Similarity Metrics.

Attribute 1

Attribute 2

Clustering

Date

Precipitation

Suspended Sediment

Dissolved Oxygen

Flow Rate

Temperature

Classification

12/15/2006

0.34

28

6.8

30

14.9

Good

12/22/2006

0

9

7

35

11.9

Bad

12/29/2006

1.6

10

6.4

46

9.5

Good

1/5/2007

3

10

6.4

52

8.5

Bad

1/12/2007

0.56

11

5.9

31

9.3

Bad

1/19/2007

0

12

8.4

43

10.8

Good

1/26/2007

0.12

20

9.2

25

11.9

Bad

2/2/2007

0

21

9.3

54

9.2

Bad

2/9/2007

0

20

8.4

35

7.9

Good

2/16/2007

0.4

20

6.4

47

8.9

Good

2/23/2007

0

17

6.1

38

9.1

Good

3/2/2007

0.13

17

6.2

29

11.4

Bad

3/9/2007

2.2

17

6.7

50

11.7

Bad

3/16/2007

1.7

15

5.5

50.1

11.9

Good

3/23/2007

0.09

18

5.7

41

12.2

Good

Clustered Data Set

Offset Classification

Date

Precipitation

Suspended

Sediment

Flow

Rate

Temperature

12/15/2006

0.34

28

30

14.9

12/22/2006

0

9

35

11.9

12/29/2006

1.6

10

46

9.5

1/5/2007

3

10

52

8.5

1/12/2007

0.56

11

31

9.3

1/19/2007

0

12

43

10.8

Classification

Good

Bad

Good

Bad

Bad

Good

Decision Tree

Date

Precipitation

Suspended
Sediment

Dissolved
Oxygen

Flow Rate

Temperature

Classification

12/15/2006

0.34

28

6.8

30

14.9

Good

12/22/2006

0

20

7

35

16

Bad

05/23/2007

1.6

10

6.4

46

9.5

???

Purdue University Calumet
Undergraduate Research

II. See5.0 Web Solution

Purdue University Calumet
Undergraduate Research

Objective


Share a visualization of the predictions
generated by See5 with the public.



To provide viewers with a user interface to
easily display descriptive and complex data in
a comprehensive environment.



Purdue University Calumet
Undergraduate Research

Methods


To provide a cross
-
platform interface by conforming
to W3C Standards


Web languages will function through various Web
browsers


Provides consistency to define the appearance of an entire
Web site


Take advantage of Web technologies


No package installation required from the user


Always available (per server uptime)


User interaction


Easy to deploy and manage


Website

Interactive Content Page

Purdue University Calumet
Undergraduate Research

Data Graphical Representation


Applying various languages to supply a fully
scalable application to the user


Flash 8 Professional will provide rich animation
and an elegant user interface


CSS will allow consistency of format throughout
the site


ASP.NET 2.0 allows embedded Flash objects


Returns server
-
side code and code
-
behind files into
plain HTML

Purdue University Calumet
Undergraduate Research

Flash Professional 8


Many users won’t be able to install arbitrary ActiveX
controls or use a Java plug
-
in, whereas Flash is
preinstalled with Windows on corporate machines,
even most Linux distributions come pre
-
packaged
with Flash


Flash can consume raw XML data to draw real
-
time
graphs to easily determine water quality


Advantages of ActionScript 2.0


Object Oriented Programming Language


Permits vector based objects to be manipulated quickly and
easily, on
-
the
-
fly!

Purdue University Calumet
Undergraduate Research

C
ascading
S
tyle
S
heets


Allows the provision of a standardized layout
throughout the site


Modulation


End result with CSS means cleaner code


Provides the user with a consistent interface


Conventional throughout the entire page


CSS allows updating to become an easy task


Modifications on one style sheet can affect some
or all pages, which are linked to that style


Purdue University Calumet
Undergraduate Research

ASP.NET Framework 2.0


Have accessibility to the .NET
Framework 2.0 Class Library


Easy deployment, configuration,
and management with IIS 6
(Windows Server 2003)


XML Metabase Schema provides
quick deployment


Easy to use GUI management utility
(inetmgr)


Quick to update latest security
patches


Security Authentication to lock
out users without proper
credentials to administrate or view
the content of the page

Purdue University Calumet
Undergraduate Research

Summary


Using clustering tools to classify data in
preparation for See5


Using See5 to generate a rule set


Use the rule set to obtain predictions


Ultimately implement and compare other
prediction methods


Provide a public website for the visualization
of the prediction