International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 1

DEVELOPMENT OF STAGE-DISCHARGE RATING CURVE IN RIVER

USING GENETIC ALGORITHMS AND MODEL TREE

by

Bhola N.S. Ghimire

(1)

and M. Janga Reddy

(2)

(1)

Research Scholar (ghimire@iitb.ac.in)

(2)

Assistant Professor (mjreddy@civil.iitb.ac.in)

Department of Civil Engineering, Indian Institute of Technology, Bombay, India

ABSTRACT

Discharge measurement in rivers is a challenging job for hydraulic engineers. A graph of stage versus discharge or the

line through the data points represents the stagedischarge relationship, also known as rating curve. The stagedischarge

relationship is an approximate method employed for estimating discharge in rivers, streams etc. For various hydrological

applications such as water and sediment budget analysis, operation and control of water resources projects, the accurate

information about flow value in rivers is very important. Stages are easy to measure as compared to the measurement of

discharge in rivers. The stagedischarge relationship at a particular river crosssection, even under conditions of

meticulous observation, it is not necessary unique as rivers are often influenced by several other factors which are

neither always understood, nor easy to quantify. This is due to the fact that in reality, discharge is not a function of stage

alone. Discharge also depends upon longitudinal slope of river, geometry of channel, bed roughness etc. However, the

measurement of these parameters at even and every time step and section is not possible. Hence there is a need to

establish the accurate relationship between stage and discharge. The conventional parametric regression methods usually

fail to model these relationships.

This paper presents the use of genetic algorithms (GA), a search procedure based on the mechanics of natural selection

and natural genetics, and Model Tree (M5), a data driven technique for dealing with continuous class problems, that

provides structural representation of the data and piecewise linear fit of the classes, for river hydrology to establish the

stagedischarge relationship. The results obtained are compared with the other methods such as geneexpression

programming (GEP), multiple linear regressions (MLR) and classical stagedischarge rating curve (RC). To measure the

performance of models, statistical measures such as coefficient of determination and root mean square error are used.

The results obtained from the GA based model as well as MT based model are found to be much better than the other

methods.

Keywords: Genetic algorithms, Model tree, Geneexpression programming, Multiple linear regression, rating curve.

1 INTRODUCTION

Hydraulic Engineers needs the discharge measurement in rivers for various purposes. It is one of the

challenging jobs for them. Discharge is solely depends upon the nature of rainfall in the catchment areas

which is purely stochastic. Due to stochastic nature of discharge, stage varies accordingly. A graph of stage

versus discharge and the line through the data points represents the stagedischarge relationship habitually

called as rating curve. The rating curve is a fundamental technique employed in discharge calculation. For

various hydrological applications such as water resources planning, reservoir operation, sediment handling

as well as hydrologic modelling, the accurate information about discharge and stage are very important.

Stages are measurable at any time but it needs sufficient preparation to measure the discharge which may

not be handy. Hence, to predict the discharge from measured stage, there should be specified relation with

them. The stagedischarge relationship at a particular river crosssection, even under conditions of

meticulous observation, is not necessary unique as rivers are often influenced by factors neither always

understood nor easy to quantify (Sefe, 1996). This is due to the fact that in reality, discharge is not a

function of stage alone. Discharge also depends up on longitudinal slope of river, geometry of channel, bed

roughness etc. However, the measurement of these parameters in every time steps and sections is not

reliable. So it is in the practice that usually discharge is forced to show the dependency with stage. Hence it

is clear that there need to establish the accurate relationship between discharge and stage. The conventional

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 2

parametric regression methods usually fail to model these relationships (Habib and Meselhe, 2006). They

have specified the two distinct approaches for stagedischarge modelling techniques numerical solutions

and data driven technique. They developed stagedischarge relationship for coastal lowgradient streams

using neural networks and nonparametric regression as a second approach. The first approach uses for the

data from accurate boundary condition sites.

Tawfik et al. (1997) introduced an approach based on multilayer artificial neural network (ANN) for

modelling stagedischarge relationship. Same approached was followed by Jain and Chalisgaonker (2000),

Sudheer and Jain (2003) and Bhattacharya and Solomatine (2005). Bhattacharya Solomatine (2005) used

model tree M5 in addition to ANN to show the relation between stage and discharge in rivers. Peterson

Overleir (2006) introduced a methodology based on the Jones formula and nonlinear regression as a solution

to situations where stagedischarge relationship is affected by hysteresis due to unsteady flow. Tyafur and

Singh (2006) used ANN and fuzzy logic tool to model the rainfallrunoff laboratory data. The relationships

for estimating the two coefficients of the stagedischarge equations were obtained and presented after some

experimental runs carried out by using flumes characterised by different values of the contraction ratio

(ranging from 0.17 to 0.81) and of the flume slope ( ranging from 0.5 to 3.5%) (Baiamonte and Ferro, 2007).

Using compound neural network, Jain (2008) developed an integrated relationship between stagedischarge

suspended sediment.

Softcomputing technique like ANN is sufficiently used in water resource engineering whereas GP and GA

is used only by few researchers. Researchers (Savic et al., 1999; Babovic and Keijzer, 2002) have developed

GP model to define the relation between rainfall and runoff in separate places. Dorado et al.(2003), applied

GP and ANN in hydrology for runoff prediction using rainfall in urban areas. Giustolisi (2004) used GP to

determine the Chezy resistance coefficient for full circular corrugated channels. Cheng et al. (2005) used

GA used for calibration of rainfall runoff model developed from fuzzy methods. Rabunal et al.(2007) used

GP and ANN to derive the unit hydrograph for a typical urban basin. Kumar and Reddy (2007) used GA for

optimization of multipurpose reservoir operation. Sivapragasan et al.(2008) demonstrated the storage

discharge relationship adopted for the nonlinear Muskingum model using an evolutionary algorithmbased

modelling approach as GP. While compared the results with particle swarm optimization technique, they

found same optimum values from both techniques. Recently, Aytek and Kisi (2008) used GEP for suspended

sediment modelling and Guven and Aytek (2009) used GEP for stagedischarge modelling in American

rivers.

Similarly, another data driven tool, Model tree (MT) have been used by few researchers in hydrology. MT

gives better accuracy over ANN in the field of water management problems, rainfallrunoff modelling, canal

sedimentation etc. (Solomatine, 2002; Solomatine and Dulal, 2003; Bhattacharya et al., 2005). Reddy and

Ghimire (2009) used model tree successfully on the field of Suspended Sediment Load (SSL) estimation in

American rivers.

The objective of this article is to support the use of soft computing technique, GA and MT in the field of

Water resource engineering especially to show the strong relation between stage and discharge. The model

results are compared with the results obtained from conventional methods like stage rating curve (SRC) and

multilinear regression (MLR) as well as the result predicted from GEP model.

2 MODELLING TECHNIQUES

2.1 Genetic Algorithms (GAs)

Genetic Algorithms (GAs) are a particular class of evolutionary algorithms that use techniques inspired by

evolutionary biology to solve a problem. In other words, GAs are one of the populationbased search

techniques, which works on the concept of “Darwin’s principle: survival of the fittest” (Goldberg, 1989).

The idea in all these evolutionary algorithms is to evolve a population of candidate solutions to a given

problem, using operators inspired by natural genetic variation and natural selection such as inheritance,

mutation, selection, and crossover.

Genetic algorithms (GAs) were invented by John Holland in the 1960s and were developed himself and his

students and colleagues at the University of Michigan (Goldberg, 1989). According to their principle, GA is

a method for moving from one population of "chromosomes" (e.g., strings of ones and zeros, called "bits")

to a new population by using a kind of "natural selection" together with the genetics inspired operators of

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 3

crossover, mutation, and inversion. Each chromosome consists of "genes" (e.g., bits), each gene being an

instance of a particular "allele" (e.g., 0 or 1). The selection operator chooses those chromosomes in the

population that will be allowed to reproduce, and on average the fitter chromosomes produce more offspring

than the less fit ones. Crossover exchanges subparts of two chromosomes, roughly mimicking biological

recombination between two single chromosome organisms; mutation randomly changes the allele values of

some locations in the chromosome; and inversion reverses the order of a contiguous section of the

chromosome, thus rearranging the order in which genes are arrayed. The indepth details about GA can be

found in (Goldberg, 1989).

2.1.1 Elements of GA. In GA, search starts with an initial set of random solutions known as population.

Each chromosome of population is evaluated using some measure of fitness function which represents a

measure of the success of the chromosome. Based on the value of the fitness functions, a set of

chromosomes is selected for breeding. In order to simulate a new generation, genetic operators such as

crossover and mutation are applied. According to the fitness value, parents and offspring are selected, while

rejecting some of them so as to keep the population size constant for new generation. The cycle of

evaluation–selection–reproduction is continued until an optimal or a nearoptimal solution is found. The

fundamental procedural algorithms steps are shown in Figure 1.

Figure 1 – Schematic diagram of genetic algorithms (Tung et al., 2006)

Selection. Selection attempts to apply pressure upon the population in a manner similar to that of natural

selection found in biological systems. Before making it into the next generation’s population, selected

chromosomes may undergo crossover or mutation (depending upon the probability of crossover and

mutation) in which case the offspring chromosome(s) are actually the ones that make it into the next

generation’s population. Poorer performing individuals (evaluated by a fitness function) are weeded out and

better performing, or fitter, individuals have a greater than average chance of promoting the information

they contain to the next generation. Out of several selection methods, tournament selection is applied in this

study. In tournament selection, operator which uses roulette selection N times to produce a tournament

subset of chromosomes. The best chromosome in this subset is then chosen as the selected chromosome.

Crossover. Crossover allows solutions to exchange information in a way similar to that used by a natural

organism undergoing reproduction. In other words, crossover is a genetic operator that combines (mates)

two chromosomes (parents) to produce a new chromosome (offspring). This operator randomly chooses a

locus and exchanges the subsequences before and after that locus between two chromosomes to create two

offspring. The idea behind crossover is that the new chromosome may be better than both of the parents if it

takes the best characteristics from each of the parents. Crossover occurs during evolution according to a

userdefinable crossover probability. For examples, if two parents (chromosomes) A and B having four

Initial Population

Generation

Evaluates fitness of all

individuals in population

Next generation

Crossover

And

mutation

Select individual

For next generation

Termination

Criteria met?

Stop the search

Yes

No

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 4

genes in each, formed two children (offspring) by exchanging gene at the end of second gene (Figure 2),

then it is said to be single point crossover whereas if it exchanges two points, than it said to be two point

crossover. In this study two point cross over is considered.

Figure 2 – Single point crossover operator

Mutation. Mutation is used to randomly change (flip) the value of single bits within individual

strings to keep the diversity of a population and help a genetic algorithm to get out of a local

optimum. It is typically used sparingly. For example in Figure 3 parent became new child by

mutated gene number two.

Figure 3 – Mutation operator

2.1.2 Fitness function used in GA. To carryout the better estimation of parameters, there are many fitness

functions can be used in GA. For this study, least root mean square error function was taken. The fitness

function is given in Equation (1), where Q

oi

and Q

pi

are observed values in the field and predicted values

from developed GA model respectively. Where n is the total no of observations and F is the function gives

error.

2

1

.( )/

n

oi pi

i

Min F Sqrt Q Q n

=

= −

∑

(1)

2.2 Model Tree (MT)

Model tree is a data driven technique for dealing with continuous class problems, that provides structural

representation of the data and piecewise linear fit of the classes. Model tree is a kind of decision tree, which

has the capability to predict the numeric values with linear regression function at the leaves. Model tree

classifies the data according to their similarity and then fits local regression equations thereby helps to

minimize the error in the model. Quinlan (1992) and Wang and Witten (1997) explained these popular

techniques.

The flow chart of Model Tree M5 (Reddy and Ghimire, 2009) showing fundamental steps is follows to

carryout the processing the data for this study. Initially it splits the parameter space into subspaces. Then it

builds linear regression model to each subspaces. It uses the information theory in splitting the data and

helps to fit on appropriate model. During model formulation each splitting section follows the idea of

decision tree integration of several models. Finally it uses computational intelligence techniques for possible

solutions to each model. The major advantages of model trees over regression trees are: (a) model trees are

much smaller than regression trees, (b) the decision strength is clear and (c) regression functions normally

do not involve many variables. Computational requirements for model trees grow rapidly with

dimensionality. Hundreds of attributes involve in the tasks of computing which helps to give better

formulation. Tree based models will be developed by a divide and defeat method. The standard deviation

reduction (SDR) is the main criteria for model selection which is given by Equation (2).

)(

||

||

)(

i

i

i

Tsd

T

T

TsdSDR

∑

−=

(2)

CrossOver Point

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 5

Where, T represents set of examples that reaches the node; T

i

represents the subset of examples that have the

i

th

outcome of the potential set (i.e. the sets that result from splitting the node according to the chosen

attribute); and sd(.) represents the standard deviation.

Pruning and smoothing. If the generated trees have more than sufficient leaves, the prediction may be ‘too

accurate’ and over fits the existing data which makes a poor generalization. It is possible to make tree

healthier by simplifying it. This merging process of the lower subtrees into one node is called pruning. The

process used to compensate for the sharp discontinuities that will occur between adjacent linear models at

the leaves of the pruned trees is called smoothing. Hence the smoothing is difficult for constructed models

from a small number of training samples.

Advantages of Model trees. Model trees constitute actually a set of local linear models. They may serve as

an alternative to ANNs, are often almost as accurate as ANNs (Solomatine, 2002). It have following

advantages: (a) MT trains much faster than ANN, (b) The results given by Model tree are transparent and

can be easily understood by decision makers, and (c) Sing pruning it is possible to easily generate a range of

MTs as a simple linear regression to a much more accurate but complex combination of local models (many

branches and leaves).

2.3 Multiple linear regression (MLR)

Many engineering and scientific problems are concerned with determining a relationship between a set of

variables. Usually, a single response variable Y (the dependent variable) as a function of a set of

independent variables x

1

, x

2,

x

3…….

x

n

. It can be written as

0332211

.........axaxaxaxaY

nn

+++++=

(3)

Where coefficient ‘a

i

’ is the regression coefficient for i

th

independent variable (x

i

) computed by using least

square methods. When n=1, Equation (3) become a linear regression equation form. Similarly, while n=2,

the function corresponds to a plane in three dimensions and the values of n greater than 2, the function is a

hyper plane of n+1 dimensional plane. If Y

i

is the observed dependent variable and Y

pi

is the predicted value

of dependent variable using Equation (3), then the sum of least square error

2

yi

e

is given by Equation (4).

2

11

2

)(

pi

N

i

i

N

i

yi

YYe −=

∑∑

==

(4)

2.4 Stage-Discharge Rating Curve (RC)

A stagedischarge rating curve (simply: rating curve, RC) is describes a relationship between the water level

(stage) a channel cross section with the rate of discharge at that section. Ideally, a rating curve describes a

unique functional relationship between stage and discharge; therefore, it is obtained as a smooth and

continuous curve with reasonable degree of sensitivity. Unfortunately there cannot be a unique stage

discharge relationship unless the flow is uniform. And due to stochastic nature of rainfall, river flow also not

uniform. Hence ideal relation to show between stage and discharge is not truth and it is only for

approximation (Henderson, 1966).

The sufficient number of measured value of discharges when plotted against the corresponding stages gives

relationship that represents the integrated effect of a wide range of channel and flow parameters. The control

(combined effect of these parameters) is usually categorized as permanent and shifting. In shifting control,

the parameters are not fixed and it changes with time. In the permanent control the parameters are constants

(Subramanya, 2006).

A majority of streams and rivers, especially nonalluvial rivers exhibit permanent control. For this

permanent control case, the relationship between the stage and the discharge is a singlevalued relation

which is expressed as in Equation (5), which is the equation of parabola where Q = discharge in m

3

/s, G =

gauge height (stage) in m, a = a constant which represent the gauge reading corresponding to zero discharge,

β and C are rating curve constants.

( )

Q C G a

β

= − (5)

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 6

Traditionally, the best values of a, β and C in Equation (5) for a given range of stage are obtained by the

least square error method. For this, by taking logarithms of Equation (5), we can get the Equation (6).

log log( ) log

Q G a C

β

= − +

(6)

or

'

Y X c

β

= +

(7)

Equation (6) is the form of the equation of straight line equivalent to that of Equation (7). Where, the

dependent variable Y = log Q, independent variable X=log (Ga) and c’ = log C. To get the best fit straight

line of n observations of both independent and dependent variables (X and Y), normally regression have to

be done for independent variable on dependent variable. Depending upon the nature of data, often two or

more straight lines may be required to fit the given data. While analyzing the data primarily, it can be

possible to find out the approximate position of the break points for each range of data. The actual break

points may be determined by solving the two equations for Q and G or graphical ways. Sometimes the curve

changes from a parabolic to a complex curve and vice versa, and sometime the constants and exponents vary

through the range (Guven and Aytek, 2009). So it is not easy to find out the values of parameters (a, β and

C) for each case and some times it may completely impossible to get the true values.

Considering this tedious situation, this study is mainly focused to optimize the parameters (a, β and C)

involved in this Equation (5) using GA as well as developed the piece wise linear equations using MT. The

methodology applied for case studies gave sufficiently good results and it is believed that, the developed

methodology will solve the many practical problems related to stagedischarge relations.

3 CASE STUDIES

3.1 Stage – Discharge Data

For the application demonstration of GA and MT, the time series daily data set containing stage and

discharge from two stations in Schuylkill River at Berne (Station no: 01470500, Lat. 40

º

31'21'' and Long.

75

º

59'55'') and Philadelphia (Station no: 01474500, Lat. 39

º

58'04'' and Long. 75

º

11'20''), USA are taken. The

catchments area of Berne station is about 919.45 km

2

and that of Philadelphia station is 4902.85 km

2

. This

information was obtained from (USGS website).

The data from the period October 01, 2000 to September 30, 2006 were taken for both of the stations. Initial

five years data were taken for training purpose and last one year data (October 01, 2005 to September 30,

2006) were used for testing purpose for both the stations. Some of the statistical parameters for these sites

are shown in Table I for training and testing sets. The parameters T, σ, σ/T, C

sx

, X

max

, X

min

are mean,

standard deviation, variance, skewness, maximum and minimum values respectively. The discharge limits

of Berne station are 2.125 to 972.014 m

3

/s

and that of Philadelphia station are 2.239 to 1484.943 m

3

/s.

Similarly, the corresponding stages of these discharges are 1.384, 5.088, 1.686 and 3.463 m respectively.

The developed models are valid for those specified ranges.

Table I – The daily statistical parameters for training and testing data set for two stations at Schuylkill River

Data Set Station

Basin Area

(Km

2

)

Data

Type

T σ σ/T C

sx

X

max

X

min

Training Berne

01470500

Philadelphia

01474500

919.45

4902.85

Stage*

Flow*

Stage

Flow

1.65

21.95

1.96

97.15

0.22

26.72

0.18

111.18

0.13

1.22

0.09

1.14

2.25

5.23

2.04

3.88

3.418

399.574

3.338

1312.07

1.384

2.125

1.686

2.239

Testing Berne

01470500

Philadelphia

01474500

919.45

4902.85

Stage

Flow

Stage

Flow

1.66

24.32

1.98

109.42

0.32

61.88

0.20

147.86

0.19

2.54

0.10

1.35

5.09

11.46

3.39

5.63

5.088

972.014

3.463

1484.943

1.396

2.522

1.774

17.258

*The units of stage and flow are (m) and (m

3

/s) respectively.

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 7

3.2 Development of Models based on conventional methods

StageDischarge Rating curve (RC) and Multiple Linear Regression (MLR) are considered for conventional

methods. The RC also developed into two forms: One in a simple power equation form (RC1: without

considering the stage height corresponding to the zero discharge) and other little complex than the former

(RC2: considering the stage value corresponding to the zero discharge). The developed models for these

methods (RC and MLR) are shown in following Equations (8) to (10) for Berne station and Equations (11)

to (13) for Philadelphia station. During development of complex rating curve (RC2), the stage

corresponding to zero discharge are fixing with the help of scatter plot diagrams for training periods. The

reference stages data taken for the Berne station to fix the stage corresponding to zero discharge are 1.418,

1.628 and 2.064. Similarly, reference stages 1.765, 1.945 and 2.396 were taken to fix the stage

corresponding to zero discharge for the Philadelphia station. The values adopted for stages corresponding to

zero discharges for the stations Berne and Philadelphia are 1.223 and 1.645 m respectively. During

development of MLR models, single independent variable was used for comparing the performance with

other models, so it became simple linear models as shown in Equations (10) and (13).

036.7

441.0 HQ = (8)

9885.1

)223.1(951.93 −= HQ

(9)

356.170069.116

−

=

HQ

(10)

512.10

055.0 HQ = (11)

841.1

)645.1(039.670 −= HQ (12)

43.1084645.602

−

=

HQ

(13)

In Equations (8) to (13), Q is discharge in m

3

/s

and H is stage height in m taken above from the reference

datum.

3.3 Development of Models based on Genetic Algorithms (GAs)

The parameters (a, β and C), involved in basic Equation (5) are optimized with GA. Initially, the “training

set” is selected from the whole data and parameters are found. Finally, the relation is used to predict the

discharge values in “testing set”. The predicted values are compared with the measured values with the help

of statistical performance measure tools such as coefficient of determination and root mean square error.

Figure- 4. Fitness convergence of Philadelphia station

A function program has been written in Matlab environment and optimization is done. The population size is

fixed as 200 with uniform creation function. Similarly, tournament selection option having size 4 with rank

scaling is selected during program execution. Mutation function is used as adaptive feasible. Two point

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16 18 20

Generations

Fitness

0

5

10

15

20

25

30

35

40

10 12 14 16 18 20

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16 18 20

Generations

Fitness

0

5

10

15

20

25

30

35

40

10 12 14 16 18 20

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 8

crossover and forward migration nature were set in the program. The program was run for five times and the

parameters are recorded for the best fitness value in both cases for Berne and Philadelphia stations. The

sample fitness for training sets for Philadelphia station is shown in Figures 4. Similar observations were

found for Berne station. From Figures 4 it can be noticed that, the function value is reached minimum 19.63

m

3

/s at 14

th

generation in Philadelphia station. Similarly, for Berne station it found 3.67 m

3

/s at 14

th

generation.

Values of the parameters for these fitness values are used for final relations between stage and discharge.

The value of parameters (a, β and C) are: 1.262, 1.765 and 94.848 for Berne station and 1.695, 1.526 and

630 for Philadelphia station. The explicit formulations of GA models for the stations Berne and Philadelphia

are given in Equations (14) and (15) respectively.

765.1

)262.1(848.94 −= HQ (14)

526.1

)695.1(630 −= HQ (15)

3.4 Development of Models based on Model Tree (MT)

MT models are formulated based on the fitness function given in Eqation (4). Minimum instances are taken

as four during formulation. The training and testing sets are used same to that used in GA model

formulation. The logic sets given by the programs are shown in Table II. This logic sets tested the time

series data feeding to the computer and decides the value according to the fitness function.

Table II - Model tree logic sets.

Based on Table II, five linear models were developed for Berne station and seven linear models were

developed for Philadelphia station. The developed linear models are shown in Equations (16) and (17).

LM 1 : Q

t

=54.5153 · H

t

74.1687

LM 2 : Q

t

= 78.4768 · H

t

111.6473

LM 3 : Q

t

= 106.0997 · H

t

158.8781

LM 4 : Q

t

= 142.2248 · H

t

228.7286

LM 5 : Q

t

= 225.1834 · H

t

420.9018

(16)

Berne Station (01470500) Philadelpha Station (01474500)

Rules:

If H

t

<= 1.572 [721/1.785%] : Rule 1

elseif H

t

<= 1.691[503/4.826%] : Rule 2

elseif H

t

<= 1.929 [430/7.216%] : Rule 3

elseif H

t

<= 2.247[131/13.753%] : Rule 4

else [41/16.241%] : Rule 5

end

Rules:

If H <= 1.898 [408/2.114%] : Rule1

elseif H <= 1.901 [369/4.163%] : Rule2

elseif H <= 1.984 [417/2.216%] : Rule3

elseif H <= 2.057 [257/1.539%] : Rule4

elseif H <= 2.228 [245/3.151%] : Rule5

elseif H <= 2.467 [93/6.874%] : Rule6

elseif [37/13.366%] : Rule7

end

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 9

LM 1 : Q

t

= 339.3015 · H

t

590.8252

LM 2 : Q

t

= 23.5523 · H

t

25.4191

LM 3 : Q

t

= 465.2298 · H

t

828.8627

LM 4 : Q

t

= 520.3345 · H

t

938.6954

LM 5 : Q

t

= 628.692 · H

t

1163.6821

LM 6 : Q

t

= 808.0967 · H

t

1563.0307

LM 7 : Q

t

= 985.0411 · H

t

2009.3265

(17)

3.5 Results and discussions

The GEP models presented by Guven and Aytek (2009) for these stations Berne and Philadelphia are shown

in Equations (18) and (19) respectively. In these equations Q is discharge m

3

/s h is stage height measured

from datum in m. In his study, he was shown the usefulness of GEP models over conventional models. Here,

this study tried to compare the results obtained by using those presented models as well as models developed

by the researcher of this paper itself. The performance measures of these models are carried out by

coefficient of determination (R

2

) and root mean square error (RMSE), which are widely used for research

judgment for many areas. Table III shows the corresponding performance values for those models

comparing with other models.

743.27738.4313.10

65.1

−+=

−

hhQ

(18)

349.8)/715.42(421.54925.42

22

−−+−= hhhhQ (19)

Table III - The R

2

and RMSE values for testing period

Models

Berne 0147050 Philadelphia 01474500

R

2

RMSE R

2

RMSE

RC1 0.78 2142 0.668 1674.8

RC2 0.993 25.7 0.985 43.6

MLR 0.866 31.5 0.941 42.2

GA 0.997 5.9 0.998 5.8

MT 0.970 13.3 0.998 7.3

GEP* 0.942 61.9 0.998 23.1

* Guven and Aytek (2009)

From Table III, it is clearly seen that overall performance from the model developed using GA (R

2

= 0.997

and RMSE = 5.9) and MT (R

2

= 0.970 and RMSE = 13.3) are far better than the conventional models as

well as model proposed by earlier researcher for Berne Station. Similarly, for Philadelphia station the GA

model (R

2

= 0.998 and RMSE = 5.8) and (R

2

= 0.998 and RMSE = 7.3) are better than other conventional

models. The RMSE value given by these models are still better than the GEP model proposed by Guven and

Aytek (2009) for both stations.

4 CONCLUSIONS

In this study, an optimization tool, Genetic Algorithm (GA) and a data driven technique Model Tree (MT)

are used to develop the relation between river stages and discharges. The results obtained from the model

developed by GA and MT compared with the model developed from conventional methods as well as GEP

Model. The results showed that GA models as well as MT model are better than the conventional models.

While comparing between GA model and MT model, GA model is seems to more superior over MT model.

The GA model for the maximum recorded stage height 5.088 m at Berne predicted the discharge value 1013

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 10

m

3

/s which is very near to that of observed value 972 m

3

/s. Similarly, at Philadelphia station, for the

maximum recorded stage 3.463 m, the model predicted value is 1503 m

3

/s whereas the observed value is

1485 m

3

/s. So it can be say, for higher stages GA model gives better result. The proposed methodology is

assumed to be useful for other sites.

Acknowledgments: The authors would like to thank the USGS web site to provide data freely

downloadable.

5 REFERENCES

Aytek, A. and Kisi, O. (2008). A genetic programming approach to suspended sediment modelling. Journal of

Hydrology, 351:288298.

Babovic, V. and Keijzer, M. (2002). Rainfall runoff modelling based on genetic programming. Nordic Hydrology,

33(5), 331346.

Baiamonte, G. and Ferro, V. (2007). Simple flume for flow measurement in sloping open channel. Journal of Irrig.

Drain. Eng., 133(1), 7178.

Bhattacharya, B. and Solomatine, D.P. (2005). Neural networks and M5 model trees in modelling water leveldischarge

relationship. Neurocomputing, 63: 381396.

Bhattacharya, B., Price, R.K. and Solomatine, D.P. (2005). Datadriven modelling in the context of sediment transport.

Journal of Physics and Chemistry of the Earth, 30 (45), 297302.

Cheng, C., Wu, X. and Chau, K.W. (2005). Multiple criteria rainfallrunoff model calibration using a parallel genetic

algorithm in a cluster of computers. J. Hydrol. Sciences, 50(6), 10691087.

Dorado, J., Rabunal, J.R., Pazos, A., Rivero, D., Santos, A. and Puertas, J. (2003). Prediction and modelling of the

rainfallrunoff transformation of a typical urban basin using ANN and GP. Applied Artificial Intelligence, 17(4),

329343.

Giustolisi, O. (2004). Using genetic programming to determine Chezy resistance coefficient in corrugated channels. J.

Hydroinformatics, 6(3), 157173.

Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning. AddisonWesley, Boston.

Guven, A. and Aytek, A. (2009). New approach for stagedischarge relationship: Geneexpression programming. J.

Hydrologic Eng., 14(8), 812820.

Habib, E.H., and Meselhe, E.A. (2006). Stagedischarge relations for lowdradient tidal streams using datadriven

models. J. Hydraul. Eng., 132(5), 482492.

Henderson, F.M. (1966). Open channel flow. The Macmillan Company, New York.

Jain, S.K. (2008). Development of integrated discharge and sediment rating relation using a compound neural network.

J. Hydrologic Eng., 13(3), 124131.

Jain, S.K. and Chalisgaonkar, D. (2000). Setting up stagedischarge relations using ANN. J. Hydrologic Eng., 5(4),

428433.

Kumar, D.N. and Reddy, M.J. (2007). Multipurpose reservoir operation using Particle Swarm Optimization. J. Water

Res. Plan. Manage., 133(3), 192201.

PetersenØverleir, A. (2006). Modelling stagedischarge relationships affected by hysteresis using the Jones formula and

nonlinear regression. J. Hydrological Sciences, 51(3), 365388.

Quinlan, J.R. (1992). Learning with continuous classes. Proceedings Austrilian Joint Conference on Artificial

Intelligence, 343348. World Scientific, Singapore.

Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit hydrograph of a typical urban

basin using genetic programming and artificial neural networks. Hydrol. Process, 21, 476485.

Reddy, M.J. and Ghimire, B.N.S.(2009). Use of Model tree and Gene expression programmong to predict the suspended

sediment load in rivers. J. Intelligent Systems, 18(3), 211227.

Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to rainfallrunoff modelling.

Water Resource Management, 13: 219231.

Sefe, F.T.K. (1996). A study of the stagedischarge relationship of the Okavango River at Mohembo, Botswana. J.

Hydrological Sciences, 41(1), 97116.

Sivapragasam, C., Maheswaran, R. and Venkatesh, V. (2008). Genetic programming approach for flood routing in

natural channels. Hydrol. Process., 22:623628.

Solomatine, D.P. (2002). Computational intelligence techniques in modelling water systems: some applications. IEEE,

078037278, 2,18531858.

Solomatine, D.P., and Dulal, K.N. (2003). Model tree as an alternative to neural network in rainfallrunoff modeling.

Hydrological Sciences Journal, 48(3), 399411.

Subramanya, K. , (2006). Engineering Hydrology. Tata McGrawHill, New Delhi.

International Workshop ADVANCES IN STATISTICAL HYDROLOGY

May 2325, 2010 Taormina, Italy

Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 11

Sudheer, K.P. and Jain, S.K. (2003). Radial basis function Neural Network for modelling rating curves. J. Hydrologic

Eng., 8(3), 161164.

Tawfik, M., Ibrahim, A. and Fahmy, H. (1997). Hysterasis sensitive Neural Network for modelling rating curves. J.

Computing Civil Eng., 11(3), 206211.

Tung, Y.K., Yen, B.C. and Melching, C.S. (2006). Hydrosystems engineering reliability assessment and risk analysis.

McGrawHill, New York.

Tyafur, G and Singh, V.P. (2006). ANN and fuzzy logic models for simulating eventbased rainfallrunoff. J. Hydraul.

Eng., 132(12), 13211330.

USGS website, http://www.usgs.gov.

Wang, Y. and Witten, I.H.: Introduction for Model trees for predicting continuous classes. Proc. The European

conference on machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague (1997)

***

## Comments 0

Log in to post a comment