DEVELOPMENT OF STAGE-DISCHARGE RATING CURVE IN RIVER USING GENETIC ALGORITHMS AND MODEL TREE

jinksimaginaryAI and Robotics

Nov 7, 2013 (4 years and 1 month ago)

95 views

International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 1
DEVELOPMENT OF STAGE-DISCHARGE RATING CURVE IN RIVER
USING GENETIC ALGORITHMS AND MODEL TREE
by
Bhola N.S. Ghimire
(1)
and M. Janga Reddy
(2)

(1)
Research Scholar (ghimire@iitb.ac.in)
(2)
Assistant Professor (mjreddy@civil.iitb.ac.in)
Department of Civil Engineering, Indian Institute of Technology, Bombay, India


ABSTRACT
Discharge measurement in rivers is a challenging job for hydraulic engineers. A graph of stage versus discharge or the
line through the data points represents the stagedischarge relationship, also known as rating curve. The stagedischarge
relationship is an approximate method employed for estimating discharge in rivers, streams etc. For various hydrological
applications such as water and sediment budget analysis, operation and control of water resources projects, the accurate
information about flow value in rivers is very important. Stages are easy to measure as compared to the measurement of
discharge in rivers. The stagedischarge relationship at a particular river crosssection, even under conditions of
meticulous observation, it is not necessary unique as rivers are often influenced by several other factors which are
neither always understood, nor easy to quantify. This is due to the fact that in reality, discharge is not a function of stage
alone. Discharge also depends upon longitudinal slope of river, geometry of channel, bed roughness etc. However, the
measurement of these parameters at even and every time step and section is not possible. Hence there is a need to
establish the accurate relationship between stage and discharge. The conventional parametric regression methods usually
fail to model these relationships.
This paper presents the use of genetic algorithms (GA), a search procedure based on the mechanics of natural selection
and natural genetics, and Model Tree (M5), a data driven technique for dealing with continuous class problems, that
provides structural representation of the data and piecewise linear fit of the classes, for river hydrology to establish the
stagedischarge relationship. The results obtained are compared with the other methods such as geneexpression
programming (GEP), multiple linear regressions (MLR) and classical stagedischarge rating curve (RC). To measure the
performance of models, statistical measures such as coefficient of determination and root mean square error are used.
The results obtained from the GA based model as well as MT based model are found to be much better than the other
methods.

Keywords: Genetic algorithms, Model tree, Geneexpression programming, Multiple linear regression, rating curve.

1 INTRODUCTION
Hydraulic Engineers needs the discharge measurement in rivers for various purposes. It is one of the
challenging jobs for them. Discharge is solely depends upon the nature of rainfall in the catchment areas
which is purely stochastic. Due to stochastic nature of discharge, stage varies accordingly. A graph of stage
versus discharge and the line through the data points represents the stagedischarge relationship habitually
called as rating curve. The rating curve is a fundamental technique employed in discharge calculation. For
various hydrological applications such as water resources planning, reservoir operation, sediment handling
as well as hydrologic modelling, the accurate information about discharge and stage are very important.
Stages are measurable at any time but it needs sufficient preparation to measure the discharge which may
not be handy. Hence, to predict the discharge from measured stage, there should be specified relation with
them. The stagedischarge relationship at a particular river crosssection, even under conditions of
meticulous observation, is not necessary unique as rivers are often influenced by factors neither always
understood nor easy to quantify (Sefe, 1996). This is due to the fact that in reality, discharge is not a
function of stage alone. Discharge also depends up on longitudinal slope of river, geometry of channel, bed
roughness etc. However, the measurement of these parameters in every time steps and sections is not
reliable. So it is in the practice that usually discharge is forced to show the dependency with stage. Hence it
is clear that there need to establish the accurate relationship between discharge and stage. The conventional
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 2
parametric regression methods usually fail to model these relationships (Habib and Meselhe, 2006). They
have specified the two distinct approaches for stagedischarge modelling techniques numerical solutions
and data driven technique. They developed stagedischarge relationship for coastal lowgradient streams
using neural networks and nonparametric regression as a second approach. The first approach uses for the
data from accurate boundary condition sites.
Tawfik et al. (1997) introduced an approach based on multilayer artificial neural network (ANN) for
modelling stagedischarge relationship. Same approached was followed by Jain and Chalisgaonker (2000),
Sudheer and Jain (2003) and Bhattacharya and Solomatine (2005). Bhattacharya Solomatine (2005) used
model tree M5 in addition to ANN to show the relation between stage and discharge in rivers. Peterson
Overleir (2006) introduced a methodology based on the Jones formula and nonlinear regression as a solution
to situations where stagedischarge relationship is affected by hysteresis due to unsteady flow. Tyafur and
Singh (2006) used ANN and fuzzy logic tool to model the rainfallrunoff laboratory data. The relationships
for estimating the two coefficients of the stagedischarge equations were obtained and presented after some
experimental runs carried out by using flumes characterised by different values of the contraction ratio
(ranging from 0.17 to 0.81) and of the flume slope ( ranging from 0.5 to 3.5%) (Baiamonte and Ferro, 2007).
Using compound neural network, Jain (2008) developed an integrated relationship between stagedischarge
suspended sediment.
Softcomputing technique like ANN is sufficiently used in water resource engineering whereas GP and GA
is used only by few researchers. Researchers (Savic et al., 1999; Babovic and Keijzer, 2002) have developed
GP model to define the relation between rainfall and runoff in separate places. Dorado et al.(2003), applied
GP and ANN in hydrology for runoff prediction using rainfall in urban areas. Giustolisi (2004) used GP to
determine the Chezy resistance coefficient for full circular corrugated channels. Cheng et al. (2005) used
GA used for calibration of rainfall runoff model developed from fuzzy methods. Rabunal et al.(2007) used
GP and ANN to derive the unit hydrograph for a typical urban basin. Kumar and Reddy (2007) used GA for
optimization of multipurpose reservoir operation. Sivapragasan et al.(2008) demonstrated the storage
discharge relationship adopted for the nonlinear Muskingum model using an evolutionary algorithmbased
modelling approach as GP. While compared the results with particle swarm optimization technique, they
found same optimum values from both techniques. Recently, Aytek and Kisi (2008) used GEP for suspended
sediment modelling and Guven and Aytek (2009) used GEP for stagedischarge modelling in American
rivers.
Similarly, another data driven tool, Model tree (MT) have been used by few researchers in hydrology. MT
gives better accuracy over ANN in the field of water management problems, rainfallrunoff modelling, canal
sedimentation etc. (Solomatine, 2002; Solomatine and Dulal, 2003; Bhattacharya et al., 2005). Reddy and
Ghimire (2009) used model tree successfully on the field of Suspended Sediment Load (SSL) estimation in
American rivers.
The objective of this article is to support the use of soft computing technique, GA and MT in the field of
Water resource engineering especially to show the strong relation between stage and discharge. The model
results are compared with the results obtained from conventional methods like stage rating curve (SRC) and
multilinear regression (MLR) as well as the result predicted from GEP model.
2 MODELLING TECHNIQUES
2.1 Genetic Algorithms (GAs)
Genetic Algorithms (GAs) are a particular class of evolutionary algorithms that use techniques inspired by
evolutionary biology to solve a problem. In other words, GAs are one of the populationbased search
techniques, which works on the concept of “Darwin’s principle: survival of the fittest” (Goldberg, 1989).
The idea in all these evolutionary algorithms is to evolve a population of candidate solutions to a given
problem, using operators inspired by natural genetic variation and natural selection such as inheritance,
mutation, selection, and crossover.
Genetic algorithms (GAs) were invented by John Holland in the 1960s and were developed himself and his
students and colleagues at the University of Michigan (Goldberg, 1989). According to their principle, GA is
a method for moving from one population of "chromosomes" (e.g., strings of ones and zeros, called "bits")
to a new population by using a kind of "natural selection" together with the genetics inspired operators of
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 3
crossover, mutation, and inversion. Each chromosome consists of "genes" (e.g., bits), each gene being an
instance of a particular "allele" (e.g., 0 or 1). The selection operator chooses those chromosomes in the
population that will be allowed to reproduce, and on average the fitter chromosomes produce more offspring
than the less fit ones. Crossover exchanges subparts of two chromosomes, roughly mimicking biological
recombination between two single chromosome organisms; mutation randomly changes the allele values of
some locations in the chromosome; and inversion reverses the order of a contiguous section of the
chromosome, thus rearranging the order in which genes are arrayed. The indepth details about GA can be
found in (Goldberg, 1989).
2.1.1 Elements of GA. In GA, search starts with an initial set of random solutions known as population.
Each chromosome of population is evaluated using some measure of fitness function which represents a
measure of the success of the chromosome. Based on the value of the fitness functions, a set of
chromosomes is selected for breeding. In order to simulate a new generation, genetic operators such as
crossover and mutation are applied. According to the fitness value, parents and offspring are selected, while
rejecting some of them so as to keep the population size constant for new generation. The cycle of
evaluation–selection–reproduction is continued until an optimal or a nearoptimal solution is found. The
fundamental procedural algorithms steps are shown in Figure 1.

Figure 1 – Schematic diagram of genetic algorithms (Tung et al., 2006)

Selection. Selection attempts to apply pressure upon the population in a manner similar to that of natural
selection found in biological systems. Before making it into the next generation’s population, selected
chromosomes may undergo crossover or mutation (depending upon the probability of crossover and
mutation) in which case the offspring chromosome(s) are actually the ones that make it into the next
generation’s population. Poorer performing individuals (evaluated by a fitness function) are weeded out and
better performing, or fitter, individuals have a greater than average chance of promoting the information
they contain to the next generation. Out of several selection methods, tournament selection is applied in this
study. In tournament selection, operator which uses roulette selection N times to produce a tournament
subset of chromosomes. The best chromosome in this subset is then chosen as the selected chromosome.
Crossover. Crossover allows solutions to exchange information in a way similar to that used by a natural
organism undergoing reproduction. In other words, crossover is a genetic operator that combines (mates)
two chromosomes (parents) to produce a new chromosome (offspring). This operator randomly chooses a
locus and exchanges the subsequences before and after that locus between two chromosomes to create two
offspring. The idea behind crossover is that the new chromosome may be better than both of the parents if it
takes the best characteristics from each of the parents. Crossover occurs during evolution according to a
userdefinable crossover probability. For examples, if two parents (chromosomes) A and B having four
Initial Population
Generation

Evaluates fitness of all
individuals in population
Next generation
Crossover
And
mutation

Select individual
For next generation

Termination
Criteria met?

Stop the search

Yes

No

International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 4
genes in each, formed two children (offspring) by exchanging gene at the end of second gene (Figure 2),
then it is said to be single point crossover whereas if it exchanges two points, than it said to be two point
crossover. In this study two point cross over is considered.

Figure 2 – Single point crossover operator
Mutation. Mutation is used to randomly change (flip) the value of single bits within individual
strings to keep the diversity of a population and help a genetic algorithm to get out of a local
optimum. It is typically used sparingly. For example in Figure 3 parent became new child by
mutated gene number two.

Figure 3 – Mutation operator
2.1.2 Fitness function used in GA. To carryout the better estimation of parameters, there are many fitness
functions can be used in GA. For this study, least root mean square error function was taken. The fitness
function is given in Equation (1), where Q
oi
and Q
pi
are observed values in the field and predicted values
from developed GA model respectively. Where n is the total no of observations and F is the function gives
error.
2
1
.( )/
n
oi pi
i
Min F Sqrt Q Q n
=
 
= −
 
 


(1)

2.2 Model Tree (MT)
Model tree is a data driven technique for dealing with continuous class problems, that provides structural
representation of the data and piecewise linear fit of the classes. Model tree is a kind of decision tree, which
has the capability to predict the numeric values with linear regression function at the leaves. Model tree
classifies the data according to their similarity and then fits local regression equations thereby helps to
minimize the error in the model. Quinlan (1992) and Wang and Witten (1997) explained these popular
techniques.
The flow chart of Model Tree M5 (Reddy and Ghimire, 2009) showing fundamental steps is follows to
carryout the processing the data for this study. Initially it splits the parameter space into subspaces. Then it
builds linear regression model to each subspaces. It uses the information theory in splitting the data and
helps to fit on appropriate model. During model formulation each splitting section follows the idea of
decision tree integration of several models. Finally it uses computational intelligence techniques for possible
solutions to each model. The major advantages of model trees over regression trees are: (a) model trees are
much smaller than regression trees, (b) the decision strength is clear and (c) regression functions normally
do not involve many variables. Computational requirements for model trees grow rapidly with
dimensionality. Hundreds of attributes involve in the tasks of computing which helps to give better
formulation. Tree based models will be developed by a divide and defeat method. The standard deviation
reduction (SDR) is the main criteria for model selection which is given by Equation (2).
)(
||
||
)(
i
i
i
Tsd
T
T
TsdSDR

−=
(2)

CrossOver Point
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 5
Where, T represents set of examples that reaches the node; T
i
represents the subset of examples that have the
i
th
outcome of the potential set (i.e. the sets that result from splitting the node according to the chosen
attribute); and sd(.) represents the standard deviation.
Pruning and smoothing. If the generated trees have more than sufficient leaves, the prediction may be ‘too
accurate’ and over fits the existing data which makes a poor generalization. It is possible to make tree
healthier by simplifying it. This merging process of the lower subtrees into one node is called pruning. The
process used to compensate for the sharp discontinuities that will occur between adjacent linear models at
the leaves of the pruned trees is called smoothing. Hence the smoothing is difficult for constructed models
from a small number of training samples.
Advantages of Model trees. Model trees constitute actually a set of local linear models. They may serve as
an alternative to ANNs, are often almost as accurate as ANNs (Solomatine, 2002). It have following
advantages: (a) MT trains much faster than ANN, (b) The results given by Model tree are transparent and
can be easily understood by decision makers, and (c) Sing pruning it is possible to easily generate a range of
MTs as a simple linear regression to a much more accurate but complex combination of local models (many
branches and leaves).
2.3 Multiple linear regression (MLR)
Many engineering and scientific problems are concerned with determining a relationship between a set of
variables. Usually, a single response variable Y (the dependent variable) as a function of a set of
independent variables x
1
, x
2,
x
3…….
x
n
. It can be written as
0332211
.........axaxaxaxaY
nn
+++++=
(3)
Where coefficient ‘a
i
’ is the regression coefficient for i
th
independent variable (x
i
) computed by using least
square methods. When n=1, Equation (3) become a linear regression equation form. Similarly, while n=2,
the function corresponds to a plane in three dimensions and the values of n greater than 2, the function is a
hyper plane of n+1 dimensional plane. If Y
i
is the observed dependent variable and Y
pi
is the predicted value
of dependent variable using Equation (3), then the sum of least square error
2
yi
e

is given by Equation (4).
2
11
2
)(
pi
N
i
i
N
i
yi
YYe −=
∑∑
==

(4)
2.4 Stage-Discharge Rating Curve (RC)
A stagedischarge rating curve (simply: rating curve, RC) is describes a relationship between the water level
(stage) a channel cross section with the rate of discharge at that section. Ideally, a rating curve describes a
unique functional relationship between stage and discharge; therefore, it is obtained as a smooth and
continuous curve with reasonable degree of sensitivity. Unfortunately there cannot be a unique stage
discharge relationship unless the flow is uniform. And due to stochastic nature of rainfall, river flow also not
uniform. Hence ideal relation to show between stage and discharge is not truth and it is only for
approximation (Henderson, 1966).
The sufficient number of measured value of discharges when plotted against the corresponding stages gives
relationship that represents the integrated effect of a wide range of channel and flow parameters. The control
(combined effect of these parameters) is usually categorized as permanent and shifting. In shifting control,
the parameters are not fixed and it changes with time. In the permanent control the parameters are constants
(Subramanya, 2006).
A majority of streams and rivers, especially nonalluvial rivers exhibit permanent control. For this
permanent control case, the relationship between the stage and the discharge is a singlevalued relation
which is expressed as in Equation (5), which is the equation of parabola where Q = discharge in m
3
/s, G =
gauge height (stage) in m, a = a constant which represent the gauge reading corresponding to zero discharge,
β and C are rating curve constants.
( )
Q C G a
β
= − (5)

International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 6
Traditionally, the best values of a, β and C in Equation (5) for a given range of stage are obtained by the
least square error method. For this, by taking logarithms of Equation (5), we can get the Equation (6).
log log( ) log
Q G a C
β
= − +
(6)

or

'
Y X c
β
= +

(7)

Equation (6) is the form of the equation of straight line equivalent to that of Equation (7). Where, the
dependent variable Y = log Q, independent variable X=log (Ga) and c’ = log C. To get the best fit straight
line of n observations of both independent and dependent variables (X and Y), normally regression have to
be done for independent variable on dependent variable. Depending upon the nature of data, often two or
more straight lines may be required to fit the given data. While analyzing the data primarily, it can be
possible to find out the approximate position of the break points for each range of data. The actual break
points may be determined by solving the two equations for Q and G or graphical ways. Sometimes the curve
changes from a parabolic to a complex curve and vice versa, and sometime the constants and exponents vary
through the range (Guven and Aytek, 2009). So it is not easy to find out the values of parameters (a, β and
C) for each case and some times it may completely impossible to get the true values.
Considering this tedious situation, this study is mainly focused to optimize the parameters (a, β and C)
involved in this Equation (5) using GA as well as developed the piece wise linear equations using MT. The
methodology applied for case studies gave sufficiently good results and it is believed that, the developed
methodology will solve the many practical problems related to stagedischarge relations.
3 CASE STUDIES
3.1 Stage – Discharge Data
For the application demonstration of GA and MT, the time series daily data set containing stage and
discharge from two stations in Schuylkill River at Berne (Station no: 01470500, Lat. 40
º
31'21'' and Long.
75
º
59'55'') and Philadelphia (Station no: 01474500, Lat. 39
º
58'04'' and Long. 75
º
11'20''), USA are taken. The
catchments area of Berne station is about 919.45 km
2
and that of Philadelphia station is 4902.85 km
2
. This
information was obtained from (USGS website).
The data from the period October 01, 2000 to September 30, 2006 were taken for both of the stations. Initial
five years data were taken for training purpose and last one year data (October 01, 2005 to September 30,
2006) were used for testing purpose for both the stations. Some of the statistical parameters for these sites
are shown in Table I for training and testing sets. The parameters T, σ, σ/T, C
sx
, X
max
, X
min
are mean,
standard deviation, variance, skewness, maximum and minimum values respectively. The discharge limits
of Berne station are 2.125 to 972.014 m
3
/s

and that of Philadelphia station are 2.239 to 1484.943 m
3
/s.
Similarly, the corresponding stages of these discharges are 1.384, 5.088, 1.686 and 3.463 m respectively.
The developed models are valid for those specified ranges.
Table I – The daily statistical parameters for training and testing data set for two stations at Schuylkill River
Data Set Station
Basin Area
(Km
2
)
Data
Type
T σ σ/T C
sx
X
max
X
min

Training Berne
01470500
Philadelphia
01474500

919.45

4902.85
Stage*
Flow*

Stage

Flow
1.65
21.95
1.96
97.15
0.22
26.72
0.18
111.18

0.13
1.22
0.09
1.14
2.25
5.23
2.04
3.88
3.418
399.574
3.338
1312.07
1.384
2.125
1.686
2.239
Testing Berne
01470500
Philadelphia
01474500
919.45

4902.85
Stage

Flow
Stage
Flow
1.66
24.32
1.98
109.42

0.32
61.88
0.20
147.86

0.19
2.54
0.10
1.35
5.09
11.46

3.39
5.63
5.088
972.014
3.463
1484.943

1.396
2.522
1.774
17.258
*The units of stage and flow are (m) and (m
3
/s) respectively.
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 7
3.2 Development of Models based on conventional methods
StageDischarge Rating curve (RC) and Multiple Linear Regression (MLR) are considered for conventional
methods. The RC also developed into two forms: One in a simple power equation form (RC1: without
considering the stage height corresponding to the zero discharge) and other little complex than the former
(RC2: considering the stage value corresponding to the zero discharge). The developed models for these
methods (RC and MLR) are shown in following Equations (8) to (10) for Berne station and Equations (11)
to (13) for Philadelphia station. During development of complex rating curve (RC2), the stage
corresponding to zero discharge are fixing with the help of scatter plot diagrams for training periods. The
reference stages data taken for the Berne station to fix the stage corresponding to zero discharge are 1.418,
1.628 and 2.064. Similarly, reference stages 1.765, 1.945 and 2.396 were taken to fix the stage
corresponding to zero discharge for the Philadelphia station. The values adopted for stages corresponding to
zero discharges for the stations Berne and Philadelphia are 1.223 and 1.645 m respectively. During
development of MLR models, single independent variable was used for comparing the performance with
other models, so it became simple linear models as shown in Equations (10) and (13).
036.7
441.0 HQ = (8)



9885.1
)223.1(951.93 −= HQ
(9)



356.170069.116

=
HQ
(10)


512.10
055.0 HQ = (11)



841.1
)645.1(039.670 −= HQ (12)



43.1084645.602

=
HQ
(13)

In Equations (8) to (13), Q is discharge in m
3
/s

and H is stage height in m taken above from the reference
datum.
3.3 Development of Models based on Genetic Algorithms (GAs)
The parameters (a, β and C), involved in basic Equation (5) are optimized with GA. Initially, the “training
set” is selected from the whole data and parameters are found. Finally, the relation is used to predict the
discharge values in “testing set”. The predicted values are compared with the measured values with the help
of statistical performance measure tools such as coefficient of determination and root mean square error.

Figure- 4. Fitness convergence of Philadelphia station
A function program has been written in Matlab environment and optimization is done. The population size is
fixed as 200 with uniform creation function. Similarly, tournament selection option having size 4 with rank
scaling is selected during program execution. Mutation function is used as adaptive feasible. Two point
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16 18 20
Generations
Fitness
0
5
10
15
20
25
30
35
40
10 12 14 16 18 20

0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16 18 20
Generations
Fitness
0
5
10
15
20
25
30
35
40
10 12 14 16 18 20

International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 8
crossover and forward migration nature were set in the program. The program was run for five times and the
parameters are recorded for the best fitness value in both cases for Berne and Philadelphia stations. The
sample fitness for training sets for Philadelphia station is shown in Figures 4. Similar observations were
found for Berne station. From Figures 4 it can be noticed that, the function value is reached minimum 19.63
m
3
/s at 14
th
generation in Philadelphia station. Similarly, for Berne station it found 3.67 m
3
/s at 14
th

generation.
Values of the parameters for these fitness values are used for final relations between stage and discharge.
The value of parameters (a, β and C) are: 1.262, 1.765 and 94.848 for Berne station and 1.695, 1.526 and
630 for Philadelphia station. The explicit formulations of GA models for the stations Berne and Philadelphia
are given in Equations (14) and (15) respectively.
765.1
)262.1(848.94 −= HQ (14)



526.1
)695.1(630 −= HQ (15)

3.4 Development of Models based on Model Tree (MT)
MT models are formulated based on the fitness function given in Eqation (4). Minimum instances are taken
as four during formulation. The training and testing sets are used same to that used in GA model
formulation. The logic sets given by the programs are shown in Table II. This logic sets tested the time
series data feeding to the computer and decides the value according to the fitness function.
Table II - Model tree logic sets.






Based on Table II, five linear models were developed for Berne station and seven linear models were
developed for Philadelphia station. The developed linear models are shown in Equations (16) and (17).
LM 1 : Q
t
=54.5153 · H
t
 74.1687
LM 2 : Q
t
= 78.4768 · H
t
 111.6473
LM 3 : Q
t
= 106.0997 · H
t
 158.8781
LM 4 : Q
t
= 142.2248 · H
t
 228.7286
LM 5 : Q
t
= 225.1834 · H
t
 420.9018


(16)


Berne Station (01470500) Philadelpha Station (01474500)
Rules:
If H
t
<= 1.572 [721/1.785%] : Rule 1
elseif H
t
<= 1.691[503/4.826%] : Rule 2
elseif H
t
<= 1.929 [430/7.216%] : Rule 3
elseif H
t
<= 2.247[131/13.753%] : Rule 4
else [41/16.241%] : Rule 5
end
Rules:
If H <= 1.898 [408/2.114%] : Rule1
elseif H <= 1.901 [369/4.163%] : Rule2
elseif H <= 1.984 [417/2.216%] : Rule3
elseif H <= 2.057 [257/1.539%] : Rule4
elseif H <= 2.228 [245/3.151%] : Rule5
elseif H <= 2.467 [93/6.874%] : Rule6
elseif [37/13.366%] : Rule7
end
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 9
LM 1 : Q
t
= 339.3015 · H
t
 590.8252
LM 2 : Q
t
= 23.5523 · H
t
 25.4191
LM 3 : Q
t
= 465.2298 · H
t
 828.8627
LM 4 : Q
t
= 520.3345 · H
t
 938.6954
LM 5 : Q
t
= 628.692 · H
t
 1163.6821
LM 6 : Q
t
= 808.0967 · H
t
 1563.0307
LM 7 : Q
t
= 985.0411 · H
t
 2009.3265



(17)

3.5 Results and discussions
The GEP models presented by Guven and Aytek (2009) for these stations Berne and Philadelphia are shown
in Equations (18) and (19) respectively. In these equations Q is discharge m
3
/s h is stage height measured
from datum in m. In his study, he was shown the usefulness of GEP models over conventional models. Here,
this study tried to compare the results obtained by using those presented models as well as models developed
by the researcher of this paper itself. The performance measures of these models are carried out by
coefficient of determination (R
2
) and root mean square error (RMSE), which are widely used for research
judgment for many areas. Table III shows the corresponding performance values for those models
comparing with other models.
743.27738.4313.10
65.1
−+=

hhQ
(18)



349.8)/715.42(421.54925.42
22
−−+−= hhhhQ (19)


Table III - The R
2
and RMSE values for testing period
Models
Berne 0147050 Philadelphia 01474500
R
2
RMSE R
2
RMSE
RC1 0.78 2142 0.668 1674.8
RC2 0.993 25.7 0.985 43.6
MLR 0.866 31.5 0.941 42.2
GA 0.997 5.9 0.998 5.8
MT 0.970 13.3 0.998 7.3
GEP* 0.942 61.9 0.998 23.1
* Guven and Aytek (2009)

From Table III, it is clearly seen that overall performance from the model developed using GA (R
2
= 0.997
and RMSE = 5.9) and MT (R
2
= 0.970 and RMSE = 13.3) are far better than the conventional models as
well as model proposed by earlier researcher for Berne Station. Similarly, for Philadelphia station the GA
model (R
2
= 0.998 and RMSE = 5.8) and (R
2
= 0.998 and RMSE = 7.3) are better than other conventional
models. The RMSE value given by these models are still better than the GEP model proposed by Guven and
Aytek (2009) for both stations.
4 CONCLUSIONS

In this study, an optimization tool, Genetic Algorithm (GA) and a data driven technique Model Tree (MT)
are used to develop the relation between river stages and discharges. The results obtained from the model
developed by GA and MT compared with the model developed from conventional methods as well as GEP
Model. The results showed that GA models as well as MT model are better than the conventional models.
While comparing between GA model and MT model, GA model is seems to more superior over MT model.
The GA model for the maximum recorded stage height 5.088 m at Berne predicted the discharge value 1013
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 10
m
3
/s which is very near to that of observed value 972 m
3
/s. Similarly, at Philadelphia station, for the
maximum recorded stage 3.463 m, the model predicted value is 1503 m
3
/s whereas the observed value is
1485 m
3
/s. So it can be say, for higher stages GA model gives better result. The proposed methodology is
assumed to be useful for other sites.

Acknowledgments: The authors would like to thank the USGS web site to provide data freely
downloadable.
5 REFERENCES
Aytek, A. and Kisi, O. (2008). A genetic programming approach to suspended sediment modelling. Journal of
Hydrology, 351:288298.
Babovic, V. and Keijzer, M. (2002). Rainfall runoff modelling based on genetic programming. Nordic Hydrology,
33(5), 331346.
Baiamonte, G. and Ferro, V. (2007). Simple flume for flow measurement in sloping open channel. Journal of Irrig.
Drain. Eng., 133(1), 7178.
Bhattacharya, B. and Solomatine, D.P. (2005). Neural networks and M5 model trees in modelling water leveldischarge
relationship. Neurocomputing, 63: 381396.
Bhattacharya, B., Price, R.K. and Solomatine, D.P. (2005). Datadriven modelling in the context of sediment transport.
Journal of Physics and Chemistry of the Earth, 30 (45), 297302.
Cheng, C., Wu, X. and Chau, K.W. (2005). Multiple criteria rainfallrunoff model calibration using a parallel genetic
algorithm in a cluster of computers. J. Hydrol. Sciences, 50(6), 10691087.
Dorado, J., Rabunal, J.R., Pazos, A., Rivero, D., Santos, A. and Puertas, J. (2003). Prediction and modelling of the
rainfallrunoff transformation of a typical urban basin using ANN and GP. Applied Artificial Intelligence, 17(4),
329343.
Giustolisi, O. (2004). Using genetic programming to determine Chezy resistance coefficient in corrugated channels. J.
Hydroinformatics, 6(3), 157173.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning. AddisonWesley, Boston.
Guven, A. and Aytek, A. (2009). New approach for stagedischarge relationship: Geneexpression programming. J.
Hydrologic Eng., 14(8), 812820.
Habib, E.H., and Meselhe, E.A. (2006). Stagedischarge relations for lowdradient tidal streams using datadriven
models. J. Hydraul. Eng., 132(5), 482492.
Henderson, F.M. (1966). Open channel flow. The Macmillan Company, New York.
Jain, S.K. (2008). Development of integrated discharge and sediment rating relation using a compound neural network.
J. Hydrologic Eng., 13(3), 124131.
Jain, S.K. and Chalisgaonkar, D. (2000). Setting up stagedischarge relations using ANN. J. Hydrologic Eng., 5(4),
428433.
Kumar, D.N. and Reddy, M.J. (2007). Multipurpose reservoir operation using Particle Swarm Optimization. J. Water
Res. Plan. Manage., 133(3), 192201.
PetersenØverleir, A. (2006). Modelling stagedischarge relationships affected by hysteresis using the Jones formula and
nonlinear regression. J. Hydrological Sciences, 51(3), 365388.
Quinlan, J.R. (1992). Learning with continuous classes. Proceedings Austrilian Joint Conference on Artificial
Intelligence, 343348. World Scientific, Singapore.
Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit hydrograph of a typical urban
basin using genetic programming and artificial neural networks. Hydrol. Process, 21, 476485.
Reddy, M.J. and Ghimire, B.N.S.(2009). Use of Model tree and Gene expression programmong to predict the suspended
sediment load in rivers. J. Intelligent Systems, 18(3), 211227.
Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to rainfallrunoff modelling.
Water Resource Management, 13: 219231.
Sefe, F.T.K. (1996). A study of the stagedischarge relationship of the Okavango River at Mohembo, Botswana. J.
Hydrological Sciences, 41(1), 97116.
Sivapragasam, C., Maheswaran, R. and Venkatesh, V. (2008). Genetic programming approach for flood routing in
natural channels. Hydrol. Process., 22:623628.
Solomatine, D.P. (2002). Computational intelligence techniques in modelling water systems: some applications. IEEE,
078037278, 2,18531858.
Solomatine, D.P., and Dulal, K.N. (2003). Model tree as an alternative to neural network in rainfallrunoff modeling.
Hydrological Sciences Journal, 48(3), 399411.
Subramanya, K. , (2006). Engineering Hydrology. Tata McGrawHill, New Delhi.
International Workshop ADVANCES IN STATISTICAL HYDROLOGY
May 2325, 2010 Taormina, Italy
Ghimire and Reddy, Development of StageDischarge RC in River using GA and MT 11
Sudheer, K.P. and Jain, S.K. (2003). Radial basis function Neural Network for modelling rating curves. J. Hydrologic
Eng., 8(3), 161164.
Tawfik, M., Ibrahim, A. and Fahmy, H. (1997). Hysterasis sensitive Neural Network for modelling rating curves. J.
Computing Civil Eng., 11(3), 206211.
Tung, Y.K., Yen, B.C. and Melching, C.S. (2006). Hydrosystems engineering reliability assessment and risk analysis.
McGrawHill, New York.
Tyafur, G and Singh, V.P. (2006). ANN and fuzzy logic models for simulating eventbased rainfallrunoff. J. Hydraul.
Eng., 132(12), 13211330.
USGS website, http://www.usgs.gov.
Wang, Y. and Witten, I.H.: Introduction for Model trees for predicting continuous classes. Proc. The European
conference on machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague (1997)
***