PC TIME SERIES DATABASE ON CEREALS GROWING IN BULGARIA
Dr.
Svetla Bachvarova
National Center of Agrarian Sciences
Sofia 137
3
, 30 Suhodolska str.
E

mail:
s.batchvarova@mzgar.government.bg
Abstract.
The amount of cereal crops production has a strategic importance in Bulgarian
agriculture
. For co
mprehensive
analysis of the factors influenced the ce
reals yield and
for the purpose of adequate decisions making it is necessary to study the real data
organized in time series by means of modern information technologies. The paper
presents the data model for storing and organizing data
on
cereal crops grow
ing in
Bulgaria by regions
. The multiple application of the model can be implemented for the
creation of PC time series database.
Thereby t
he
updating of
database i
s
ensure
d
. It is
important because of the chance to study data in time series in the future.
The
integration of this database with the statistical software will give oppor
tunity for time
series analysis. A computer simulation for application of statistical software for trend
modeling
is presented.
Key words:
data model, statistical software,
sim
ulation, decision making
Introduction
Cereal crops have a strategic importance in global aspect as well as for
conditions in Bulgaria. There exist a lot of factors influenced the size of sown area and
obtained crop production. During the recent 10

15 ye
ars there is a reduction in the
cereals growing in Bulgaria. Evidence that proves this tendency is the brief analysis
based on the data from FAOSTAT and EUROSTAT concerning the amount of wheat
production in Bulgaria. Estimated average yield for the three y
ears period 1990

1992 is
4410.6 million tons, while for the period 2004

2006 is 3584.4 million tons. This drop of
18.7% is worrying, however at the same time reveals that the country possesses
recourses, which have not been used properly during the recent
10 years. For the
thorough analysis of the factors influenced the cereals yield, for the prognoses
elaboration and adequate decisions making it is necessary to study the real data arranged
in the time series and by means of the modern information technolog
ies.
The
Internet based databases
, maintained by international organisations
FAOSTAT
and
EUROSTAT
, are usually very large, but
do not provide detailed
information for
the
countries
. In principle
very large databases are not built for a
specific research an
d
they
have an archival function [
Javier C., 2004
]. The
information
about p
articular
fields for each country is
very limited or
almost absent
[
Brooks P. et al.,
1996
].
Only general data on
the
quantity of cereals and wheat
produc
ed
is presented in
the
Inte
rnet sites of
FAOSTAT and EUROSTAT. FAOSTAT ranges
this data
over the
period 1990

2004, while EUROSTAT presents
data for
shorter, but more recent period
c
o
mprises
1995

2006
years
.
There are no data
on
agro

technical measures (m
achinery,
fertilisers), regio
ns
, employed
etc.
Therefore
the
managers
involved
in the field of
cereals production on
the
national and regional level cannot rely on detailed data on
the
Internet.
PC based databases on branch level
give
the
opportunity
no
t
only
t
o
stor
e
detailed data i
n time series
[
Tegos, G. et al., 2004
]
, but also
provides an
easy used
software techniques for processing
it
[
Onkov
K. et al.
, 2005
].
Massive data referring to
a separate branch
, especially on cereals,
is annually
stor
ed
in
a
database of mainframe
computer
, since the statistical methodology for data collection, evaluation and
publishing is ordinarily carried out yearly.
If the “historical” data extracted from this
type of database is in
ASCII
txt
format
, the
n the
known approaches of Stanford’s
TSIMMIS proje
ct [
Hammer J. et al., 1995
]
and [
Garcia

Molina H.
et al., 1995
]
shall be
taken into consideration.
The basic purpose of this paper is to present the data model for storing and
organizing data on cereals growing in Bulgaria in PC
time series
database as wel
l as
an
example for expecting results in case of integrating the database with the statistical
software for trend modeling.
Data model
Data model for storing and organizing data on cereals growing in Bulgaria
is a
relational type. It
has to present preci
se
ly
the
variables and objects as well as their
relationships. Usually the manager on high level in the field of cereal plants production
is interested
in
the yield, price, machinery,
applied
fertilizers and pesticides. He studies
the values of
these varia
bles by different regions and cereals. The
purpose of the
relational data model is also to organize data in time series
(Table 1)
,
that additionally
would
complicate
the data model
.
Table 1
Variables and objects of data model
Variables
Objects
Time seri
es on
Yield
Regi
o
ns
Price
Cereals
Macinery
Fertilisers
Pesticides
The data model is developed
only
for one va
riable
–
yield
.
In t
his way the
model will be simple and clear. The multiple application of the developed data mo
del
for all
discussed
variables can be done
for the creation of PC database
.
T
hree
data sets are
the
evident:
regions, cereals and time series
for the creation
of data model for one variable

yield
(Figure 1).
Figure 1 Data
model for regions, cereals and yield
Three tables are
designed
for application of the data model to
the
database:
Regions, Cereals and Yield. First two tables consist of the lists
with the name of r
egions
and
c
ereals
respectively
. Each row
(record)
of th
e table “Yield” contains
exactly
one
time series.
Each column of this table is used for storing
the value of
the corresponding
year.
This data organization is very important for updating database. In the case of
yearly updating only the appending one numbe
r (new annual value) is needed. This way
the database has open character
that
is
signific
ant because of the chance to study data in
time series in the future.
There are two ways for data access to time series:
Regions
Cereals
Yield (t
ime series
)
(
1)
Cereals
Regions
Yield (time series)
(2)
Only one of these two ways is implemented by using Database Management
System (DBMS) “Access”.
The first one
(1)
is more appropriate because the number of
cereals is smaller than number
of regions.
F
or
th
e case of Bulgaria, the table “Yield”
contains 5 subtabl
es
corresponding to
cereals and each one owns
8 rows
for
regions.
The relations (2) can be done by the software for “Data views”.
Preparation of the following queries
is
important for the managers an
d
scientists:

Missing or neglected quantities of yield by regions and by cereals;

Percentage of quantities of yield by regions and by cereals;

Generalization by regions or by cereals;

Forming groups of regions in geographic areas.
Simulation experiment
Th
e developed system on packaging and integration (Onkov K., 2006) has
software instruments to integrate PC time series database with
the
statistical software
.
The software modules in it
g
ive opportunity to apply the proper procedures

for
Regions
Cereals
Yield (time series)
calculation basic
statistics of time series (mean value, standard deviation, variance) as
well as for
the
time series analysis
–
trend modeling and forecasting.
The next simulation experiment shows the results of trend modeling for the
wheat production
for
two Bulgarian r
egions
, named “R
egion
1” and “Region2”
.
The adequacy of the models and the choice of the best model are implemented
automatically by the software.
Figure 2 presents real data for both regions.
0
200
400
600
800
1000
1200
1400
1600
1800
1999
2000
2001
2002
2003
2004
2005
2006
Year
Quantity
/t
h
ousand
tons/
Re
gion 1
Model, R
egion1
Re
gion 2
Figure 2 Quantity of yield from two regions and trend mode
l
Four trend models are studied for each region
:
Linear, Polynomial second and
third degree and exponential.
The second degree polynomial model for region 1 is
adequate
to
the level of significance 0.95. For region 2 no one model is adequate.
The
difficul
ties in finding adequate model are due to the relatively big declining of yield for
year 2003.
Conclusion
The PC time series database uses data model that concerns cereals growing in
Bulgaria by yields and regions. The integration of this database with s
tatistical software
provides data processing in it
–
computation of basic statistics, trend modeling and
forecasting. Finally, the paper suggests a solution for the creation and updating PC time
series database for cereals growing in Bulgaria that will sup
port decision making of
managers o
n regional and national level.
References:
1.
Brooks P., Ch. Wollenweber, 1996, “Reporting Against Large Databases.
DBMS
Data Warehouse Supplement”,
www.dbmsmag.com/index.s
html
2.
Garcia

Molina H., J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and
Jennifer Widom, 1995,
"Integrating and Accessing Heterogeneous Information
Sources in TSIMMIS"
, In
Proceedings of the AAAI Symposium on Information
Gathering
, Stanford, California, pp. 61

64.
3.
Javier C., 2004, Seminario ITI, Fast Nearest Neighbours Search in High

Dimensional
Large

Sized Databases.
4.
Hammer J., H. Garcia

Molina, K. Ireland, Y. Papakon
stantinou, J. Ullman, and J.
Widom, 1995,
"Information Translation, Mediation, and Mosaic

Based Browsing
in the TSIMMIS System"
, In
Exhibits Program of the Proceedings of the ACM
S
IGMOD International Conference on Management of Data
, San Jose, California,
p. 483
5.
Onkov
K.,
Tegos G., 2005, “An approach to software packaging of PC time series
database relying on statistical data from mainframe computer”, IEEE Journal of
Annual School.
Lectures, Vol. 25, N 2, pp. 3

7.
6.
Onkov K., 2006, “System on packaging and integration of PC time series database
and statistical software”, Proceedings of the conference “Objekty 2006”, Prague,
November, pp. 245

251
.
7.
Tegos, G.
,
K. Onkov
, 2004, "Ecological
aspects of data series computer analysis
concerning quantities of sea fish catches by principal species and fishing areas"
,
Journal of Environmental Protection and Ecology, vol. 5
,
No 4, pp. 836

843.
Comments 0
Log in to post a comment