Dr. Svetla Bachvarova

typoweheeΗλεκτρονική - Συσκευές

8 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

89 εμφανίσεις

PC TIME SERIES DATABASE ON CEREALS GROWING IN BULGARIA

Dr.
Svetla Bachvarova

National Center of Agrarian Sciences

Sofia 137
3
, 30 Suhodolska str.

E
-
mail:
s.batchvarova@mzgar.government.bg


Abstract.

The amount of cereal crops production has a strategic importance in Bulgarian
agriculture
. For co
mprehensive

analysis of the factors influenced the ce
reals yield and
for the purpose of adequate decisions making it is necessary to study the real data
organized in time series by means of modern information technologies. The paper
presents the data model for storing and organizing data

on

cereal crops grow
ing in
Bulgaria by regions
. The multiple application of the model can be implemented for the
creation of PC time series database.

Thereby t
he
updating of
database i
s

ensure
d
. It is
important because of the chance to study data in time series in the future.

The
integration of this database with the statistical software will give oppor
tunity for time
series analysis. A computer simulation for application of statistical software for trend
modeling
is presented.

Key words:

data model, statistical software,
sim
ulation, decision making




Introduction

Cereal crops have a strategic importance in global aspect as well as for
conditions in Bulgaria. There exist a lot of factors influenced the size of sown area and
obtained crop production. During the recent 10
-
15 ye
ars there is a reduction in the
cereals growing in Bulgaria. Evidence that proves this tendency is the brief analysis
based on the data from FAOSTAT and EUROSTAT concerning the amount of wheat
production in Bulgaria. Estimated average yield for the three y
ears period 1990
-
1992 is
4410.6 million tons, while for the period 2004
-
2006 is 3584.4 million tons. This drop of
18.7% is worrying, however at the same time reveals that the country possesses
recourses, which have not been used properly during the recent
10 years. For the
thorough analysis of the factors influenced the cereals yield, for the prognoses
elaboration and adequate decisions making it is necessary to study the real data arranged
in the time series and by means of the modern information technolog
ies.

The
Internet based databases
, maintained by international organisations
FAOSTAT
and

EUROSTAT
, are usually very large, but
do not provide detailed
information for
the
countries
. In principle

very large databases are not built for a
specific research an
d
they
have an archival function [
Javier C., 2004
]. The

information
about p
articular

fields for each country is

very limited or

almost absent

[
Brooks P. et al.,
1996
].

Only general data on
the
quantity of cereals and wheat
produc
ed

is presented in
the
Inte
rnet sites of
FAOSTAT and EUROSTAT. FAOSTAT ranges
this data
over the
period 1990
-
2004, while EUROSTAT presents

data for

shorter, but more recent period
c
o
mprises

1995
-
2006

years
.
There are no data
on

agro
-
technical measures (m
achinery,
fertilisers), regio
ns
, employed

etc.

Therefore
the
managers

involved

in the field of
cereals production on
the
national and regional level cannot rely on detailed data on

the

Internet.

PC based databases on branch level

give

the
opportunity
no
t

only
t
o
stor
e

detailed data i
n time series

[
Tegos, G. et al., 2004
]
, but also
provides an
easy used
software techniques for processing
it

[
Onkov

K. et al.
, 2005
].

Massive data referring to
a separate branch
, especially on cereals,
is annually
stor
ed
in

a

database of mainframe
computer
, since the statistical methodology for data collection, evaluation and
publishing is ordinarily carried out yearly.
If the “historical” data extracted from this
type of database is in
ASCII
txt

format
, the
n the

known approaches of Stanford’s
TSIMMIS proje
ct [
Hammer J. et al., 1995
]

and [
Garcia
-
Molina H.

et al., 1995
]
shall be

taken into consideration.

The basic purpose of this paper is to present the data model for storing and
organizing data on cereals growing in Bulgaria in PC
time series
database as wel
l as
an
example for expecting results in case of integrating the database with the statistical
software for trend modeling.


Data model

Data model for storing and organizing data on cereals growing in Bulgaria
is a
relational type. It
has to present preci
se
ly

the
variables and objects as well as their
relationships. Usually the manager on high level in the field of cereal plants production
is interested
in

the yield, price, machinery,
applied

fertilizers and pesticides. He studies

the values of

these varia
bles by different regions and cereals. The
purpose of the
relational data model is also to organize data in time series
(Table 1)
,
that additionally
would
complicate

the data model
.


Table 1

Variables and objects of data model

Variables

Objects

Time seri
es on

Yield

Regi
o
ns



Price

Cereals



Macinery





Fertilisers





Pesticides












The data model is developed
only
for one va
riable


yield
.
In t
his way the
model will be simple and clear. The multiple application of the developed data mo
del
for all
discussed
variables can be done
for the creation of PC database
.

T
hree

data sets are
the
evident:
regions, cereals and time series
for the creation
of data model for one variable
-

yield
(Figure 1).




Figure 1 Data

model for regions, cereals and yield


Three tables are
designed

for application of the data model to
the
database:
Regions, Cereals and Yield. First two tables consist of the lists
with the name of r
egions
and
c
ereals

respectively
. Each row
(record)
of th
e table “Yield” contains
exactly
one
time series.

Each column of this table is used for storing

the value of

the corresponding
year.

This data organization is very important for updating database. In the case of
yearly updating only the appending one numbe
r (new annual value) is needed. This way
the database has open character

that

is
signific
ant because of the chance to study data in
time series in the future.


There are two ways for data access to time series:


Regions



Cereals



Yield (t
ime series
)

(
1)


Cereals



Regions



Yield (time series)

(2)


Only one of these two ways is implemented by using Database Management
System (DBMS) “Access”.
The first one

(1)
is more appropriate because the number of
cereals is smaller than number
of regions.
F
or

th
e case of Bulgaria, the table “Yield”
contains 5 subtabl
es
corresponding to
cereals and each one owns
8 rows
for
regions.
The relations (2) can be done by the software for “Data views”.

Preparation of the following queries
is

important for the managers an
d
scientists:

-

Missing or neglected quantities of yield by regions and by cereals;

-

Percentage of quantities of yield by regions and by cereals;

-

Generalization by regions or by cereals;

-

Forming groups of regions in geographic areas.


Simulation experiment

Th
e developed system on packaging and integration (Onkov K., 2006) has
software instruments to integrate PC time series database with
the
statistical software
.
The software modules in it

g
ive opportunity to apply the proper procedures
-

for
Regions

Cereals

Yield (time series)

calculation basic

statistics of time series (mean value, standard deviation, variance) as
well as for
the
time series analysis


trend modeling and forecasting.

The next simulation experiment shows the results of trend modeling for the
wheat production
for

two Bulgarian r
egions
, named “R
egion
1” and “Region2”
.


The adequacy of the models and the choice of the best model are implemented
automatically by the software.
Figure 2 presents real data for both regions.


0

200

400

600

800

1000

1200

1400

1600

1800

1999

2000

2001

2002

2003

2004

2005

2006

Year

Quantity

/t
h
ousand
tons/

Re
gion 1

Model, R
egion1

Re
gion 2


Figure 2 Quantity of yield from two regions and trend mode
l


Four trend models are studied for each region
:

Linear, Polynomial second and
third degree and exponential.

The second degree polynomial model for region 1 is
adequate
to

the level of significance 0.95. For region 2 no one model is adequate.
The
difficul
ties in finding adequate model are due to the relatively big declining of yield for
year 2003.


Conclusion

The PC time series database uses data model that concerns cereals growing in
Bulgaria by yields and regions. The integration of this database with s
tatistical software
provides data processing in it


computation of basic statistics, trend modeling and
forecasting. Finally, the paper suggests a solution for the creation and updating PC time
series database for cereals growing in Bulgaria that will sup
port decision making of
managers o
n regional and national level.


References:

1.

Brooks P., Ch. Wollenweber, 1996, “Reporting Against Large Databases.

DBMS

Data Warehouse Supplement”,
www.dbmsmag.com/index.s
html

2.

Garcia
-
Molina H., J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and
Jennifer Widom, 1995,
"Integrating and Accessing Heterogeneous Information
Sources in TSIMMIS"
, In
Proceedings of the AAAI Symposium on Information
Gathering
, Stanford, California, pp. 61
-
64.

3.

Javier C., 2004, Seminario ITI, Fast Nearest Neighbours Search in High
-
Dimensional

Large
-
Sized Databases.

4.

Hammer J., H. Garcia
-
Molina, K. Ireland, Y. Papakon
stantinou, J. Ullman, and J.
Widom, 1995,
"Information Translation, Mediation, and Mosaic
-
Based Browsing
in the TSIMMIS System"
, In
Exhibits Program of the Proceedings of the ACM
S
IGMOD International Conference on Management of Data
, San Jose, California,
p. 483

5.

Onkov

K.,
Tegos G., 2005, “An approach to software packaging of PC time series
database relying on statistical data from mainframe computer”, IEEE Journal of
Annual School.
Lectures, Vol. 25, N 2, pp. 3
-
7.

6.

Onkov K., 2006, “System on packaging and integration of PC time series database
and statistical software”, Proceedings of the conference “Objekty 2006”, Prague,
November, pp. 245
-
251
.

7.

Tegos, G.
,

K. Onkov
, 2004, "Ecological
aspects of data series computer analysis
concerning quantities of sea fish catches by principal species and fishing areas"
,

Journal of Environmental Protection and Ecology, vol. 5
,
No 4, pp. 836
-
843.