2.Statistical Information Systems

tongueborborygmusΗλεκτρονική - Συσκευές

7 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

96 εμφανίσεις

10

Guidelines on the Application of New Information Technology to Population Data Dissemination
2. Statistical Information Systems

2.1 Networks: LAN/WAN/Intranet/Internet

From mainframes to
personal computers
Information technology has rapidly developed during the past
decade. Previously mainframe computer systems had dominated
the computer market and represented the primary computing
power of most national statistical offices. Today mainframe
computer systems have diminished considerably in terms of
importance. While the mainframe computer still represents
massive power and storage capacity it has been superseded in
many areas by the greater flexibility, portability and wider range
of powerful software tools available for use with personal
computers.

Networks Personal computers can be connected together to form a network
of virtually any size or capacity and may be located on a single
floor of a building, multiple floors of the same building, across
buildings, cities or even countries. The software used to drive
the computers and manage data can be resident on the file server
at the centre of the network or on the hard disk of the individual
computers. Networked computers enable the storage of data
either centrally or on the hard disks of the individual computers
linked to the network. A file server, a more powerful computer,
controls each network.

LANs and WANs A Local Area Network or LAN is the term used to describe a
situation where two or more computers are linked together
(usually by cable) within the same building. A Wide Area
Network or WAN is the term used to describe a situation where
two or more LANs are linked together (usually by telephone
cable) across buildings, cities or localities. LANs and WANs are
very powerful and connect computing systems sited both locally
and at remote locations, that enable the user to send and receive
data and information, manipulate, cross tabulate, graph and
interrogate databases as well as present data in many meaningful
ways. The key to a successful LAN/WAN is to use a suitably
powerful file server to effectively and efficiently manage
operations and provide the users with appropriate software tools.

Intranet
Broadly speaking the term Intranet is used to describe a situation
where electronic mail, data and other information is sent via a
LAN/WAN operation. The system operates very much like the
Internet but is contained and accessed solely within a single
organisation. An Intranet operation may comprise a number of
databases located on numerous file servers where access may be
restricted to personnel from a specific subject matter area or to
personnel within specific echelons within the organisation.
11

Guidelines on the Application of New Information Technology to Population Data Dissemination
However, the general sending and receiving of electronic mail is
usually unrestricted. A more narrow definition restricts Intranet
to applications using TCP/IP (Internet) protocol for data transfer
within an exclusive network organisation.

Internet The term Internet is used to describe the World Wide Web
(WWW) which is a global network of computing systems. The
Internet is used extensively by millions of people every day for
electronic mail, to obtain information about and to purchase
products and services. The Internet represents the computing
technology of the future. This has been recognised by thousands
of organisations worldwide that have established home pages to
promote themselves and to display their products and services.
The number of connections to the Internet is growing daily and
represents an opportunity to every national statistical office to
reach the widest possible number of potential customers.


2.2 Database Development and Data Warehousing

Database Management
Systems
In a computing environment a database is a means of storing
data in the most efficient manner possible. Databases enable
access to data and updating to be centrally controlled giving
better security and integrity. Eliminating duplicate data and only
storing essential data can reduce data redundancy. Databases
make efficient use of available disk space and can be designed
for the efficient extraction of data using specially developed
software. A database management system (DBMS) is the
software that allows you to create, maintain and report on the
data.

Relational DBMS
There are several database models including the relational,
hierarchical and network models. The relational database model
is commonly used when ad hoc queries need to be made and is
suitable for population data. Data is organised into a series of
relational tables divided into fields and rows. Individual tables
are linked together using a common field or a combination of
fields called the primary key. Referential integrity can be used
to ensure that links between tables are valid for all entries in the
table. Integrity is maintained by checking these links every time
data is inserted, deleted or changed and only allowing changes,
which will maintain these links. A database should be
normalised which is the process of simplifying the database so
that it achieves its optimum structure and eliminates redundant
duplication.

12

Guidelines on the Application of New Information Technology to Population Data Dissemination

Stand-alone databases Databases may operate in a stand-alone environment or
integrated as a data warehouse. Stand-alone databases are easier
and cheaper to establish. They store each year of each data set in
a separate database. This means that to obtain time series data
considerably more data manipulation needs to occur. In this
operating environment, to create a new data set from several
others, the user must extract the data from each database and
subsequently, using another software package such as a
spreadsheet or database, merge the data sets together using a
common variable. However access is generally quicker and less
complex than in a data warehouse situation.

Data warehouses The data warehouse concept does have its own set of
advantages. The data warehouse is generally stored in a single
area and all data sets are available at a single time. This means
that different subject matter data sets may be linked together
with a common variable and time series data can be easily
compiled.


13

Guidelines on the Application of New Information Technology to Population Data Dissemination

2.3 Case Study 1: Integrated Regional Database (IRDB) – Australian
Bureau of Statistics
The Integrated Regional Database (IRDB) was developed by the Australian Bureau of
Statistics to accommodate a growing number of data sets containing small area data such
as regions, localities etc. The need to enable time series data to be compiled from the
original data sets and a flexible presentation tool resulted in the development of a
comprehensive software package. The basic principle behind the development of the
IRDB was to allow a user to tabulate data over time for a specific area, or areas.
The IRDB stores data from a number of different economic social and demographic
surveys and covers a number of different time periods. The user must first select the
variables required, then the areas for which data is required and finally the time period or
periods desired. The software automatically extracts the data from the database and
displays it as a table, graph or map at the user’s choice.
The IRDB is relatively simple to use and does not require extensive training. The system
involves using a mouse and point and click technology. A series of drop down menus
guides the user through the data selection and extraction process.
The biggest trap for unwary IRDB users is the complexity of possible combinations of
available areas, items and time points. The initial concept of the IRDB was to allow a user
no restrictions in selecting any areas, items or time. However this has caused problems for
inexperienced users selecting inappropriate data item, area and time point combinations.
The market research identified this as one of the problems for IRDB users and it has been
targeted as one of the main areas for improvement during redevelopment.
The IRDB is currently being redeveloped and is scheduled for release in June 1999. An
on-demand data and metadata delivery mechanism (the pipeline”), which will greatly
improve the much sought after flexibility and timeliness of data delivery to Australian
Bureau of Statistics clients will support the redeveloped IRDB.
For further information contact: Frank Blanchfield, Director Geographic Technology
Division, Australian Bureau of Statistics, email: frank.blanchfield@abs.gov.au




14

Guidelines on the Application of New Information Technology to Population Data Dissemination

2.4 Case Study 2: Statistical Information System (SISMAC) – Statistics
Bureau of Japan

SISMAC is the name for the Statistical Information System of Management and Co-
ordination Agency in Japan. Development of SISMAC commenced in 1989 and the first
system was installed on an IBM mainframe computer. The second version was started in
1993, and the system used DBase2 for DNMS by on an IBM mainframe computer. Third
version of SISMAC, which is now in service, has been in operation since October 1998.
SISMAC is a statistical database system for providing almost all of the statistical data
produced by the Japan Statistics Bureau and the Statistics Centre to other government
ministries and agencies. SISMAC not only contains population data but also household and
economic data. The SISMAC service is not provided to the general public, therefore it is
not available for use from the internet.

SISMAC is provided to other ministries and governmental agencies using a direct
electronic link known as the Kasumigaseki-WAN or Wide Area Network which connects
most government offices in Japan to the SISMAC database.

SISMAC has numerous functions that the user can use to extract data relevant to their
work and to modify how the data is displayed on screen. The user may search any table
within SISMAC and display the result of the search in a data table on their own personal
computer. The user may also download the searched data as a CSV (Comma Separated
Value) or text format file. The user may also modify the table display and arrange the data
in different styles. For instance, the user can change the position of the rows and columns
and amend the table title and row and column headings.
Users can access the system from their own personal computers within their individual
offices by entering a user identification and password that are given by the SISMAC
manager (Japan Statistics Bureau). The Bureau manages the IDs and passwords for each
user. All passwords are changed on an annual basis to ensure integrity and security of data
access. The Japan Statistics Bureau now manages approximately 1,000 user connections
and their passwords.
The data tables contained with the SISMAC database are maintained by the Japan
Statistics Bureau and are updated on a regular basis to ensure that the latest data is
available quickly to users. An external supplier maintains the hardware.
For further information contact: Koki Toida, Statistics Bureau of Japan, email:
ktoida@stats.go.jp
.