Data Mining

voraciousdrabSoftware and s/w Development

Dec 14, 2013 (3 years and 8 months ago)

92 views

Chapter 4


IT infrastructure:
provides platform for supporting all

information systems in the business


Computer hardware


Computer software


Data management technology

• Organizes, manages


Networking and telecommunications technology


Technology
services




IT Infrastructure Components





Types of Computers:

Computers come in different sizes with varying

capabilities for processing information.


Types of Computers


1
-

Servers:



Type of midrange computer.


• Support computer network, sharing files and resources.


2
-

Mainframes:


• Large
-
capacity, high
-
performance computer that can process


large amounts of data very rapidly.


3
-

Supercomputer:



computer

used complex calculations with thousands of


variables, millions of measurements .


• Used in engineering,


4
-

Grid computing:



Power of geographically remote computers connected into


single network to act as “virtual supercomputer”


5
-

Client/server computing:


• Form of distributed computing


• Splits processing between “clients” and “servers”



Clients:
user point of entry



Servers:
store and process shared data and perform network


management activities


Client/Server Computing




A Multitiered Client/Server Network (N
-
Tier)






Primary secondary storage technologies

-

Magnetic disk:


Hard drives, USB flash drives

-


Optical disks


CD
-
ROM, CD
-
RW, DVD

-


Magnetic tape


Storage networking:
SANs




Input
devices:


Gather data and convert them into electronic form.


• Keyboard

• Computer mouse

• Touch screen

• Digital scanner

• Audio input

• Sensors


Output devices:

Display data after they have been processed.


• Monitor

• Printer

• Audio output


-

Information systems collect and process

information in one of two ways.



Batch processing



Online processing





The emerging mobile digital platform

• Wireless communications through 3G cell networks and Wi
-
Fi.

• New software apps.



Cloud
Computing:


A model of computing in which firms and individuals obtain

computing resources over the Internet


• Cloud infrastructure as a service

• Cloud platform as a service

• Cloud software as a service


Multicore processors:

• Integrated circuit with t
wo or more processors

• Enhanced performance

.


Operating System Software:

The software that manages and controls the

computer’s activities
.


-
PC operating systems and graphical user

Interfaces


GUIs


Windows XP, Windows Vista, and Windows Server 2003


UNIX


Linux:


Open
-
source software



-

Application Software:

Application programming languages for business:

• COBOL

• C, C++

• Visual Basic: Visual programming language



Software packages and desktop

productivity tools
:


Word processing software


Spreadsheet software


Data management software


Presentation graphics


Software suites


Web browsers



Software for the Web: Java, AJAX, and HTML.


Java:

Operating

system
-
independent, processor independent,

AJAX:

Allows a client and server to exchange
data behind

the scenes

Hypertext markup language (HTML):

Creating links to other pages and objects



Web services:

• Software components that exchange information


with one another using universal Web


communication standards and languages




XML

(
extensible markup language)



SOAP
(simple object access protocol)



WSDL
(Web services description language)



UDDI

(universal description, discovery, and integration)



SOA

(Service oriented architecture )



Software Trends :


Open Source Software


Linux,
Apache



Cloud Computing


Google Apps, Office Web Apps





Direct costs:


hardware, software purchase costs
.


Indirect costs:

technical support, training.


Hidden costs:


support staff, downtime.











Chapter 5



Database:

• Collection of
related files containing records on people,


places, or things.

• Prior to digital databases.

Entity:

• Generalized category representing person, place, thing on


which we store and maintain information

• E.g., SUPPLIER, PART

Attributes:

• Specific
characteristics of each entity:

• SUPPLIER name, address

• PART description, unit price, supplier


Relational database:

• Organize data into two
-
dimensional tables (relations) with

columns and rows.

• One table for each entity:

• E.g., (CUSTOMER, SUPPLIER,

PART, SALES)



Fields
(columns) store data representing an attribute.

• Rows store data for separate
records
, or tuples.



Key field:
uniquely identifies each record.



Primary key:

(
رركتيام

)

• One field in each table

• Cannot be duplicated


Provides unique identifier for all information in any row


Relational database tables may have:


One
-
to
-
one relationship


One
-
to
-
many relationship


Many
-
to
-
many relationship



Entity
-
relationship diagram

• Used to clarify table relationships in a
relational database


A Simple Entity
-
Relationship Diagram



DBMS
(
Database Management Systems )

Examples
: Microsoft Access, DB2, Oracle

Database Microsoft SQL Server MySQL



Operations of a Relational DBMS


Select:

• Creates a subset of all records
meeting stated criteria


Join:

• Combines relational tables to present the server with more

information than is available from individual tables


Project:

• Creates a subset consisting of columns in a table

• Permits user to create new tables containing
only desired information


Capabilities of Database Management Systems


Data definition capabilities:


Data dictionary:


Querying and reporting:


Object
-
Oriented DBMS (OODBMS)


Stores data and procedures that act on those data


as

objects to be retrieved and shared


Better suited for storing graphic objects, drawings,


video, than DBMS designed for structuring data


only



Tools for analyzing:


Data warehousing


Multidimensional data analysis


Data mining


Utilizing Web
interfaces to databases



Data warehouse:

• Database that stores current and historical data that may be of


interest to decision makers

• Data can be accessed but not altered



Data mart:

• Subset of data warehouses that is highly focused and isolated



for a specific population of users




Business intelligence:


tools for consolidating,

analyzing
, and providing access to large

amounts of

data to improve decision making

• Data mining


Online Analytical Processing (OLAP)


Supports multidimensional data

analysis,


ways using multiple dimensions



Data Mining

Finds hidden patterns and relationships in large

databases and infers rules from them to predict future


behavior

.



Types of information obtainable from data mining


Associations


Sequences


Classifications


Clusters


Forecasting


Data mining versus privacy concerns

• Used to create detailed data image about each individual




Text Mining

• Unstructured data (mostly text files) accounts for 80

percent

of an organization’s useful information.

• Text mining allows businesses to extract key elements

from, discover patterns in, and summarize large

unstructured data sets.


Web Mining

• Discovery and analysis of useful patterns and

information from the Web

• Content mining structure mining usage mining




Establishing an Information Policy



Information policy


States organization’s rules for organizing, managing, storing,


sharing information


Data administration


Responsible for specific policies and procedures through


which data can be managed as a resource


Database administration


Database design and management group responsible for


defining and organizing the structure and content of the


database and maintaining the database.