2. Data Formats

honorableclunkΛογισμικό & κατασκευή λογ/κού

30 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

126 εμφανίσεις

ITEC 1011

Introduction to Information Technologies

2. Data Formats

Chapt. 3

ITEC 1011

Introduction to Information Technologies

Introduction


Examples

pp. 59.
-
61

Real World

Data

Computer

Data

Input device

Dear Mom:

Keyboard

10110010…




Digital

camera

10110010…

ITEC 1011

Introduction to Information Technologies

Format must be appropriate


The internal representation must be
appropriate for the type of processing to
take place (e.g., text, images, sound)

ITEC 1011

Introduction to Information Technologies

Rules/Conventions


Proprietary

formats


Unique to a product or company


E.g., Microsoft
Word
, Corel
Word Perfect
, IBM
Lotus
Notes


Standards


Evolve two ways:


Proprietary formats become
de facto

standards (e.g., Adobe
PostScript
, Apple
Quick Time
)


Committee is struck to solve a problem (Motion Pictures
Experts Group,
MPEG
)

pp. 61
-
62

ITEC 1011

Introduction to Information Technologies

Standards Organizations


ISO


International Standards Organization


CSA


Canadian Standards Association


ANSI


American National Standards
Institute


IEEE


Institute for Electrical and
Electronics Engineers


Etc.


ITEC 1011

Introduction to Information Technologies

Examples of Standards

Type of Data

Standards

Alphanumeric

ASCII, EBCDIC, Unicode

Image

JPEG, GIF, PCX, TIFF

Motion picture

MPEG
-
2, Quick Time

Sound

Sound Blaster, WAV, AU

Outline graphics/fonts

PostScript, TrueType, PDF

ITEC 1011

Introduction to Information Technologies

Why Standards?


Standard are “arbitrary”


They exist because they are


Convenient


Efficient


Flexible


Appropriate


Etc.

ITEC 1011

Introduction to Information Technologies

Alphanumeric Data


Problem: Distinguishing between the number 123
(one hundred and twenty
-
three) and the characters
“123” (one, two, three)


Four standards for representing letters (alpha) and
numbers


BCD


Binary
-
coded decimal


ASCII


American standard code for information
interchange


EBCDIC


Extended binary
-
coded decimal interchange
code


Unicode

pp. 63
-
69

ITEC 1011

Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats


BCD


ASCII


EBCDIC


Unicode

ITEC 1011

Introduction to Information Technologies

Binary
-
Coded Decimal (BCD)


Four bits per digit

Digit

Bit pattern

0

0000

1

0001

2

0010

3

0011

4

0100

5

0101

6

0110

7

0111

8

1000

9

1001

Note: the following
bit patterns are not
used:


1010


1011


1100


1101


1110


1111

ITEC 1011

Introduction to Information Technologies

Example


7093
10

= ? (in BCD)


7 0 9 3



0111 0000 1001 0011

ITEC 1011

Introduction to Information Technologies

Next 22 slides

Standard Alphanumeric Formats


BCD


ASCII


EBCDIC


Unicode

ITEC 1011

Introduction to Information Technologies

The Problem


Representing text strings, such as



Hello, world”
,

in a computer

ITEC 1011

Introduction to Information Technologies

Codes and Characters


Each character is coded as a byte


Most common coding system is ASCII
(Pronounced
ass
-
key)


ASCII =
A
merican National
S
tandard
C
ode
for
I
nformation
I
nterchange


Defined in ANSI document X3.4
-
1977

ITEC 1011

Introduction to Information Technologies

ASCII Features


7
-
bit code


8
th

bit is unused (or used for a parity bit)


2
7

= 128 codes


Two general types of codes:


95 are “Graphic” codes (displayable on a
console)


33 are “Control” codes (control features of the
console or communications channel)

ITEC 1011

Introduction to Information Technologies

ASCII Chart

ITEC 1011

Introduction to Information Technologies

ITEC 1011

Introduction to Information Technologies

Most significant bit

Least significant bit

ITEC 1011

Introduction to Information Technologies

e.g., ‘a’ = 1100001

ITEC 1011

Introduction to Information Technologies

95 Graphic codes

ITEC 1011

Introduction to Information Technologies

33 Control codes

ITEC 1011

Introduction to Information Technologies

Alphabetic codes

ITEC 1011

Introduction to Information Technologies

Numeric codes

ITEC 1011

Introduction to Information Technologies

Punctuation, etc.

ITEC 1011

Introduction to Information Technologies

“Hello, world” Example


=

=

=

=

=

=

=

=

=

=

=

=

Binary

01001000

01100101

01101100

01101100

01101111

00101100

00100000

01110111

01100111

01110010

01101100

01100100

Hexadecimal

48

65

6C

6C

6F

2C

20

77

67

72

6C

64

Decimal

72

101

108

108

111

44

32

119

103

114

108

100


H

e

l

l

o

,



w

o

r

l

d


=

=

=

=

=

=

=

=

=

=

=

=


=

=

=

=

=

=

=

=

=

=

=

=

ITEC 1011

Introduction to Information Technologies

Common Control Codes


CR


0D

carriage return


LF


0A

line feed


HT


09

horizontal tab


DEL

7F

delete


NULL

00

null

Hexadecimal code

ITEC 1011

Introduction to Information Technologies

ITEC 1011

Introduction to Information Technologies

Terminology


Learn the names of the special symbols


[ ]

brackets


{ }

braces


( )

parentheses


@

commercial ‘at’ sign


&

ampersand


~

tilde

ITEC 1011

Introduction to Information Technologies

ITEC 1011

Introduction to Information Technologies

Escape Sequences


Extend the capability of the ASCII code set


For controlling terminals and formatting output


Defined by ANSI in documents X3.41
-
1974 and
X3.64
-
1977


The escape code is ESC = 1B
16


An escape sequence begins with two codes:



ESC



[

1B
16

5B
16

ITEC 1011

Introduction to Information Technologies

Examples


Erase display:


ESC [ 2 J


Erase line:


ESC [ K

ITEC 1011

Introduction to Information Technologies

Next 1 slides

Standard Alphanumeric Formats


BCD


ASCII


EBCDIC


Unicode

ITEC 1011

Introduction to Information Technologies

EBCDIC


E
xtended
BCD

I
nterchange
C
ode
(pronounced
ebb’
-
se
-
dick
)


8
-
bit code


Developed by IBM


Rarely used today


IBM mainframes only


ITEC 1011

Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats


BCD


ASCII


EBCDIC


Unicode

ITEC 1011

Introduction to Information Technologies

Unicode


16
-
bit standard


Developed by a consortia


Intended to supercede older 7
-

and 8
-
bit
codes

ITEC 1011

Introduction to Information Technologies

Unicode Version 2.1


1998


Improves on version 2.0


Includes the Euro sign (20AC
16

= )


From the standard:

…contains 38,887 distinct coded characters derived
from the supported scripts. These characters cover the
principal written languages of the Americas, Europe,
the Middle East, Africa, India, Asia, and Pacifica.

http://www.unicode.org

ITEC 1011

Introduction to Information Technologies

Keyboard Input


Key (“scan”) codes are converted to ASCII


ASCII code sent to host computer


Received by the host as a “stream” of data


Stored in buffer


Processed


Etc.

pp. 69

ITEC 1011

Introduction to Information Technologies

Shift Key


inhibits bit 5 in the ASCII code


Key(s)

ASCII code

6 5 4 3 2 1 0


Character




1 1 0 0 0 0 1


1 0 0 0 0 0 1


a


A


a

a

Shift

ITEC 1011

Introduction to Information Technologies

Control Key


inhibits bits 5 & 6 in the ASCII code


Key(s)

ASCII code

6 5 4 3 2 1 0


Character




1 1 0 0 0 1 1


0 0 0 0 0 1 1


c


ETX


c

c

Ctrl

Control

code

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

OCR

Hello, world

Page of text

Optical scan

10110110…

Computer file

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

Bar Codes


An automatic identification (Auto ID)
technology that streamlines identification
and data collection


See

http://www.digital.net/barcoder/barcode.html

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

Voice/audio Input


Input device: microphone


Audio input is “digitized” and stored


Processed in two ways


As is (no recognition)


Recognized and converted to alphanumeric data
(ASCII)

Digitize

10110010…

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

Punched Cards


Invented by Herman Hollerith (founder of
IBM)


Each card holds 80 characters

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

Images


Typically images are pictures that are
optically scanned and saved as a “bit map”
or in some other format


Many formats


gif, jpeg, …

ITEC 1011

Introduction to Information Technologies

Typical “Save As” Dialog

ITEC 1011

Introduction to Information Technologies

Objects


Images made of geometrically definable
shapes


Offer efficiency, flexibility, small size, etc.

ITEC 1011

Introduction to Information Technologies

Other Input


OCR


optical character recognition


Bar code readers


Voice/audio input


Punched cards


Images / objects


Pointing devices


pp. 69
-
86

ITEC 1011

Introduction to Information Technologies

Pointing Devices


Originally used for specifying coordinates
(
x, y
) for graphical input


Today used as general purpose device for
“graphical user interfaces” (GUIs)

ITEC 1011

Introduction to Information Technologies

Thank you