Chinese Character Output

concretecakeUrban and Civil

Nov 29, 2013 (3 years and 8 months ago)

76 views

Lecture

9


1

Chinese Character Output


Character
字符
:

abstract object recognized by
human in communication, it is the
representation at the conceptual level. Control
characters in computer internal code is not
considered characters


Glyph
字形
:

character in its concrete form
without regards to thickness, style, size, and
the computer internal representation(bitmap,
outline, etc)


Font

(font set)
字體
/
字型庫
:
specific form of
character with all computer internal
representation attributes


Lecture

9


2


The three levels of representation

Image

圖像

Font

字型

External

Representation

外部表示

GID

(
Glyph ID)

Glyph

字形

Document

Description

Character

字符

Code

Internal
Representation

內部表示

Rendering

Association

Human perception

Lecture

9


3

Lecture

9


4


Lecture

9


5

Glyph Representation: Bitmaps


A matrix of 1s and 0s to represent a character


Typical monitor display a character using a 16 x 16 bitmap











Typical sizes and storage demand are shown


(not double size => quadruple storage)


Data compression(a lot of empty space)


Total Chars
87 x 94
8,178
Type
Size
Storage(est)
Simple
16 x 16
262k
Common
24 x 24
589k
Common
32 x 32
1M
Detailed
64 x 64
4M
Detailed
96 x 96
8M
Detailed
128 x 128
16M
Detailed
256 x 256
64M
Lecture

9


6


Usually store small bitmaps and scale up but there are problems
with the quality of slanted edges






Linear scaling: from Old(x
old
, y
old
) to New(x
new
, y
new
),

where 0 <= x
old
<=
(Width
OLD
-
1),

0 <= y
old
<=
(Height
OLD
-
1)



and 0 <= x
new
<=
(Width
NEW
-
1),

0 <= y
new
<=
(Height
NEW
-
1)



assuming Height and Width values are integers


r
x
= Width
NEW
/Width
OLD
, r
y
=Height
NEW
/Height
OLD


If r
x
>1 and r
y
>1, then it is called scaling up


New(x
new
, y
new
) = New(x * r
x
,


y* r
y
) = Old(

x


,

y


)

Lecture

9


7

Smoothing techniques for scaling


Ad Hoc Techniques
(No underlying model but cheap):


Enlargement (Matrix manipulation)


Thresholding: convert into bitmap (assign 1 if >=
0.4 for unidirectional)


Lecture

9


8


Smoothing spline (
齒形
)
and interpolation
嵌入法
(
costly)


Basis: Character bitmaps are a coarse sample of the
original character


Approach: Recover the curves of the character as
continuous functions (cubic spline) and then interpolate
or generate the bitmaps of another size


Optimization: Minimize the unsmoothing


Lecture

9


9

Bezier Curves


P(t) = (x(t), y(t)): any point

in the curve(0<= t <= 1)


Cubic Bezier: 4 points


end points coincide with curve


other points control shape (can specify gradient at end points)


X(t) =X
0
*(1
-
t)
3

+ 3* X
1
*(1
-
t)
2
*t + 3*X
2
*(1
-
t) *t
2

+ X
3
*t
3


Y(t) =Y
0
*(1
-
t)
3

+ 3* Y
1
*(1
-
t)
2
*t + 3*Y
2
*(1
-
t) *t
2

+ Y
3
*t
3



Lecture

9


10

Glyph Representation: Outline


Characters as shapes enclosed by lines or curves and
specify these by parameters (i.e. data as an ASCII
file and an interpreter to generate the graphic image)


Line specified by 2 points


Curve: (usually cubic Bezier) specified by 4 points


end points coincide with curve


other points control shape

Lecture

9


11


Advantages comparing to bitmaps:


Scaling does not affect quality (Major)


Does not need to store different sized fonts (a
compression of extremely detailed/large fonts)


Compression (as in standard text)


Email transport without encoding and decoding


Example of a Postscript for the Chinese Character

:


Lecture

9


12


Unit of measurements: 1 point = 1/72 of an inch and
the coordinates starts at the bottom left corner and
coordinate translation is needed.


Postscript level 1 font(base font) can handle only up to
256 characters in each set.


It maps 256 code into names of fonts in the set.


Postscript Level 0 fonts: Composite Font


Double byte encoding:


1st byte: index to base font


2nd byte: code in the particular base font

Lecture

9


13


CID
-
keyed fonts(pp 288)


A technique to make character glyph definitions be
independent of codeset.


Each character glyph is given a CID which uniquely
defines a glyph shape.


A CMap is a file which contains mapping of character
encodings with glyphs(CID).


A CIDFont file contains the pointers to the actual
descriptions of the glyphs. A CIDFont file usually keeps
character glyphs with the same style.


Other outline fonts include: TrueType fonts and
OpenType. They different in the data structures/
header forms.


Lecture

9


14

Bitmap
-
to
-
Outline Conversion


Determine outline for all the straight lines


Generate curve list: a curve must begin and end in two
different corner (therefore needs to find corners:
compute an angle between two vector points along the
outline)


Preprocessing for curve
-
fitting: knee removal, smooth
filtering to yield finer co
-
ordinates of sample points.


Perform curve fitting: iterations try to improve fitting
goodness (measured as the least square error)


End point alignment: close end points of two
consecutive splines are merged by averaging their
positions

Lecture

9


15


Lecture

9


16

Getting outline pixels through erosion


Finding the outline of a bitmap is to find the pixel that is
located inside an object, but that has at least one neighbour
outside the object



Basic idea


Find the bitmap with its edge pixels


removed:erosion( a smaller cross)


Original bitmap with the eroded


bitmap removed.

Lecture

9


17


Need more mathematical terms and binary image operation


Translation
:The displacement in either the
x

direction, the
y
direction or both at once. It is the reposition of the co
-
ordinate system.


Suppose B is a binary image,


B
xy

means to move B by the

coordinates(x,y).





(0,0)

origin

(
x,y)

Translated

Lecture

9


18


Erosion of B(a bitmap)
: is

a set of coordinates (x,y)
such that S translated by (x,y), is contained in B.


E = B


S = {(x,y) | Sxy


B}



S
(4 pixels of blacks)
:



Against


and their rotations


Returns all the points in B whose neighbors are not
the bo
a
rder

(edge)

pixels.

Lecture

9


19


Outline pixels:



B
-

(B S)