Making Cents of Yens and Euros: Web 2.0 ... - Digital Silk Road

bricklayerbelchedInternet και Εφαρμογές Web

5 Φεβ 2013 (πριν από 4 χρόνια και 4 μήνες)

88 εμφανίσεις

© Copyright 2007 Achim Ruopp

Web 2.0 Expo 2007

Making Cents of Yens and
Euros: Web 2.0
Internationalization

Achim Ruopp

achim@digitalsilkroad.net

http://www.digitalsilkroad.net/

Demo

A Currency Converter Application


before and after

Web 2.0 Internationalization

Agenda


Introduction to Web Internationalization (i18n)


Selecting and Persisting User Preferences


Locales and Locale Identifiers


Unicode


Localization


Model and Tools


Multi
-
lingual Syndication


RSS


Atom


Client
-
side Scripting


Javascript Internationalization


Ajax


International Web Services Design


REST


SOAP

Intro to Web Internationalization

Language and Location

en
-
US

fr en;0.8

da
-
DK

Intro to Web Internationalization

User Preferences


Language


HTTP Accept
-
Language header


E.g.:
en, fr
-
CA;0.8, fr;0.6


Language negotiation with the server


Locale


Cultural preferences for formatting, sorting etc.


Infer from Accept
-
Language header


Map IPv4 address to ccTLD (country code top
-
level
domain)


Public information accessible through libraries


E.g. Perl IP::Country CPAN module


Commercial services offer more precision


Always provide option to change defaults


Store preferences in cookies

Intro to Web Internationalization

Internet Language Tags


IETF Language Tags (BCP 47)


Language[
-
Language]*
3

[
-
Script][
-
Region]

[
-
Variant]*[
-
Extension]*[
-
PrivateUse]*


Examples


en
-
CA: English in Canada


Zh
-
Hant
-
TW: Chinese written in traditional
Chinese script used in Taiwan


Obsoletes RFC 3066 & RFC 1766


Often still used in products/earlier standards

Internationalization Changes


Intro to Web Internationalization

POSIX Locales


Cross
-
platform API


Locale
-
identifiers can have variations


Un*x: en_US


Windows: English_United States


Results can be platform
-
dependent


Basis for locale functionality in all scripting
languages


Provides functionality for


Number Formatting: 1,000,000.23


Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ


Sorting


String processing (e.g. upper
-
/lower
-
casing)


Some translated strings like weekdays, yes/no
messages

Intro to Web Internationalization


International Components for Unicode


IBM Open Source project


Extensive locale data and APIs


Data vetted as part of Common Locale
Data Repository (CLDR) project


Java and C++ APIs


Wrappers for scripting languages


PyICU (Python)


ICU4R (Ruby)


abandoned?


DIY


difficult because of API complexity
and character encoding issues

Intro to Web Internationalization

Microsoft Internationalization APIs


Windows NLS API


Microsoft .NET Framework
System.Globalization namespace


Similar set of data to ICU


Data vetted by Microsoft subsidiaries


APIs accessible from all Microsoft
programming languages


Intro to Web Internationalization

Unicode 5.0

00000


10000

20000

30000

E0000

F0000

100000



Basic Multilingual Plane

Dead Languages & Math

Han Characters

Language Tags

Private Use

0000

1000

2000

3000

4000

5000

6000

7000

8000

9000

A000

B000

C000

D000

E000

F000

Alphabets

Punctuation

Asian Languages

Han Characters

Yi

Hangul

Surrogates

Private Use

Legacy/Compatibility

99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined

Intro to Web Internationalization

Unicode Encodings Forms


Variable length: UTF
-
8/UTF
-
16


Fixed length: UTF
-
32


U+2122: ™: Trade Mark Sign

UTF
-
8

0xE2 0x84 0xA2

1110
0010

10
000100

10
100010

UTF
-
16

0x2122

00100001 00100010

UTF
-
32

0x00002122

0…00100001 00100010

* source: Google presentation at IUC30

Intro to Web Internationalization

Unicode on the Web


XML processors are required to process UTF
-
8/UTF
-
16


Encoding declaration precedence

1.
HTTP Content
-
Type header charset declaration

2.
XML encoding declaration (XHTML)

3.
meta charset declaration in (X)HTML

4.
link element charset attribute


Approx. 4% of pages have encoding errors*


No real need for character references


ü: ü or &#252


Exceptions: <,>,&,"


Use styles to control font selection

Demo

A Currency Converter Application


globalized but not localized

Intro to Web Internationalization

Localization Recommendations

Avoid translatable text in
graphics


Make sure graphics are
culturally neutral

Avoid
absolute
sizing

Use
HTML
flow
layout

Write complete sentences

Intro to Web Internationalization

Localization Model and Tools


Text translation


Localization formats


HTML with template library


W3C Internationalization Tag Set (tool support?)


GNU gettext/PO


XLIFF
-

XML Localization Interchange File Format


Localization tools


OmegaT


Open Language Tools (Sun)


The WordForge Project: Pootle





Searchability


Links/Sitemap

Demo

A Currency Converter Application


fully internationalized Web 1.0
application

Client
-
side Scripting

Javascript Internationalization


ECMAScript edition 3 added a range of
internationalization features (1999)


Good support for Unicode processing


Set of locale
-
sensitive functions


Dependent on host locale (i.e. browser)


Set of locale
-
insensitive functions


No number or date/time parsing


Javascript libraries with additional
internationalization functionality


dojo Toolkit (i18n contributed by IBM)


Microsoft AJAX Library

Client
-
side Scripting

AJAX Recommendations


Late globalization


Transmit data in locale
-
independent form with
XMLHttpRequest


Might require some creative parsing/UI


Early localization


Text localization server
-
side


Browsers are missing a message
-
catalog
facility


Dynamically created page content is invisible
to search engines

Multi
-
lingual Syndication

RSS 2.0


Character encoding


RSS 2.0 is an XML application


XML encoding rules apply


Language


Element only on channel (feed), not on item


Create one channel per language


Specified to comply to RFC1766 language tags


Date/Time


In standard RFC 822 format (including 4
-
digit
years)


E.g. “Wed, 02 Oct 2002 08:00:00 EST”


Multi
-
lingual Syndication

Atom Syndication


More granular language marking


xml:lang can be applied to any human
readable text in the format


Aggregators need to deal with this


Better date/time format: RFC 3339


E.g. “2003
-
12
-
13T18:30:02
-
05:00”


Acknowledgement: Tim Bray

Demo

A Currency Converter Application


adding a syndication feed with
exchange rate information

International Web Services Design

Service Patterns

Description

Request data

Return data

Locale Neutral

Neutral data
formats

CAD

1.1785

Client
Influenced

Service reacts
to client
-
locale
e.g. HTTP
Accept
-
Language

CAD

(Accept
-
Language: de)

Kanadischer
Dollar

Service
Determined

Service is
locale
-
specific
and ignores
client
preference

03/08/2007
12:00pm EST

Data Driven

Service adjusts
formatting and
language to
locale the data
refers to

NOK


CHF

norske kroner


?

International Web Services Design

REST


REST naturally ties into i18n features in
HTTP/HTML/XML


Locale indicated with HTTP Accept
-
Language


Encoding and language marking in markup


Special caution for HTTP GET parameters


Locale
-
independent formatting recommended


Text parameters


Encode in UTF
-
8 and escape in URIs


IRI (International Resource Identifier) functionality
might provide this for you

International Web Services Design

SOAP


Locale can be communicated in


Transport header (e.g. HTTP)


SOAP header


SOAP message body


Beware of automatically generated SOAP
interfaces


Might be locale
-
dependent, but not allow to
specify locale


Use of XML Schema data types promotes
locale
-
independence


Also consider localization of error
messages


Conclusions



Unification


One code base


Customization


Localization and adaptation for locales


Next step: cross
-
language “leakage”


Provide views in multiple languages to the
same (user
-
generated) data


Translate user
-
generated content


Volunteers


Machine Translation

Call for Contributions


Presentation and Perl CGI demo code


http://www.digitalsilkroad.net/web2expo


Add a version in your preferred language


Ruby on Rails


PHP


Python





Similar ASP.NET application


http://quickstarts.asp.net/QuickStartv20/aspn
et/doc/localization/default.aspx



References



W3C Internationalization Activity


http://www.w3.org/International/


POSIX Locale


http://www.opengroup.org/onlinepubs/009695399/base
defs/xbd_chap07.html


International Components for Unicode


http://www
-
306.ibm.com/software/globalization/icu/


Unicode/Common Locale Data Repository


http://www.unicode.org/


Microsoft Internationalization APIs


http://msdn2.microsoft.com/en
-
us/library/ms776254.aspx


http://msdn2.microsoft.com/en
-
us/library/system.globalization.aspx

References




OmegaT


http://www.omegat.org/omegat/omegat_en/omegat.html


Open Language Tools


https://open
-
language
-
tools.dev.java.net/


The WordForge Project


http://www.wordforge.org/drupal/


Javascript Internationalization


http://www.icu
-
project.org/docs/papers/internationalization_support_for_javascript.ht
ml


RSS 2.0


http://www.rssboard.org/rss
-
specification


Atom Syndication


http://www.atomenabled.org/developers/syndication


RSS 1.0


http://web.resource.org/rss/1.0/spec


W3C Web Services Internationalization Usage Scenarios


http://www.w3.org/TR/ws
-
i18n
-
scenarios/

Additional Slides

Multi
-
lingual Syndication

RSS 1.0


Character encoding


RSS 1.0 is an XML application


XML encoding rules apply


Complies to RDF (Resource Description
Framework) specification


Definition of language and date/time formats
are left to RDF metadata formats


Dublin Core Metadata Element Set


Language: RFC1766/ISO639
-
2


Date/Time: ISO 8601 (superset of RFC 3339)


Also Dublin Core allows to specify time periods!