Semantic Microformats - Copia

nervousripSecurity

Nov 5, 2013 (3 years and 9 months ago)

61 views

Semantic Microformats


Uche Ogbuji

uche@zepheira.com

About the presenter

XML expert since 1997

RDF/Semantic Web expert since 1999

Web services/SOA expert since 1999

Heavy work with XML and RDF/SemWeb/SOA
best practices

Lead developer of key open specs and OSS, e.g.

Versa RDF query language

4Suite XML/RDF processing toolkit

Amara XML toolkit

About Zepheira, LLC

Firm offering Semantic technologies solutions
bridging technology and business

Team of leaders in next generation Web
technology and business applications

Founded in 2007, already featured in MIT Tech
Review and BusinessWeek

If you see me in the halls later, ask me about
“Zepheira 3D”

Agenda

A brief intro to microformats

The basic concepts

The Good, the bad and the ugly

Bringing Semantic Technology to microformats

Semantic transparency

GRDDL

Leapfrogging microformats altogether

XML or even JSON with Semantic schema

RDF/A

A brief intro to microformats

In their own words...

“Designed for humans first and machines second,
microformats are a set of simple, open data
formats built upon existing and widely adopted
standards. Instead of throwing away what works
today, microformats intend to solve simpler
problems first by adapting to current behaviors and
usage patterns”


microformats.org

Basic idea

Start with a base schema such as XHTML, or Atom

Define specialized variations

Conventions for use of, say, attribute values already defined by the base
schema

Modest set of extensions to the base schema

Embed minor, specialized variations within
established vocabularies

It’s just Jargon

Jargon is not inventing a completely new
language

Start with an existing language

Come up with variations: specialized syntax or specialized
vocabulary (mostly the latter)

Establishing and propagating conventions in
the language variation

Formal terms and definitions (professional glossaries)

Informally shared trends

Elemental microformat
examples

<p>Nice blog. Buy your medz <a href='
http://medz.com
'
rel='nofollow'
>here</a></p>


<div class='blogroll'>


<a href="
http://chimezie
.ogbuji.net
/"
rel="brother

met"
>Chimezie</a>

</div>



<p>We decided not to implement <a
rev="vote
-
against"

href="
http://www.w3.org/TR/xquery/
" title="way
too
complex">XQuery</a>...</
p>

Compound microformat
example

<ol
class='xoxo'
>


<li>Subject 1


<ol>


<li>subpoint a</li>


<li>subpoint b</li>


</ol>


</li>


<li>Subject 2


<ol compact="compact">


<li>subpoint c</li>


<li>subpoint d</li>


</ol>


</li>

</ol>

XOXO is a microformat for “outline”, which
can be anything from “Blogrolls” to
presentations

A seriously mixed bag

The good:

Rough consensus is good, even for minor
syntactic details

The bad:

It’s very easy for microformats to clash,
because of the lack of Semantic grounding (yes, this is
biting people in practice already)

The ugly:

Putting it bluntly, have you ever seen an
format as horrid as XOXO? That’s what you get when
you try to tunnel one set of semantics into another.


(OK granted OPML might be even uglier, but we can create a far more humane
format)

Note: XOXO was inspired by desire to avoid the horrid macro
-
format OPML

The Ugly: Absurd markup
indirection

A condition where the attempt to fit a microformat into
a host leads to unnaturally indirect markup design

<product>


<property>


<name>ID</name>


<value>xyz123</value>


</property>

</product>


<element tagname='description'>


My favorite Weblog

</element>


<dl>


<dt>description</dt>


<dd>My favorite Weblog</dd>

</dl>

<product xml:id='xyz123'>


<name>Widget wonder</name>

</product>





<description>


My favorite Weblog

<description>

What to do about the ugly

Elemental microformats minimize syntactic footprint,
and are thus harmless.

Most compound microformats are hopeless cases. In
Almost all cases its better to Just Use XML.


Who needs hAtom when Atom does the trick, and is thriving?


XBEL is a more humane alternative to XOXO (and even OPML)

BTW “Ugly” is not a negligible problem.
Form is Function.

The bad: Semantic waffling

Microformats have to in effect steal and re
-
purpose
the host format’s Semantics

Some formats make somewhat dodgy use of host
language constructs (e.g. a/@rev in vote
-
links)

Few microformats have even reasonable syntactic
schema, let alone semantic

Note: Schematron is a great schema language for the likes of
microformats, and even for their semantics. Ask me later about
Schematron Abstract Patterns

Microformat profiles

Microformat advocates have taken a stab at alleviating
the “bad” by specifying a “profile” link from the source
document to a description of the microformat.

<?xml version="1.0" encoding="UTF
-
8"?>

<!DOCTYPE html PUBLIC "
-
//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1
-
strict.dtd">

<html xmlns="
http://www.w3.org/1999/xhtml
">


<head
profile="
http://gmpg.org/xfn/11
">



<title>Sample documen
t</title>


</head>

XMDP for microformat
profiles

Microformats descriptions were originally ad
-
hoc Wiki
pages. Profiles movement has defined a super
-
simple
format based on a subset of XHTML.

Called XHTML Meta Data Profile (XMDP)

Example
Web view
of XMDP

What else to do about the bad

XMDP is all about prose description, and is very
non
-
formal

We can do so much better

Bringing Semantic Technology to microformats

Beyond syntax

We know that XML is a useful alphabet, but a library of
XML needs a reference section for true advancement


That’s what we’re all at this conference to talk about, right?

Microformats need a reference section even more
badly, because they often turn simple alphabets into
arcane substitution ciphers

Θe quik βroυν φox jυμπσ oveρ θe λειζy δoγ

Semantic transparency

Semantic transparency is the ability to share context
between systems by accessing resources that can be
automatically located from the syntax.

Microformats seek to share context, but in their basic
form do not offer the tools for context sharing.

Semantic transparency would add value and longevity
to documents using microformats.

Just use RDF?

RDF does provide for semantic transparency, unlike
XML, or microformats. It has very formalized semantic
expressiveness, and healthy support in tools and
standards

We could just advocate standardizing on RDF
representations of such rich data.

Let’s get real

RDF/XML’s syntax is awkward, especially for the sorts
of prose/narrative documents that serve as host to
microformats


N3 and such are better, but hardly likely to inspire enthusiasm with Web authors

RDF is not popular with many in the XML and Web 2.0
world; this “political” battle distracts from getting useful
work done

Practical meeting

Time is finally right for a practical meeting of XML and
RDF technology I’ve been personally advocating for
seven years

Why not use syntax transform technology to extract
Semantic details from whatever conventions we can
identify in XML and microformats?

Enter GRDDL

W3C draft for using transforms such as XSLT
to convert XML to RDF

General conventions for associating XML languages with GRDDL
transforms

Piggy
-
backs off microformats profiles

Rather than XMDP the profile link is to a XHTML page with
embedded RDF

You would add transforms to the base GRRDL
profile for the host language

Pronounced “Griddle”


Short for “Gleaning Resource Descriptions from Dialects of Languages”

Dublin Core/RDF from XHTML

Part of a theoretical script, dc
-
from
-
xhtml.xslt, for
generating Dublin Core in RDF/XML from XHTML
metadata

W3C hosts a very convoluted implementation of this:
dc
-
extract.xsl

...

<xsl:template match='xhtml:title'>


<dc:title><xsl:apply
-
templates/></dc:title>

</xsl:template>

...

XHTML GRDDL profile

grrdl
-
example.xhtml

<html xmlns="
http://www.w3.org/1999/xhtml
">


<head
profile="
http://www.w3.org/2003/g/dat
a
-
view
"
>


<title>Some Document<
/title>


<link
rel="transformation"


href="
http://www.w3.org/2000/06/dc
-
extract/dc
-
extract.xsl
"

/>



<
meta name="DC.Subject"


content="ADAM; Simple

Search; Index+; prototype" />


...


</head>


...

</html>

RDF/XML result

<rdf:RDF


xmlns:dc="
http://purl.org/dc/elements/1.1/
"


xmlns:rdf="
http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#
"


>


<rdf:Description rdf:abo
ut="">


<dc:title>
Some Document
</dc:title>


</rdf:Description>

</rdf:RDF>

Using GRDDL with
microformats

grrdl
-
xfn
-
example.xhtml

<html xmlns="
http://www.w3.org/1999/xhtml
">


<head profile="
http://www.w3.org/2003/g/dat
a
-
view
">


<title>Some Document<
/title>


<link rel="transformation"


href="
http://www.w3.org/2000/06/dc
-
extract/dc
-
extract.xsl
" />


<l
ink
rel="transformation"


href="
http://www.w3
.org/2003/12/rdf
-
in
-
xhtml
-
xslts/grokXFN.xsl
"

/>


</head>


<body>


...


<div class='blogroll'>


<a href="
http://chimezie.ogbuji.net
/"
rel="brother

met"
>Chimezie</a>


</div>


...


</body>

</html>

GRDDL in practice

The main implementation is a Firefox
“GreaseMonkey” script

microcontentextractor.user.js applies GRDDL profiles and builds an
aggregated model for the user

GRDDL does provide a way of increasing the
formality of microformat models, an important
step towards semantic transparency

With GRDDL in place the microformats buzz at
least offers Webmasters an rough and ready
path toward Semantic technology

Leapfrogging microformats altogether

Just use XML (or JSON or...)

Blogroll example in plain XML (compare XOXO)

Blogroll example in plain JSON

<folder>


<title>Technology</title>


<weblog href="
http://weblog.foo
">


<title>Example Weblog</title>


<webfeed href="
http://weblog.foo
/atom
" type="applicati
on/atom+xml"/>


<description>That good ole Weblog</description>


</weblog>

</folder>


[


{"blog": "
http://example.com/bud/
",


"feed": "
http://example.com/bud/
atom
",


"description": "
My buddy's Weblog",


"tags": ["buddy"]


}


]

Not as ugly, but just as bad,
right?

Semantic technology advocates need to do a better job
of building on plain old XML. GRDDL is a good first
step, but more is needed.


This is my primary space of professional interest.


Again: if you see me later, ask me about
Schematron Abstract Patterns

As for JSON. Work is just starting, but let’s not forget the
XML lesson:


Collaborate rather than compete with emerging syntax.

RDF/A

RDF/A is a system for embedding triples right into
documents using ideas similar to microformats.


It eliminates the extra transform step required by
GRDDL for extracting RDF


Note: RDF/A predates both microformats and GRDDL

rel
-
license microformat
example

rel
-
license is a microformat for providing a link
specifically to the license for the source document.

<html xmlns="
http://www.w3.org/1999/xhtml
">


<head>


<title>Some Document</title>


</head>


<body>


...


<p>This document is licensed under a

<a
rel="license"

href="
http://creativecommons.org/l
icenses/by
-
nc/2.5/
">


Creative Commons Non
-
Co
mmercial License

</a>.


</p>


...


</body>

</html>

RDF/A license example

In RDF/A you would repurpose an existing RDF
vocabulary for a license relationship, and just embed
this into the host document

<html xmlns="
http://www.w3.org/1999/xhtml
"


xmlns:cc="
http://creativecommons.org/l
icenses/
">


<head>


<title>Some
Document</title>


</head>


<body>


...


<p>This document is licensed under a

<a
rel="cc:license"

href="
http://creativecommons.org/licenses/by
-
nc/2.5/
">


Creative Comm
ons Non
-
Commercial License

</a>.


</p>


..
.


</body>

</html>

Yes, QNames in content stink, but they’re all over the XML and RDF worlds

Advantage RDF/A?

Qualification of the license relationship provides for
discovery and semantic precision. The namespace can
be treated as a link and dereferenced for discovery
and hence Semantic transparency.


That fixes the “bad”, but RDF/A is no help with the
“ugly”. Stretching RDF to fit an XHTML skeleton gets
grungy in a hurry.

Note: RDF/A is not the same as the Talis “RDF in HTML” system used in the
GRDDL profile document. Yeah, we need to get our act together, folks.

Questions?

Resources

Microformats
home

http://microformats.org

“Thinking XML”
column

http://uche.ogbuji.net/tech/publications/thinkingxml

“Perspective on
XML” column

http://www.adtmag.com/article.asp?id=8515

GRDDL

http://www.w3.org/2001/sw/grddl
-
wg/

“Microformats in
Context”

http://www.xml.com/pub/a/2006/04/26/microformats
-
grddl
-
rdfa
-
nvdl.html

XMDP

http://gmpg.org/xmdp/

More fun links

http://dannyayers.com/archives/2005/08/01/microformats
-
on
-
the
-
grddl/

http://copia.ogbuji.net/blog/2005
-
11
-
16/Does_it_co

...

http://www.oasis
-
open.org/cover/xmlAndSemantics.html

http://theryanking.com/blog/archives/2005/06/10/wordpress
-
kubrick
-
hreview
-
trouble/