Real-world HTML Markup usage

tomatogilamonsterInternet and Web Development

Jun 24, 2012 (5 years and 3 months ago)

425 views

Real-world
HTML Markup usage
Viral Ad Network – Jan 2012
Overview
Much has been said about HTML5 and related technologies, and how these specifications

will change the behaviour of the web.
There is, however, little information about how widespread the usage of the new markup

exposed in these standards is.
We analyse a subset of our internal web-crawl data (our dataset consists of hundreds of

thousands of web pages), and analyse how many of these pages use specific tags.
This information is useful for anyone writing scrapers, indexers, or parsers which try to

analyse information content found on websites.
In particular, we look at the proportion of web pages which are found to use one of the

following tags within their markup:
summary
,
ruby
,
canvas
,
video
,
section
,
nav
,
article
,
aside
,
footer
,
header
Note that we only look at the markup served on page load. We did not analyse the effect of

executing any scripts found on these sites.
We find the most commonly used tags in markup are
header
,
footer
, and
aside
.
Results
The relative frequency of tag occurrences is shown below.
Of the tags studied,
footer
and
header
were the tags most commonly found in HTML

documents, followed closely by
aside
.
The
article
and
nav
tags were less common, with an occurrence probability of under ¼ of

that of the
footer
and
header
tags.
Interestingly,
canvas
and
video
were found very infrequently within HTML markup. This

header
footer
aside
article
nav
section
video
canvas
ruby
summary
Relative Frequency of HTML5 tags
may be because usage of these tags commonly involves scripted DOM access, and

elements created by scripts in the page were not included in this analysis.