[WG1: Web and Cloud Applications]

convertingtownΛογισμικό & κατασκευή λογ/κού

4 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

87 εμφανίσεις



Members: Tobias Hoßfeld (UWue)
, Raimund Schatz (FTW)
, Patrick Le Callet (UNantes), Matthias Hirth
, Bruno Gardlo (UZil)

This document describes the Qualinet WG

Task Force

on “

The goal is

to identify the scientific challenges and problems
for QoE assessment via crowdsourcing

also the

and benefits

to derive a methodology and setup for crowdsourcing in QoE assessment,

to challenge crowdsourcing QoE

approach with usual “lab”


develop mechanisms and statistical approaches for identifying reliable ratings from
remote crowdsourcing users,

to define requirements onto crowdsourcing platforms for improved QoE assessment.

As a result,

a common roadmap for th
is task force is envisioned

and joint activities among the
members of Qualinet

to be stimulated.

Motivation and
roblem statement

Subjective user
studies are typically carried out by a test panel of real users in a laboratory
ironment. While many and possibly even diverging views on the quality of the media
consumption can be taken into account

entailing accurate results and a good understanding of
the QoE and its sensitivity

based user studies

can be time
consuming and

costly, since the
tests have to be conducted by a large number of users for statistically relevant results. Costs and
time demands further increase if the design and the execution of the tests as well as the analysis
of the user ratings are performed in a
n iterative way. This means that the QoE model


developed through repeated cycles of design, implementation, and statistical analysis of the
tests. This iterative approach is unavoidable when touching new QoE aspects
, e.g. for 3D video or
cloud applica

also discuss panel limitations wit



seems to be an appropriate alternative approach. Crowdsourcing means to
outsource a job (like video quality testing) to a large, anonymous crowd of users in the form of an
open call. Crowdsourcing platforms in the Internet, like Amazon Mechanical Turk or
Microworkers, offer access to a large number of internationally widespread users in the Internet
and distribute the work submitted by an employer among the users. With crowdsourcing,
subjective user studies can be efficiently conducted at low costs with ad
equate user numbers for
getting statistically significant QoE scores. In addition, the desktop
PC based setting of
crowdsourcing provides a highly realistic setting for scenarios like online video

streaming or
other Internet applications
. However,
ity of results

cannot be assumed due to the
anonymity and remoteness of participants. Some subjects may submit incorrect results in order
to maximize their income by completing as many jobs as possible; others just may not work
correctly due to lack of sup
ervision. Therefore, it is necessary to develop an appropriate
methodology that addresses these issues and ensures consistent behavior of the test subjects
throughout a test session and thus obtain reliable QoE results.

Furthermore, due to the
remoteness o
f the participants it is necessary to monitor the users

environment, e.g. viewing
distance of users, since the environment cannot be set up equally for all users.

We could leverage also some issues on how to trigger and sustain sufficient interest for th

Social networks
can also be seen as crowdsourcing platform, since the crowdsourcing platform is
often used
only for acquiring the users while the test user survey is implemented on
a different

web server

(belonging to the researcher issuing th
e user survey). Hence
, the same user survey
can be done with social network users.

The same problems as with crowdsourcing occur, since
reliability of user ratings is not guaranteed.

Which other problems do occur? Please feel free to add/modify/comment.

y applications for crowdsourcing based QoE assessment

Crowdsourcing platforms are typically accessible via a web browser and the
crowdsourcing jobs (like a user study) are also running in the web browser. T
he desktop
based setting of
crowdsourcing provides a highly realistic setting for scenarios like online video

streaming or other Internet applications. Thus, key applications are all kind of applications
running in a web browser


Web browsing


Online video streaming


Cloud application
like cloud gaming



Not possible:
The applications must be feasible in an Internet setting. For example, QoE
assessment of 3D HD video streaming via crowdsourcing is currently not feasible, since the
available Internet access bandwidth is not

In addition, special hardware may be
required for 3D television, which is not available to the end users. In general, crowdsourcin
g tests
can hardly be conducted
, if special hardware is needed. These test still require
a laboratory

Development of a
specific video player suitable for crowdsourcing tests

to pre upload all the

QoE of crowdsourcing platform

itself is not investigated so far.

Please feel free to add/modify/comment.

Required methodology

The overall goal is to define a
‘Certified’ (=agreed) methodology for QoE Assessment based on
. This includes test design methods, monitoring of tests and environments,
statistical measures for reliability analysis.


design methods

are imp
ortant to be included in crowdsourcing studies. These methods
allow a filtering of the users.

These task design methods are illustrated for YouTube QoE tests.


Gold Standard Data.

The most common mechanism to detect unreliable workers and to
estimate the qu
ality of the results is to use questions whereof the correct results are
already known. These gold standard questions are interspersed among the normal tasks
the worker has to process. After results submission by the worker, the answers are
compared to gol
d standard data. If the worker did not process the gold standard
questions correctly, the non
gold standard results should be assumed to be incorrect
For example,
the worker affirmed the question
“Did you notice any pauses in the
video?”, but the vide
o is played without any interruptions.


Consistency Tests.
In this approach, the worker is asked the same question multiple times
in a slightly different manner. For example, at the beginning of the survey the worker is
asked how often she visits the YouTub
e web page, at the end of the survey she is asked
how often she watches videos on YouTube. The answers can slightly differ but should lie
within the same order of magnitude. Another example is to ask the user about his origin
country in the beginning and a
bout his origin continent at the end. The ratings of the
participant are disregarded, if not all answers of the test questions are consistent. An
unresolved problem concerns subjects that are not willing to provide correct personal
data and that provide in
consistent data. In that case, the user ratings are rejected,
although they could provide valid quality ratings.


Content Questions.
After watching a video, the users were asked to answer simple
questions about the video clip. For example, ``Which sport was

shown in the clip? A)
Tennis. B) Soccer. C) Skiing.'' or ``The scene was from the TV series... A) Star Trek
Enterprise. B) Sex and the City. C) The Simpsons.'' Only correct answers allow the user's
ratings to be considered in the QoE analysis.





. Monitoring users during the tasks completion can
also be used to detect cheating workers. The most common approach here is measuring
the time the worker spends on the task. If the worker completes a task very quickly, this
might ind
icate that she did the work sloppy. However, it has to be noted that the
reaction times of different subjects may differ significantly depending on the actual

A more robust method is to monitor browser events in order to measure the
focus time, wh
ich is the time interval during which the browser focus is on the website
belonging to the user test.

Monitoring and downloading

of test

Since the tests are conducted remotely and the users have
to download the contents of the user test (within the web b
rowser), two options are possible.


In order to avoid effects of the Internet transmission on the QoE test, the user survey is
downloaded completely (e.g. the video clips to be evaluated) before the test starts.


The user test has to be monitored on network
and application layer, in order to take into
account additional impairments from the network transmission on application layer (e.g.
nsufficient bandwidth leads to artifacts or pauses in the video).

Statistical measures

to identify unreliable user ratings



and intra
rater reliability
, e.g. based on

Spearman’s rank correlation coefficient as
a non
parametric measure of statistical dependence between user rating and


Confidence intervals are often misused as measure for reliability of s
ubjective tests


SOS hypothesis
Considering the MOS values alone does not allow drawing any
conclusions about

users and the credibility of the presented subjective test
results. However, considering the standard deviation of opinion scores (SOS)

in addition
to the MOS values help identifying incredible results





The important fact about designing the test methods is that the user should not be overloaded
with too many questions of the same
kind. When dealing with investigation of the QoE using
various crowdsourcing or social network platforms, it is necessary to stress up the fact that the
users are evaluating mostly in their home environmen
t. I
f they are


bored with exhausting
onnaires, they will simply loose interest in further evaluating and they will quit the
application, or
(perhaps it is even worse) they will just evaluate the given sequence with random

The exact same issue is related to the overall duration of the a
ssessment. For the
subjective testing in the laboratory, it is generally advised not to exceed overall duration of 30
minutes. However, this is absolutely inadequate for online testing and crowdsourcing. Whereas in
laboratory it is possible to present user
s for example 30 videos in single testing session, for the
crowdsourcing testing scenario it would be more efficient to break these 30 videos into several
testing sets, present them to several groups and then combine the results.

Concerning the downloadin
g of the set. If one would want to evaluate the effect of long duration
sequences (e.g. 30 minutes long video sequences) on QoE, it is also possible to use RTMP
streaming with flash player, which is able to adapt the video source according to the current
nternet connection. There will be already prepared video sequences with vari
ous bitrates on the
server side
. This kind of approach will be in my opinion more reasonable and will better reflect the
real world scenario conditions, than requesting the users t
o download one big file with slow
Internet connection. Of course, the application monitoring
n the user side is necessary, but easy

to implement.

Comparison of


Testing Methods

There are basically
four different

types of subjective user studies


laboratory studies (with paid users

in a controlled test environment, with a test


crowdsourcing studies (with paid users conducting a test remotely)


social networking studies (without any payments; some addition
al, but unreliable
“social” information of the users conducting remotely the test)


field trials (in an uncontrolled, but realistic environment)

The main differences between crowdsourcing and lab studies are compared
, i.e.
considering various
effects emerging in subjective studies

and their test design
, and
, i.e. considering the impact on
concrete user ratings and



Uncontrolled tests require a

monitoring of the environment.

Please feel free to add/modify/comment.

ed features of Crowdsourcing platforms

The following features would be nice to
be integrated in a crowdsourcing platform. Since
UniWue has

very good connections to one crowdsourcing provider

may be


to integrate some desired features directly into the platform

Special demands on user demographics are of interest. For example, to specify that 50% of
the test subjects are younger than 30 years and 50% are older than 30 years.

Targeting different types
of users is possible in the Facebook social network.

Collaboration with Microworkers.com Provider

UniWue has collaboration with Microworkers.com which allows us to ask for integrating
features. Since UniWue is actively using this platform, we can offer som
e support for
Qualinet members, e.g. how to design tests, how to launch tests, etc.

urrently available features on Microworkers.com


Microworkers.com provides access to more than 200.000 registered users

which can
be identified by unique, public users IDs. The crowd is distributed all over the world,
with a large majority located in Asia. Details can be found in


The location of the individual workers is verified by IP monitoring and a postal
identification during the payout process. The home country of the workers is
available in her public user profile.


Tasks are described in a simple textual manner and the
workers submit their work
results using a web form. For QoE tests the workers can be pointed to an external
server hosting the test environment. A payment code generated at the end of the
test is then submitted at Microworkers as work proof.


Besides task
s with are offered to the whole crowd, the platform offers means to
choose only a selected group of workers for a certain tasks. The workers can be
chosen e.g. based on their location or based on their performance in previous tasks.


To support more sophis
ticated tasks, the interface will be redesigned in the next
months and an API for automated interactions with the platform is currently

Available support from UniWue


We can provide
introductions and support for the
usage of the Microworkers.com

, like the account and task creation. Further we can help during the task
design, which highly affects the result quality.


The results from the current crowdsourcing tests can be used to form an initial pool
of trustworthy workers for QoE tests.
This pool can be extended and adapted
depending on the results of other users QoE tests.


Resulting for our current tests, we have ready
use hardware to run web based
crowdsourcing task and validated mechanisms to integrate these tasks into the
orkers.com platform, like payment
key generation strategies.

Interaction with other
Working Groups /

/ Task Forces

WG1 subgroup on

Web and Cloud Applications

Please feel free to add/modify/comment.

Joint Activities

Submission to


Example: Updated subjective test methodologies and metrics for Web Browsing QoE

validation of test methodologies and results


Example: YouTube QoE tests based on crowdsourcing vs. lab vs. field

Creation of common test data sets

which would allow cross

Please feel free to add/modify/comment.

Requirements for other WGs

Application areas

WG3: Quality metrics

WG4: Databases and validation


Databases: yet to be created (by Qualinet members)

WG5: Standardization,
certification and dissemination


Guidelines for quality testing based on crowdsourcing (ITU


(not intended as self
promotion, but to show own activities in this direction; papers can be sent to
interested people)


Matthias Hirth, Tobias Ho
ßfeld, Phuoc Tran
Gia. Anatomy of a Crowdsourcing Platform

Using the Example of Microworkers.com. Workshop on Future Internet and Next Generation
Networks (FINGNet), Seoul, Korea, June 2011.


Tobias Hoßfeld, Raimund Schatz, et al. Quantification of YouTub
e QoE via Crowdsourcing,
currently under submission, available as technical report no. 483 at the University of


Clemens Horch
, Christian Keimel, Klaus Diepold.

Crowdsourcing for
Subjective Video Quality Tests. Technical Report (in German), available online



Chen, Chen
Chi Wu, Yu
Chun Chang, and Chin
Laung Lei. 2009. A crowdsourceable
QoE evaluation framework for multimedia content. In Proceedings of the 17th ACM
international conference on Multimedia (MM '09). ACM, New York, NY, USA, 491


Ta Chen, Chi
Jui Chang, Chen
Chi Wu, Yu
Chun Chang, and Chin
Laung Lei. 2010.
Quadrant of euphoria: a crowdsourcing platform for QoE assessment. Netwrk. Mag. of Global
Internetwkg. 24, 2 (March 2010), 28


B. Gardlo, M. Ries, M. Rupp, and R. Jarina
, “A QoE evaluation methodology for HD video
streaming using social networking,” in Multimedia (ISM), 2011 IEEE International Symposium
on, dec. 2011, pp. 222