EXTRACTING STATE MODELS OF WEB APPLICATIONSA. Y. Zakonov

spongereasonInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

93 εμφανίσεις

EXTRACTING S
TATE MODELS OF WEB A
PPLICATIONS

A
.
Y
.
Zakonov


Saint
-
Petersburg National Research University of Information Technologies,
Mechanics and Optics, Saint
-
Petersburg, Russia


1. INTRODUCTION

Over the recent years web sites has transformed from a col
lection of static
HTML pages to a complex interactive web applications that are becoming more
complex and more popular, which is proved by examples like Facebook, GMail,
Amazon, etc. By a Web application we understand a set of pages connected by
hyperlinks
, ranging from html pages displaying static content to a complex
single page applications, which Document Object Model (DOM) could be
dynamically modified by application logic implemented as Javascript handlers
for user interactions and Ajax callbacks.

Whi
le source code is the most accurate description of the behavior of a
web application, this description is expressed in low
-
level program statements
and are hardly suitable for high
-
level understanding of an application’s intended
behavior. The goal of pres
ented research is to propose a method for automatic
state model extraction of existing web applications, which would be suitable for
writing formal specification requirements and their automatic verification using
existing model checking tools. To do so, w
e pursue research in two
complementary directions.

First of all a method of web application states discovery is proposed. It is
achieved by dynamic application analysis, which is implemented by random
-
driven exhausting exploration of page states and page t
ransitions. Selenium
framework facilities that emulate user interactions with the graphical interface
were employed to develop a proof
-
of
-
concept tool that explores web application
possible states and transitions between them.

Secondly, an algorithm is pro
posed to discover similar states and to merge
them. For complex applications counting each different DOM tree as a different
application state would lead to a model consisting of thousands of different
states and transitions, which would make this model pr
actically useless. An
algorithm is proposed, that analyses differences between application states and
find which of them are semantically the same. Such algorithm makes it possible
to generate human
-
readable models automatically even for complex real world
s
web applications. Models are considered to be an essential step in capturing
different system behaviors and simplifying the analysis required to check or
improve quality of the software.

The research aims to propose solutions for number of various tasks:

1. Method of discovering as many different application states as it is
possible.

2. Algorithm to measure similarity of web application states.

3. Atoolthatwouldautomaticallybuildahuman
-
readable model of an
existing web application provided discovered stat
es, transitions and similarity
measure function.

4. A way to apply Model Checking techniques to extracted models.

5. Method to apply Model
-
based testing techniques to web applications.

2. RELATED WORK

Several techniques have been presented in the literatur
e that propose
automatic model extraction and verification of the existing web applications. In
[1] authors survey 24 different modelling methods used in web site verification
and testing. There are very few researches that extract models from existing web

applications in order to support their maintenance and evolution [2, 3, 4].
Common drawback of such researches is that they aim to create model, useful
for proper understanding of dynamic application behavior, but not a formal
model that can be verified a
gainst given requirements.

In [5] authors propose a runtime enforcement mechanism that restricts the
control flow of a web application to a state machine model specified by the
developer, and use model checking to verify temporal properties on these state
machines. This approach implies manual development of a state model by
developer, which is time consuming and error prone, especially for complex
web applications.

The work most similar to our approach is described in [6]. Paper proposes
state
-
based testin
g approach, specifically designed to exercise Ajax Web
applications. Test cases are derived from the state model based on the notion of
semantically interacting events. Approach is limited to single page applications
and Ajax callbacks. Our approach handle

all the possible state changes, which
include page transitions and Javascript DOM manipulation in event handlers
triggered by user actions, as well as Ajax callbacks. Handling a Web application
in whole makes it possible to apply our approach to real worl
d applications and
to achieve more accurate models.

Researches on model checking of web applications [7, 8 9] concentrate
mostly on the model checking process, but not on the model extraction. Model
extraction is critically important for complex real world

web applications
because straight
-
forward model extraction would generate huge models with
hundreds of states and transitions for complex applications, which would be
practically useless. Creating the model manually is error
-
prone and time
consuming. This

paper describes an approach that simplifies automatically
extracted state and transition information and generates human
-
readable models,
which could significantly improve processes of definition of the formal
specification requirements and model checking
.

3. ALGORITHM TO DISCOVER POSSIBLE WEB APPLICATION
STATES

In the proposed approach it is assumed that a web application state is
completely described by a web page source code. Page source code defines what
information user can see and what actions he can

make. In practice some
information could be hidden from user and stored on server side and user’s
action result could depend on server side state. But at current stage let’s assume
that page source code describes application state.

Each application state
defines set of possible user actions: buttons or
hyperlinks to click, text inputs to fill, checkboxes or radio buttons to select, etc.
Each action could trigger a page update (JavaScript event handler) or a page
transition. In general it is impossible to g
uarantee discovery of all web
application states, as there are infinite number of various user action sequences
and possible Ajax callbacks to these actions. Each action sequence potentially
could lead to a new unknown application state. In current researc
h we propose
an algorithm of states discovery that is based on random user sequences
generation. Selenium tool, which is able to emulate user actions, is employed to
emulate user interaction with a web application and to record discovered states.
Before an
d after each action snapshot of the web page’s DOM is recorded.
Proposed algorithm consists of the following steps:

1. Static analysis of the page source code is used to get list of available
actions actionlist, which consists of all the html items, that a
re present on the
page and could trigger page transition or a Javascript code.

2. Randomly select an action from actionlist and execute it.

3. Triad < state1, action, state2> is appended to the execution trace.

4. Continue iterations if any of the last 20
actions has discovered at least
one new state. This stop condition would help if all the possible states were
discovered or web application went to a dead end state or group of states.
Algorithm should be tuned for specific tasks, as sometimes 20 states li
mit could
be insufficient while for others it could be more than enough.

Finally, the execution trace is stored as a sequence of triads < state1,
action, state2 >, where state holds the DOM tree of the page and action
describes taken action and action obje
ct referenced using XPath language.

4. APPROACH TO MEASURE SIMILARITY OF WEB APPLICATION
STATES

Web application could consist of hundreds or even thousands of pages
with different source codes. For example in a mail application an inbox page’s
source code
depends on the number of messages that are currently in the inbox.
And inbox page with two messages differs from a page with three messages.
Even two inbox pages that display same number of messages but with different
subjects would differ. If we are build
ing a model of such application all these
pages should be considered as one state, as they actually describe the same state
of the application from the user/developer point of view. Another example is a
web page that has an advertisement block that include
s some code from other
websites. Same application page would have different source codes due to
different advertisement blocks included. All this insufficient page source codes’
differences would lead to a huge number of different states discovered even fo
r
simple web applications. A model that contains hundreds of states and
transitions would be practically useless as it would be impossible for a
developer to understand application logic from this model or to define any
formal requirements. A sophisticated

algorithm to merge similar states is
required. Current section describes an approach that makes it possible to detect
similar states by analyzing corresponding DOM trees and therefore to reduce
number of states in the model to achieve human
-
readable model
s even for
complex real world web applications.

4.1 Filter out DOM elements page state does not depend on

DOM tree is traversed and all nodes of the following types are filtered
out: link, script, meta. Nodes of these types do not directly affect the page
state
that the user could see or set of actions that the user can make. Also we propose
to ignore text values of the elements, but compare only DOM structure. All the
element attributes are ignored except the style attribute. Style attribute may not
be com
pletely ignored as CSS could directly affect user’s page perception:
elements (including controls) could be made invisible or could be disabled using
CSS styles.

4.2 Filter out external dependencies

Web application could often contain links that lead to ot
her web sites. It
could be information links for user (e.g. Google’s search results page), links to
partner web sites or advertisement banners and links. It depends on the specific
web site if these external elements on the web page are part of the web
app
lication’s business logic or are they unimportant and could be filtered out.
External dependencies, e.g. elements that depend on external web sites, are
detected by the following set of rules:

• img or iframe elements’ src attributes lead to external web s
ite;

• link’s href attribute contains external web site address;

• element is associated with (or initialized by) JavaScript code, which
contains AJAX get requests to external web site.

4.3 Recursive node similarity definition

To discover similar states we

introduce the following recursive definition
of similarity: for DOM nodes A and B similar(A,B) == True if and only if they
have the same type and same number of children, where each child of A is
similar to the corresponding child of B.

4.4 Collapse simil
ar node sequences

An important feature of the proposed approach is similar node “collapse”
step. Let’s illustrate this idea on the example of the inbox messages page of the
mail web application. Page with 10 messages present and page with 11
messages would

have different DOM trees, but, from the user or developer point
of views, they denote the same state of the application. As well as the page with
1000 messages displayed. Important difference would be if no messages are in
the inbox that is an empty inbox

page. Empty inbox page is a different state as
user has different set of possible actions to make, while pages with some
messages present in the inbox are all the same from this point of view. Same
situation occurs in many other popular web applications:
different number of
links in the list, search result items, task list todo items, etc.

“Collapse” is implemented be the following algorithm:

1. Traverse DOM tree, starting from it’s leafs.

2. For a given node fetch list of children nodes listc.

3. Check al
l possible pairs xi,xj


listc and if similar(xi,xj) == True
remove xj node.

6. CASE STUDY

A proof
-
of
-
concept tool was developed using Python 2.7 programming
language, Selenium [10] and Graphviz [11] frameworks. Current version of the
tool is capable of fully automated web appli
cation analysis. Random
-
driven
state exploration is implemented according to the algorithm described in section
III. Also the developed tool supports two modes of simplification algorithm:
DOM structures comparison and Action Sets comparison. Tool currentl
y
provides a console interface and produces output in the form of an XML file,
describing the extracted model, and a PNG image. XML description could be
converted to Promela language and used as an input to the Spin model checker.
Also XML could be used fo
r model
-
based test automation tools. PNG image
contains a human
-
readable representation of the model. State labels contain
page titles and transition labels contain description of the taken actions, like
“click object L” or “type text A into field B”. Obje
ct references are described
using XPath language. PNG model is useful for developers to review overall
application design and to write down formal specification requirements, using
proposed states and transition ids.

Developed tool was applied to a number
of existing popular web
applications. For each web application random
-
driven state exploration was run
with 10 minutes time limit. All the execution traces were stored to the external
files as sequences of triads < state1, action, state2 >. Trace files siz
es vary from
1.5 MB to 30 MB depending on complexity of the application’s pages. Then
each execution trace was simplified using proposed algorithm. For all the
examined real world web applications automatic exploration tool was able to
discover more then 8
0 different states in a reasonably short execution time. State
models containing 80
-
200 states and transitions between them are useless in
practice, as they are not human
-
readable and it is impossible to write down any
adequate formal requirements using th
em. For TadaList.com and m.VK.com
proposed simplification algorithms were able to produce models that contain
less then 20 states. Such models are human
-
readable and would be useful for
developers and QA specialists. For more complex web sites models conta
in
more states and some manual review of the proposed models is advisable.
Proposed in the conclusion section further improvements of the simplification
algorithm would be able to cope better with complex web sites like
Amazon.com, etc.

8. CONCLUSION

In th
is paper we have presented an approach to extract a finite state model
of an existing web application. Such model would be suitable for writing formal
specification requirements, to automate model checking and to apply
modelbased testing techniques. A meth
od of static page source code analysis is
proposed to explore web application states using Selenium tool to emulate user
interaction. Two algorithms are proposed that measure similarity of web
application pages. These algorithms are designed to reduce numb
er of states in
the state models and to make them human
-
readable even for complex real world
web applications.

Further research is aimed at improving similarity measure algorithms.
Currently research proposed two different approaches for this problem which

performance in practice are close to each other. Quality of the page comparison
results are planed to be improved by using pattern discovery algorithms and
more sophisticated page source code analysis. Improving similarity measure are
critical issue for t
he proposed approach is it used to reduce number of states in
the model, which is generally huge for complex applications.

It should be noted, that extracted models could not be expected to cover all the
possible states and transitions of the web applicati
on, as such model could be
achieved only if an exhaustive execution trace that explores each link, triggering
each possible event handler on every page of the application would be provided.
Generally such trace is infeasible and the model only approximates

web
application behavior. Nevertheless such model could be used for automated
model checking and model
-
based testing and significantly improve software
quality and defect detection rate.

9. REFERENCES

1. Alalfi, M.H., Cordy, J.R., Dean, T.R.: Modelling me
thods for web
application verification and testing: state of the art. Softw. Test., Verif.
Reliab.(2009) 265

296

2. Hassan AE, Holt RC. Architecture recovery of web applications.
Proceedings of the 24th ICSE, ACM Press: New York, NY, USA, 2002; 349

359.

3.

Antoniol G, Di Penta M, Zazzara M. Understanding Web Applications
through Dynamic Analysis. Proceedings of the IWPC 2004; 120

131.

4. Di Lucca GA, Di Penta M. Integrating Static and Dynamic Analysis to
improve the Comprehension of Existing Web Application
s. Proceedings 7th
IEEE WSE: Washington, DC, USA, 2005; 87

94.

5. Sylvain allA l’, Taylor ttema, Chris Bunch, Tevfik Bultan:
Eliminating navigation errors in web applications via model checking and
runtime enforcement of navigation state machines. ASE 2
010: 235

244

6. Marchetto, A., Tonella, P., Ricca, F.: State
-
Based Testing of Ajax Web
Applications. ICST 2008: 121

130

7. Y. uang, F. Yu, C. ang, C. Tsai, D.T. Lee, and S. Kuo, ”Verifying
Web Applications Using Bounded Model Checking”, DSN 2004: 199

208
.

8. Homma, Kei and Izumi, Satoru and Abe, Yuki and Takahashi, Kaoru
and Togashi, Atsushi ”Using the Model Checker Spin for Web Application
Design”, SAINT 2010: 137

140

9. omma, K. Izumi, S. Takahashi, K. Togashi, A., et al ”Modeling Web
Applications Desi
gn with Automata and Its Verification”, ISADS 2011: 103


112

10. Antawan Holmes , Marc Kellogg, Automating Functional Tests Using
Selenium, AGILE 2006: .270

275

11. Kaufmann M., Wagner D. (editors) Drawing Graphs: Methods and
Models, Springer, 2001. 326 pa
ges