Assessment of Communication Protocols in the EPC Network ...

therapistarmySoftware and s/w Development

Dec 14, 2013 (3 years and 7 months ago)

82 views

Assessment of Communication Protocols in the
EPC Network – Replacing Textual SOAP and XML
with Binary Google Protocol Buffers Encoding
J¨urgen M¨uller,Martin Lorenz,Felix Geller,Matthieu Schapranow,Thomas Kowark,and Alexander Zeier
Hasso Plattner Institute for IT Systems Engineering
Prof.-Dr.-Helmert-Str.2-3
14482 Potsdam,Germany
fjuergen.muellerjmatthieu.schapranowjthomas.kowarkjalexander.zeierg@hpi.uni-potsdam.de
fmartin.lorenzjfelix.gellerg@student.hpi.uni-potsdam.de
Abstract—The diffusion of RFID technology continues using
the Electronic Product Code (EPC) as unique identifier for phys-
ical objects.The EPC Network enables companies to share infor-
mation about read events by defining EPC Information Services
(EPCIS).The communication protocols in the EPC Network are
defined to be XML and SOAP.We claim that a binary encoded
communication protocol is more appropriate given the fact that
billions of read events will occur.We extend the Fosstrak EPCIS
repository by binary encoded Google’s Protocol Buffers.Our
experimental evaluation shows that this switch in the protocol
from textual encoding to binary encoding reduces the message
size by about 75%,the processing time by at least 11%for XML
and at least 59%for SOAP communication.With regards to the
Internet of Things,our contribution shows the importance of the
appropriate choice of communication protocols.We encourage
that the EPC Network should be extended by a binary encoded
communication protocol.
I.INTRODUCTION AND MOTIVATION
Radio-frequency IDentification (RFID) technology is being
used in many industries for different purposes such as (a)
Production,Monitoring and Maintenance,(b) Product Safety,
Quality and Information,and (c) Logistical Tracking &
Tracing.A RFID tag is attached to physical objects,which
then are uniquely identifiable by its identifier,the Electronic
Product Code (EPC) [7] that is stored on the RFID tag.
Companies install RFID readers that recognize RFID tags
in their proximity.The EPC is transferred from the RFID
tag to the RFID reader via the Generation 2 Air
Interface [6] using reflected power of the reader.The
reader transfers the data to a middleware which performs
filtering and collection operations.After that,
the low level,binary encoded read event is converted into
an Application Level Event (ALE) [5].This also
includes enriching the read event with information such as
the event time,the read point where the read event occurred,
and the respective business context.Finally,the read event
is stored in a repository.This repository implements the
EPC Information Services (EPCIS) [4] standard.
The EPCIS standard defines a capture interface and a query
interface of this repository.The capture interface is used
by the middleware to store RFID read events in the EPCIS
repository.The query interface is used by applications which
access the EPCIS repository in order to create business value
by sharing read events [15].
The EPCIS standard defines the protocol for the EPCIS
capture interface to be Extensible Markup Language (XML)
based.The query interface is defined to use the SOAP Web
Service protocol [4].We claim that the decision towards
these protocols implicates performance problems and puts
unnecessary load on the Internet,which is the underlying
infrastructure of the EPC Network.To overcome these
hurdles,we propose to enhance the EPCIS standard by a
binary encoded protocol.
As a real world example,to motivate our believe that per-
formance is of utmost importance for the EPC Network,we
present a use case from the pharmaceutical industry.The two
U.S.states California and Florida have issued laws which aim
to reduce the counterfeiting of pharmaceuticals.Turkey also
issued such a law and the European Commission considers to
issue such a law for the whole European Union to fight upcom-
ing counterfeiting of pharmaceuticals.Therefore companies,
which produce,store,transport or sell pharmaceuticals have
to track their products on item-level.This encompasses 15
billion packages of pharmaceuticals which are produces each
year [12].Given the fact that a package of pharmaceuticals is
subject to about five read events and about six queries,this
results in approximately 75 billion read events and 90 billion
queries each year which are handled by EPCIS repositories.
This example includes about 2,000 producers,50,000 distribu-
tors,and 142,000 retailers,which each operate an EPCIS.For
immediate reasons,performance and efficient consumption of
resources such as network bandwidth will be of essential im-
portance.E.g.,it is not acceptable to wait several minutes to
gather all information necessary to verify the origin of a single
package of pharmaceuticals.
In addition to the existing goals for the EPC Network inter-
faces,we add performance and efficient resource consumption.
We stress the EPCIS as a core component in the EPC Network
and,from our point of view,it seems to be the information
system with the highest potential improvement.Readers and
middleware communicate rather efficiently and do not have
immediate need for any redesign.We question the usage of
SOAP as the Web Service technology of choice,when high
performance and throughput is required.In this paper,primar-
ily take a look at interfaces,data type definitions,and com-
munication protocols around the EPCIS repository and discuss
alternative and more efficient technologies for the replacement
of XML as the programmatic representation of events in the
context of the EPC Network.
II.RELATED WORK
After having motivated the context and the goals of our
work,we now present work that is related to different aspects
of our contribution to introduce new communication protocols
for EPCIS communication.First,we discuss work related to
specifying a protocol for communication in large scale net-
works and then go into detail about representation of data
when transferring it across different nodes,more specifically
the serialization of application data for transfer over the wire.
A.Communication Protocols
1) HTTP:One of the most commonly used standards for
application-level protocol is the Hypertext Transfer Protocol
(HTTP) specified by the World Wide Web Consortium (W3C)
[9].Its most prominent application is the World Wide Web,
where it defines the communication based on the distinction
of servers and clients.The server provides access to certain
resources,such as text documents,which a client may request
based on a pre-defined set of operations,so called verbs.The
standard specifies both the request and the response format,
but abstracts over the actual content which may be transmitted
and also the underlying hardware technologies.Therefore the
protocol is applicable to any server/client scenario,including
the EPC Network.
One of the main advantages of HTTP is the fact that the
set of operations and extent of the request and response for-
mat are rather light-weight and general.The usage across the
World Wide Web can be interpreted as indication for this.
More specifically,HTTP forms a simple but functional and
mature set of rules which can ensure the independent commu-
nication across unknown agents.Furthermore,considering the
vast size of the Internet,the benefits of complying to a widely
accepted and implemented standard are immediate.
2) SOAP:Another prominent protocol specification is
SOAP,which was previously an acronym for Simple Object
Access Protocol.Today it is associated mainly with the use
of Web Services,as it is commonly used as a basis of the
Web Services standard stack.SOAP provides for a whole
framework of specifications for the communication based
on web services,for example the Web Services Description
Language (WSDL) for interface definitions.
One of the core differences to HTTP is that SOAP is aimed
at transferring structured information.For this purpose,SOAP
specifies the respective formats for requesting and providing
information in more detail,depending mainly on XML for
representation of information.
The consequences of applying SOAP as a protocol and
related technologies are already focus of academic research.
Muehlen et al.present the underlying design decisions and
their impact when developing “standards in the area of
cross-organizational workflows” [16] with respect to SOAP.
They use Representational State Transfer (REST) to contrast
the implications of SOAP with a related set of principles
applicable to communication via Web Services.They provide
a comprehensive case study which presents advantages
and disadvantages of both sides,but does not evaluate the
technical aspects regarding performance.
Kohloff and Steele provide an evaluation of SOAP with re-
spect to “High Performance Business Applications” [11].They
implement functionality applying SOAP and compare its per-
formance to the usage of an industry-specific protocol.They
present results,which showthe benefits of the industry-specific
data representation format that relies on binary encoding.
On the other hand,Ng et al.find that,given an optimized en-
gine for processing SOAP messages,SOAP might provide per-
formance which is “competitive to a fast binary protocol” [13].
However,their performance evaluations were aimed mainly at
compression of message size also in the case of included bi-
nary information,such as an image for example.As such their
comparison with binary protocols is mainly about reducing the
overhead of “Base64” encoding which is commonly used for
converting binary data into textual representations.Reducing
message size is also a goal of our evaluations.Nevertheless,
their work shows that there is need for improvement for SOAP
engines to provide for performance on a comparable level to
processors of binary encoded messages.
Chiu et al.present an evaluation of SOAP related technolo-
gies for scientific computing [2],which involves simple data
structures such as arrays and computation based on floating
point numbers which need to be represented for transmission.
As such the commonalities with our scenario are limited,but
the demand for performance even under high load is similar.
Therefore many of their findings relate directly to our sce-
nario,as they identify the overhead produced by the markup
language,both in message size and deserialization cost.
Suzumura et al.aim to improve the processing of SOAP
message by addressing the deserialization of individual mes-
sages [14].Their work makes use of the similarities between
messages and speeds up deserialization by “Differential
Deserialization” which omits sections which have already
been processed.As such their work has a similar focus as
our work,meaning that they are also aiming to improve
performance by through inspecting the shortcomings of
deserialization.Their approach is to reuse SOAP messages,
which allows the reuse of existing technology.The crucial
idea is to exploit the fact that messages commonly consist
of redundant parts across messages.On the other hand,our
approach is a more fundamental change as we evaluate the
use of a different encoding.Suzumura et al.claim that their
changes to the Axis SOAP engine can result in a speed up
of a factor close to three.A similar approach is presented
by Abu-Ghazaleh and Lewis [1],also aiming at improving
performance by differential deserialization.The main benefit
of this technique is its compliance with existing SOAP
technologies,which would alleviate the transition.However,
their mechanism is not directly applicable to our scenario and
needs because the messages in our scenario usually consist
of similar structure,but the content is rarely redundant.
B.Serialization Mechanisms
We refer to serialization as the process of converting an
object used in a programming environment,in our case in a
virtual machine,into a bit-based representation that can be
exchanged between different nodes and used to restore the
state of the object,or deserialize it,on a different node at a
later point in time.More informally,serialization is the process
of storing an object such that it can be transferred across a wire
or stored for later usage.The following subsections discuss the
different aspects of using textual or binary encoding of object
state for this purpose.
1) Textual Encoding:We refer to textual encoding of infor-
mation as the representation of object state in human-readable
form.More specifically,the individual bytes are used for en-
coding letters,or glyphs by Unicode terminology,which are
used for defining the object’s state.As such,textual encoding
involves at least to steps of encoding:First mapping bytes
to characters and secondly converting object state to a col-
lection of characters.The second step usually involves the
formal definition of a markup language in order to facilitate
the automated processing of information.An example for such
a markup language is the XML where tags are used to hier-
archically structure and label information.
One of the main benefits of textual encoding is that it is
human-readable and therefore,e.g.,eases understandability
and troubleshooting.On the other hand,textual representation
is commonly very verbose as there is need for structuring the
data such that is processable by a machine but also provide for
human labeling of individual entries.The markup language
JSON aims to address this issue of redundant information
but in any case the textual encoding of structure for the
sake of human readability is expensive when compared to
binary encoding as outlined in the following section.The
immediate reason results from the encoding necessary for
representing characters.While variable byte encodings,such
as “UTF-8” attempt to reduce the memory footprint,it still
requires a minimum of one byte for each individual character.
Encodings such as “UTF-32” allocate four bytes for each
character,which causes an overhead of memory needed for
most cases,but reduces time needed for decoding.
2) Binary Encoding:We refer to binary encoding of infor-
mation as the usage of individual bits to represent the structure
of the data at hand.This means that single bytes can be used
to represent the hierarchy and labels of information,rather
than text characters as is the case with textual encoding.The
difference to textual encoding is that the markup for the infor-
mation is not easily understandable by humans,as the labels
are numbers in binary form and not text characters.A possi-
ble problem which arises from discarding human-readability,
is the question of byte-order.Two nodes,or even the protocol
at hand,should specify the order or “endianness” in order to
guarantee interoperability.
A prominent use of binary encoding is the storage of object
state from within a programming environment,also referred to
as “marshalling”.As the programming environment is respon-
sible for storing and restoring,there is no immediate need for
the information to be processable by humans.However,their
are also several protocols which use binary encoded messages,
for reasons we discuss in more detail below.The following
is a selection of binary encoding technologies:Common Ob-
ject Request Broker Architecture (CORBA),Microsoft’s Dis-
tributed Component Object Model (DCOM),Google’s Proto-
col Buffers (GPB).
More importantly,using binary encoding is a preferable
choice because machines process numbers faster than
characters.One reason for this is that the additional step of
decoding characters from byte representation can be omitted.
The cost of this decoding step becomes obvious when
considering variable byte encoding schemes,such as UTF-8,
where each byte needs to be processed individually and
checked whether the succeeding byte has to be included for
decoding the character.Furthermore,using bits to represent
structure clearly reduces the overhead of the markup at hand.
Instead of verbose markup labels,binary encoded messages
commonly have few overhead and consist mostly of the
actual content.
Juric et al.present work which evaluates the performance
difference between protocols which use binary and textual en-
coding,more specifically Java’s Remote Method Invocation
and Web Services using XML to transfer information [10].
They show that communication based on binary encoding of-
fers significant performance improvements.More interestingly
they identify the need for “marshaling/demarshaling optimiza-
tions” to overcome the difference in performance which aligns
with the goals of our work.
Elfwing et al.present a comparison of SOAP and a CORBA
middleware implementation [3].Preliminary results showed
that the “ratio between SOAP and CORBA was 400:1 in re-
sponse time”.Given their analysis of the Axis SOAP imple-
mentation,they claim that they are able to lower the “theo-
retical ratio to 7:1”.From our perspective,this remains a sig-
nificant performance bottleneck,showing that protocols based
on binary encoding are a compelling choice whenever perfor-
mance is crucial to the application at hand.
3) Google’s Protocol Buffers (GPB):For our implementa-
tion,we choose GPB given the reason that this protocol is
hardly addressed in academia up to now.GPB was developed
by Google for inter-application communication within Google
and is now available as Open Source software.
Employing GPB involves several steps:First,the structure
of the “message” which shall be send over the wire is formally
defined using a simple data definition language.When com-
pared with the regular Web Services technology stack,this
is comparable to W3C’s XML Schema which is commonly
used for WSDL definitions.However,the data definition lan-
guage used for GPB is not as expressive as XML Schema,for
example the concept of inheritance is not supported.It aims
at facilitating extensibility,as messages with new fields are
compatible with more restricted versions.
Listing 1 presents a sample definition of a message as used
in our implementation:
Listing 1.Google Protocol Buffer Object Event Schema
message Pr ot oObj ect Event f
r e qui r e d s t r i n g event
t i me = 1;
o p t i o n a l s t r i n g r ecor d
t i me = 2;
r e qui r e d s t r i n g event Ti meZoneOf f set = 3;
r e pe a t e d s t r i n g e p c
l i s t = 4;
r e qui r e d Pr ot oAct i on a c t i on = 5;
o p t i o n a l s t r i n g bi z
s t e p = 6;
o p t i o n a l s t r i n g d i s p o s i t i o n = 7;
o p t i o n a l s t r i n g r e a d
poi nt = 8;
o p t i o n a l s t r i n g b i z
l o c a t i o n = 9;
r e pe a t e d Pr ot oBus i ne s s Tr a ns a c t i on b i z
t r a n s a c t i o n
l i s t = 10;
g
The example highlights the simplicity of the definitions
and demonstrates several features of the data definition
language,such as the nesting of messages (for example
“ProtoBusinessTransaction” is another message defined
similarly as “ProtoObjectEvent”),the marking of required
and optional fields or fields which are sequences (repeated).
The given data definition is used by a compiler for a cer-
tain programming language to create structures which can be
used in the programming environment of choice.Currently the
project offers bindings for JAVA,C++,Python,and Ruby.
III.IMPLEMENTATION
The following section discusses the implementation of the
new,binary encoded protocol for EPCIS communication.As
the focus of our work is the evaluation of the interfaces of
an EPCIS repository,we decided to make use of existing
technologies in order to obtain results as realistic as possible
in our scenario.More specifically,we did not focus on simply
evaluating the performance of sample web services interfaces,
instead we focus on actual interfaces which are part of the
EPC Network specification.As the basis for our examinations
we make use of Fosstrak Open Source EPCIS
repository [8],which is an “open source RFID software
platform that implements the EPC Network specifications”
and aims to enable prototyping within the context of the EPC
Network.The main reason for our choice to use the platform
though,is Fosstrak’s compliance with the EPC Network
requirements,which we consider crucial in order to retrieve
realistic results for our evaluation.
We first present the Fosstrak EPCIS repository architecture
before we describe the modifications and additions we made
to the Fosstrak implementation in order to replace the existing
XML and SOAP technologies.
A.Fosstrak Open Source EPCIS Repository Architecture
Fosstrak is implemented using standard technologies such
as Java and Hibernate.This easily allows the identification of
the individual parts for our modifications and evaluation.
Figure 1 shows the architecture of the system.The
architectural style is three tier client-server;a client is either
an EPCIS capture application,an EPCIS query application,or
both.The server is an EPCIS repository providing interfaces
for the clients.The Fosstrak EPCIS repository parses client
requests and processes them according to the EPCIS standard.
The transport protocols used by the client applications are
XML over HTTP for capturing and SOAP over HTTP for
querying,respectively.
Fig.1.Fosstrak EPCIS Architecture
The HTTP binding for the capture interface is implemented
by providing a Java Servlet which is registered with the Servlet
container (we use Apache Tomcat).The Servlet receives cap-
ture requests from an EPCIS capturing application.These re-
quests must include EPCIS events serialized in XML given in
the payload of an HTTP POST request.The Servlet validates
the incoming XML documents against the EPCIS schema and
passes it to the CaptureOperationsModule.After that,
the EPCIS events are stored into the database using Hibernate
as a Object-Relational Mapper (O/R Mapper).
The second interface is the query interface.Fosstrak uses
the Apache CXF Web Service framework in combination with
the JAX-WS API to implement the SOAP/HTTP binding for
the query interface.CXF maps the contents of the incoming
SOAP requests to Java objects and hands them over to the
QueryOperationsModule where a database query is per-
formed and the query results are sent back to the QueryClient.
B.Design and Rationale
Our main goals were the replacement of XML and SOAP
for data exchange.As mentioned before,the time it takes to
serialize/deserialize XML and the memory it consumes are
crucial factors that influence the performance of the EPCIS
repository.Unfortunately,the software design of the Fosstrak
EPCIS repository is very close related to the decision to use
XML for communication and data representation.Internally,
XML messages send to the EPCIS repository are accessed
using Java Architecture for XML Binding (JAXB).
In contrast to older technologies like JAXP with its DOM
and SAX parsers,JAXB follows a new approach.Instances
of XML documents defined by a schema can be unmarshaled
into a tree of standard java objects.The XML schema con-
tains the information about the structure and the data types of
a XML document defined by the schema.JAXB ships with a
compiler that uses the schema to create a hierarchy of Java
objects (content objects),which represent the content of the
XML documents.During runtime,the JAXB API is used to
unmarshal a given XML document instance into the tree of
pre-compiled content objects.The created content objects can
be used throughout the rest of the application.We doubt that
the decision to use JAXB is appropriate,since the JAXB con-
tent objects are not just normal Java objects.They contain a lot
of methods that refer back to their XML origin.This design
makes the work with such objects very unintuitive.Both,cap-
ture and query interface receive XML messages for further
processing.The JAXB object representation of the XML is
used internally all the way down to the database.The data
model corresponds directly to the object tree created by the
JAXB compiler.We assume that Hibernate was used to de-
rive the data model directly from the object model created by
JAXB.The choice of using XML obviously has a deep impact
on implementation details of the software.
To replace the XML-based communication at the capture
interface and the SOAP-based communication at the query
interface,we implement three components:a GPB capture in-
terface Servlet,a GPB query interface Servlet,and a converter
from GPB to JAXB.
Fig.2.Fosstrak EPCIS Repository Adapted Architecture
Our adapted Fosstrak EPCIS repository shown in Figure 2
is able to handle XML and Google Protocol Buffer requests
simultaneously.We left the logical implementation untouched
and provided a second way to capture and query events.
1) GPB Capture Interface:We added a new capture
Servlet to the Fosstrak EPCIS repository and – like the
original one – our Servlet extracts the stream from the
request’s payload.The difference is that this stream contains
the serialized GPB message instead of XML.We pass the
stream to our own CustomCaptureOperationsModule,
which uses the GPB API to generate Java objects from the
stream and converts these objects into JAXB content objects,
which can be stored using Hibernate.Our implementation
differs only in the way,that we get a serialized GPB message
instead of XML from the request’s stream and that we need
to convert the objects into JAXB content objects before we
can store them.
2) GPB Query Interface:The EPCIS query interface is
more complex.It provides a framework by which client
applications may query EPCIS repository data.The interface
provides two types of queries:One-off queries and standing
queries.One-off queries represent a single transaction
between client application and EPCIS repository in which
the client application state one request and gets one response.
Standing queries are long-term transactions where the client
application has ongoing interest in a query and thereafter
receives periodic delivery of results via the EPCIS query
interface without making further requests.It is obvious that
there is much more business logic necessary in order to
provide the desired service to the client.With respect to
this fact,Fosstrak EPCIS repository implements the query
interface as a SOAP based Web Service.Since our goal is
to improve the performance of the system,we made the
decision to veer away from SOAP.Internally,most of the
business logic,like query execution,is found inside the
QueryOperationsModule.We use this module as our
main connection point to the EPCIS repository.Just like for
the capturing interface,we added a new Servlet to the appli-
cation.Analogical,this Servlet extracts GPB messages from
the HTTP POST request stream.Whereas we implement a
customized CaptureOperationsModule for the capture
interface,we use the original QueryOperationsModule
for the Query Control Interface.The reason for that is the
complex business logic that is needed in order to create the
internal query defined by the abstract query received from the
client.We added a private method to the Servlet that converts
the GPB message into a query that can be executed by the
original QueryOperationsModule.The result of the
query is converted back into GBP format and is serialized to
the Servlet’s response,which can be interpreted by the client.
3) Converter from GPB to JAXB:The third component that
we add to the Fosstrak EPCIS repository is our converter.We
need to be able to receive and process messages in GPB for-
mat,but at the same time,we need JAXB content objects to
use the internal functions of the Fosstrak EPCIS repository.
Our converter can be used to convert JAXB into GPB objects
and vice versa.When keeping in mind that our goal is better
performance,we have to mention that this conversion is a
performance lack which we consider in our evaluation.
IV.EMPIRICAL EVALUATION
In this section,we first present the test design including
the metrics we use.We then discuss the test setup in order
to qualify the numerical values of our findings and facilitate
reproduction of our results.Finally,we present and analyze
the respective empirical results.
A.Test Design
In our empirical evaluation,we proceed in two steps:first,
we perform black-box testing and then,we analyze the in-
ternals of the EPCIS repository.We believe that processing
time will be a crucial factor when processing high volumes
of messages in the EPC Network.In addition,we analyze
the communication protocol and therefore,the second topic
of interest is serialization.
1) Processing Time:To determine processing time,we
measure the time needed by a client for constructing a
message which is understood by the server,the time it takes
to contact the server and subsequently the time it takes the
server to respond.These pieces can be composed to identify
the roundtrip time necessary state a request the interface
and receive a response.The measurement can be performed
without causing change in the system under examination,as
it is common in black-box testing.
After having examined the overall performance of a given
interface,we continue with further investigations by using
white-box testing in order to identify the specific times
needed for the individual steps of serving a given response.
The single steps are grouped by purpose,such as computation
needed for adhering to the used protocol or for serialization
which we discuss in further detail in the following section.
2) Serialization:One important point in serialization is the
resulting size of a serialized object.This amount of memory
impacts the time required for transferring information between
nodes as well as the memory footprint on the EPCIS repos-
itory.Furthermore,it is vital for an EPCIS repository to be
able to efficiently convert a serialized message into an inter-
nally processable representation and vice versa.Therefore,we
identify the need to compare the time for serialization and
deserialization when employing different mechanisms as an
indicator for the performance.
B.Test Setup
For the purpose of testing,we employed a Microsoft Win-
dows Vista installation on a x86 architecture computer with an
Intel Core2 Duo T7500 CPU running at 2.20 GHz and 3 GB
of RAM.All testing has been performed solely on this single
machine as we decided to exclude network latency in our tests.
This is mainly due to the fact the we consider network latency
as a factor depending on the message size,which we discuss
in the following section.Instead,we concentrate mainly on the
overall processing times and attempt to identify the individual
parts which have the highest impact on the final result.
We installed the Fosstrak EPCIS repository as described
on its user guide at http://www.fosstrak.org/epcis facilitating
Apache Tomcat (version 6.0.18) and MySQL Community
Server (version 5.1.30).Furthermore,we used Sun’s Java
HotSpot Client VM (build 11.2-b01) from a Java Runtime
Environment (version 1.6.0
12) with no special settings or
parameters for initializing the virtual machine.For adapting
the Fosstrak platform,we used GPB version 2.0.2 with the
compiler option optimize_for SPEED.
For the purpose of black-box testing,we implemented sev-
eral clients which measure the times needed for querying the
EPCIS repository interfaces.For white-box testing,made use
of dynamic instrumentation as supported by Sun’s HotSpot
Client VM,driven by the profiling application JProfiler (ver-
sion 5.2.1).JProfiler supports the connection to the Tomcat
container and instruments the code for the purpose of mea-
suring time and memory footprints.Therefore we are able to
obtain immediate information about a running server session
as detailed as single method invocations.We made use of this
technology to measure the overall processing times of indi-
vidual components of the deployed application.However,our
measurements are only upper bounds,as the code instrumenta-
tion imposes a certain overhead which is omitted for black-box
testing.Nevertheless,as we compare the performance of the
interface of the existing Fosstrak EPCIS repository with our
modifications during the same session,both are instrumented
alike such that we believe that our evaluations are valid.
C.Test Results
In our presentation of the test results,we focus on results
for the query interface.Both interfaces are implemented using
a Java Servlet and query the database through an O/R mapper.
However,the query interface is compliant to a WSDL specifi-
cation using the SOAP standard.Therefore the implementation
consists also of computation required for maintaining protocol
conformity,which is omitted in our modified GPB version,as
we use plain HTTP messages.Nevertheless,we point out our
results for the capture interface where appropriate for the sake
of completeness.
1) Black-box Testing:In order to evaluate the overall per-
formance of the given interfaces,we decided to use black-box
testing as outlined above.For this purpose we implemented a
plain client using existing technology of the Fosstrak platform
as well as our own implementation for the use of GPB.The
client instantiated a shared message object only once,which
was used in consecutive invocations of the respective interface
in order to reduce the effect of serializing the message before
sending it to the server.For the capture interface,we used
a sample message to store a single ObjectEvent,while the
message to the query interface requested to retrieve a single
event related to a given EPC.The messages where sent in
alternating order,meaning that after each invocation of the
capture interface a message was sent to the query interface
and so on.
The invocations were sent in batches in order to examine
the performance with respect to time.The first set of batches
were of smaller size,while we sent higher volumes in order
to compute valid averages.The performance of the query in-
terface with respect to the batch sizes is shown in Figure 3.
The figure shows that the use of GPB imposes a certain
overhead for the first set of messages.But more importantly,
the figure shows that the use of binary encoding technolo-
gies for the serialization mechanism offers a significant per-
formance increase when sending higher volumes.The values
become constant after about five hundred consecutive invoca-
tions,providing for 59% time savings for the query interface.
For the capture interface,we measured an improvement of
about 11% when employing GPB.The reasons for this differ-
ence lies partially in the fact that the query interface conforms
Fig.3.Black-box Evaluation of Query Interface Performance for varying
Serialization Mechanisms
to a verbose WSDL/SOAP specification.
After measuring the overall performance of our modifica-
tions with respect to the existing implementation of the Fos-
strak EPCIS repository,we are able to support our claim that
the mechanism for serialization has to be considered when
designing an efficient EPCIS repository implementation.
2) White-box Testing:The presented results of the black-
box testing raises several issues,such as the question why the
impact for the capture interface is significantly lower than the
changes for the query interface.We will address these issues in
this section,as they require more detailed knowledge about the
individual steps,which are required for computing a response
and cannot be explained based on results retrieved from black-
box testing.We use the same messages for white-box testing as
we did for black-box testing.To gain insights in the (adapted)
Fosstrak EPCIS repository,we employed dynamic instrumen-
tation of the executed application code in order to profile the
application and determine the individual times categorized by
purpose.We identify the following categories that we consider
to be relevant to our work:time needed for (a) serialization,
(b) deserialization,(c) conversion between the GPB and JAXB,
(d) database management,and (e) other purposes.
Figure 4 presents the results of the white-box test of the
query interface.The result shows the accumulated times for
one thousand consecutive invocations,after having performed
two thousand consecutive invocations in order to overcome
the initialization cost for GPB.The values are grouped by the
above categories,where the category “Other” refers to com-
putation for which classification was not unambiguous.As the
criteria for classification,we used the respective method names
and their namespaces to identify which category to assign the
elapsed time to.
These results explain some of the results we obtain from
client-side testing,as well as raise issues for further investi-
gations.The measurements show that serialization is indeed
faster for GPB:Serialization is performed in about 69% of
time needed for respective XML message and deserialization
also offers a performance boost by about 30%.This aligns with
our claim that binary encoded messages can be created and
Fig.4.White-box Evaluation of Query Interface Performance for varying
Serialization Mechanisms
processed more efficiently than the respective XML messages.
Considering a gain of about 30% when examining the dese-
rialization mechanismalone,raises the question why the modi-
fied capture interface performs poorly when compared to these
numbers and yields only a 11% gain.However,using white-
box testing,we are able to explain this issue using our empiri-
cal results:The time is spent mostly on conversion between the
GPB representation and Fosstrak’s internal analog.Therefore,
we conclude that 11% are a lower bound,as the conversion
is only required as we sustain compliance with the Fosstrak
EPCIS repository internals.
Another important aspect is that,for the query interface,
a significant amount of time is spent on computation which
cannot be classified as serialization- or database-related:More
than 60% for the current implementation and only about 14%
for our modified version.The major part of this time is spent
on managing the overhead required for processing SOAP mes-
sages.As our modified version uses a plain HTTP message,
none of this logic is required and the interface implementation
can focus in the deserializing the message while discarding
most of the protocol overhead.Therefore,we identify this is-
sue as the focus for future work with respect to our scenario.
More specifically,we propose to reconsider the choice to use
the SOAP protocol rather than the more light-weight HTTP.
3) Evaluation of Message Size:When thinking about per-
formance for distributed systems,network traffic and the size
of the messages exchanged is a relevant factor.Since the idea
of EPCglobal is to use the Internet as the connection medium
of choice,one point of further improvement could be the trans-
fer time between nodes in the network.One of the advantages
of GPB is the resulting message size.Compared to the ver-
bose markup language XML,GPB can reduce the message
size significantly.
Consider the sample XML message shown in Listing
2,as taken from the Fosstrak user guide.The message
requires 1,380 bytes when storing it on a NTFS filesystem
using our test machine.Even when assuming the removal
of white spaces and memory allocation of at least one byte
per character which is applicable to UTF-8,the message still
requires at least 1,033 bytes when transferred.
Listing 2.Sample XML capture event message
<?xml ve r s i on =”1.0” encodi ng =”UTF8” s t a nda l one =” yes”?>
<e pc i s:EPCISDocument xml ns:xs i =” h t t p://www.w3.or g/2001/XML Schemai n s t a n c e ”
xmlns:e pc i s =” ur n:e pc gl oba l:e pc i s:xsd:1” xml ns:e pc gl oba l =” ur n:e pc gl oba l:xsd:1”
xs i:schemaLocat i on =” ur n:e pc gl oba l:e pc i s:xsd:1 EPCgl obalepci s 1
0.xsd ”
c r e a t i onDa t e =”20080316T22:13:16.397+01:00” schemaVer si on=”1.0”>
<EPCISBody>
<Event Li s t>
<Obj ect Event>
<event Ti me >20080316T22:13:16.397+01:00 </event Ti me>
<event Ti meZoneOf f set >+01:00</event Ti meZoneOf f set>
<epcLi s t>
<epc>ur n:epc:i d:s g t i n:0614141.107346.2017 </epc>
</epcLi s t>
<act i on>OBSERVE</act i on>
<bi zSt ep>ur n:e pc gl oba l:e pc i s:b i z s t e p:fmcg:shi pped</bi zSt ep>
<d i s p o s i t i o n>ur n:e pc gl oba l:e pc i s:di s p:fmcg:unknown</d i s p o s i t i o n>
<r eadPoi nt><i d>ur n:epc:i d:s gl n:0614141.07346.1234 </i d></r eadPoi nt>
<bi zLocat i on>
<i d>ur n:e pc gl oba l:fmcg:l oc:0614141073467.A2349</i d>
</bi zLocat i on>
<bi z Tr a ns a c t i onLi s t>
<bi z Tr a ns a c t i on t ype =” ur n:e pc gl oba l:fmcg:b t t:po”>
h t t p://t r a n s a c t i o n.acme.com/po/12345678
</bi z Tr a ns a c t i on>
</bi z Tr a ns a c t i onLi s t>
</Obj ect Event>
</Event Li s t>
</EPCISBody>
</e pc i s:EPCISDocument>
Encoding precisely the same information using our data def-
inition for GPB as presented in Listing 1,the message size is
reduced to 350 bytes when storing on a NTFS filesystem.This
provides a reduction by nearly 75% of the respective XML
representation.Thus we believe that the reduction of size may
provide for a speed improvement of equal magnitude when
transferring the data across a regular network lines,when ne-
glecting possible network problems,which are not the focus
of our work.We evaluated different messages which are ex-
pected to be common in an actual EPC Network and measured
an average in size reduction by a factor of three.To recall the
use case of the European pharmaceutical industry,the 75 bil-
lion read events would result in network traffic of about 94
petabytes using the verbose XML representation and about 24
petabytes using GPB.
We evaluated different messages which are expected to be
common in an actual EPC Network and measured an average
in size reduction by a factor of three.
V.FUTURE WORK
We stress that the next step beyond the application of a
binary-encoded communication protocol for EPCIS commu-
nication would be to replace the XML-related object structure
in the Fosstrak EPCIS repository source code.Together with
this,JAXB should be removed because it tries to bind the
language independent data exchange format XML to Java’s
data types and object structures,which is not appropriate for
the simple operations conducted within the EPC Network.
Furthermore,the database itself can be a source of
better performance.Our suggestion is a design with a
write-optimized part for the capturing of events,and a
read-optimized part for querying.The write-optimized part
could be as simple as taking the stream from the incoming
request and store it in the database without further processing.
The stored streams are processed periodically in order to
move them from the write-optimized to the read-optimized
storage.The read-optimized part of the database design could
be a column-oriented database that is tuned to execute high
performance queries.
VI.SUMMARY
In this paper,we questioned the choice to use textual en-
coded messages for the EPCIS communication in the EPC Net-
work.We presented the EPC Network,its intention,and a real-
world use case.Afterwards,we analyzed related work,which
supported our believe that binary encoded EPCIS communica-
tion would be more appropriate.To prove our claim,we facili-
tated the Fosstrak Open Source EPCIS repository implementa-
tion and extended it’s interfaces to also understand the binary
encoded message protocol Google Protocol Buffers.In our
empirical evaluation,we conducted black-box and white-box
testing and were able to show that the message size is reduces
by about 75%,the capture interface performance increased by
about 11% and the query interface performance increased by
about 59%,just by changing the underlying communication
protocol.Finally,we pointed out future work which can lever-
age our implementation and furthermore improve Fosstrak’s
EPCIS repository performance.
REFERENCES
[1] N.Abu-Ghazaleh and M.J.Lewis.Differential deserialization for opti-
mized soap performance.In SC ’05:Proceedings of the 2005 ACM/IEEE
conference on Supercomputing,page 21,Washington,DC,USA,2005.
IEEE Computer Society.
[2] K.Chiu,M.Govindaraju,and R.Bramley.Investigating the limits of
soap performance for scientific computing.In HPDC ’02:Proceedings
of the 11th IEEE International Symposium on High Performance Dis-
tributed Computing,pages 246–254,Washington,DC,USA,2002.IEEE
Computer Society.
[3] R.Elfwing,U.Paulsson,and L.Lundberg.Performance of soap in web
service environment compared to corba.In APSEC ’02:Proceedings of
the Ninth Asia-Pacific Software Engineering Conference,pages 84–93,
Washington,DC,USA,2002.IEEE Computer Society.
[4] EPCglobal Inc.Epc information services.Technical Report Version
1.0.1,2007.
[5] EPCglobal Inc.Application level events standard.Technical Report
Version 1.1,2008.
[6] EPCglobal Inc.Class 1 generation 2 uhf air interface protocol standard
”gen 2”.v.1.2.0,2008.
[7] EPCglobal Inc.Epcglobal tag data standards.Technical Report Version
1.4,2008.
[8] C.Floerkemeier,C.Roduner,and M.Lampe.Rfid application devel-
opment with the accada middleware platform.IEEE Systems Journal,
Special Issue on RFID Technology,1(2):82–94,2007.
[9] U.Irvine,J.Gettys,J.Mogul,H.Frystyk,L.Masinter,P.Leach,and
T.Berners-Lee.Hypertext transfer protocol - http/1.1.RfC,(2616),
1999.
[10] M.B.Juric,B.Kezmah,M.Hericko,I.Rozman,and I.Vezocnik.Java
rmi,rmi tunneling and web services comparison and performance anal-
ysis.SIGPLAN Not.,39(5):58–65,2004.
[11] C.Kohlhoff and R.Steele.Evaluating soap for high performance appli-
cations in capital markets.Comput.Syst.Sci.Eng.,19(4),2004.
[12] J.M¨uller,C.P¨opke,M.Urbat,A.Zeier,and H.Plattner.A simulation of
the pharmaceutical supply chain to provide realistic test data.Advances
in System Simulation,International Conference on,pages 44–49,2009.
[13] A.Ng,P.Greenfield,and S.Chen.A study of the impact of compression
and binary encoding on soap performance.Australasian Workshop on
Software and System Architecture,pages 46–56,2005.
[14] T.Suzumura,T.Takase,and M.Tatsubori.Optimizing web services
performance by differential deserialization.Web Services,IEEE Inter-
national Conference on,pages 185–192,2005.
[15] F.Thiesse,C.Floerkemeier,M.Harrison,F.Michahelles,and C.Ro-
duner.Technology,standards,and real-world deployments of the epc
network.IEEE Internet Computing,13(2):36–43,2009.
[16] M.zur Muehlen,J.V.Nickerson,and K.D.Swenson.Developing
web services choreography standards:the case of rest vs.soap.Decis.
Support Syst.,40(1):9–29,2005.