A model-driven framework for domain specific languages

butterbeanspipeΛογισμικό & κατασκευή λογ/κού

14 Ιουλ 2012 (πριν από 5 χρόνια και 1 μήνα)

1.872 εμφανίσεις

A model-driven framework for domain specific languages
demonstrated on a test automation language
Thesis by
Martin Karlsch,711330
In Partial Fulfillment of the Requirements
for the Degree of
Master of Science in Software Systems Engineering
Supervisors
Prof.Dr.Andreas Polze (HPI)
Dr.Martin von Löwis (HPI)
Dr.Klaus Ries (BMWGroup)
Dipl.-Inf.Markus Hillebrand (BMWGroup)
Hasso-Plattner-Institute of Software Systems Engineering
Potsdam,Germany
January 12,2007
(Submitted March 7,2007)
ii
c￿January 12,2007
Martin Karlsch,711330
All Rights Reserved
iii
Abstract
Innovation in today’s automotive industry is largely based on electronics and software.To
get a competitive edge a company has to deliver newfeatures steadily.This trend is boosted
by a decreasing technology life cycle.The result is an exponential growth of software in-
side a car.As quality plays a critical role software and hardware have to be tested properly,
manual testing is not practicable anymore.At the BMWGroup a framework for test au-
tomation has been developed including a Microsoft Excel based test specification language.
The language and its implementation approach offer various places for improvement,how-
ever due to the chosen technology the emendation possibilities are limited.Therefore the
development of a new domain specific language (DSL) was decided.
However,the development of a new DSL is a difficult task at the moment.The devel-
oper needs expertise in the field of language development which involves design,lexical
analysis,parser and compiler construction,implementation of a suited runtime and most
importantly knowledge of the concepts within the specific domain.A various number of
tools supporting domain specific language development is available,yet most lack in cate-
gories like reusability,maintainability,interoperability or ease of use.
For that reason this thesis proposes a new framework for the development of domain
specific languages which is based on the vision of Model-Driven Engineering (MDE).In
MDE models are the central artifact of software engineering and enable reuse.A domain
and its language (syntax and semantic) are described by models.The framework allows
the reuse of existing textual grammars for compiler generator tools,translating them into
models.Thereby it becomes possible to apply a large number of MDE transformation and
generation tools to them.In turn a language parser and generator can be produced from
these grammar models.An additional goal is to automatically generate a suitable language
editor and debugger to improve developer experience.
Using the proposed framework,the test automation language is reimplemented.This
is realized by reusing an existing language and extending it.Also different problems like
error reporting or tool support for the DSL are solved.
iv
Zusammenfassung
Die Grundlage für Innovation in der heutigen Automobilindustrie findet sich zu 90% im
Bereich der Elektronik- und Softwareentwicklung.Der Lebenszyklus neuer Produkte wird
immer kürzer,und umkonkurrenzfähig zu bleiben,müssen Hersteller immer schneller neue
Funktionen integrieren.Das hat zur Folge,dass die Menge an Software imAuto exponen-
tiell wächst.Die verwendete Software wie auch die Hardware müssen sorgfältig getestet
werden,umeine hohe Qualität garantieren zu können.Aufgrund der Menge und Komplex-
ität ist dies nicht mehr manuell möglich und muss automatisiert werden.Bei der BMW
Group ist hierfür ein spezielles Testautomatisierungsframework einschließlich einer Mi-
crosoft Excel basierten Testspezifikationssprache entwickelt worden.Die Sprache und ihre
Implementierung lassen Raum für Verbesserungen.Dies erweißt sich allerdings aufgrund
der gewählten Technologie als schwierig.Deshalb entschied man sich für die Entwicklung
einer neuen domänenspezifischen Sprache (DSL) für die Testautomatisierung.
Die Entwicklung einer neuen Sprache ist jedoch nicht trivial.Die Entwickler brauchen
ein breitgefächertes Repertoire an Wissen vor allem in den Bereichen Sprachdesign und
-implementierung.Nicht zu unterschätzen ist die genaue Kenntnis der Domäne,für die
die neue Sprache bestimmt ist,um Syntax und Semantik möglichst domänenspezifisch
festlegen zu können.Dieser Prozess wird durch eine Vielzahl an verfügbaren Werkzeu-
gen unterstützt.Häufig sind diese allerdings in einem oder sogar mehreren Aspekten wie
Wiederverwendbarkeit,Wartbarkeit oder Interoperabilität unzureichend.
In dieser Arbeit wird ein neues Framework für die Entwicklung von domänenspezifis-
chen Sprachen vorgeschlagen und implementiert.Es baut auf den Prinzipien des Model-
Driven Engineering (MDE) auf,welches Modelle in das Zentrum der Softwareentwick-
lung rückt und mit ihrer Hilfe eine einfachere Wiederverwendbarkeit ermöglicht.Eine
Domäne und ihre Sprache (Syntax und Semantik) werden mit Modellen beschrieben.Das
Framework ermöglicht die Wiederverwendung von exestierenden textuellen Grammatiken,
indemes sie in Grammatikmodelle überführt.Diese können mit Werkzeugen,welche eben-
falls den MDE Ansatz verfolgen,modifiziert,analisiert,überprüft und umgewandelt wer-
den.Außerdem ist es möglich,aus diesen Grammatikmodellen wiederum einen Parser zu
erzeugen,sowie zusätzlich einen Editor und einen Debugger für die domänenspezifische
Sprache,was der Verwendbarkeit durch Entwickler zu Gute kommt.Zusätzlich ist die
Entwicklung einer komplett neuen Sprache möglich.
Die bestehende Testautomatisierungsprache wird mit Hilfe des neuen Frameworks reim-
plementiert – unter Wiederverwendung bestehender Artefakt und einer Erweiterung dieser.
Hiermit werden ProblemimBereich Fehlerbehandlung und Werkzeugunterstützung gelöst.
v
Contents
Abstract
iii
Zusammenfassung
iv
Acknowledgments
x
I Preliminaries
2
1 Introduction
3
1.1 Motivation
..................................
3
1.2 Description of the remaining chapters
....................
5
2 Technological context and related work
6
2.1 Domain specific languages
..........................
6
2.1.1 What is a domain specific language?
.................
6
2.1.2 Advantages
..............................
8
2.1.3 Disadvantages
............................
8
2.1.4 Development phases and patterns
..................
9
2.1.5 Decision phase
............................
10
2.1.6 Analysis phase
............................
11
2.1.7 Design phase
.............................
13
2.1.8 Implementation phase
........................
15
2.1.9 Approaches supporting DSL development
..............
18
2.2 Model-Driven Engineering
..........................
22
2.2.1 Models - the foundation of MDE
..................
22
2.2.2 Domain specific modeling languages
................
26
2.2.3 Generators and transformations
...................
28
2.2.4 Different MDE technologies in detail
................
31
2.3 Model-driven DSL development
.......................
36
2.3.1 Model-driven DSLs by example
...................
36
2.3.2 Model-based domain specific language tools
............
37
2.4 Summary
...................................
42
3 Domains in the automotive industry
43
vi
II Frodo
45
4 One framework to bind themall?
46
4.1 Requirements and solutions
..........................
46
4.2 User roles
...................................
47
4.3 Framework Architecture
...........................
48
4.3.1 Components of the framework
....................
49
4.3.2 Language specification artifacts
...................
53
4.3.3 Back-end
...............................
55
4.4 Implementation
................................
56
4.4.1 Overview
...............................
56
4.4.2 ECore - Metametamodel Implementation
..............
57
4.4.3 Grammar/Syntax View Handling
..................
60
4.4.4 Transformation Engine
........................
63
4.4.5 Plugin Generation
..........................
63
4.4.6 Implementation problems and issues
.................
64
5 Examples
66
5.1 A simple state machine language
.......................
66
5.1.1 Design
................................
66
5.1.2 Implementation
............................
68
5.2 Test automation language
..........................
71
5.2.1 Introduction
..............................
71
5.2.2 The current system
..........................
72
5.2.3 A better test automation language
..................
76
III Conclusions
86
6 Future work
87
6.1 DSL Framework
...............................
87
6.1.1 Theoretical outlook
.........................
87
6.1.2 Practical outlook
...........................
88
6.2 TestDSL
....................................
88
7 Summary
90
7.1 Survey
.....................................
90
7.2 Contributions
.................................
91
A Fundamental Modeling Concepts
93
B Statement of Academic Honesty
96
vii
List of Figures
2.1 Simplified cost prediction for DSL-based methodologies [
59
]
........
9
2.2 Domain analysis,taken from[
147
]
.......................
11
2.3 Different technological space and the metamodeling stacks
..........
25
2.4 Example of MOF layers
............................
32
2.5 MDA PIMto PSM(FMC)
...........................
33
2.6 ECore metametamodel
.............................
34
2.7 XMI used for exchange.
............................
35
4.1 Proposed architecture for an model-driven DSL framework (FMC)
......
49
4.2 Frodo’s architecture (FMC)
..........................
57
4.3 Excerpt fromFrodo grammar metamodel
...................
61
5.1 Finite state machine domain model
......................
67
5.2 Example of a finite state machine
.......................
67
5.3 Screenshot of the FSMeditor
..........................
70
5.4 Current PyTAF architecture (FMC)
......................
74
5.5 Excel TestDSL example
............................
76
A.1 FMC block diagramexample
..........................
94
A.2 FMC block diagramreference sheet
......................
95
viii
List of Tables
2.1 Well known DSLs [
92
]
.............................
8
2.2 Lines of Code for different approaches to implement FDL
..........
18
2.3 Different MDE implementations,different metametamodels
.........
24
2.4 Different tools for creating DSML
.......................
27
2.5 Different model-to-text approaches
......................
29
2.6 Different generator tools
............................
29
2.7 Transformation engines fromdifferent technology spaces
...........
30
2.8 Some model-to-model transformers
......................
31
ix
Listings
2.1 SPL metamodel
................................
38
2.2 SPL TCS template specification
.......................
38
2.3 A TCSSL grammar example
.........................
41
4.1 Creating an ECore metamodel for a carpool in Python
............
59
4.2 Carpool metamodel as XMI
..........................
59
4.3 Auto importing the carpool metamodel and creating a model
........
59
4.4 XMI representation of a carpool model
....................
60
4.5 Excerpt fromthe generative grammar for ANTLR grammars
........
62
5.1 Example of FSMDSL (demo.fsm).
......................
67
5.2 Grammar of the FSMDSL as EBNF
.....................
68
5.3 The FSMparser grammar
...........................
68
5.4 The FSMlexer grammar
...........................
69
5.5 A simple FSMinterpreter in Python,interpreting demo.fsm
.........
70
5.6 The XMI representation of demo.fsm
....................
70
5.7 A generative grammar,producing a C back-end.
...............
71
5.8 Example of generated code for TestDSL
...................
80
5.9 Example of the embedded Python test language
...............
81
5.10 Example of the new wait statment
......................
81
5.11 Exerpt fromthe Python grammar metamodel to Frodo metamodel transfor-
mation
.....................................
82
5.12 Example of unmatchable rule with finite lookahead and a solution.
.....
84
x
Acknowledgments
I would like to thank Klaus Ries and Markus Hillebrand at the BMW Group,for many
interessting discussion even for the philosophical ones.Inspiring and wild ideas were dis-
cussed.But after all their guidance helped me to keep the right direction.Also Martin von
Löwis,my tutor at the HPI helped with additional ideas and many corrections.Furthermore
I would like to thank Hannah Ulferts for motivation,correcting this thesis and eating my
chocolate.Additional corrections were done by Thomas Hille,Friederike Karlsch and Jens
Ulferts,many thanks go to them,too,especially to Thomas because he shared my thesis
writing fate.
Last but not least I could not have done all of this without the never ending support of
my family.Thank you.
1
The limits of my language determine the limits of my world.
Ludwig Wittgenstein
2
Part I
Preliminaries
3
Chapter 1
Introduction
1.1 Motivation
Today the German automotive industry holds a high market share and has a good reputation
all over the world [
57
].However,competition with low cost manufacturers forces compa-
nies like BMWGroup,Audi or DaimlerCrysler to concentrate on innovative features [
57
],
for example sophisticated control over processes like engine,brake or acceleration control,
furthermore infotainment functionality like GPS navigation,head-up displays or advanced
audio/video functionality.Almost every new feature contains a combination of electronics
and software [
35
].To reduce costs,devices are built out of standard hardware components
while the software is well customized,not least to avoid imitation.
This has led to an exponential growth of software inside a car [
35
].However,the
software engineering capabilities within the automotive industry,suppliers and OEMs alike
have not grown as fast as the complexity.The cause for the increased complexity is not only
the amount of software but more and more distributed,interconnected and interdependent
functionality leading to a higher error rate.
A number of measures have been taken to manage it.One is to automate testing,es-
pecially integration testing.Many different test automation solutions exist nowadays but
not all fulfill the requirements of automotive integration testing.Therefore PyTAF[
62
],a
practical Python
1
test automation framework,has been developed at the BMWGroup.It is
programmed in a domain specific test automation language,not in Python directly.From
this language Python code is generated.
Domain specific languages (DSLs) are computer languages tailored to a special field of
application.Compared to general purpose languages they offer an advantage in expressive-
ness and ease of use in their domain.This is possible due to higher abstraction,a clear and
concise programspecification and improved programchecking possibilities.
For the current test automation language,test cases have to be written inside Microsoft
Excel.This is not satisfying because of the implied and inappropriate spreadsheet struc-
ture and limited integration into the development environment.Furthermore the language
itself suffers from design problems such as an incomplete data model.As a case in point
this can lead to ineffective test development or even worse defective tests and increased
1
The programming language.
http://www.python.org
4
maintenance costs.Later in this thesis the current issues will be elaborated in more detail.
Most of the problems can be solved by a new textual domain specific language.This
language needs to be equally powerful but has to improve upon the current shortcoming.
Nevertheless,the development of a newlanguage is a difficult task to undertake,even if the
language is domain specific.The developer needs to be qualified for language design and
implementation and to possess a good understanding of the target domain.For tasks such
as compiler development several tools are available.Likewise tools explicitly supporting
the development of domain specific languages have been created.To utilize most of them
great expertise is still required.Additionally reuse is difficult in most cases.
A vision which focuses on reuse is Model-Driven Engineering (MDE).Models move
into the center of the development process.They allow a more domain specific description
of problems than modern third generation programming languages are able to offer,ex-
ploiting advanced modeling techniques,code generation and model transformations.This
approach is very promising as first results show [
17
].
Recently first attempts have been made to combine technologies for DSL development
with the MDE paradigm,motivated by improved reuse properties and the need to specify
domain specific languages more easily [
64
,
94
,
96
,
146
,
151
].
In this thesis,beyond the improved test automation language,a model-driven frame-
work is presented allowing to implement the former in a more easy and reusable way.
Inside [
106
] Klint demands an “engineering discipline for grammarware” stating that the
engineering aspect of grammar-depended software is insufficiently understood.Most DSL
development tools are grammar based.Still questions like grammar transformation,main-
tenance,verification and reuse are hard to answer.While the available DSL tools answer
some of the question in a versatile way,they remain unconnected island solutions.For
example grammar refactoring is available in nearly none of the current tools.Providing
answers will reduce needed development efforts and decrease the number of errors.
For that reason the proposed framework tries to combine DSL development and MDE
backed by the idea of explicitly enabling reuse.A metamodel for grammars is specified
which is able to represent a large number of grammars as grammar models.These models
can then be modified utilizing a large number of available MDE-enabled tools.For ex-
ample it is possible to refactor through transformation of the grammar model or to apply
aspects.Nevertheless,refactoring or aspect introduction is not the focus of this thesis.To
enable MDE in the first place,this thesis also discusses the implementation of a popular
metametamodel on a new platform.
Furthermore,not only formal aspects are important in DSL development but also the
surrounding toolset and its integration.Developers are used to capable editors with syntax
highlighting,folding and code completing and moreover excellent support of debugging
in modern IDEs.These feature can greatly improve productivity though are seldom avail-
able for domain specific languages as a high development effort is necessary to implement
editors and debuggers.The proposed framework shows a possible way to automatically
deliver such an environment.
While implementing the new test automation language,for example the formal Python
grammar is reused and extended.This allowed to save a large amount of development time,
base the new language on an already existing formal specification and create a reusable
5
model of Python and the DSL.
Additionally problems like error reporting,exception mapping from back-end to the
DSLand some debugging capabilities are enabled by using the presented framework.These
features were not included in the original test automation language.
1.2 Description of the remaining chapters
This thesis is divided into three parts.
Part one gives an extensive overview of concepts and current technologies.It consist
of three chapters.First,domain specific languages are described.What is a DSL and what
advantages and disadvantages are associated with it?Then development phases and design
patterns for each phase are discussed including references to different technologies.Finally
a reviewof some more concrete tools and frameworks allowing the implementation of DSL
is given.
In the chapter Model-Driven Engineering the former term is introduced as a new soft-
ware engineering paradigm.Definition for model,metamodel,metametamodel and tech-
nology space are provided followed by a formal specification attempt.Moreover the con-
cepts of domain specific modeling languages,generator and transformations are elucidated,
in each case underpinned by a description of current technology.The last section explains
concrete technologies in more detail such as the Meta Object Facility,Model-Driven Ar-
chitecture,Eclipse Modeling Framework,and XML Metadata Interchange.
That last chapter of part one outlines research which is more closely related to this
thesis,namely examples of DSLs implemented using model-driven techniques and MDE-
enabled DSL development frameworks.
Part two contains the main contributions of this thesis,split up into a framework pro-
posal,the realization of that proposal and two examples.While the proposal is generic and
does not discuss concrete technologies,the implementation shows what technologies are
able and which of them were used to realize the framework.Furthermore different prob-
lems encountered are depicted.At the end two examples are implemented with the help
of the framework:a simple state machine language and a more complex test automation
language.The later one is elaborated in more detail because it is major contribution of
this thesis.At first the currently used system is described,followed by an enumeration of
its problems in design and implementation.Then,a new language and its development is
shown.
Part three,the last part of this work consists of two chapters.The first chapter shows
possibilities for future work in general for the proposed and implemented framework and
in special for the test automation language.The second chapter draws a summary of this
thesis and names important contributions.Finally the appendix contains an overviewabout
the FMC
2
notation often used in this work.
2
Fundamental Modeling Concepts.
6
Chapter 2
Technological context and related work
Initially this chapter introduces domain specific languages,explains their advantages and
disadvantages,gives a summary about common patterns and categorizes the domain spe-
cific language development process.Furthermore different approaches and tools,support-
ing language development,are discussed.Secondly the approach of Model-Driven Engi-
neering is presented,including an explanation of terms such as metamodel or technical
space.Afterwards different Model-Driven Engineering incarnations are depicted together
with the meaning of generators and transformations in this context and what kinds of ap-
proaches are available.A closer look on MOF/MDA,ECore/EMF and XMI follows.Fi-
nally an insight into various frameworks which directly support model-driven language
development is given.
2.1 Domain specific languages
“Works of imagination should be written in very plain language;the more
purely imaginative they are the more necessary it is to be plain.”
1
This section describes what a domain specific languages is,what kind of ad-
vantages and disadvantages a DSL has and also what common DSL analysis,
design and implementation patterns exist.
2.1.1 What is a domain specific language?
To understand the meaning Of the termdomain specific language or more precisely domain
specific programming language the termprogramming language is defined.One possibility
is given by [
72
]:
“A programming language or computer language is a standardized com-
munication technique for expressing instructions to a computer.It is a set
of syntactic and semantic rules used to define computer programs.A language
enables a programmer to precisely specify what data a computer will act upon,
1
Samuel Taylor Coleridge
7
how these data will be stored/transmitted,and precisely what actions to take
under various circumstances.”
However there exists no definition which all authors agree upon.Watts therefore proposes
[
181
] some criteria which have to be fulfilled by a programming language:

Must be universal (every problem must have a solution that can be programmed in
the language,if that problemcan be solved at all by computer).

Must be implementable on a computer.

Should also be reasonably natural for solving problems,at least problems within its
intended application area.
Programming languages in general can be grouped or classified by different criteria.Possi-
ble criteria are the purpose (for example Fortran for scientific programming versus C [
102
]
for system programming),the paradigm (LISP as a functional language or Smalltalk as a
object oriented language),the generation (1GL up to 5GL),whether it is imperative or
declarative and domain specific or general purpose.General purpose languages (GPLs) are
less specialized and are suited for a wide area of applications from business processing up
to scientific computing.Java
2
is a prominent representative.
The term domain specific means that the language is explicitly tailored to a target do-
main.Complex constructs and abstraction of the domain are offered within the language
increasing its expressiveness in comparison to GPLs.It is possible to express solutions
for domain problems with a lesser effort.The higher abstraction and the compactness and
therefore better readability and writability enables a larger group of people with less pro-
gramming knowledge to be productive using the DSL.This leads to productivity gains in
general and also to decreased maintenance costs.
Often a DSL does not fulfill all criteria given by Watts.Nevertheless,many DSLs are
regarded as special programming languages.
Today there are many well known DSLs like HTML,SQL,VHDL,make (software
build process),Latex (document preparation),BNF (context free grammars) or even Excel.
The use of DSLs is not new.These languages had been named special-purpose languages,
end-user languages or as Bentley [
23
] called them “little languages” before the term do-
main specific language was coined.Already in 1957 APT [
34
],a language for numeric
controlled machines was developed at the MIT,which can be considered as one of the first
available DSLs.The boarder between a DSL and a GPL is fuzzy,for example COBOL
was considered a GPL but also a DSL for business applications.Another example is Pro-
log which can be understood as a DSL for applications specified by the predicate calculus.
One attempt to classify a language has been done by Jones [
92
].A higher level stands for
more domain specific whereas a lower level means more generality (table
2.1
).As stated
by Mernik [
126
] the domain-specificity of a language is a matter of degree.In this thesis a
definition by the former will serve as a guidance:
2
http://java.sun.com/
8
“DSLs are languages tailored to a specific application domain.They offer
substantial gains in expressiveness and ease of use compared with GPLs in
their domain of application”
However many similar definition can be found in the literature [
99
].
DSL
domain
level
Java
GPL
5
VHDL
hardware design
17
HTML
web pages markup
22
SQL
database queries
25
Excel
spreadsheets
57
Table 2.1:Well known DSLs [
92
]
2.1.2 Advantages
As stated in
2.1.1
a DSL offers different advantages.Productivity and maintainability [
172
]
are increased due to an appropriated domain specific notation.DSLs are more suitable
for end-user programming.Domain experts are able to understand,validate,modify and
develop within the language (better readability,writability and high abstraction).The gains
can be measured quantitatively and qualitatively.Most qualitative reasoning is backed up
by practical observations.According to [
126
] the quantitative validation of DSLadvantages
is an ongoing field of research,yet supporting results are reported.Figure
2.1
shows the
advantage of DSLs regarding to long termcost.
Because of the concise nature and the domain fitting notation DSLs are up to a certain
degree self-documenting.This also facilitates the embodying of domain knowledge which
eases reuse [
63
,
166
] and conservation.
Another advantage is the possibility to validate at domain level [
49
].While normal GPL
compilers do not know about any domain concept beyond the general language constructs,
a DSL can be checked for any domain specific constraint.An example may be real time
properties:as long as for every language construct a certain execution time is ensured,it is
possible to automatically proof the whole program.Just as verification,optimization can
be done more effectively at the domain level [
18
].
2.1.3 Disadvantages
ADSL has not only advantages,but also potential shortcomings.One drawback is the high
development effort which is needed for a new language.The language developer needs
at least experience in language design and knowledge about the target domain.He has
to find fitting abstractions,the right scope and balance between GPL and DSL constructs.
Furthermore the language must be implemented and maintained.
9
Figure 2.1:Simplified cost prediction for DSL-based methodologies [
59
]
Other problems are tooling,user training costs and performance.While general pur-
pose languages such as Java or C#
3
have a strong tool support,corresponding tools for a
new DSL have to be created.IDEs like Eclipse or Visual Studio offer deep integration
with these languages like powerful editors with syntax highlighting and checking,inte-
grated compilers and advanced debuggers.Creating a tool ecosystem for a DSL is a time
consuming process which adds to the total costs caused by language design and implemen-
tation.Without a development methodology and suitable tools the risk is high that the DSL
development costs surpass the estimated saving by using a DSL.
The mentioned training costs originate from the fact that possible DSL users have by
definition never used the language before,However this is mitigated as in most cases the
new language should match the domain expert’s expectations.
Often a DSL will suffer from a lower performance than a hand written software.As
long as performance is not critical the other DSL benefits will make this a minor problem.
Nevertheless is some cases performance can be equal or faster because optimization is
possible on a high abstraction level but in most cases the potential is limited.
2.1.4 Development phases and patterns
The development of a DSL can be divided into different phases.The design and the im-
plementation phase were already mentioned.A more fine grained phase subdivision is
possible.Five stages can be distinguished:decision,analysis,design,implementation and
deployment.The development process of a DSL has not to follow these phases sequen-
tially.
Different authors [
161
,
166
] have identified numerous patterns which are reoccurring in
3
http://www.ecma-international.org/publications/standards/Ecma-334.htm
10
DSL development and can serve as guidance for a developer without prior expertise in this
field.Each pattern can be assigned to one of the five phases.The patterns are divided into
decision patterns,analysis patterns,design patterns and implementation patterns each cap-
turing common approaches.In the following section phases and patterns will be described
according to the extensive analysis by Menrik et al.[
126
].
2.1.5 Decision phase
Before the development of a new DSL can begin,a decision has to be made.Is it feasible
or not?Economic considerations have to be taken into account.Do the accumulated devel-
opment,deployment and maintenance costs justify a new DSL in comparison with other
conventional approaches?Is there already a suited existing DSL?If so are documentation
and maintenance good enough?If not,is the risk developing a new DSL acceptable?
The following decision patterns have been identified.Most of them based on the same
general concerns such as allowing domain experts with less programming experience [
130
,
163
] to develop software or improving software economics.

Notation An improved new or existing domain specific notation can be a definitive
factor.Two common subpatterns are the transformation of a visual to a textual notion
and the creation of a user friendly notation for an existing API.The first pattern for
example enables easier composition for large artifacts.

AVOPT Domain-specific analysis,verification,optimization,parallelization and trans-
formation for applications developed in a GPL are in general time consuming and
hard to automate due to for example source code complexity.With a well defined
DSL AVOPT is more feasible.

Task automation In some cases GPL programming suffers from repetitive program-
ming tasks.Automatic code generation driven by an appropriated DSL can ease this
[
160
].

Product line Some software products do not exist as a single standalone application
but are part of a product line or software family,sharing common parts.A DSL can
facilitate the specification and support automated assembly [
182
].

Data structure representation Representing structured data in an easy to read,write
and maintainable form assists in making complex structures accessible.An appro-
priated DSL can help achieving this goals.YAML [
22
] and JSON[
52
] are examples.

Data structure traversal Like representation,traversal of data structures can often be
expressed more effective with a fitting DSL (for example SQL [
84
]).

System front-end DSL based configuration and adaption for systemfront-ends

Interaction Text,menu,dialog or voice based applications which interact with the
user can benefit froma DSL which specifies input and reaction in a high level repre-
sentation.
11

GUI construction Often GUI design is done by using a DSL.For example XUL and
XAML are XML based DSL for GUI description [
25
].
2.1.6 Analysis phase
After the decision in favor for a (new) DSL is made,the specific domain has to be analyzed
with the goal of gathering as much domain knowledge as possible.It is important to ensure
a high quality of the gathered material and to have access to domain experts.The termdo-
main analysis was introduced by Neighbors [
132
] and defined as identifying similar objects
and operations in a particular domain.Different sources of information can be examined
for example already existing technical documents,APIs and GPL code or knowledge from
domain experts.
After gathering the knowledge must be clustered to find meaningful abstractions and
must be consolidated.In most cases the results of the analysis are a domain definition,
the domain specific terminology and concepts,a domain model,the domain scope and
a description of the (operational) semantics.Figure
2.2
summarizes different sources and
possible results.Yet there is no widely adopted notation to capture the results of the analysis
phase.
Three different domain analysis patterns can be identified:informal,formal and extrac-
tion fromcode.
Figure 2.2:Domain analysis,taken from[
147
]
12
Informal pattern
The informal pattern means that the domain analysis is done informally and therefore no
formal process is used.Most DSLs are developed without an analysis methodology [
126
].
This often leads to incomplete requirements and can complicate the development process.
While it is possible to get first results earlier the quality is not as high as with formal
patterns.For simple domains an informal process is often enough.
Formal pattern
Domain analysis can also be done using a defined process/methodology.Those which use a
methodology can be counted to those that followthe formal pattern.Using a formal pattern
helps to avoid missing important parts of the domain and can lead to more appropriate
requirements.Alarge number of methodologies used,come fromanother field of research:
domain engineering.Domain engineering is derived from the area of software reuse and
refers to the systematic modeling of a target domain.This is strongly related to the notation
of programfamilies [
171
] and software product lines [
113
,
162
].
While domain engineering and analysis techniques focus mainly on commonalities,
family and product line analysis examine the variations inside a domain.Several method-
ologies exist today:FAST (Family-Oriented Abstractions,Specification and Translation)
[
182
],Sherlock [
168
],DSSA (Domain-Specific Software Architectures) [
165
],DARE
(Domain Analysis and Reuse Environment) [
75
],FODA (Feature-Oriented Domain Anal-
ysis) [
100
],PROTEUS [
89
],ODE (Ontology-based Domain Engineering) [
54
] or ODM
(Organization Domain Modeling) [
158
].This list consists of the most well know methods
but is by no means complete.
An example where FODA and FAST are applied can be found in [
126
].While most
methodologies have a graphical feature diagram or domain model as result,Deursen and
Klint propose a formalized textual
4
representation [
58
] which can be used to generate UML
[
133
] diagrams or other types of documentation even code.
Semi formal
A specific semi formal approach (domain driven design) covering analysis is proposed in
[
66
].The creation of a fitting domain model is most important in domain driven design.At
first domain experts and software architects try to find a domain model which serves as a
base for a common communication language (Ubiquitous Language).This language will
be used later on in all aspects of the development process.It is advised that the notation for
the domain model is UML.Not only one large diagram,but several small diagrams each
describing a certain aspect or part should be used.The reason behind this is avoiding clut-
tering and reducing complexity.The UML artifacts should be accompanied by documents
that contain information not captured by UML like the meaning of concepts or what certain
objects are supposed to do.In comparison to other methodologies Evans gives extensive
information how to continue after the domain model is established or the feature analysis
is done.
4
The Feature Description Language (FDL),which is a separate DSL again.
13
Extraction fromcode pattern
The last identified pattern extraction from code derives a DSL directly from an existing
implementation.In most cases this implementation is done in a GPL though it is also
possible to derive fromanother DSL.
2.1.7 Design phase
The design of a DSL and therefore the development of the language itself is based on the
results of the earlier phases.Two questions have to be answered approaching the design:
How is the DSL related to existing languages and what kind of formal description for the
language is chosen?With each question different possible design patterns are associated
helping to find an appropriated answer.
Creating a language based on an existing one can have different advantages.Some
users may be familiar with the base language resulting in reduced training cost.Common
operations such as arithmetic’s for the family of C languages are well known to many
developers.Furthermore an existing implementation and/or eco system can be leveraged.
Three different approaches reusing existing language can be distinguished.The fourth
approach is the entirely new development of a language.
Piggyback
The newlanguage can piggyback domain specifics feature on part of the existing language.
Examples are Hancock [
51
],lava [
159
] or Facile [
154
].Hancock is a DSL for high per-
formance signature processing and it piggybacks on C by modifying language parts and
adding processing related constructs.From this DSL,C code is generated again.Similar
to that lava,a production grammar DSL to describe and generate test cases for a JVM,pig-
gybacks on the textual Java byte code representation.The byte code is generated from the
DSL.The Facile language helps developing high performance processor simulation,also
by augmenting C.
Extension
A related pattern is extension.The base language is extended by features corresponding
to domain concept.In comparison to piggybacking the base language is not modified or
replaced.A problem of this approach is the seamlessly integration of new features with
existing ones.A DSL which follows the extension pattern is SWUL [
32
].SWUL supports
the development of Java SWING GUIs and is embedded into Java.
Specialization
Developing a newDSL does not always mean to create something new.Amore uncommon
pattern is specialization (not to confuse with specialization in UML).An existing language
is reduced to fit the needs of a special domain.Examples are RPython [
149
] or OWL-Lite
[
174
].RPython is a subset of the Python language used inside the PyPy project [
149
].The
complexity of Python is reduced in order to make C code generation fromRPython easier.
14
OWL-Lite is a subset of the OWL Web Ontology Language with the goal to reason easier
about it than OWL.
Entirely new language
While building on another language can have several advantages as discussed above,cre-
ating a completely new language offers great flexibility.However with great flexibility
comes an increased difficulty designing the language.The design of most GPLs is directed
by a number of desirable characteristics [
72
],namely simplicity,abstraction,expressive-
ness,uniformity,efficiency,safety,modularity,clarity and orthogonality.More essential or
desirable characteristics and also language design advices are stated by Horowitz [
90
] in
his evaluation of programming languages.Also software economics may influence the de-
sign,too.Costs can be generated by programmer training,software creating,compilation,
execution and maintenance time which depends on the DSL.
Following the guidelines for GPL design while designing a new DSL is considered a
good idea,many DSL researcher believe that the same rules apply for general and domain
specific languages.This needs caution though as real life DSL development has shown
[
186
].Some domain models or established domain notations can be in conflict with the
basic rules.The DSL users can be the end-users and do not need to be programmers.Wile
formulates some learned lessons fromreal life DSL development:
“Lesson 3:You are almost never designing a programming language.
Lesson 3 Corollary:Design only what is necessary.Learn to recognize your
tendency to over-design.
Most DSL designers come from language design backgrounds.There the ad-
mirable principles of orthogonality and economy of form are not necessarily
well-applied to DSL design.Especially in catering to the pre-existing jargon
and notations of the domain,one must be careful not to embellish or over-
generalize the language.” [
186
]
Informal or formal design
After the question is answered if and howthe DSL might be related to other languages,the
designer has to specify the language itself.This can be done formally or informally.
Using the informal design pattern is often easier.The specification is usually written in
some sort of natural language.Sample DSL code snippets or short programs are sometimes
used exemplary to showlanguage constructs and what their meaning is.This approach suf-
fers fromcertain restriction:it is not clear if the descriptions are precise enough - language
syntax and semantic problems can only be detected in the implementation phase and often
informal artifacts are not directly useful for the implementation.
A formal language design can solve some of these problems.Syntax and semantic
problems can be checked automatically if an appropriated formalism is used.With a for-
mal specification it is possible to derive parts of the implementation and therefore reduce
the needed implementation efforts.Different formal methods may be applied including
15
grammars for syntax and abstract state machines for semantic definition (like BNF),at-
tributed grammars or rewrite systems.
2.1.8 Implementation phase
After the hurdles of analysis and design are taken,the DSL has to be implemented (in case
it is executable,editable or should be analyzed).A wide range of possible implementation
strategies exists and the best appropriated one should be chosen.This decision can strongly
influence the needed development efforts and should be considered carefully.As with
decision,analysis and design different possible implementation patterns can be identified.
Fowler [
74
] distinguished between external and internal DSL.Spinellis [
161
] and Mernik
[
126
] found similar patterns but did a deeper analysis.Some of them will be discussed in
the following sections.
Compiler
5
/Interpreter pattern
The most obvious strategy is implementing a DSL in the way a large number of GPLs are
implemented:by using an interpreter or compiler/application generator approach.
An interpreter interprets DSLcode in a recognize-fetch-decode-run cycle.Agreat num-
ber of dynamic languages (such as Ruby
6
or Python) is implemented following this pattern.
An interpreter is much easier to implement and later to extend,compared to a compiler.
Furthermore the runtime environment is often easier to control.
A compiler approach offers different advantages.Because the complete code has to
be analyzed by the compiler extensive static analysis including error checking and appro-
priated error reporting can be done.The source is then transformed into library calls and
lower level language constructs (For example when compiling C#,it is transformed into
MSIL and calls to.Net Libraries).This allows fine grained optimizations and often offers
a performance advantaged due to native execution of the compilation results.
Preprocessors
Another classical approach to DSL implementation involves preprocessors.The prepro-
cessing step happens before the code is interpreted or compiled.Different flavors can be
distinguished.
The macro preprocessor is one of the most used forms of preprocessing.Macros can
be simple substitutions on the lexical level before any parsing is done.One or more tokens
can be replaced by a sequence of characters or other tokens.These substitutions can often
be parametrized.The C language with the C preprocessor is an example of simple macros.
However working only on the lexical level has many shortcomings.The preprocessor is
independent of the underlying language and therefore syntax or logical errors can be easily
introduced while expanding the macros.These errors can only be detected at interpret or
compile time.
5
In the context of DSL development,compilers are sometimes called application generators.
6
http://www.ruby-lang.org/
16
This problem is solved by syntactical macros that work on the syntactical level hence
on the parse tree.Scheme or <bigwig> for example offer hygienic macros [
109
].These
macros are checked and guarantee to prevent the production of collisions with existing
symbol definitions during expansion.The language C++ also offers syntactical macros,
called templates or template metaprogramming
7
.An example DSL constructed by C++
template metaprogramming techniques and operator overloading is Spirit (a C++ Parser
Framework [
55
]).
Another form of preprocessing is the source-to-source transformation.Unlike macros
where the scope is restricted to some tokens,the whole source is translated into another
language.An example is the in
2.1.7
mentioned language SWUL.Here the complete DSL
which is embedded inside a Java source file is replace by Java code.
A more uncommon preprocessing pattern is the pipeline.The source is not change in
one but in multiple steps whereas each processor is responsible for a subset of the DSL.
The language CHEMis an example using a pipeline architecture.
Embedding
While preprocessors replace the DSL code the embedding approach
8
directly embeds DSL
constructs
9
inside a host language.The basic embedding,is the creation of a simple appli-
cation library.To allow a more domain specific notation,new types,operators and other
constructs depending on the base language are created.The combination of the new con-
structs is then used to describe a domain specific problem.In C++ an often used feature to
allow domain specific notation is operator overloading
10
.
Not all languages are equally well suited as a host for a DSEL.Hudak states that func-
tional languages like Haskell
11
or MetaOCaml
12
are far more convenient than classical
static typed compiled languages such as Java (Lava is a Haskell DSEL example [
27
]).
Mostly this is due to features like partial and lazy evaluation,high order functions or
generic and polymorphic strong types.However also dynamic languages like Smalltalk
[
81
] or lately Ruby [
167
] have proven to be equally suitable for embedding a DSL because
of their very flexible syntax and advanced introspection,reflection and dynamic runtime
modification features.In contradiction to Hudak,Freeman and Pryce [
77
] show that it is
possible to implement a acceptable DSEL in Java.By separating syntax frominterpretation
through object call chains which create an object graph,a good compromise is reached.In
the end they conclude,now in conformance with Hudak,that even though they reached
their goal other languages are better suited.
Some future languages are especially built with extensibility in mind like Fortress [
9
]
or Scala[
138
].
7
The C++ template system is a Turing-complete almost functional compile time programming
language[
176
].
8
Often called domain specific embedded language (DSEL) or embedded domain specific languages
(ESDL)
9
DSL constructs have to be valid constructs in the context of the host language.
10
For example Liboctave a C++ math library allows matrix addition and multiplication using the normal
addition or multiplication operator.
11
http://www.haskell.org/
12
http://www.metaocaml.org/
17
Extension
A related approach to embedding is the extension of the language itself.The compiler
or interpreter is modified to work with domain specific constructs.This is more difficult
than other approaches since most compiler or interpreters are not designed for extension.
Modular and safe techniques are still a matter of research.One possible solution are Meta
Object Protocols (MOP) like the runtime MOP [
103
] for CLOS or the compiletime MOP
for C++ [
48
].
Metaobjects are objects modeling the object system itself and they allow introspection
and customization.A Metaobject protocol guarantees that the Object Systemwill follow a
documented protocol for various parts of its operation.therefore a MOP defines the mean-
ing and behavior of a program.Through the MOP an existing language can be modified
and extended.
COTS
With the rise of XML another form of DSL implementation has evolved.DSL implemen-
tation can also be approached by using commercial of the shelf (COTS) tools and library
and restricting them to fit a special domain [
101
].The grammar for a XML DSL can be
customized by using a DTD or a XML Schema.As a broad range of XML library and
tools is available for parsing (DOM,SAX and STAX parsers),for analyzing (XQuery),and
for transforming (XSLT) the effort for a standalone parser and compiler can be reduced
[
139
].In addition to that tools for writing and displaying XML in a structured manner are
available.However XML is not the only possibility as Wile showed [
186
].
Hybrid
The different strategies have on the one hand advantages but disadvantages on the other..
While it is possible to have a syntax very close to the domain,domain specific verification
and optimization (AVOPT in general) and good error reporting using the compiler/inter-
preter pattern often causes problems due to the complexity of the tools development and
the need to design a language from zero compared to the solution of leveraging a existing
solutions.In
2.1.9
some frameworks and tools will be discussed which try to solve some
of these problems.Additionally the framework presented in this paper tries to solve some
problems.
The embedding approach has similar advantages and problems but the sides are flipped.
The advantages:often existing languages and their tool chains can be reused and the im-
plementation effort is reduced.On the other side it is hard to create a domain fitting syntax
being bound to a host language.Although the existing infrastructure is leveraged,the DSL
concepts are not known to the base language resulting in the problem that error reporting
or debugging happens in terms of the hosting language not the DSL.
Therefore it is also possible that a hybrid approach is used.Examples are combin-
ing macros and an extensible compiler [
164
],combining embedding and compiling [
65
],
combining macros and embedding [
71
] and many others.
18
It has to be decided per case which kind of approach is appropriated.A non general
applicable and simplified comparison of needed effort is presented in table
2.2
.Mernik
implemented the Feature Description Language (FDL) [
58
] using several approaches and
compared the results.
Approach
Lines of Code
Compiler
1100
Macro-processing
50
Embedding (Prolog)
40
XML-based
310
Table 2.2:Lines of Code for different approaches to implement FDL
2.1.9 Approaches supporting DSL development
Many different tools and frameworks are available to support the development of domain
specific languages.They differ in their approaches and their maturity.Some support
the whole development life cycle and offer a solution for editor and debugger generation
whereas others are simple compiler generators.Most of them are based on various forms
of language descriptions.
Compiler Compiler
The lowest common denominator are compiler compiler tools
13
,that can be used to develop
new domain specific languages.They are able to construct a lexer/parser out of a given
grammar.This can be the base for an interpreter or compiler.The most prominent ones
are Lex/Yacc
14
,Flex/Bison
15
,JavaCC
16
,SableCC
17
,CoCo/R
18
and ANTLR [
141
] (a more
comprehensive list is given by [
2
,
185
]).They often differ only slightly in the used grammar
or the power of the generated lexers/parsers (GLR,LALR,SLR,LRor LL).Alarge number
of different DSLs has been constructed using one of these tools.However they only solve
the problem of syntax recognition,so a parser/compiler has not to be developed manually.
It should not remain unmentioned that they often require a solid background in language
development theory,like to understand the difference in power between a GLR and LL
generator.
JTS,Maya,JSE
In section
2.1.8
different approaches to DSL implementations are described.As stated
normally it is hard to extend existing languages if no special arrangements were taken by
13
Often also named lexer/parser generators.
14
http://dinosaur.compilertools.net/
15
http://www.gnu.org/software/flex/
16
https://javacc.dev.java.net/
17
http://sablecc.org/
18
http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/
19
the original language implementers.Java is such a case.Nevertheless the Jakarta Tool
Suite [
19
] supports the extension of the Java language with domain specific constructs.
JTS consists of two components Bali and Jak.Bali is a Java Precompiler which takes
an annotated EBNF grammar as input and Jak is a JTS generated superset of the Java
Language with meta programming capabilities.Together they allowan easier extension of
Java corresponding to a macro approach.Similar approaches are Maya [
16
] and the Java
Syntax Extender (JSE) [
15
].
JLE
While JTS provides facilities for error checking there is no support for more general static
analysis.The Java Language Extension Framework (JLE) [
188
] is an alternative to the
JTS.The primary idea behind JTS was macro processing,the JLE framework builds a new
customized Java compiler out of the different selectable language extension.The language
extensions are formulated in an attributed grammar and are strictly modular.They can be
used for the simple extension of the host language grammar productions or for the definition
of static analysis and optimizations.Finally the customized compiler generates pure Java
fromthe input code.
EasyExtend
Another approach to language extension with domain specific constructs is EasyExtend.
EasyExtend allows to extend the Python programming language.This is done by modular
modifications to the original Python grammar.The new grammar is used in combination
with a mapping to valid Python constructs creating a new domain specific parser.One that
can parse and run the DSL on top of Python.The original CPython Parser is not modified
and has not to be recompiled
19
.
metafront
A framework which goes beyond the shortly before introduced tools is metafront [
30
].
Metafront is language agnostic.It consists of two parts:grammar and syntactic transfor-
mation specifications.An input stream according to the grammar is parsed by a new pars-
ing algorithm (specificity parsing).The input grammar can be created in a modular way
combining multiple grammar input files (one grammar can extend another grammar).The
resulting parse tree can then be transformed by given declarative transformation rules.For
every transformation an input and an output language (grammar) has to be defined.This
allows metafront to statically check the transformations for type safety and to guarantee
the termination of the rules.The metafront approach can be used as a (macro) preprocessor
for domain specific language extension or for the development a completely newlanguage.
The language is then transformed into a base language like C.
19
A modified parser fromthe PyPy project is used (written in Python).
20
LISA
The previous approaches could be used for the extension of an existing language or for a
complete newDSL,they do not solve the DSL ecosystemproblems.Nevertheless different
frameworks try to improve this situation.
The Language Implementation System on Attribute Grammars (LISA) [
88
] uses at-
tribute grammars not only to generate a compiler or interpreter.A large number of differ-
ent tools can be generated by the Java based LISA tool including a syntax aware editor or
finite state automate visualization editors.One reason for that is the possibility to separate
semantic rules fromgrammar production rules.This in addition to advanced templates and
inheritance formalismsupport for grammars allows better modularization.Another part of
LISA is debugger generation,although this was not a target for LISA and is therefore only
rudimentary.
MetaEnvironment - ASF+SDF,TIDE,AspectAsf
While LISA is based on attribute grammars the MetaEnvironment [
169
] is based on alge-
braic specifications.It has been in development for several years and had several rewrites,
but can be considered stable today.The most important subpart is ASF+SDF (algebraic
specification formalism and syntax definition formalism) allowing a integrated but still
modularized definition of language syntax and semantics.ASF+SDF delivers executable
specifications based on conditional equations and term rewriting
20
.Other important parts
are a SGLR parser,the ToolBus for coordinating different tools and ATerms,a format to
exchange annotated trees between the several tools.Based on this foundation the MetaEn-
vironment can generate interpreters,editors or pretty printers for DSLs.
With the ToolBus Integrated Debugging Environment (TIDE) [
170
] it is also possible to
get debugging support for a DSL,but it still needs several manual steps.Another solution
to get debug support is AspectAsf [
143
].AspectAsf annotates the grammar with all pos-
sible debug actions at every potential place,This increased the complexity of the resulting
grammar.
The MetaEnvironment aims to reduce development effort for new languages,but the
developer has to learn the complex ASF+SDF formalism.Additionally the MetaEnviron-
ment has grown over the years to a large collection of components making the framework
flexible and generic but decreasing the ease of maintenance.Due the flexibility learning
and mastering the environment is difficult.The MetaEnvironment is only available for
Linux based systems.
Strongly related to the MetaEnvironment is Stratego/XT
21
[
177
],which is based on
SDF,too.It uses a different and more powerful language for termrewriting.With it differ-
ent tools can be generated.Stratego is even more complex due to highly advanced concepts.
The MetaEnvironment and LISA are both suited to implement DSLs and moreover GPLs
or can be used for programtransformations.
20
An extended explanation can be found in [
107
]
21
Stratego is the language and XT is a tool collection.
21
Smarttools
As LISA SmartTools [
13
] is a development environment generator based on Java and it
additionally makes heavy use of XML.A DSL can be define as an AST
22
.This AST is
in turn manipulated.From it SmartTools is capable of generating a structured editor,a
pretty-printer,a UML model and a parser specification.A debugger can not be generated,
though.
Horn logic approach
An approach able to generate a compiler and even a profiler is presented in [
180
].It is based
denotational semantic and uses Horn logic where Horn logic code syntax and semantic are
executable and yield an interpreter.This interpreter can be partially evaluated together
with a DSL program and resulting in compiled code.This approach is very powerful,
though mathematical,and requires many theoretical skills from the DSL developer.Also
no surrounding toolset is generated.
DSL Debugging Framework
As manually building debugging tools for each new DSL is time consuming,Hu [
187
]
proposes a DSL Debugging Framework (DDF) based on the Eclipse Debugger Platform
and on DMS.A DSL grammar is defined with ANTLR.From this grammar a DSL parser
and additional source mappings are generated.The mappings relate each DSL construct
to the back-end code.A mapping component communicating with the Eclipse Debugging
perspective and the back-end debugger is derived of the mapping.The mapping instruc-
tions are applied automatically to the grammar by aspects,by using DMS (a transformation
system).They contain instructions which are inserted in the grammar actions.The aspects
are specified in AspectG,a language for modifying the ANTLR grammar.
DDF supports the generation of DSL debugger for imperative,declarative and hybrid
languages.Currently only Java as back-end and ANTLRas front-end are supported.More-
over the actions are specified in Java.For this reasons the approach can be seen as rather
inflexible at the moment.
Other frameworks
Most of the presented frameworks and tools are based on previous research results.For
the completeness a few should be mentioned:Draco [
131
]which allowed the creation of
a parser,transformer,optimizer,analyzer and pretty printer for a DSL in an integrated
way,KHEPERA [
67
] which worked as a source-to-source translator,powered by a simple
AST constructing parser,a tree transformer and a pretty printer for the resulting trees and
furthermore Sprint [
55
] based on denotational semantics and partial evaluation of a DSL
interpreter.Other approaches will not be discussed but they are similar:APPLAB [
26
],
DMS [
20
],smgn [
104
],Eli [
97
],SPARK [
121
] and TXL [
50
].This list is by no means
complete and several other solutions exist.
22
abstract syntax tree
22
Also some commercial solutions promise DSL development support by generating ed-
itors,debuggers and other tool such as JetBrains MetaProgrammingSystem
23
or Intention-
alProgramming
24
.However non of these has reached production state or is available to the
public.
2.2 Model-Driven Engineering
While section
2.1
focused on DSLs this section discusses Model-Driven Engi-
neering.In the following the terms model,metamodel,metametamodel tech-
nology space will be defined and several examples given.Afterwards a pos-
sible formalization of the terms is presented.Then some key techniques are
described like domain specific modeling languages,transformations and gen-
erators.Finally the Model-Driven Architecture will be described shortly in-
cluding MOF,EMF/ECore and XMI.
Model-Driven Engineering
25
(MDE) is a vision that moves the model into the center of the
software development process.Moreover any software artifact is considered a model or
model element.While previous approaches used models for documentation or communi-
cation of ideas,they are first class entities throughout the whole model-driven engineering
life cycle [
40
].MDE is an open and integrative approach not tied to a special standard,
therefore different implementations exist.
The problem that should be solved is the growth of platform complexity beyond the
ability to mask it with current GPLs.MDE tries to capture what is often expressed in an
informal way (prose,diagrams) as formal model based specifications.
Key concepts to mitigate current software engineering problem are models,metamod-
els,technology spaces,domain specific modeling languages and different kinds of model
transformation like model to model or model-to-text
26
.
Model-Driven Engineering is currently only described informally but different formal-
ization approaches are researched at the moment [
68
,
150
].
2.2.1 Models - the foundation of MDE
Model,Metamodel and Metametamodel
Models are the key artifact of MDE.For that reason the term model is defined first.The
word model has its etymological root in the Latin word modullus that is the diminutive of
modus.Then again modus meant a special architectural constraint ratio between parts of a
building.Ageneral definition merely based on the etymological meaning is not easy.Today
the interpretation of the word model strongly depends on the viewpoint of the observer and
his domain.
23
http://www.jetbrains.com/mps/
24
http://intentsoft.com/
25
The often used term Model-Driven Development (MDD) is equivalent to MDE,but is a registered
trademark of OMG.Also used:model driven software development (MDSD).
26
Often also called generators
23
As this thesis is a computer science thesis a definition from the computer scientists
Gerbé and Bézivin [
41
] is used (a more general discussion of the model termcan be found
in [
24
,
114
,
156
]).
“Amodel is a simplification of a systembuilt with an intended goal in mind.
The model should be able to answer questions in place of the actual system.”
While this serves as a starting point,Kleppe et al.[
105
] gives a definition even more
directed to MDE.
“A model is a description of a (part of) systems written in a well-defined
language.A well-defined language is a language with well-defined form (syn-
tax),and meaning (semantics),which is suitable for automated interpretation
by a computer.”
Kleppe speaks about a “well-defined language” which can be used to create a model.In
MDE metamodels define how a model can look alike.
The word “meta” is Greek and means “above”,therefore the term metamodel can be
interpreted as a model describing another model
27
.To understand the term metamodel a
simple analogy to languages is drawn.A language consists of words whose combination
is constraint by a grammar.If a sentence in a language is seen as one possible model,the
definition of its structure,the grammar,can be seen as its metamodel.Earlier it was said
that in MDE a metamodel defines how a model can lookalike,this can be more precisely
formulated as:a metamodel defines the constructs and rules usable to create a class of
models.This is consistent with the following definition:
“A meta-model is a model of a set of models.” [
69
]
“Ameta-model is a model that defines the language for expressing a model.”
[
135
]
Notice that a metamodel is itself a model.If this is true the metamodel has also to be
defined in a “language” (needs a metamodel).For that reason the metametamodel is in-
troduced,allowing to specify metamodels.The question which can be raised,is how the
metametamodel is defined.To avoid an infinite stacking of meta levels,metametamodels
are often specified self reflexive and therefore the metamodel of the metametamodel is the
metametamodel itself.
Most approaches implementing MDE define a three level meta stack - model,meta-
model,metametamodel.A metamodel can be used to clearly define a class of models and
the metametamodel should allow the specification of all possible metamodels including
itself.Therefore one metametamodel should be enough.
The one metametamodel needs still to be defined.Currently different models com-
pete:the MetaObjectFacility (MOF [
135
]),the Kernel Metametamodel (KM3 [
93
]),EMF
27
It is important to differentiate between a model about a model and a model of a model.While a painting
of a painting showing a house can be seen as a model of a model it does not describe for example of what
colors can be used in a painting.
24
MDE
implementation
Main contributor
Metametamodels
Tools
Model-Driven
Architec-
ture(MDA)
OMG
MOF,ECore,
KM3,SMD,
Kermeta
Eclipse EMF
Software
Factories
Microsoft
unnamed
Microsoft DSL
Tools
Model Integrated
Computing
(MIC)
University of
Vanderbilt
MetaGME
GME
Model Centric
Software
engineering
Lockheed
?
Lockheed
internal
Table 2.3:Different MDE implementations,different metametamodels
ECore [
37
],MetaGME [
119
],the Coral Simple Metamodel Description (SMD [
8
]),Ker-
meta [
146
],the Microsoft DSL Tools Metametamodel [
127
] and many others.All serve
a certain MDE implementation as metametamodel but differ in expressiveness.Unifying
these or enabling translation between them will mitigate fragmentation in the long term.
Different approaches have already been made [
42
,
43
,
79
] to bridge metametamodels for
instance MDA to MIC or MDA to Microsoft DSL Tools and vice versa.Yet as Bézivin
states [
45
] simple translation does not solve all fragmentation problems and is an area of
ongoing research.
On the end of this section it should be mentioned that the described terms are still
subjects of philosophical debates.For slightly different views [
24
,
39
,
69
,
70
,
148
] are
good starting points.
Technological spaces
Different notations like BNF,XML Schema or Ontology’s description can be seen as
metametamodel,too (for grammars respective XML documents or ontologies).This de-
pends on the technological space (TS) under consideration.Technological spaces were
introduced in [
116
].The idea is having an abstract base to reason about similarities,differ-
ences and integrations possibilities for different technologies.The following definition is
given by Kurtev et al.[
117
]:
“A technical space is a model management framework with a set of tools
that operate on the models definable within the framework.”
Kurtev continues with the observation that each technology space is based on a three-level
metamodeling stack (model,metamodel and metametamodel) as can be seen in figure
2.3
.
This has already been described for the model technological space (MDE or ModelWare in
25
general) but is also true for technological spaces such as DataWare,GrammarWare [
106
]
or XMLWare.
Layer
3
Layer
2
Layer
1
MOF
UML
metamodel
UML model
EBNF
a C program
C Grammar
XML Schema
XML Schema
Definition
a XML
document
conformsTo
conformsTo
conformsTo
conformsTo
conformsTo
conformsTo
conformsTo
Technology space
OMG
/
MDA
Grammar
XML
Figure 2.3:Different technological space and the metamodeling stacks
As different metametamodels exist,various frameworks,languages and tools are avail-
able for generation and transformation.
Formalization
Currently the terms model,metamodel and metametamodel have only been defined infor-
mal.Kurtev et al.[
117
] propose a formal definition of these terms.This is done out of
an organizational point of view (as a set of graphs constrained by other graphs).If in the
remainder of this thesis is referred to the mentioned terms the beneath definitions should
be used.
Definition 2.1:
A directed multigraph G = (N
G
,E
G

G
) consists of a set of nodes N
G
,a
set of edges E
G
,and a mapping function Γ
G
:E
G
→N
G
×N
G
.
Definition 2.2:
A model M = (G,ω,µ) is a triple where:G = (N
G
,E
G

G
) is a directed
multigraph,ω is itself a model (called the reference model of M) associated to a graph
26
G
ω
= (N
ω
,E
ω

v
) and µ:N
G
∪ E
G
→N
ω
is a function associating elements (nodes and
edges) of Gto nodes of G
ω
.The relation between a model and its reference model is called
conformance.We denote it as conformsTo
28
.
Definition 2.3:
A metametamodel is a model that is its own reference model (i.e.it con-
forms to itself).
Definition 2.4:
A metamodel is a model such that its reference model is a metametamodel.
Definition 2.5:
A terminal model is a model such that its reference model is a metamodel.
2.2.2 Domain specific modeling languages
The ideas behind domain specific modeling languages (DSML) and domain specific mod-
eling are similar to the DSL concepts presented in
2.1
but applied to the world of models.
Whereas in DSLs the language,its textual representation,is domain specific,in DSM
29
the
models are domain specific.In comparison with DSLs the representation is almost in every
case graphical.The gained benefits are similar,too.
UML according to its name a unified modeling languages tries to offer a set of differ-
ent model types able to describe every imaginable domain.Yet often the UML or other
traditional modeling languages lack the facilities to precisely capture domain knowledge
leading to abuse and a large number of proposed extensions [
124
,
137
].The recent version
of UML allows the modification by profiles,enabling specialization.Of greater importance
is the fact that UML diagramtypes are now based on a single metamodel
30
(MOF).
Using a metamodel for the definition of a DSML offers a number of advantages:

Decreased development effort for visualization and editors because of a common
metamodel

Model verification and checking according to a metamodel with available tools [
144
]

The DSML stays extensible (for example SysML based on the UML2 metamodel
[
137
])

Uniform exchange formats exist (like XMI) and a next generation model exchange
is proposed [
111
]

Approaches for models comparison based on a metamodel exist [
110
,
123
]

The semantics of a DSML can be described by a metamodel [
60
]
28
Alternatively instanceOf is also correct.
29
Domain Specific Modeling
30
In this case MOF is the metamodel for the definition of the definition of what elements can exist in a
diagramor simply the metamodel of the DSML.If the definition of what kind of elements can exist is treated
as a metamodel,MOF is its metametamodel.It depends on the point of view when MOF is treated as model,
metamodel or metametamodel.
27
Name
Metamodel
information
Eclipse EMF/GMF
ECore/XML Schema
XMI support,complex but most
complete and powerful compared with
others [
11
]
MetaEdit+
GOPRR
customizable diagrameditor including
element constraints
Generic Modeling
Environment
MetaGME
OCL and XMI support,complete
graphical customization possible
Generic Eclipse
Modeling System
ECore
based on Eclipse GMF,a large number of
DSML development steps is automated
in comparison to raw GMF.
XMF-Mosaic
MOF/XCore
based on Eclipse,editors customizable,
OCL support
Tau G2
MOF/UML
DSML are created by defining UML 2.0
profiles through extending the UML
metamodel.Modification of graphical
representation is restricted to model
elements (for example link could not be
customized).
Rational Software
Architect
MOF/UML
DSML are created by defining UML 2.0
profiles but only stereotypes are possible.
Graphical representation is even more
restricted then Tau G2.
MagicDraw 12.0
MOF/UML
extension by UML 2.0 profiles,API for
defining customdiagramelements
Microsoft DSL Tools
unnamed
plugin for Visual Studio.Net,
representation completely customizable
Table 2.4:Different tools for creating DSML
Current tools
A large number of tools usable to specify and then use a DSML are available.They allow
to define allowed elements and how their graphical representation look like.Most of them
are based either on the ECore or the MOF.A detailed view goes beyond this thesis but a
few are listed in table
2.4
.
Earlier approaches
The described approach to modeling a specific domain is not new.Already CASE tools
implemented many of the ideas.In contradiction to the described DSMLtools however they
were based on proprietary modeling languages,metamodels and ineffective code generator
and failed for this reasons [
82
].
“Most of these tools did not take advantage of platform-specific features,
28
and produced naive,inefficient,least common denominator code.The upfront
costs of adopting the modeling techniques they required were prohibitively
large.Added to this was the risk inherent in spending a substantial portion
of the projects budget building models with the promise of code appearing
only in later stages.This required enormous confidence in the tools and in the
longevity of their vendors.Also,concepts like round-trip engineering,where a
model could be synchronized with changes made independently to the code it
had generated from them,were overwhelmingly complex.Another criticism
was that many CASE tools imposed a methodical top-down process.This
was the ultimate kiss of death,since rapid iteration of partial solutions has
proven to be critically important to the success of application development,as
so clearly demonstrated by agile development methods.” [
21
]
2.2.3 Generators and transformations
Transformations and generators allow to synthesize different artifacts such as source code
or alternative model representations and also transform traceable between different mod-
els with the possibility to ensure consistency.Furthermore it opens the possibility to do
model refactoring,specialization,annotation or model aspect weaving.Transformations
and generators are enabled by metamodels.
Generators
In general most available generators can be categorized as either template or visitor based.
Often generators are referred to as model-to-text transformations.Template based genera-
tors replace certain parts inside a template document with information fromthe model.One
instance of a template based generator is JET which is part of the EMF.Additional to JET
there are a large number of different template based generators like MOFScript,XPand or
Acceleo which are focused on model-to-text or more general ones such as Smarty,Con-
template,Cheetah,Jinja,Savant,or Liquid to name only a few.
The visitor approach works as described by the visitor pattern [
78
],though the visitee
is the model.Each model element is visited and an according action is triggered.In the
context of the current visited model element specific text is generated.
Another approach,falling under the category of template generators but less frequently
used are generative grammars.The main idea is to use grammars as in compiler generator
tools,but reverse the direction.While in compiler tools a lexer and a parser are generated
being able to consume arbitrary text conforming to the grammar,reverse grammar tools
generate text conforming to the grammar,according to given model (input data).Such
tools are ST [
140
] or Anti-Yacc [
87
].
29
Approach
Tools
Template
AndroMDA[
128
],FUUT-je[
173
],
OptimalJ
31
,Xpand
32
,ArcStyler
33
,
MOFScript
34
,Acceleo
35
Generative
ST [
140
],Anti-Yacc [
87
]
Table 2.5:Different model-to-text approaches
Tool
Description
AndroMDA,FUUT-je,OptimalJ
template engines based on UML
MOFScript,XPand,Acceleo
template engines based on MOF,
more flexible then UML based
engines
Smarty
36
,Contemplate
37
,
Cheetah
38
,Jinja
39
,Savant
40
,or
Liquid
41
general applicable template
engines,no special MDE support
ST,Anti-Yacc
generative grammar engines,no
special MDE support
Table 2.6:Different generator tools
Transformations
Transformations and the executing systems can be grouped by different criteria,for ex-
ample whether they are imperative or declarative,how many input and output models are
involve,are they horizontal
42
or vertical
43
,whether they directly manipulate
44
a model,are
31
http://www.compuware.com/products/optimalj/
32
http://www.openarchitectureware.org/
33
http://www.interactive-objects.com/
34
http://www.eclipse.org/gmt/mofscript/
35
http://www.acceleo.org/
36
http://smarty.php.net/
37
http://www.typea.net/software/contemplate/assembled/home.html
38
http://www.cheetahtemplate.org/
39
http://wsgiarea.pocoo.org/jinja/http://smarty.php.net/
40
http://phpsavant.com/yawiki/http://smarty.php.net/
41
http://home.leetsoft.com/liquid
42
The source and the target model share the abstraction level,for example when a refactoring is performed.
43
Source and target model exist on a different abstraction level,for example a refinement.
44
The user has to implement transformation rules and sequence trough API calls in a GPL like Java,which
directly modify the model.
30
Technology space
Metametamodel
Standard/Tools
MDA/EMF
MOF/ECore
QVT [
136
],ATL [
95
],Tefkat [
118
],
YATL [
142
],MTF
45
,UMT[
83
],BOTL
[
31
],MOLA [
98
],SmartQVT
46
,Borland
TogetherQVT
47
,MiA
48
,MTRANS[
145
]
XML
Schema
XSLT
49
Grammar
Grammar in (E)BNF
ANTLR AST Transform
50
,TXL [
50
],
ASF+SDF [
169
]
Data
-
SQL
Graph
different
AGG
51
,GReaT [
6
],Viatra,PROGRES
[
155
],Groove,FUJABA [
179
]
Table 2.7:Transformation engines fromdifferent technology spaces
described via algebraic relations,are graph based or using a hybrid approach.Which of
the techniques is most appropriated?That has to be decided depending on the require-
ments.Mens and Gorp proposed a taxonomy [
125
] that can help with this decision.Again
the technology space has to be considered.Is the transformation intra or inter technol-
ogy space?An intra-TS transformation for example would be a transformation XML to
XML performed with XSLT and XQuery.An intra-TS transformation could be UML to
XML[
178
].
A large number of model-to-model transformation frameworks have been developed
lately,caused by a growing interest in MDA/MOF/EMF and MDE in general.The OMG
proposed the Query Viewand Transform(QVT) standard based on MOF.There are several
implementations claiming to be compliant with QVT up to a certain level.Nonetheless,
there exists no complete implementation of the standard version 1.0 today.The most ad-
vanced freely available implementation is the Atlas Transformation Language(ATL).
Note that transformations themselves can be expressed as models [
44
].These mod-
els are called transformations models and behave like regular models.They conform to
a transformation metamodel and can be transformed themselves (metatransformations or
high order transformations).
While there are several attempts to classify [
53
] or benchmark [
175
] different model
transformation approaches,an exhaustive comparison of the various transformation tech-
nologies is still pending but would be of great use.
45
http://www.alphaworks.ibm.com/tech/mtf
46
http://sourceforge.net/projects/smartqvt
47
http://www.borland.com/de/products/together/index.html
48
http://www.mia-software.com/miaStudio/indexOfMiaStudio.php?lang=
en&theme=prod-miagen
49
http://www.w3.org/TR/xslt
50
http://www.antlr.org/doc/sor.html
51
http://tfs.cs.tu-berlin.de/agg
31
Tool
Description
YATL
part of the Kent Modeling Framework,textual,MOF
based,declarative/imperative hybrid
MOLA
visual language,simple and loop patterns
BOTL
bijective,visual language,declarative,object oriented
mathematical foundation
Tefkat
textual,declarative/imperative hybrid,based on theory of
stratified logic programs
ATL
ATLAS Transformation Language,based on ECore EMF
metametamodel,very actively developed,
declarative/imperative hybrid
MTRANS
textual,MOF based,translated into XSLT which modifies
XMI
UMLX
graph bases,visual language,only for UML models,
translated into XSLT which modifies XMI
BOTL
graph based,bidirectional,only for UML models
Viatra2
graph based and abstract state machine based hybrid,
pre/post conditions for a transformation,recursive and
negative pattern support
VMT
graph based,visual language,declarative
MISTRAL [
115
]
MOF based,textual,declarative/imperative hybrid
Table 2.8:Some model-to-model transformers
2.2.4 Different MDE technologies in detail
MOF
As described in Section
2.2.2
different metametamodels exist.A well-known one is de-
scribed in the OMG Meta Object Facility (MOF) standard.The idea behind MOF is a
four-layered architecture as depicted in figure
2.4
.Sometimes it is referred to as a closed
metamodeling architecture because the defined metametamodel conforms to itself.Cur-
rently with version 2.0 different flavors of the MOF exist:the Complete MOF (CMOF),
the Essential MOF (EMOF) and the Semantic MOF.CMOF is the whole MOF whereas
EMOF is only a subset of the most important elements.
The widespread use of the Universal Modeling Language (UML) is one of the reasons
for the recent popularity of MOF.The current version is the base for UML2.All different
diagram types are defined conforming to the metamodel.The MOF is the foundation for
OMG Model Driven Architecture
52
(MDA),too.
52
The term Model Driven Architecture is misleading.While the part “Model Driven” has its explicit
meaning in the systematic use of models,the part “Architecture” was chosen by accident [
70
].In contrast to
what could be assumed to be “Architecture” it does not mean software architecture in this case.The OMG
has been criticized more than once for the misleading naming.In fact the etymology of MDA derives from
the renaming of the earlier Object Management Architecture (OMA) to MDA.However the OMGsometimes
refers by architecture to their “four layer architecture”.
32
M
3
:
metametamodel
(
MOF
)
M