The very least about Xtext

italiansaucySoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

129 views

The very least about Xtext

Juri

Luca De Coi

Saint
-
Etienne, France, 16
-
05
-
2011

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Xtext (I)

Xtext

is a language development framework


i.e., a technology supporting the activity of
developing languages

Given the
Xtext

grammar of a language, it
provides you with





for that language

an Eclipse editor with



content assistance



quick fixes



template proposals



outline



hyperlinking



syntax coloring



project wizard

a code
generator

a serializer
and code
formatter

a scoping
and linking
framework

a validator

an AST
builder

a parser

Xtext (II)

The more you want, the more you have to pay

BUT


if you are fine with the (reasonable) defaults,
your amount of work will be pretty low


otherwise,
Xtext

is highly configurable


Each automatically generated class can be replaced in
a non
-
invasive way


What do we want?


A parser


An AST builder


We will (almost) only focus on
Xtext’s

grammar
language

Technical remark

Xtext

is based on
Ecore



Knowledge of
Ecore

is required to exploit
Xtext’s

full potential

Ecore

is the core of the Eclipse Modeling
Framework Project (EMF)


EMF is “a modeling framework and code
generation facility for building tools and other
applications based on a structured data model”

I will try to leave
Ecore

out as much as possible



I will skip some (most) parts of
Xtext

Xtext

s grammar language (XGL)


A language to describe (textual) languages


An
Xtext

grammar describes


the syntax of the target language


the structure of the target AST

The syntax of the target language

Not surprisingly,
XGL

distinguishes between


lexical level


syntactic level


specifies the
language

s

by means
of

exploited
by the

Lexical
level

tokens

keywords,

terminal
rules

lexer (a.k.a.
scanner or
tokenizer)

Syntactic
level

grammar

parser
rules

parser

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Terminal rules (I)

terminal
NAME

:
expression

;


expression

can contain

1.
keywords


single
-

or double
-
quoted


can have any length


can contain arbitrary characters


including the escape sequences
\
b
,
\
t

,
\
n

,
\
f

,
\
r

,
\
"

,
\
'

and
\
\


Unicode escape sequences (e.g.,
\
u123
) are not
supported

EX:

'
foo
'
,
"
\
foo
"
,
'"'
,
"'"

Terminal rules (II)

2.
wildcard

(
.
)


An arbitrary character


EX:

.

3.
rule calls


Terminal rules can only point to other
terminal

rules


EX:

ID

(assuming that
ID

is the name of a terminal
rule)

4.
character ranges (
..
)


Extremes are included


EX:

'
a'..'z
'
,
'A'..'Z'
,
'0'..'9'

Terminal rules (III)

5.
until token (
-
>
)


All input between the preceding and the
following token (extremes are included)


EX:

'/*'
-
> '*/'

6.
negated token

(
!
)


Input different than the following


EX:

!'
\
n'

7.
cardinality operators (
?
,
*
,
+

or nothing)


EX:

'^'?
,
'
\
r'*
,
'9'+

Terminal rules (IV)

8.
groups (token sequences)


EX:

'a' . ID

(assuming that
ID

is the name of
a terminal rule)

9.
alternatives

(
|
)


EX:

' ' | '
\
t' | '
\
r' | '
\
n'

Operator priority

Ordered by decreasing priority







Parenthesis (
()
) can override default priorities

terminal ID:


'^'?


('
a'..'z'|'A
'..'Z'|'_')


('
a'..'z'|'A
'..'Z'|'_'|'0'..'9')*;

Character ranges

..

Until token, Negated token

-
>
,
!

Cardinality operators

?
,
*
,
+

or nothing

Groups

Token sequences

Alternatives

|

Technical remark

NOTE:

Terminal rules can hide each other


The order of terminal rules is crucial


This is especially important when mixing


newly introduced rules and


rules from imported grammars (cf. below)

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Parser rules (I)

name

:
expression

;


expression

can contain

1.
keywords


single
-

or double
-
quoted


can have any length


can contain arbitrary characters


including the escape sequences
\
b
,
\
t

,
\
n

,
\
f

,
\
r

,
\
"

,
\
'

and
\
\


Unicode escape sequences (e.g.,
\
u123
) are not
supported

EX:

'
foo
'
,
"
\
foo
"
,
'"'
,
"'"

Parser rules (II)

2.
rule calls


EX:

ID

(assuming that
ID

is the name of a rule)

3.
cardinality operators (
?
,
*
,
+

or nothing)


EX:

'^'?
,
'
\
r'*
,
'9'+

4.
groups (token sequences)


EX:

'a' ID

5.
unordered groups

(
&
)


Elements can appear in any order but only once


Elements with cardinality
*

or
+

must appear
continuously without interruption


EX:

'a' & ID*

Parser rules (III)

6.
alternatives

(
|
)


EX:

' ' | '
\
t' | '
\
r' | '
\
n'

Operator priority

Ordered by decreasing priority






Parenthesis (
()
) can override default priorities

Action:


'{'
TypeRef

(



'.' ID ('='|'+=') 'current'


)? '}' ;

Cardinality operators

?
,
*
,
+

or nothing

Groups

Token sequences

Unordered groups

&

Alternatives

|

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

The structure of the resulting AST


You typically (should) know the AST you want
before

defining a textual representation for it


You will now learn how to instruct
Xtext

to
build the ASTs you want


Let start with the classical example

Arithmetical expressions (I)

Arithmetical expressions (II)


We have to define a corresponding
textual
representation


i.e., we have to define a corresponding
grammar


To keep things easy, let define a grammar
which


does not consider operator priorities


does not consider operator associativity


requires to explicitly specify parenthesis

EX:


not
1 + 2 * (3


4 / 5)


but
1 + (2 * (3


(4 / 5)))

Arithmetical expressions (III)

Expression ::= IntOrPar ( FactorSign
IntOrPar | TermSign IntOrPar )?

IntOrPar ::= INT | '(' Expression ')'

INT ::= '0' | '1'..'9' '0'..'9'*

FactorSign ::= '*' | 'multiply' | '/'
| 'divide'

TermSign ::= '+' | 'plus' | '
-
' |
'minus'

Arithmetical expressions (IV)

How to instruct
Xtext

to build this
AST out of

1 + (2 * (3


(4 / 5)))

?

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Return types


Each rule should specify a return type


The return type defaults to


ecore
::
EString

(for terminal and data type rules

cf.
below)


the rule’s name (otherwise)






IntOrPar

returns

Expression: … ;

terminal INT
returns

ecore
::
EInt
: … ;

The
Xtext

framework will create

The parser generated by the
Xtext

framework will create

a Java class for each
(non
-
existing)
return type

an instance of such class
whenever applying a rule with
such a return type

Enumeration rules

TermSign

and
FactorSign

are enumerations


enumerations can be specified by (enumeration) rules

enum

TermSign
: PLUS
=
'+' | PLUS
=
'plus' |
MINUS
=
'
-
' | MINUS
=
'minus';


If the value is omitted, you will get equal name and value


The first enumeration value is the default one

The
Xtext

framework
will create

The parser generated by the
Xtext

framework
will create

an enumeration for each
enumeration rule

an enumeration value whenever
applying the corresponding
enumeration rule

It is (theoretically) possible



using alternative literals



referencing a value twice

In practice, Xtext complains

It is (theoretically) possible



using alternative literals



referencing a value twice

In practice, Xtext complains

Terminal rules


Terminal and data type rules (cf. below) return
ecore
::
EString

by default


You probably want the following rule to return an
integer

terminal INT: '0' | '1'..'9' '0'..'9'*;

To this goal, you have to

1.
declare the return type in the rule

terminal INT returns
ecore
::
EInt
: … ;

2.
create a value converter (VC)

3.
create a value converter service (VCS)

4.
register the VC at the VCS

Creating a value converter

Create a class implementing

IValueConverter

/* Responsible for the
string
-
to
-
value
conversion */

X
toValue
(String,
AbstractNode
)

/* Responsible for the
value
-
to
-
string
conversion */

String toString(X)


X

is the return type of the grammar rule


ValueConverterException
s signal conversion
errors


IValueConverter

and
ValueConverterException

belong
to package
org.eclipse.xtext.conversion

AbstractNode

belongs to package
org.eclipse.xtext.parsetree

Creating a value converter service

Create a class implementing
IValueConverter


The easiest way is by extending
AbstractDeclarativeValueConverterServ
ice


Extend
DefaultTerminalConverters

if you
imported grammar
Terminals

(cf. below)


IValueConverter

belongs to package
org.eclipse.xtext.conversion

AbstractDeclarativeValueConverterService

belongs to
package
org.eclipse.xtext.conversion.impl

DefaultTerminalConverters

belongs to package
org.eclipse.xtext.common.services

Terminals

belongs to package
org.eclipse.xtext.common

Registering VCs at VCSs

Declare as many
VCS fields as
IValueConverter
s

you need

@Inject private

type

name
;


type

implements
IValueConverter


name

is an arbitrary name


Declare as many
VCS methods as grammar rules you handle

@
ValueConverter
(rule =
"
rule
")

public
IValueConverter
<
returnType
>
rule
(){

return
converter
;

}


rule

is the name of the grammar rule


returnType

is the type returned by
converter


converter

is the
IValueConverter

responsible for
rule


Inject

belongs to package
com.google.inject

ValueConverter

belongs to package
org.eclipse.xtext.conversion

Simple actions

IntOrPar

returns Expression:

'(' Expression ')' |

{
Integer
}

value
=
INT;

The
Xtext

framework
will

The parser generated by the
Xtext

framework
will



create a class
Expression



捲敡t攠愠a污獳l
Integer

(extending
Expression
)



慤搠愠晩f汤l
value

of type
ecore
::
EInt

to class
Integer

In the first case

return the created
Expression

In the second case



捲敡t攠慮a
Integer



assign the parsed

INT

to its field
value



re瑵牮t瑨攠捲敡t敤
Integer

The right
-
hand side can be either of



a rule call



a keyword



a cross
-
reference (cf. below)



an alternative of the formers

Field assignment


The operator
=

assigns atomic values to fields


The operator
+=

assigns multiple values to fields

Pair:

values
+=
Element ',' values
+=
Element;


The operator
?=

assigns binary values to fields

Wrapper:
isNull
?=
'null' | inner=Wrapped;

The
Xtext

framework will add

The parser generated by the Xtext
framework will

a list field (with values of the proper
type) for each assignment with the
+=

operator

add elements to such a list whenever
creating the corresponding object

a
boolean

field for each assignment
with the
?=

operator

initialize such a field to
false

(resp.
true
) if the parser does not scan
(resp. scan) the assignment’s right
-
獩摥

The
Xtext

framework
will

The parser generated by the
Xtext

framework
will



捲敡t攠捬慳獥s
Factor

and
Term

(extending
Expression
)



慤搠瑨敭 晩敬摳f
left
,
sign

and
right

(of the proper type)

In there is no optional part

return the created
Expression

Otherwise



捲敡t攠e
Factor

or
Term



assign the parsed

IntOrPar

to its field
left



go on as expected

Assigned actions

Expression:
IntOrPar

(

{
Factor
.
left
=current}

sign=
FactorSign

right=
IntOrPar

|

{
Term
.
left
=current}

sign=
TermSign

right=
IntOrPar

)? ;

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

Hidden tokens


Can be defined at (parser) rule
-

or grammar
-
level


Rule
-
level hidden tokens override grammar
-
level ones


Are automatically skipped when processing the
rule/grammar





EX:

Expression return Expression
hidden(
WS
)
: … ;

Grammar
-
level

Rule
-
level

When importing one single
grammar, its hidden tokens are
reused

Hidden tokens defined for a calling rule are
reused for called rules (unless they define
their own hidden tokens)

Data type rules

They are parser rules which


contain neither assignments nor actions


only call terminal or data type rules

The AST builder simply concatenates the parsed
text


Why should we use data type rules instead of
terminal rules?


They allow hidden tokens


They allow backtracking

References: Motivation


In a language, it is often the case that the
same entity is referred over and over

EX:

Variables and methods in Java programs


You do not want the AST builder to create new
instances of the entity whenever a reference is
found


You rather want the AST builder to point to
the entity created at definition
-
time

field
=[
type
|
rule
]

where


field

is the field of the object created by the
AST builder which is supposed to refer to an
entity


type

is the class of the referred entity


rule

is a grammar rule specifying the string
representation of the reference


If omitted (with the preceding
|
),
org.eclipse.xtext.common.Terminals
.
ID

is assumed

Notice that



references can only be used within assignments



entities of different classes can have the same
string representation



cross
-
references across file boundaries are
supported



as long as the referenced entities are on the
classpath

References: Syntax

References: (Default) Semantics


In order to be
referenceable
, entities must
have a field
name


Reference resolution is based on qualified
names


An entity’s qualified name is computed by
concatenating


the qualified name of the entity’s container


a dot (
.
)


the entity’s name

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header


Getting
Xtext

up & running

The header of
Xtext

grammars

Consists of declarations of


the grammar’s name

and possibly


imported grammars


grammar
-
scoped hidden tokens


imported
Ecore

packages


the
Ecore

package to generate


The first rule in a grammar (
entry rule
) is assumed
to be its entry point


i.e., it is the first rule the parser generated by
Xtext

will try to apply

Name and imported grammars

The grammar’s name


Xtext

grammar names follow Java’s naming
conventions


The grammar file must have the same name as the
grammar it contains (and extension
.
xtext
)

EX:

grammar

org.eclipse.xtext.Xtext

Imported grammars


The current grammar can reuse (or override) rules
defined in other grammars

EX:

with

org.eclipse.xtext.common.Terminals

Hidden tokes and imported packages

Grammar
-
scoped hidden tokens


Are declared just like hidden tokens for rules

EX:

hidden(
WS
)


Imported
Ecore

packages


You do not really need to care about them


Just do not be scared if you see something like

import

"http://www.eclipse.org/emf/200
2/Ecore"
as

ecore

The package to generate


Among else,
Xtext

creates an
Ecore

package
(whatever it is)


Just keep in mind that


A name and a namespace URI
are required to
create an
Ecore

package


You must provide
Xtext

with such data


EX:

generate

myDsl

"http://www.univStEtienne.fr/my
dsl/MyDsl"

Exercise

To test your understanding of XGL, have a look
at XGL’s
Xtext

grammar


http://dev.eclipse.org/viewcvs/v
iewvc.cgi/org.eclipse.tmf/org.e
clipse.xtext/plugins/org.eclips
e.xtext/src/org/eclipse/xtext/X
text.xtext?root=Modeling_Projec
t&view=markup

Outline


Introduction


How to specify the target language


Terminal rules


Parser rules


How to specify the target AST


The working example


Return types


Further features


The header of
Xtext

grammars


Getting
Xtext

up & running

Getting
Xtext

up & running (I)

Install Eclipse

1.
Download Eclipse Modeling Tools


http://www.eclipse.org/downloads/

2.
Start Eclipse Modeling Tools

3.
Click on
Install Modeling Components

(the
fifth icon from the left on the icon bar right
below the menu bar)

4.
Select
Xtext

Getting
Xtext

up & running (II)

Create an Xtext project

1.
File


New


Project…


Xtext



Xtext

project

2.
Choose a meaningful project name, language
name and file extension

3.
Uncheck the
Create generator project

box

4.
Click on
Finish

5.
Add
http://download.itemis.com/ant
lr
-
generator
-
3.0.1.jar

to the
project

s classpath

Getting
Xtext

up & running (III)

Generate the language artifacts

1.
Replace the content of the automatically opened
grammar file with your grammar

2.
Locate the file
Generate
grammarName
.mwe2

next to
the grammar file in the package explorer view

3.
Choose
Run As


MWE2 Workflow

from its context menu

4.
Possibly add your converters and converter service to the
non
-
ui

project


Add the following method to the class
grammarName
RuntimeModule

@Override

public Class<? extends
IValueConverterService
>
bindIValueConverterService
() {


return
converterService
.class
;

}

Getting
Xtext

up & running (IV)

Run the generated IDE plug
-
in

1.
Right
-
click on the
Xtext

project and choose
Run As


Eclipse Application


This will spawn a new Eclipse workbench

2.
Create a new project

3.
Create a new file with the file extension you
chose in the beginning


This will open the generated entity editor

4.
Enjoy the editor